CN108804428A - Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation - Google Patents
Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation Download PDFInfo
- Publication number
- CN108804428A CN108804428A CN201810600694.0A CN201810600694A CN108804428A CN 108804428 A CN108804428 A CN 108804428A CN 201810600694 A CN201810600694 A CN 201810600694A CN 108804428 A CN108804428 A CN 108804428A
- Authority
- CN
- China
- Prior art keywords
- translation
- text
- retroversion
- former
- cypher
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
Abstract
This application discloses a kind of correcting methods of term mistranslation in translation, the correcting method obtains the former cypher text of target terms in the first translation, and candidate translation corresponding with the target terms is obtained from training set, set all candidate translations to pseudo- cypher text;Each puppet cypher text is replaced into former cypher text described in the first translation respectively and obtains N number of second translation, and retroversion is executed to the first translation and all second translations and operates to obtain N+1 retroversion text;Source text and all retroversion texts are subjected to the translation order of accuarcy that text relatively determines the first translation, and correct the former cypher text of target terms according to translation order of accuarcy.This method can realize that the mistranslation of field term in machine translation is corrected under the premise of not depending on resource in a large amount of fields.Disclosed herein as well is a kind of a kind of correcting devices of term mistranslation in the correcting system, computer readable storage medium of term mistranslation in a kind of translation and translation, have the above advantageous effect.
Description
Technical field
The present invention relates to machine translation field, more particularly to the correcting method of term mistranslation, system, one kind in a kind of translation
The correcting device of term mistranslation in computer readable storage medium and a kind of translation.
Background technology
Machine translation mothod refers to being turned over a kind of original text of natural language (that is, original language) using computing devices such as computers
It is translated into the technology of the translation of another natural language (that is, object language).Since this translation process is completed by machine, so with
Human translation is compared, can be in the relatively short a large amount of translation of time-triggered protocol.
But when having the text of more specific area technical term using machine translation mothod translation, due to universal machine
The translation occurrence number of the translation or field term that lack specific area term in the training corpus of device translation system is less to be led
It causes translation probability relatively low, therefore is often malfunctioned using general machine translation method to translate this class text.It is asked for above-mentioned
Topic, the method that term machine translation text is corrected in the prior art are:First regard each word in the translation of output as differentiation
Object, construction lexical feature, syntactic feature etc. select disaggregated model appropriate such as maximum entropy classifiers, random forest, two-way
LSTM etc. labels to each word, judges correcting errors for word;The term of mistranslation is corrected if mistake.But it is this
Method is in the process for correcting term mistranslation dependent on resource in a large amount of fields, in the unknown text of domain-oriented, rare language
Speech resource will limit the versatility of such method.
Therefore, how under the premise of not depending on resource in a large amount of fields to realize that the mistranslation of field term is entangled in machine translation
Just it is a technical problem that technical personnel in the field need to solve at present.
Invention content
The purpose of the application is to provide the correcting method of term mistranslation in a kind of translation, system, a kind of computer-readable deposits
The correcting device of term mistranslation in storage media and a kind of translation, can realize machine under the premise of not depending on resource in a large amount of fields
The mistranslation of field term is corrected in device translation.
In order to solve the above technical problems, the application provides a kind of correcting method of term mistranslation in translation, the correcting method
Including:
The former cypher text of target terms in the first translation is obtained, and acquisition is corresponding with the target terms from training set
Candidate translation, set all candidate translations to pseudo- cypher text;Wherein, first translation is translated by source text
It arrives;
Each described pseudo- cypher text is replaced into former cypher text described in first translation respectively and obtains N number of second
Translation, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text;
The source text and all retroversion texts are subjected to text and relatively determine that the translation of first translation is accurate
Degree, and according to the former cypher text translated order of accuarcy and correct the target terms.
Optionally, candidate translation corresponding with the target terms is obtained from training set, by all candidate translations
Being set as pseudo- cypher text includes:
The cryptographic Hash of the target terms is calculated, and the phrase table built from the training set is subjected to Hash piecemeal, is looked into
The candidate translation corresponding with the target terms is looked for, sets all candidate translations to the pseudo- cypher text.
Optionally, the former cypher text of target terms includes in the first translation of acquisition:
The word alignment information for obtaining first translation, where determining the target terms according to the word alignment information
The original cypher text.
Optionally, the source text and all retroversion texts are subjected to text and relatively determine turning over for the retroversion text
Order of accuarcy is translated, and includes according to the former cypher text that the translation order of accuarcy corrects the target terms:
The probabilistic language model score of all retroversion texts is calculated using language model, and determines the language model
Probability score maximum value;
The retroversion text is mapped as first eigenvector, the source text is mapped as second feature vector, and will
Each the COS distance between the corresponding first eigenvector of retroversion text and the second feature vector is set as described
Semantic similarity between retroversion text and the source text determines the semantic similarity maximum value;
Judge that the probabilistic language model score of the corresponding retroversion text of first translation is obtained with the probabilistic language model
Divide whether the difference of the score of maximum value is less than or equal to preset value, obtains the first judging result;
Judge whether the semantic similarity between the corresponding retroversion text of first translation and the source text is described
Semantic similarity maximum value obtains the second judging result.
Judge whether first judging result and second judging result are no;
If being no, the former cypher text translation error is judged, and correct the former cypher text of the target terms.
Optionally, the former cypher text for correcting the target terms includes:
First translation is replaced with into corresponding second translation of the semantic similarity maximum value.
Optionally, the former cypher text for correcting the target terms includes:
All retroversion texts and the source text are represented as dependency tree, are calculated from retroversion text described in each
Corresponding dependency tree is converted into minimum editor's cost of the corresponding dependency tree of source text;
Select corresponding second translation of retroversion text of minimum editor's Least-cost as optimal translation;
Or, the semantic similarity is selected to be translated with minimum editor's maximum retroversion text of cost difference corresponding second
Text is used as the optimal translation;
First translation is replaced with into the optimal translation.
Present invention also provides a kind of correcting system of term mistranslation in translation, which includes:
Synonymous word acquisition module, the former cypher text for obtaining target terms in the first translation, and from training set
Candidate translation corresponding with the target terms is obtained, sets all candidate translations to pseudo- cypher text;Wherein, described
First translation is translated to obtain by source text;
Decoding module, for each described pseudo- cypher text to be replaced former translation text described in first translation respectively
Originally N number of second translation is obtained, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion
Text;
Module is corrected, relatively determines that described first translates for the source text and all retroversion texts to be carried out text
The translation order of accuarcy of text, and according to the former cypher text translated order of accuarcy and correct the target terms.
Optionally, the synonymous word acquisition module includes:
Former cypher text determination unit, the word alignment information for obtaining first translation are believed according to the word alignment
Breath determines the former cypher text where the target terms.
Pseudo- cypher text determination unit, the cryptographic Hash for calculating the target terms, and will be built from the training set
Phrase table carry out Hash piecemeal, search the candidate translation corresponding with the target terms, will all candidate's translations
It is set as the pseudo- cypher text.
Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer
Program realizes the step of correcting method of term mistranslation in above-mentioned translation executes when executing.
It is described to deposit present invention also provides a kind of correcting device of term mistranslation in translation, including memory and processor
Computer program is stored in reservoir, the processor is realized when calling the computer program in the memory in above-mentioned translation
The step of correcting method of term mistranslation executes.
The present invention provides a kind of correcting methods of term mistranslation in translation, including obtain target terms in the first translation
Former cypher text, and candidate translation corresponding with the target terms is obtained from training set, all candidate translations are set
It is set to pseudo- cypher text;Wherein, first translation is translated to obtain by source text;Each described pseudo- cypher text is replaced respectively
It changes former cypher text described in first translation and obtains N number of second translation, and to first translation and all described second
Translation executes retroversion and operates to obtain N+1 retroversion text;The source text and all retroversion texts are subjected to text comparison
It determines the translation order of accuarcy of first translation, and the former of the target terms is corrected according to the translation order of accuarcy and is translated
Text.
Translation refers to converting the source text of the first language to the version for second of language for expressing identical semanteme,
And the text that the contrary operation for executing translation again to the version of second of language obtains the first language is referred to as retroversion
The process of text, this converse translation is just referred to as retroversion, if the version for second of language that translation obtains is not present
Mistranslation, then retroversion text will keep higher consistency with source text.Further, due to one in a certain language
Word has the different candidate translation of multiple semantemes in another language, and only directly carrying out retroversion to candidate's translation can not be true
The order of accuarcy of fixed candidate's translation (may also obtain because even being wrong candidate translation progress retroversion in correct source text
Former word), therefore candidate translation can be placed in the complete sentence of context and carry out retroversion and obtain retroversion text, will
The translation accuracy that can evaluate some word in version is compared with source text for retroversion text, and then selects a translation
The corresponding candidate translation of the highest retroversion text of accuracy is as correctly translation.Based on this, the present invention is by the knowledge of term mistranslation
Other process is converted into the comparison problem between retroversion text and source text, and pseudo- translation corresponding with target terms is searched by comparing
The replacement that text carries out text obtains N number of second translation, and the first translation and the second translation, which are carried out retroversion, obtains multiple retroversion texts
This, the translation order of accuarcy that text relatively determines first translation is carried out by the source text and all retroversion texts,
And the former cypher text of the target terms is corrected according to the translation order of accuarcy.This programme can not depend on a large amount of fields
Realize that the mistranslation of field term in machine translation is corrected under the premise of interior resource.The application additionally provides art in a kind of translation simultaneously
The correcting device of term mistranslation in the correcting system of language mistranslation, a kind of computer readable storage medium and a kind of translation has upper
Advantageous effect is stated, details are not described herein.
Description of the drawings
In order to illustrate more clearly of the embodiment of the present application, attached drawing needed in the embodiment will be done simply below
It introduces, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present application, for ordinary skill people
For member, without creative efforts, other drawings may also be obtained based on these drawings.
The flow chart of the correcting method of term mistranslation in a kind of translation that Fig. 1 is provided by the embodiment of the present application;
The flow chart of the correcting method of term mistranslation in another translation that Fig. 2 is provided by the embodiment of the present application;
The structural schematic diagram of the correcting system of term mistranslation in a kind of translation that Fig. 3 is provided by the embodiment of the present application.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application
In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is
Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art
The every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Refer to Fig. 1 below, the correcting method of term mistranslation in a kind of translation that Fig. 1 is provided by the embodiment of the present application
Flow chart.
Specific steps may include:
S101:The former cypher text of target terms in the first translation is obtained, and is obtained and the target art from training set
The corresponding candidate translation of language sets all candidate translations to pseudo- cypher text;
Wherein, the present embodiment acquiescence is the first translation for the source text of the first language to be translated as to second of language
Later, the process corrected for the mistranslation text of term in translation.The purpose of this step is to obtain mesh in the first translation
Mark other candidate's translations of term, that is, pseudo- cypher text.Source text, retroversion text and target terms in the present embodiment are
The first language, and former cypher text, pseudo- cypher text, the first translation and the second translation are second of language.
First, the method for obtaining the former cypher text of target terms in the first translation has very much, such as is believed by word alignment
Breath compares the position of target terms in source text and the first translation to obtain former cypher text, can also be by comparing the first translation
In the rare degree of all words determine former cypher text.Certainly, the former cypher text mentioned in this step is the first language
The word of the target terms of speech identical semanteme in second of language, former cypher text herein can be a word, can also
It is multiple contaminations.
It is well known that a word usually has depositing for multiple translations in another language in a kind of language
The meaning for obtaining pseudo- cypher text in the present embodiment is that the relative accuracy of translation can be evaluated, i.e., in certain language
Selection and the highest word of semantic consistency in source text in all cypher texts corresponding with target terms.Such as:It is turning over
There may be such phenomenon during translating:A is translated as a1, and there are other two kinds translation a2 and a3 by A, in source text
Middle a2 is the semantic word of expression A that can be best, so when a2 can be replaced a1.Wherein, from training set in this step
Middle to obtain candidate translation corresponding with the target terms, setting all candidate translations to pseudo- cypher text can be specific
For:The cryptographic Hash for calculating the target terms, by the phrase table built from the training set carry out Hash piecemeal, search with it is described
The corresponding candidate translation of target terms sets all candidate translations to the pseudo- cypher text.
Below by one more specifically example illustrate, the process of the present embodiment:
Source text:When you want to connect to some computer time,bus network
more suitable.
First translation:When you want to connect some computers, bus network is more suitable.
Target terms are bus network, and former cypher text is bus network.
Other translations of target terms are determined by training set:Public traffic network (i.e. pseudo- cypher text).
Former cypher text is replaced with into pseudo- cypher text and obtains the second translation:It is public when you want to connect some computers
Hand over network more suitable.
Retroversion is carried out to the first translation and the second translation:
The corresponding retroversion text of first translation is:When you want to connect to a computer,the
bus network is more appropriate.
The corresponding retroversion text of second translation is:When you want to connect to some computer
time,public transportation network more suitable.
It is compared with source text, determines that the similarity degree of the corresponding retroversion text of the first translation and source text is higher, because
The corresponding first translation translation of this original cypher text bus network is correct.
It should be noted that the present embodiment acquiescence has term and its training set of translation, training set can be passed through
It searches and the semantic identical pseudo- cypher text of former cypher text.Source text can be obtained from the test set built in advance, structure
Building the process of test set can be:Chinese and English abstract and keyword are obtained using web crawlers, is screened in the Chinese and English abstract
Sentence sample with the keyword;Structure includes the test set of all sentence samples.Specifically, it can utilize
Web crawlers obtains Chinese and English abstract and keyword from periodical.It first has to carry out subordinate sentence, detects sentence boundary;Secondly, right
In each Chinese key, the sentence that positioning keyword occurs in Chinese is made a summary, then go in the english sentence of manipulative indexing to look into
The keyword is looked for, manipulative indexing front and back can extend two windows herein, the reason is that, most people is write Chinese abstract as English
It is not to translate sentence by sentence when abstract.Based on this, what the sentence pair of acquisition can be regarded as translating each other, the keyword in sentence pair can
To regard term as.
S102:It each puppet cypher text is replaced into described in the first translation former cypher text respectively obtains N number of second and translate
Text, and retroversion is executed to the first translation and all second translations and operates to obtain N+1 retroversion text;
Wherein, if only carrying out the obtained retroversion text of retroversion to former cypher text and source text is compared, due to this
The comparison of sample lacks the reference object about comparison result, therefore can not accurately evaluate the quality of retroversion text.Therefore this step
On the basis of S101, pseudo- cypher text replacement is subjected to former cypher text and has obtained N number of second translation, N is just more than 0
Integer, and the numerical value representated by N is consistent with the pseudo- quantity of cypher text.It should be noted that although retroversion is the reverse of translation
Process, but be not the reverse process of translation result, even therefore the corresponding retroversion text of the first translation with source text nor
Completely the same.
S103:The source text and all retroversion texts are subjected to the translation that text relatively determines first translation
Order of accuarcy, and according to the former cypher text translated order of accuarcy and correct the target terms.
It is understood that language is the complicated notation being made of by certain grammer vocabulary, it includes
Voice system, lexical system and grammar system, and in same language, all sentences are all to follow same syntax rule
And use same lexical system.Therefore, all the first translation and the second translation the base of retroversion has been subjected in S102
On plinth, source text is compared with retroversion text to the translation that can evaluate corresponding first translation of retroversion text or the second translation
Order of accuarcy.
Wherein, the translation accuracy of the first translation is determined in this step, and corrects former translation according to according to translation accuracy
There are following steps for the operation acquiescence of text:
Step 1:The source text and all retroversion texts are subjected to text and relatively determine turning over for first translation
Translate order of accuarcy;
Step 2:Judge to translate whether order of accuarcy meets preset standard;If so, terminating flow;If it is not, then entering step
Rapid three:
Step 3:Correct the former cypher text of the target terms.
Translation refers to converting the source text of the first language to the version for second of language for expressing identical semanteme,
And the text that the contrary operation for executing translation again to the version of second of language obtains the first language is referred to as retroversion
The process of text, this converse translation is just referred to as retroversion, if the version for second of language that translation obtains is not present
Mistranslation, then retroversion text will keep higher consistency with source text.Further, due to one in a certain language
Word has the different candidate translation of multiple semantemes in another language, and only directly carrying out retroversion to candidate's translation can not be true
The order of accuarcy of fixed candidate's translation (may also obtain because even being wrong candidate translation progress retroversion in correct source text
Former word), therefore candidate translation can be placed in the complete sentence of context and carry out retroversion and obtain retroversion text, will
The translation accuracy that can evaluate some word in version is compared with source text for retroversion text, and then selects a translation
The corresponding candidate translation of the highest retroversion text of accuracy is as correctly translation.Based on this, the present embodiment is by term mistranslation
Identification process is converted into the comparison problem between retroversion text and source text, is turned over by comparing lookup puppet corresponding with target terms
The replacement of translation this progress text obtains N number of second translation, and the first translation and the second translation, which are carried out retroversion, obtains multiple retroversion
The source text and all retroversion texts are carried out the accurate journey of translation that text relatively determines first translation by text
Degree, and according to the former cypher text translated order of accuarcy and correct the target terms.The present embodiment can not depend on greatly
Realize that the mistranslation of field term in machine translation is corrected in amount field under the premise of resource.
Refer to Fig. 2 below, the correcting method of term mistranslation in another translation that Fig. 2 is provided by the embodiment of the present application
Flow chart;Further explanation is carried out to the operation of S103 in previous step in the present embodiment, other steps and upper one
Embodiment is almost the same, can be with cross-reference, and details are not described herein again.
Specific steps may include:
S201:The word alignment information for obtaining first translation determines the target terms according to the word alignment information
The former cypher text at place;
S202:The cryptographic Hash of the target terms is calculated, and the phrase table built from the training set is subjected to Hash point
Block searches the candidate translation corresponding with the target terms, sets all candidate translations to the pseudo- translation text
This.
S203:Each described pseudo- cypher text is replaced into former cypher text described in first translation respectively and obtains N
A second translation, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text;
S204:The probabilistic language model score of all retroversion texts is calculated using language model, and determines institute's predicate
Say model probability score maximum value;
Specifically, ngram language models are trained by extensive single language English language material, for each retroversion text, participle
As a result it is w1,w2,...,wn, computational language model probability score logp (w1,w2,...,wn), wherein p is that the sentence is according to just
The possibility of Chang Yuyan.When using 5gram language models
M is 4.
S205:The retroversion text is mapped as first eigenvector, the source text is mapped as second feature vector,
And it sets the COS distance between each corresponding first eigenvector of retroversion text and the second feature vector to
Semantic similarity between the retroversion text and the source text determines the semantic similarity maximum value;
The process in S207 is specifically illustrated, continuous three sentence (s are giveni-1,si,si+1), it enablesIndicate sentence
Sub- siIn t-th of word,For corresponding term vector.Following formula gives sentence siCataloged procedure.
rt=σ (Wrxt+Urht-1)
zt=σ (Wzxt+Uzht-1)
ztAnd rtRespectively update door and resetting door.Update door is worked as controlling the status information of previous moment and being brought into
Degree in preceding state, updating the value of door, bigger to illustrate that the status information of previous moment is brought into more.Resetting door is neglected for controlling
The slightly degree of the status information of previous moment, the smaller explanation of value for resetting door is ignored more, and h indicates the hiding shape at each moment
State,Currently to remember content, ⊙ is Hadamard products, i.e. matrix corresponding element product.
Decoder is a kind of output h with encoderiFor the neural language model of condition.Calculating process is similar with coding,
The difference is that introducing Matrix Cz, CrAnd C, the calculating for being biased update door with sentence vector, being reset door and hidden state.
Need two decoders to surrounding sentence si-1And si+1It is decoded respectively, with sentence si+1For, it enablesFor hiding for t moment
State, formula below give sentence si+1Decoding process.
The probability of t-th of word is:
Given tuple (si-1,si,si+1), object function is intended to using the hidden state of current sentence as condition, and optimization is previous
The sum of the log probability of a sentence and the latter sentence, as shown by the following formula, total losses function is in all instructions to mathematical definition
Practice the sum of the object function on sample.
After the model for obtaining pre-training, retroversion translation and source text are distinguished into input model, the last one list of acquisition sentence
The hidden layer state of word can indicate entire sentence.Two vectorial semantic cosine similarities are calculated, numerical value is bigger, represents two sentences
It is sub semantic more close.
S206:Judge that probabilistic language model score and the language model of the corresponding retroversion text of first translation are general
Whether the difference of the score of rate score maximum value is less than or equal to preset value, obtains the first judging result, and enter S208;
S207:Judge the semantic similarity between the corresponding retroversion text of first translation and the source text whether be
The semantic similarity maximum value obtains the second judging result, and enters S208;
S208:Judge whether first judging result and second judging result are no;If being no, enter
S209;If not being no, terminate flow.
Specifically, the condition into S209 is:The probabilistic language model score and language of the corresponding retroversion text of first translation
0.015) and first it says and the difference of model probability score maximum value is more than preset value (as a preferred option, which can be
Semantic similarity between the corresponding retroversion text of translation and the source text is not the maximum value of semantic similarity, should if meeting
It the phenomenon that condition then illustrates the former cypher text in the first translation about target terms there are mistranslation, is corrected.
S209:Judge the former cypher text translation error, and corrects the former cypher text of the target terms.
Wherein, this step is built upon have determined former cypher text there are mistranslation on the basis of, to former cypher text into
Row is corrected.Wherein, correct the former cypher text of target terms method can there are many kinds of, be exemplified below three kinds it is preferred
Correcting method:
Correcting method one:First translation is replaced with into corresponding second translation of the semantic similarity maximum value.
Correcting method two:All retroversion texts and the source text are represented as dependency tree, are calculated from each
The corresponding dependency tree of the retroversion text is converted into minimum editor's cost of the corresponding dependency tree of source text;Selection minimum editor's generation
Corresponding second translation of retroversion text of valence minimum replaces with the optimal translation as optimal translation and by first translation.
Correcting method three:All retroversion texts and the source text are represented as dependency tree, are calculated from each
The corresponding dependency tree of the retroversion text is converted into minimum editor's cost of the corresponding dependency tree of source text;Select the semantic phase
Like degree minimum corresponding second translation of the maximum retroversion text of cost difference is edited with described as the optimal translation and by institute
It states the first translation and replaces with the optimal translation.
It should be noted that calculations set forth above is converted into source document from the corresponding dependency tree of retroversion text described in each
The process of minimum editor's cost of this corresponding dependency tree is as follows:
Specifically, using the dependency tree of Stamford natural language processing tool analysis retroversion text and source text.Interdependent
In tree, each node is expressed as three fields:Root, part-of-speech tagging, the dependence with father node.Basic edit operation
Be redefined nine types, the first six kind (INS_LEAF, INS_SUBTREE, INS, DEL_LEAF, DEL_SUBTREE,
DEL) be inserted into or delete a leafy node, one whole subtree or any one node neither leafy node nor in subtree
A part.Three kinds of (REN_POS, REN_DEP, REN_POS_DEP) renaming part-of-speech taggings, dependences or both are complete afterwards
Portion's renaming.
When beginning, it is 1.0 to enable the cost of each basic edit operation, this makes the insert or delete operation of entire node
Cost is 3 (three fields are entirely insertable or delete).The renaming of part-of-speech tagging or relationship type is allowed to be tied and if only if source
Point and the root of destination node are identical.It is not present in edit script if two nodes are identical, or because identical word
Root is renamed, and claiming the two nodes is aligned by tree edit model.In addition, the renaming cost of stop words is revised as
2.5, no matter whether two stop words have identical part-of-speech tagging or relationship type.The reason is that, stop words often have it is fixed
Part-of-speech tagging and dependence, therefore compared with renaming notional word, can be aligned with lower cost.
If the term that translation system exports in translation is wrong, then carries out retroversion, the syntax knot of retroversion text to translation
Structure is likely to different from source text.Based on this, calculates the tree between retroversion text and source text dependency tree using the above method and compile
Volume distance, apart from smaller, the syntactic structure similarity of two sentences is higher.
In order to examine the validity of put forward field term mistranslation recognition methods herein, evaluation index selects accuracy rate PR, fixed
Justice is as follows:
Wherein, molecule #of correctly translated terms are all source text (test sets in above-mentioned formula
In all texts) in the term number correctly translated, denominator Total#of terms are the total of term in all source texts
Number.
Refer to Fig. 3, the structure of the correcting system of term mistranslation in a kind of translation that Fig. 3 is provided by the embodiment of the present application
Schematic diagram;
The correcting system may include:
Synonymous word acquisition module 100, for obtaining the former cypher texts of target terms in the first translation, and from training set
It is middle to obtain candidate translation corresponding with the target terms, set all candidate translations to pseudo- cypher text;Wherein, institute
The first translation is stated to translate to obtain by source text;
Decoding module 200 is turned over for each described pseudo- cypher text to be replaced original described in first translation respectively
Translation originally obtains N number of second translation, and executes retroversion to first translation and all second translations and operate to obtain N+1
Retroversion text;
Module 300 is corrected, described the is relatively determined for the source text and all retroversion texts to be carried out text
The translation order of accuarcy of one translation, and according to the former cypher text translated order of accuarcy and correct the target terms.
Further, synonymous word acquisition module 100 includes:
Former cypher text determination unit, the word alignment information for obtaining first translation are believed according to the word alignment
Breath determines the former cypher text where the target terms.
Pseudo- cypher text determination unit, the cryptographic Hash for calculating the target terms, and will be built from the training set
Phrase table carry out Hash piecemeal, search the candidate translation corresponding with the target terms, will all candidate's translations
It is set as the pseudo- cypher text.
Further, correcting module 300 includes:
Probabilistic language model score calculation unit, the language mould for calculating all retroversion texts using language model
Type probability score, and determine the probabilistic language model score maximum value;
Semantic Similarity Measurement unit, for the retroversion text to be mapped as first eigenvector, by the source text
It is mapped as second feature vector, and incites somebody to action the corresponding first eigenvector of each retroversion text and the second feature vector
Between COS distance be set as the semantic similarity between the retroversion text and the source text, determine the semantic similarity
Maximum value;
First evaluation unit, the probabilistic language model score for judging the corresponding retroversion text of first translation and institute
Whether the difference for stating the score of probabilistic language model score maximum value is less than or equal to preset value, obtains the first judging result;
Second evaluation unit, for judging the semanteme between the corresponding retroversion text of first translation and the source text
Whether similarity is the semantic similarity maximum value, obtains the second judging result.
Judging unit, for judging whether first judging result and second judging result are no;
Text corrects unit, for when first judging result and second judging result are no, then judging
The original cypher text translation error, and correct the former cypher text of the target terms.
Further, the text correction unit includes:
First corrects subelement, for first translation to be replaced with the semantic similarity maximum value corresponding second
Translation.
Cost computation subunit is edited, for all retroversion texts and the source text to be represented as dependency tree,
Calculate minimum editor's cost that the corresponding dependency tree of source text is converted into from the corresponding dependency tree of retroversion text described in each;
Or, second corrects subelement, for selecting corresponding second translation of retroversion text of minimum editor's Least-cost to make
The optimal translation is replaced with for optimal translation and by first translation.
Or, second corrects subelement, for selecting the semantic similarity and minimum editor's cost difference maximum
Corresponding second translation of retroversion text replaces with the optimal translation as the optimal translation and by first translation.
Since the embodiment of components of system as directed is corresponded with the embodiment of method part, the embodiment of components of system as directed is asked
Referring to the description of the embodiment of method part, wouldn't repeat here.
Present invention also provides a kind of computer readable storage mediums, have computer program thereon, the computer program
It is performed and the step of above-described embodiment is provided may be implemented.The storage medium may include:USB flash disk, read-only is deposited mobile hard disk
Reservoir (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or
The various media that can store program code such as CD.
Present invention also provides a kind of correcting devices of term mistranslation in translation, may include memory and processor, institute
It states and has computer program in memory, when the processor calls the computer program in the memory, may be implemented
State the step of embodiment is provided.The correcting device of term mistranslation can also include various network interfaces in certain translation,
The components such as power supply.
Each embodiment is described by the way of progressive in specification, the highlights of each of the examples are with other realities
Apply the difference of example, just to refer each other for identical similar portion between each embodiment.For system disclosed in embodiment
Speech, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place is referring to method part illustration
?.It should be pointed out that for those skilled in the art, under the premise of not departing from the application principle, also
Can to the application, some improvement and modification can also be carried out, these improvement and modification also fall into the application scope of the claims
It is interior.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment including a series of elements includes not only that
A little elements, but also include other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.Under the situation not limited more, the element limited by sentence "including a ..." is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Claims (10)
1. the correcting method of term mistranslation in a kind of translation, which is characterized in that including:
The former cypher text of target terms in the first translation is obtained, and obtains time corresponding with the target terms from training set
Choosing translation sets all candidate translations to pseudo- cypher text;Wherein, first translation is translated to obtain by source text;
It each described pseudo- cypher text is replaced into former cypher text described in first translation respectively obtains N number of second and translate
Text, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text;
The source text and all retroversion texts are subjected to the translation order of accuarcy that text relatively determines first translation,
And the former cypher text of the target terms is corrected according to the translation order of accuarcy.
2. correcting method according to claim 1, which is characterized in that obtained from training set corresponding with the target terms
Candidate's translation, setting all candidate translations to pseudo- cypher text includes:
Calculate the cryptographic Hash of the target terms, and the phrase table built from the training set be subjected to Hash piecemeal, search with
The corresponding candidate translation of the target terms sets all candidate translations to the pseudo- cypher text.
3. correcting method according to claim 1, which is characterized in that obtain the former cypher text of target terms in the first translation
Including:
The word alignment information for obtaining first translation, determined where the target terms according to the word alignment information described in
Former cypher text.
4. correcting method according to claim 1, which is characterized in that carry out the source text and all retroversion texts
Text relatively determines the translation order of accuarcy of the retroversion text, and corrects the target terms according to the translation order of accuarcy
Former cypher text include:
The probabilistic language model score of all retroversion texts is calculated using language model, and determines the probabilistic language model
Score maximum value;
The retroversion text is mapped as first eigenvector, the source text is mapped as second feature vector, and will be each
COS distance between the corresponding first eigenvector of retroversion text and the second feature vector is set as the retroversion
Semantic similarity between text and the source text determines the semantic similarity maximum value;
Judge the probabilistic language model score of the corresponding retroversion text of first translation with the probabilistic language model score most
Whether the difference for the score being worth greatly is less than or equal to preset value, obtains the first judging result;
Judge whether the semantic similarity between the corresponding retroversion text of first translation and the source text is the semanteme
Similarity maximum value obtains the second judging result;
Judge whether first judging result and second judging result are no;
If being no, the former cypher text translation error is judged, and correct the former cypher text of the target terms.
5. correcting method according to claim 4, which is characterized in that the former cypher text for correcting the target terms includes:
First translation is replaced with into corresponding second translation of the semantic similarity maximum value.
6. correcting method according to claim 4, which is characterized in that the former cypher text for correcting the target terms includes:
All retroversion texts and the source text are represented as dependency tree, calculating is corresponded to from retroversion text described in each
Dependency tree be converted into minimum editor's cost of the corresponding dependency tree of source text;
Select corresponding second translation of retroversion text of minimum editor's Least-cost as optimal translation;
Or, the semantic similarity the second translation corresponding with minimum editor's maximum retroversion text of cost difference is selected to make
For the optimal translation;
First translation is replaced with into the optimal translation.
7. the correcting system of term mistranslation in a kind of translation, which is characterized in that including:
Synonymous word acquisition module, the former cypher text for obtaining target terms in the first translation, and obtained from training set
Candidate translation corresponding with the target terms sets all candidate translations to pseudo- cypher text;Wherein, described first
Translation is translated to obtain by source text;
Decoding module is obtained for each described pseudo- cypher text to be replaced former cypher text described in first translation respectively
To N number of second translation, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text
This;
Module is corrected, first translation is relatively determined for the source text and all retroversion texts to be carried out text
Order of accuarcy is translated, and corrects the former cypher text of the target terms according to the translation order of accuarcy.
8. correcting system according to claim 7, which is characterized in that the synonymous word acquisition module includes:
Former cypher text determination unit, the word alignment information for obtaining first translation are true according to the word alignment information
The former cypher text where the fixed target terms;
Pseudo- cypher text determination unit, the cryptographic Hash for calculating the target terms, and it is short by being built from the training set
Language table carries out Hash piecemeal, searches the candidate translation corresponding with the target terms, by all candidate translation settings
For the pseudo- cypher text.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program is realized when the computer program is executed by processor such as term mistranslation in claim 1 to 6 any one of them translation
Correcting method the step of.
10. the correcting device of term mistranslation in a kind of translation, which is characterized in that including:
Memory, for storing computer program;
Processor is executed when for executing the computer program as term is wrong in claim 1 to 6 any one of them translation
The step of correcting method translated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810600694.0A CN108804428A (en) | 2018-06-12 | 2018-06-12 | Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810600694.0A CN108804428A (en) | 2018-06-12 | 2018-06-12 | Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108804428A true CN108804428A (en) | 2018-11-13 |
Family
ID=64085477
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810600694.0A Pending CN108804428A (en) | 2018-06-12 | 2018-06-12 | Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108804428A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109582977A (en) * | 2018-11-20 | 2019-04-05 | 科大讯飞股份有限公司 | A kind of interactive text interpretation method and device |
CN109858042A (en) * | 2018-11-20 | 2019-06-07 | 科大讯飞股份有限公司 | A kind of determination method and device of translation quality |
CN110807338A (en) * | 2019-11-08 | 2020-02-18 | 北京中献电子技术开发有限公司 | English-Chinese machine translation term consistency self-correcting system and method |
CN111191468A (en) * | 2019-12-17 | 2020-05-22 | 语联网(武汉)信息技术有限公司 | Term replacement method and device |
CN111385612A (en) * | 2018-12-28 | 2020-07-07 | 深圳Tcl数字技术有限公司 | Television playing method based on hearing-impaired people, smart television and storage medium |
CN111597826A (en) * | 2020-05-15 | 2020-08-28 | 苏州七星天专利运营管理有限责任公司 | Method for processing terms in auxiliary translation |
CN111652006A (en) * | 2020-06-09 | 2020-09-11 | 北京中科凡语科技有限公司 | Computer-aided translation method and device |
CN111680524A (en) * | 2020-06-09 | 2020-09-18 | 语联网(武汉)信息技术有限公司 | Human-machine feedback translation method and system based on reverse matrix analysis |
CN111680526A (en) * | 2020-06-09 | 2020-09-18 | 语联网(武汉)信息技术有限公司 | Human-computer interaction translation system and method based on reverse translation result comparison |
CN111898387A (en) * | 2019-05-06 | 2020-11-06 | 阿里巴巴集团控股有限公司 | Translation method and device, storage medium and computer equipment |
CN111916050A (en) * | 2020-08-03 | 2020-11-10 | 北京字节跳动网络技术有限公司 | Speech synthesis method, speech synthesis device, storage medium and electronic equipment |
CN112084301A (en) * | 2020-08-11 | 2020-12-15 | 网易有道信息技术(北京)有限公司 | Training method and device of text correction model and text correction method and device |
CN112528683A (en) * | 2020-12-23 | 2021-03-19 | 深圳市爱科云通科技有限公司 | Text translation correction method, device, system, server and readable storage medium |
CN112667208A (en) * | 2020-12-22 | 2021-04-16 | 深圳壹账通智能科技有限公司 | Translation error recognition method and device, computer equipment and readable storage medium |
CN112765968A (en) * | 2021-01-05 | 2021-05-07 | 网易有道信息技术(北京)有限公司 | Grammar error correction method and training method and product for grammar error correction model |
WO2021092730A1 (en) * | 2019-11-11 | 2021-05-20 | 深圳市欢太科技有限公司 | Digest generation method and apparatus, electronic device, and storage medium |
CN113051937A (en) * | 2021-03-19 | 2021-06-29 | 北京大米科技有限公司 | Machine error correction method, device, electronic equipment and readable storage medium |
CN115455981A (en) * | 2022-11-11 | 2022-12-09 | 合肥智能语音创新发展有限公司 | Semantic understanding method, device, equipment and storage medium for multi-language sentences |
CN115688703A (en) * | 2022-10-31 | 2023-02-03 | 国网山东省电力公司烟台供电公司 | Specific field text error correction method, storage medium and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140288915A1 (en) * | 2013-03-19 | 2014-09-25 | Educational Testing Service | Round-Trip Translation for Automated Grammatical Error Correction |
CN106202059A (en) * | 2015-05-25 | 2016-12-07 | 松下电器(美国)知识产权公司 | Machine translation method and machine translation apparatus |
-
2018
- 2018-06-12 CN CN201810600694.0A patent/CN108804428A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140288915A1 (en) * | 2013-03-19 | 2014-09-25 | Educational Testing Service | Round-Trip Translation for Automated Grammatical Error Correction |
CN106202059A (en) * | 2015-05-25 | 2016-12-07 | 松下电器(美国)知识产权公司 | Machine translation method and machine translation apparatus |
Non-Patent Citations (1)
Title |
---|
MENGYI LIU ET AL.: "Terminology Translation Error Identification and Correction", 《CHINESE NATION CONFERENCE ON SOCIAL MEDIA PROCESSING》 * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109858042A (en) * | 2018-11-20 | 2019-06-07 | 科大讯飞股份有限公司 | A kind of determination method and device of translation quality |
CN109582977A (en) * | 2018-11-20 | 2019-04-05 | 科大讯飞股份有限公司 | A kind of interactive text interpretation method and device |
CN109582977B (en) * | 2018-11-20 | 2023-06-02 | 科大讯飞股份有限公司 | Interactive text translation method and device |
CN109858042B (en) * | 2018-11-20 | 2024-02-20 | 科大讯飞股份有限公司 | Translation quality determining method and device |
CN111385612A (en) * | 2018-12-28 | 2020-07-07 | 深圳Tcl数字技术有限公司 | Television playing method based on hearing-impaired people, smart television and storage medium |
CN111898387A (en) * | 2019-05-06 | 2020-11-06 | 阿里巴巴集团控股有限公司 | Translation method and device, storage medium and computer equipment |
CN110807338A (en) * | 2019-11-08 | 2020-02-18 | 北京中献电子技术开发有限公司 | English-Chinese machine translation term consistency self-correcting system and method |
CN110807338B (en) * | 2019-11-08 | 2022-03-04 | 北京中献电子技术开发有限公司 | English-Chinese machine translation term consistency self-correcting system and method |
WO2021092730A1 (en) * | 2019-11-11 | 2021-05-20 | 深圳市欢太科技有限公司 | Digest generation method and apparatus, electronic device, and storage medium |
CN111191468B (en) * | 2019-12-17 | 2023-08-25 | 语联网(武汉)信息技术有限公司 | Term replacement method and device |
CN111191468A (en) * | 2019-12-17 | 2020-05-22 | 语联网(武汉)信息技术有限公司 | Term replacement method and device |
CN111597826B (en) * | 2020-05-15 | 2021-10-01 | 苏州七星天专利运营管理有限责任公司 | Method for processing terms in auxiliary translation |
CN111597826A (en) * | 2020-05-15 | 2020-08-28 | 苏州七星天专利运营管理有限责任公司 | Method for processing terms in auxiliary translation |
CN111680526A (en) * | 2020-06-09 | 2020-09-18 | 语联网(武汉)信息技术有限公司 | Human-computer interaction translation system and method based on reverse translation result comparison |
CN111652006B (en) * | 2020-06-09 | 2021-02-09 | 北京中科凡语科技有限公司 | Computer-aided translation method and device |
CN111680524A (en) * | 2020-06-09 | 2020-09-18 | 语联网(武汉)信息技术有限公司 | Human-machine feedback translation method and system based on reverse matrix analysis |
CN111652006A (en) * | 2020-06-09 | 2020-09-11 | 北京中科凡语科技有限公司 | Computer-aided translation method and device |
CN111680526B (en) * | 2020-06-09 | 2023-09-08 | 语联网(武汉)信息技术有限公司 | Man-machine interactive translation system and method based on comparison of reverse translation results |
CN111916050A (en) * | 2020-08-03 | 2020-11-10 | 北京字节跳动网络技术有限公司 | Speech synthesis method, speech synthesis device, storage medium and electronic equipment |
CN112084301B (en) * | 2020-08-11 | 2023-12-15 | 网易有道信息技术(北京)有限公司 | Training method and device for text correction model, text correction method and device |
CN112084301A (en) * | 2020-08-11 | 2020-12-15 | 网易有道信息技术(北京)有限公司 | Training method and device of text correction model and text correction method and device |
CN112667208A (en) * | 2020-12-22 | 2021-04-16 | 深圳壹账通智能科技有限公司 | Translation error recognition method and device, computer equipment and readable storage medium |
CN112528683A (en) * | 2020-12-23 | 2021-03-19 | 深圳市爱科云通科技有限公司 | Text translation correction method, device, system, server and readable storage medium |
CN112765968A (en) * | 2021-01-05 | 2021-05-07 | 网易有道信息技术(北京)有限公司 | Grammar error correction method and training method and product for grammar error correction model |
CN113051937A (en) * | 2021-03-19 | 2021-06-29 | 北京大米科技有限公司 | Machine error correction method, device, electronic equipment and readable storage medium |
CN115688703A (en) * | 2022-10-31 | 2023-02-03 | 国网山东省电力公司烟台供电公司 | Specific field text error correction method, storage medium and device |
CN115688703B (en) * | 2022-10-31 | 2024-03-12 | 国网山东省电力公司烟台供电公司 | Text error correction method, storage medium and device in specific field |
CN115455981A (en) * | 2022-11-11 | 2022-12-09 | 合肥智能语音创新发展有限公司 | Semantic understanding method, device, equipment and storage medium for multi-language sentences |
CN115455981B (en) * | 2022-11-11 | 2024-03-19 | 合肥智能语音创新发展有限公司 | Semantic understanding method, device and equipment for multilingual sentences and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108804428A (en) | Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation | |
US7636657B2 (en) | Method and apparatus for automatic grammar generation from data entries | |
CN111241294A (en) | Graph convolution network relation extraction method based on dependency analysis and key words | |
CN109857846B (en) | Method and device for matching user question and knowledge point | |
JP2005115328A (en) | Slot for rule-based grammar and statistical model for preterminal in natural language understanding (nlu) system | |
JP2008165786A (en) | Sequence classification for machine translation | |
JP2008165783A (en) | Discriminative training for model for sequence classification | |
CN106570180A (en) | Artificial intelligence based voice searching method and device | |
CN112052324A (en) | Intelligent question answering method and device and computer equipment | |
CN110930993A (en) | Specific field language model generation method and voice data labeling system | |
CN111339269A (en) | Knowledge graph question-answer training and application service system with automatically generated template | |
CN115357719A (en) | Power audit text classification method and device based on improved BERT model | |
CN112417823B (en) | Chinese text word order adjustment and word completion method and system | |
CN111581968A (en) | Training method, recognition method, system, device and medium for spoken language understanding model | |
CN113821605A (en) | Event extraction method | |
CN116151132A (en) | Intelligent code completion method, system and storage medium for programming learning scene | |
JP3669870B2 (en) | Optimal template pattern search method, search device, and recording medium | |
Quick | Learning production probabilities for musical grammars | |
Carter et al. | Syntactic discriminative language model rerankers for statistical machine translation | |
CN113868382A (en) | Method and device for extracting structured knowledge from Chinese natural language | |
Wang et al. | Combination of CFG and n-gram modeling in semantic grammar learning. | |
Ács et al. | Evaluating contextualized language models for hungarian | |
JP6772394B1 (en) | Information learning device, information processing device, information learning method, information processing method and program | |
CN116483314A (en) | Automatic intelligent activity diagram generation method | |
Le et al. | Automatic quality estimation for speech translation using joint ASR and MT features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181113 |