CN104462072B - The input method and device of computer-oriented supplementary translation - Google Patents

The input method and device of computer-oriented supplementary translation Download PDF

Info

Publication number
CN104462072B
CN104462072B CN201410678005.XA CN201410678005A CN104462072B CN 104462072 B CN104462072 B CN 104462072B CN 201410678005 A CN201410678005 A CN 201410678005A CN 104462072 B CN104462072 B CN 104462072B
Authority
CN
China
Prior art keywords
translation
phrase
candidate
input
input method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410678005.XA
Other languages
Chinese (zh)
Other versions
CN104462072A (en
Inventor
宗成庆
黄国平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201410678005.XA priority Critical patent/CN104462072B/en
Publication of CN104462072A publication Critical patent/CN104462072A/en
Application granted granted Critical
Publication of CN104462072B publication Critical patent/CN104462072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention is a kind of input method of computer-oriented supplementary translation, including step S1:Participle is carried out to source language sentence;Step S2:Obtain the corresponding machine translation translation candidate list of source language sentence and optimal machine translation after participle and translate adopted candidate;Obtain multi-component grammar hint phrase;Step S3:Respond button selection multi-component grammar hint phrase or receive input keystroke sequence, obtain input method phrase candidate;Step S4:Respond after user key-press selection multi-component grammar hint phrase or input method phrase candidate, obtain multi-component grammar hint phrase, repeat step S3, until user completes the translation of typing source language sentence.The present invention also provides the input unit of computer-oriented supplementary translation, and the device includes:Word-dividing mode, translation module, the first generation module, the second generation module, input unit interface.The present invention makes full use of machine translation knowledge, and button saving rate can be made at least to rise 11.04%, the efficiency of human translation is substantially improved.

Description

The input method and device of computer-oriented supplementary translation
Technical field
The present invention relates to natural language processing technique field, more particularly, to a kind of computer-oriented supplementary translation Input method and device.
Background technology
Machine translation is exactly that the conversion between different language is realized with computer.The language being translated is commonly referred to as source language Speech, the object language translated into referred to as object language.Machine translation is exactly to realize the process changed from original language to object language.
Computer-aided translation fully improves translator with a large amount of repetitions or similar sentence and segment Operating efficiency.It is different from machine translation, independent of the automatic translation of computer, but completes entirely to turn in the presence of people Translate process.Computer-aided translation causes heavy manual translation process automation, and translation efficiency is greatly improved and turns over Translate quality.
In recent years, many researchers attempted further to improve the effect of computer-aided translation by machine translation knowledge Rate.The focus studied at present is post-editing, i.e. the translation progress edit operation to machine translation system is high-quality to generate Translation.But allow people's satisfied translation relatively because current machine translation is difficult to produce, directly resulting in interpreter does not have power careful The translation of machine translation is changed, so post-editing is not widely adopted.In addition, having scholar it is proposed that based on interactive machine Interpretation method supplementary translation (for example, see Sergio Barrachinaetc., " Statistical Approaches to Computer-Assisted Translation ", Computational Linguistics, 35 (1), p3-28,2009), with Sacrifice full automatic translation brief and obtain a kind of interpretation method of better quality translation, basic thought is exactly in current translation system Unite on translation result, user points out some mistakes and provides correct translation, is then forwarded to translation system and decodes translation again, repeatedly Untill user's requirement is met after generation is multiple.But interactive interpretation method severe jamming human translation flow, and equally take Arduously, therefore this kind of system is mainly used in that user is limited to the knowledge of object language or feelings that known little about it to object language Under condition.And the main users of computer-aided translation are professional interpreters, so interactive interpretation method is almost never commercially turned over Translate system use.Guy Lapalme and Philippe Langlais are real based on interactive translation framework between 1997-2005 Show TransType translation systems, provide the prompting of subsequent translation in real time in user's input process.But this requires that interpreter is necessary Start translation from left to right, machine translation updates translation result according to the part inputted and accurately carried as far as possible to provide Show.TransType2 after upgrading realizes the translation of three kinds of language pair, i.e. English → Spanish, English → French, English → German, but because being difficult to the flow with reference to human translation, this interactive modes of TransType2 are not used by other systems. Therefore, it is in the urgent need to address one further to improve translation efficiency and translation quality that how research, which combines machine translation knowledge, Individual problem.
The content of the invention
For above-mentioned technical problem, it is a primary object of the present invention to propose a kind of input of computer-oriented supplementary translation Method and apparatus, translation efficiency and translation quality are improved can make full use of machine translation knowledge in input process.
In order to realize the purpose, as one aspect of the present invention, the invention provides a kind of computer-oriented auxiliary The input method of translation, comprises the following steps:
Step S1:Participle is carried out to source language sentence;
Step S2:Using MT engine, the corresponding machine translation translation of the source language sentence after participle is obtained Candidate list, and the highest machine translation translation candidate that will wherein give a mark is output to input unit as optimal machine translation translation Interface;N number of multi-component grammar hint phrase is generated using the top n word of the optimal machine translation translation, and is output to input dress Interface is put, user key-press selection is waited;
Step S3:The multi-component grammar hint phrase that user key-press is selected is responded, or the input of reception user is pressed Key sequence;Using log-linear model, the machine translation translation candidate list and input keystroke sequence are calculated, generation M Input method phrase candidate is simultaneously output to input unit interface, waits user key-press selection;
Step S4:The input method phrase candidate that user key-press is selected is responded, or receives the input of user Keystroke sequence, judges whether user has completed the translation of typing source language sentence, if it is terminates, if otherwise using Typing translation part and the machine translation translation candidate list generate N number of multi-component grammar hint phrase, are output to input unit Interface, waits user key-press selection, and jump to step S3;
Wherein, N, M are positive integer.
Wherein, the multi-component grammar hint phrase includes:First hint phrase is unigram, only comprising a word; Second hint phrase is bi-gram, comprising two words, described two words contain first hint phrase word and second Cue, and first hint phrase word be second hint phrase prefix;By that analogy, the N-1 hint phrase All words are the prefixes of n-th hint phrase, and n-th hint phrase is that N-gram includes N number of word, and wherein N is set in advance Positive integer more than zero, default value is 4.
Also comprise the following steps in step s3:
Step S31:A point word is carried out to input keystroke sequence, the input keystroke sequence after point word is obtained;After described point of word The coding unit that input keystroke sequence is separated by point character is constituted, and each coding unit is that the character input method of correspondence word is encoded The prefix of whole or character input method coding;
Step S32:Input method phrase candidate list is initialized as sky, to every in the input keystroke sequence after described point of word One coding unit is calculated as below successively:
According to character input method coding rule, the coding unit is calculated and obtains target word candidate collection;
Using decoding algorithm to the target word candidate collection, input method phrase candidate list and machine translation translation candidate List is calculated, and obtains new input method phrase candidate list;
Using log-linear model to each input method phrase candidate in the new input method phrase candidate list Given a mark and arranged in descending order;If the length of the new input method phrase candidate list exceedes the threshold value M of setting, only M marking highest input method phrase candidate before retaining;Number of target word candidate that each input method phrase candidate includes etc. In decoded coding unit number, the order for the effective candidate of target word that each input method phrase candidate includes with it is decoded Coding unit sequence consensus;
The input method phrase candidate list is substituted with the new input method phrase candidate list.
Wherein, the feature that the log-linear model is used includes:
(1) typing model probability;
(2) probabilistic language model;
(3) probability of occurrence of the word in input method phrase candidate;
(4) input method phrase candidate probability of occurrence;
(5) word in input method phrase candidate whether the binary feature in machine translation translation candidate;
(6) input method phrase candidate whether the binary feature in machine translation translation candidate;
(7) input method phrase candidate whether the binary feature in user's terminology bank.
Step S33:Complete in the input keystroke sequence after described point of word after the calculating of all coding units, the input The length of method phrase candidate list be M, and by marking descending arrangement, wherein M for it is set in advance be more than zero positive integer, it is default It is worth for 5.
Also comprise the following steps in step s 4:
Step S41:Respond after user key-press selection multi-component grammar hint phrase or input method phrase candidate, to typing Translation part carries out participle and obtains the translation of the typing part after participle;
Step S42:If the optimal machine translation translation includes last of the translation of the typing part after participle Word, then using maximum-prefix matching algorithm, calculate the translation of the typing part after optimal machine translation translation candidate and participle, Generate N number of multi-component grammar hint phrase;
Step S43:If the optimal machine translation translation does not include last of the translation of the typing part after participle Individual word, then select all last words for including the translation of the typing part after participle in machine translation translation candidate list Machine translation translation candidate, obtain suboptimum machine translation translation candidate list, and the highest machine translation that will wherein give a mark is translated Literary candidate is used as suboptimum machine translation translation;Using prefix match algorithm, after suboptimum machine translation translation candidate and participle Typing translation part is calculated, generates N number of multi-component grammar hint phrase.
As another aspect of the present invention, the invention also provides a kind of input of computer-oriented supplementary translation dress Put, the device includes:Word-dividing mode, translation module, the first generation module, the second generation module, input unit interface, wherein:
Word-dividing mode, for typing translation part to generate and exports the source language sentence after participle by source language sentence and With the translation of the typing part after participle;
Translation module is connected with word-dividing mode, using MT engine, obtains the source language sentence pair after participle The machine translation translation candidate list answered, and the highest machine translation translation candidate that will wherein give a mark translates as optimal machine translation Text is output to the module at input unit interface;
First generation module is connected with translation module, input unit interface, for machine translation translation candidate list and Input keystroke sequence to calculate, using log-linear model, generate M input method phrase candidate and be output to input unit interface;
Second generation module is connected with translation module, input unit interface, for being turned over to typing translation part and machine Translation candidate list is calculated, and is generated N number of multi-component grammar hint phrase and is output to input unit interface;
Input unit interface, for showing that optimal machine translation translation, input method phrase candidate and multi-component grammar prompting are short Language, and receive user key-press select command and input keystroke sequence, the translation of typing source language sentence.
As another aspect of the invention, the invention also provides a kind of input of computer-oriented supplementary translation dress Put, including:
The device of participle is carried out to source language sentence;
Using MT engine, the corresponding object language machine translation translation of the source language sentence after participle is obtained Candidate list, the highest that will wherein give a mark machine translation translation candidate generation phrase candidate list, and it is output to input unit circle The device in face;
After the keystroke sequence for receiving user's input, using log-linear model, with reference to machine translation translation candidate row Table, the phrase candidate list of dynamic adjustment in real time and the device for being output to the input unit interface;
User key-press selection is responded, until user completes the device of source language sentence translation.
Wherein, the input unit also includes:
The device that machine translation candidate list obtains N-gram prompting is combined after one phrase of user's typing;And
The N-gram prompting, the device selected for user are shown in interface of input method.
According to the above-mentioned technical solution, methods and apparatus of the present invention has following good effect:
(1) because input method directly influences translation efficiency, by machine translation knowledge and computer-oriented supplementary translation Input method is dissolved into character input method, and can smoothly breaking through existing interactive mode, (such as post-editing, interactive machine is turned over Translate) limitation so that on the premise of Consumer's Experience is not influenceed, more efficiently input method can must further improve interpreter's Translation efficiency and translation quality;
(2) present invention can effectively utilize machine translation knowledge, use the computer-aided translation containing machine translation It is automatic effectively to reduce number of stroking on the premise of normal translation flow is not disturbed during instrument.Turned over by English-Chinese political news Experiment is translated, is as a result shown, relative to Google's spelling input method, stroking for quantization is singly easy for and counts this index, the present invention is at least Button saving rate is risen 11.04%, 11.04% is at least improved equivalent to operating efficiency.If by machine translation translation Interpreter is helped to organize the effect of final translation to count faster, improved efficiency then becomes apparent.
Brief description of the drawings
Fig. 1 is the input method and the general frame figure of device of the computer-oriented supplementary translation of the present invention;
Fig. 2 be the present invention computer-oriented supplementary translation input method and device refinement after general frame figure;
Fig. 3 is that the inventive method and device are embedded into the schematic diagram after computer-aided translation platform;
Fig. 4 is the input keystroke sequence for disabling multi-component grammar hint phrase and enabling two kinds of situations of multi-component grammar hint phrase Contrast schematic diagram;
Fig. 5 is that the present invention combines an example to input keystroke sequence decoding after machine translation knowledge;
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in further detail.
All codes of the present invention, which are realized, to be completed with Java and Apache Flex programming languages, and backstage programs for Java Language, container is Tomcat, and input method foreground is completed with Apache Flex programming languages, and development platform is Ubuntu 12.04 With Windows 7, but not limited to this, these are not limitation of the present invention;Any platform is not used by programming Related code, therefore described system realizes and can also run in the operating system of other versions.This input method is face To computer-aided translation, the input method merged with character input method, non-universal character input method.Specific area of computer aided Translation software, MT engine, character input method are unrestricted.The character input method can be five-stroke input method, phonetic The various character input methods such as input method.
The basic thought of the present invention is rightly to utilize machine translation knowledge, proposes a kind of computer-oriented supplementary translation Input method, to improve the translation quality and translation efficiency of interpreter.The system framework figure of the present invention is as shown in Figure 1.In Fig. 1:Point Word module receives the source language sentence after source language sentence, output participle to translation module;Word-dividing mode has received artificial translation Artificial translation after typing part, output participle typing part to the second generation module;Translation module and word-dividing mode, second Generation module is connected, the corresponding machine translation translation candidate list of source language sentence after output participle to the first generation module; First generation module is connected with translation module, input unit interface, is received the input keystroke sequence of user, machine translation translation and is waited List is selected, generates and exports input method phrase candidate to input unit interface;Second generation module and word-dividing mode, translation module Connection, receives the artificial translation after participle typing part and machine translation translation candidate list, generates and export multi-component grammar Hint phrase is to input unit interface;Input unit interface directly and user mutual, for showing optimal machine translation translation, defeated Enter method phrase candidate and multi-component grammar hint phrase, and receive user key-press select command and input keystroke sequence, typing source language Say the translation of sentence.
It is soft that Fig. 3 gives an example of the present invention (it is assumed that character input method is spelling input method) embedded area of computer aided Schematic diagram after part.Fig. 3 is broadly divided into two pieces of regions of A, B or so.A-quadrant is that machine translation translation candidate list is joined for user Examine, user can set the number of display machine translation translation candidate.B regions are main function region of the present invention.When user is firm When starting typing translation or having the multi-component grammar hint phrase can use, user can pass through enter key or the selection of numerical key 5 to 8 Corresponding prompting, as shown in the B1 of region.In the B2 of region, when no multi-component grammar hint phrase is available, machine translation is still By the present invention user can be helped to improve efficiency:The preferential word by machine translation translation candidate list assigns higher score value, Such as " fl " corresponding " welfare " directly row first place, it is to avoid select the trouble of word.Therefore, the present invention not only can be explicitly by more First syntax hint phrase accelerates translation efficiency, can also come implicitly by real-time putting in order for adjustment input method candidate phrase Accelerate translation efficiency.Unlike translating exchange method from other machines, if the machine translation of a-quadrant is set to invisible State, i.e. user, which are completely dispensed with, comprehends machine translation result, and the present invention can still help user to improve translation efficiency.
The present invention proposes a kind of input method of computer-oriented supplementary translation.We are substituted with spelling input method below The character input method, using English to Chinese translation duties as embodiment, and combines following example to elaborate the present invention Principle and implementation method.
Assuming that source language sentence S:
China mulls change to officials’welfare system
One of machine translation translation candidate MT:
China considers to change ability official's benefit system
Corresponding artificial translation HT:
China considers reform civil servants' welfare system
1st, to source language sentence and, typing translation part carries out participle.Embodiment is as follows:
In this example, there are many kinds to the method that English and Chinese carry out participle.In an embodiment of the present invention we with The participle instrument Urheen increased income carries out participle to English and Chinese.The Urheen can also carry out participle to other Languages, Such as Japanese, can be freely downloaded in following network address:
http://www.openpr.org.cn/index.php/zh/NLP-Toolkit-For-Natural-Langua ge-Processing/68-Urheen-A-Chinese/English-Lexical-Analysis-Toolkit/View-d etails.html
In this example, machine translation translation candidate and artificial translation automatic word segmentation, and with space space between adjacent word.
2nd, using MT engine, the corresponding machine translation translation candidate row of the source language sentence after participle are obtained Table, and the highest machine translation translation candidate that will wherein give a mark is output to input unit interface as optimal machine translation translation; N number of multi-component grammar hint phrase is generated using the top n word of the optimal machine translation translation, and is output to input unit circle Face, waits user key-press selection.
(1) machine translation translation candidate list is obtained.
After the step 1 obtains the source language sentence after participle, it is possible to obtain machine by MT engine Translate translation candidate list, i.e. n-best lists.Using the highest machine translation translation candidate that given a mark in n-best lists as most Excellent machine translation translation is simultaneously output to input unit interface, for reference, waits user's typing human translation translation.Here MT engine can be any translation engine, such as the famous translation engine Moses that increases income, can be in the case where following network address is free Carry:
http://www.statmt.org/moses/N=Moses.Releases
The Moses possesses fairly perfect document, and translating server can be easily disposed according to these documents.
(2) N number of multi-component grammar hint phrase is generated using the top n word of the optimal machine translation translation.
N number of multi-component grammar hint phrase is made up of continuous multiple words, and the multi-component grammar hint phrase includes:First Individual hint phrase is unigram, only comprising a word;Second hint phrase is bi-gram, includes two words, described two Individual word contains before the word and second cue of first hint phrase, and first hint phrase, second hint phrase Sew;By that analogy, all words of the N-1 hint phrase are the prefixes of n-th hint phrase, and n-th hint phrase is N member texts Method include N number of word, wherein N for it is set in advance be more than zero positive integer.N default value is 4 in embodiment, be can customize.Show In example, generating 4 multi-component grammar hint phrases using the top n word of the optimal machine translation translation is:" China ", " China Consideration ", " China's consideration changes ", " China's consideration changes ability ".4 multi-component grammar hint phrases are output to input dress Put behind interface, 4 multi-component grammar hint phrases and its serial number:5. Chinese, 6. China consider, 7. China are considered in change, 8. State considers to change ability.User can by the corresponding multi-component grammar hint phrase of digital key selection corresponding with sequence number, Such as press numerical key " 6 " selection " China considers ".
3rd, response user key-press selects corresponding multi-component grammar hint phrase, or receives the input keystroke sequence of user; Using log-linear model, the machine translation translation candidate list and input keystroke sequence are calculated, M input method of generation is short Language candidate is simultaneously output to input unit interface, waits user key-press selection.
In this example, because character input method used is spelling input method, then the input keystroke sequence refers to user's input Character input method coding be Chinese phonetic alphabet string, such as " China consider " corresponding " zhongguokaolv ".
Step S31:A point word is carried out to input keystroke sequence, the input keystroke sequence after point word is obtained;After described point of word The coding unit that input keystroke sequence is separated by point character is constituted, and each coding unit is that the character input method of correspondence word is encoded The prefix of whole or character input method coding.
Pinyin character string is pressed chinese character, with " ' " cut for point character.Such as pinyin string " zhongguokaolv " is cut Into " pinyin string " zgkl " is cut into " z ' g'k'l " by zhong'guo'kao ' lv ".Word algorithm is divided to use the maximum based on trie trees Prefix match algorithm (detailed description is shown in document D.E.Knuth, " The art of Computer Programming ", vol.1, pp.295-304;" Sorting and Searching ", Fundamental Algorithms, vol.III, pp.481-505, Addison-Wesley Reading Mass, 1973).
Step S32:Input method phrase candidate list is initialized as sky, to every in the input keystroke sequence after described point of word One coding unit is calculated as below successively:
Step S321:According to character input method coding rule, the coding unit is calculated and obtains target word candidate collection. As pinyin string " in z ' g'k'l ", " z " correspondence Chinese character be target word candidate collection ", this, again, in, most, do, word, morning, Make, person ... ", " g " correspondence target word candidate collection " crosses, is somebody's turn to do, gives, it is individual, more, height, with, firm, each, dry, state ... ", " k " is right Answer target word candidate collection " can, see, soon, open, block, examining, empty, fast, visitor ... ", " l " correspondence target word candidate collection " come, Lee, it is inner, old, consider, road, class, woods ... ".
Step S322:The target word candidate collection, input method phrase candidate list and machine are turned over using decoding algorithm Translation candidate list is calculated, and obtains new input method phrase candidate list.
For the present embodiment, decoding refers to the input keystroke sequence after point word (as " China considers " is corresponding " zhong'guo'kao ' lv ") it is converted into the process of corresponding input method phrase candidate.Here input keystroke sequence can be Spelling or simplicity or Two bors d's oeuveres.An object of the present invention is by " zhong'guo'kao ' lv " are this long Keystroke sequence is reduced to that most short " z ' g'k'l ", character input method can not be accomplished when this submits this patent as far as possible.
It is defeated after each coding unit combination because the target word candidate collection search space of each coding unit is very big Enter method phrase number of candidates exponentially to rise, it is necessary to which (such as post searches for decoding algorithm, and detailed description is shown in document using decoding algorithm Och, Franz Josef, Nicola Ueffing, and Hermann Ney, " An EfficientA*Search Algorithm for Statistical Machine Translation ", vol.1, pp.295-304;“Sorting and Searching ", Proceedings ofthe workshop on Data-driven methods in machine Translation-Volume 14.Association for Computational Linguistics, 2001) quickly search The target word Candidate Set of each coding unit of rope merges extension input method phrase candidate.
Step S323:Using log-linear model to each input method in the new input method phrase candidate list Phrase candidate is given a mark and arranged in descending order;If the length of the new input method phrase candidate list exceedes the threshold of setting During value M, M marking highest input method phrase candidate before only retaining;The target word candidate that each input method phrase candidate includes Number be equal to decoded coding unit number, the order for the effective candidate of target word that each input method phrase candidate includes with Decoded coding unit sequence consensus.
In the target word Candidate Set merging extension input method phrase candidate with each coding unit of decoding algorithm fast search During, because the length of input method phrase candidate list exponentially rises, it is therefore necessary to its beta pruning, by its length It is limited within certain limit.During beta pruning, using log-linear model, (detailed description is shown in document Knoke, David, and Peter J.Burke, eds, " Log-linear Models ", vol.20, Sage, 1980) the new input method phrase is waited Each input method phrase candidate in list is selected to be given a mark and arranged in descending order.Arranged with the new input method phrase candidate Table substitutes the input method phrase candidate list.
Assuming that the input keystroke sequence after point word isCorrespondence input method phrase candidate collection is H, The input method phrase candidate of wherein maximum probability isThe corresponding log-linear model of the present invention is:
Wherein, λmFunction weight is characterized, is rule of thumb manually set with actual scene;For following spy Levy function:
(1) typing model probability;
(2) probabilistic language model;
(3) probability of occurrence of the word in input method phrase candidate;
(4) input method phrase candidate probability of occurrence;
(5) word in input method phrase candidate whether the binary feature in machine translation translation candidate;
(6) input method phrase candidate whether the binary feature in machine translation translation candidate;
(7) input method phrase candidate whether the binary feature in user's terminology bank.
Feature (1)-(4) can pass through following seed words library initialization:
http://www.datatang.com/data/45925
Chinese-character phonetic letter table can be downloaded by following address:
http://www.datatang.com/data/11858
Step S33:Complete in the input keystroke sequence after described point of word after the calculating of all coding units, the input The length of method phrase candidate list be M, and by marking descending arrangement, wherein M for it is set in advance be more than zero positive integer.This example In, M value is 5, be can customize.
Phrase candidate list is shown in second row at input unit interface, and every page shows 5, and numbering is 0 to 4, space bar The candidate that selection numbering is 0, operating key (Ctrl) selection is encoded to 1 candidate, the selection correspondence candidate of numerical key 0 to 4.“z’g’ The corresponding results of k ' l " are as shown in Figure 5.
4th, response user key-press is selected after multi-component grammar hint phrase or input method phrase candidate, utilizes typing translation Part and the machine translation translation candidate list generate N number of multi-component grammar hint phrase, and are output to input unit interface, etc. Treat that user key-press is selected, repeat the above steps 3, until user completes the translation of typing source language sentence.
Step S41:Respond after user key-press selection multi-component grammar hint phrase or input method phrase candidate, to typing The handy above-mentioned steps 1 in translation part carry out participle and obtain the translation of the typing part after participle.
Step S42:If the optimal machine translation translation includes last of the translation of the typing part after participle Word, then using maximum-prefix matching algorithm, calculate the translation of the typing part after optimal machine translation translation candidate and participle, Generate N number of multi-component grammar hint phrase.
In this example, after user's input " welfare ", it is prefix matching success with " welfare ", generates new round N-gram Prompting and tool sequence number:5. system.
Step S43:If the optimal machine translation translation does not include last of the translation of the typing part after participle Individual word, then select all last words for including the translation of the typing part after participle in machine translation translation candidate list Machine translation translation candidate, obtain suboptimum machine translation translation candidate list, and the highest machine translation that will wherein give a mark is translated Literary candidate is used as suboptimum machine translation translation;Using prefix match algorithm, after suboptimum machine translation translation candidate and participle Typing translation part is calculated, generates N number of multi-component grammar hint phrase.
It can be disabled according to actual conditions or enable multi-component grammar hint phrase, Fig. 4 is with having illustrated two kinds of situations Contrast.In Fig. 4, left figure is the situation of disabling multi-component grammar hint phrase, and right figure is the feelings for enabling multi-component grammar hint phrase Shape.
The input method for the above-mentioned computer-oriented supplementary translation that the present invention is provided realized by computer software, Accordingly, the invention also provides a kind of input unit of computer-oriented supplementary translation, be illustrated in figure 2 the present invention towards The system framework figure of the input unit of computer-aided translation, input unit of the invention includes:Word-dividing mode, translation module, First generation module, the second generation module, input unit interface, wherein:
Word-dividing mode, for typing translation part to generate and exports the source language sentence after participle by source language sentence and With the translation of the typing part after participle, method shown in the step 1 in the input method of the invention of above-mentioned introduction can be passed through All kinds of participle instruments including Urheen are called to carry out participle;
Translation module is connected with word-dividing mode, using MT engine, obtains the source language sentence pair after participle The machine translation translation candidate list answered, and the highest machine translation translation candidate that will wherein give a mark translates as optimal machine translation Text is output to the module at input unit interface;
First generation module is connected with translation module, input unit interface, for machine translation translation candidate list and Input keystroke sequence and carry out the method calculating as shown in above-mentioned step 2, utilize log-linear model, M input method phrase of generation is waited Select and be output to input unit interface;
Second generation module is connected with translation module, input unit interface, for being turned over to typing translation part and machine Translation candidate list carries out the method as shown in above-mentioned step 3 and calculated, and generates N number of multi-component grammar hint phrase and is output to input Device interface;
Input unit interface, for showing that optimal machine translation translation, input method phrase candidate and multi-component grammar prompting are short Language, and receive user key-press select command and input keystroke sequence, the translation of typing source language sentence.
As a preferred embodiment of the present invention, the invention also provides a kind of input of computer-oriented supplementary translation Device, including:
The device of participle is carried out to source language sentence, can be by the step in the input method of the invention of above-mentioned introduction All kinds of participle instruments of the method call shown in 1 including Urheen carry out participle;
Using MT engine, the corresponding object language machine translation translation of the source language sentence after participle is obtained Candidate list, the highest that will wherein give a mark machine translation translation candidate generation phrase candidate list, and it is output to input unit circle The device in face;Described device can obtain machine translation candidate list, i.e. n-best lists by method shown in above-mentioned steps 2;
After the keystroke sequence for receiving user's input, using log-linear model, with reference to machine translation translation candidate row Table, the phrase candidate list of dynamic adjustment in real time and the device for being output to the input unit interface;
User key-press selection is responded, until user completes the device of source language sentence translation.
Preferably, the input unit of computer-oriented supplementary translation of the invention also includes:When one phrase of user's typing The device of N-gram prompting is obtained with reference to machine translation candidate list afterwards;And show that the N-gram is carried in interface of input method Show, the device selected for user.
As a preferred embodiment of the present invention, the invention also provides a kind of input of computer-oriented supplementary translation Device, visualized graph interface as shown in figure 1, including:
The device of participle is carried out to source language sentence;
Using MT engine, the corresponding machine translation translation candidate row of the source language sentence after participle are obtained Table, and will wherein give a mark highest machine translation translation candidate as optimal machine translation translation be output to input unit interface, N number of multi-component grammar hint phrase is generated using the top n word of the optimal machine translation translation, and is output to input unit circle Face, waits the device of user key-press selection;
To user key-press select multi-component grammar hint phrase respond, or reception user input keystroke sequence, The machine translation translation candidate list and input keystroke sequence are calculated using log-linear model, M input method of generation is short Language candidate is simultaneously output to input unit interface, waits the device of user key-press selection;
The input method phrase candidate that user key-press is selected is responded, or receives the input of user by bond order Row, are judged whether user has completed the translation of typing source language sentence, if it is terminated, if otherwise translated using typing Literary part and the machine translation translation candidate list generate N number of multi-component grammar hint phrase, are output to input unit interface, etc. Treat that user key-press selects and circulated the device for performing above-mentioned response of step;
Wherein, N, M are positive integer.
5th, Setup Experiments
In order to verify whether the present invention can increase considerably translation efficiency, from the auxiliary translation system of privately owned assistance translation platform member (http://cotrans.me) in randomly selected comprising 4,040 to translation daily record, and be randomly divided into two groups, often Group is right comprising 2020.Every group is randomized into development set (1,000 to) and test set (1,020 to) again.The auxiliary translation of member Machine translation system in system is that phrase-based translation model is realized.The ginseng free ZMERT that increases income is adjusted, can be by following Download address:
http://joshua-decoder.org/4.0/zmert.html
Be set to evaluation and test index parameter during development set tune ginseng "-m BLEU4 shortest " (for example, see Papineni, Kishore., Roukos, Salim, Ward, Todd, and Zhu Wei-Jing, " BLEU:a method for automatic Evaluation of machine translation ", In Proc.of ACL, 2002).Baseline system is Google's cloud translation Input method, can pass through following links and accesses:
http://www.google.com/inputtools/try/
The evaluation index used is button saving rate (keystroke savings rate, KSR).Because of different translation systems The translation number of candidates of output may be inconsistent, in order to avoid this species diversity, and this experiment is only used each source language sentence Highest is divided to translate candidate as reference, calculation formula is as follows:
Google's cloud translating input method:
The present invention:
Wherein, T is the artificial translation sentence set of Chinese, and C is the corresponding optimal machine translation collection of translations of all english sentences Close,For Chinese artificial translation, m is the number of the artificial translation word, and c is that optimal machine translation is translated Text.mknorm(t) represent if with the spelling input method minimum number of stroking that word for word the Chinese sentence t of typing needs;Mk (t) is represented In the case that machine translation translation is consistent with artificial translation, the minimum number of stroking that artificial translation t needs is inputted using the present invention; kGoogle(t) represent to input the number of actually stroking that artificial translation t needs using Google's cloud input method;K (c, t) represents to refer to machine Translation c is translated, the number of actually stroking needed with the Chinese sentence t of present invention input.For Chinese, reference literature Wei Cui, " Evaluation of Chinese Character Keyboards ", Computer, 18 (1), pp.54-59,1985, just like Lower formula:
Wherein, len (ti) it is word tiChinese character number.Mk (t) value can be calculated by equation below:
Wherein, N represents the number of N-gram prompting, is defaulted as 4;Sl represents the number of separator between word and word, such as right In Chinese sl=0, for English sl=1;Sp represents to select that some word needs by bond number, normal conditions from input method result Under, sp=1.
The value of button saving rate is the decimal between 0 to 1, and 0 represents that button can not be saved completely, and 1 represents to reach preferable shape State, button can not be reduced again.
6th, experimental result
Table 1 gives the present invention and performance of Google's cloud input method in two groups of test datas.It will be seen that this hair Bright button saving rate has been respectively increased 11.04%, 11.26% relative to Google's cloud input method in two groups of test datas.This The validity and superiority of the input method of computer-oriented supplementary translation are absolutely proved.
In a word, test result indicates that the input method and device of the computer-oriented supplementary translation of the present invention can be fully effective Using machine translation knowledge, the input speed and translation efficiency of professional interpreter can be greatly improved.
The button saving rate (%) of the present invention of table 1 and Google's input method
Experimental group Google's cloud input method The present invention
1 37.40 48.44
2 36.44 47.70
Because the method for the present invention is not proposed for two kinds of specific language, so methods and apparatus of the present invention With universal applicability.Although the present invention is only translated in English to Chinese and tested on direction and spelling input method, The present invention is also applied for other Languages pair and other character input methods, such as Chinese to English, English to French Translator direction simultaneously With five-stroke input method etc..
Particular embodiments described above, has been carried out further in detail to the purpose of the present invention, technical scheme and beneficial effect Describe in detail bright, it should be understood that the foregoing is only the present invention specific embodiment, be not intended to limit the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc. should be included in the protection of the present invention Within the scope of.

Claims (6)

1. a kind of input method of computer-oriented supplementary translation, comprises the following steps:
Step S1:Participle is carried out to source language sentence;
Step S2:Using MT engine, the corresponding machine translation translation candidate of the source language sentence after participle is obtained List, and the highest machine translation translation candidate that will wherein give a mark is output to input unit circle as optimal machine translation translation Face;Using the initial N number of multi-component grammar hint phrase of the top n word generation of the optimal machine translation translation, and it is output to defeated Enter device interface, wait user key-press selection;Wherein, N number of multi-component grammar hint phrase is carrying for continuous multiple word compositions Show phrase, the hint phrase includes:First hint phrase is unigram, only comprising a word;Second hint phrase For bi-gram, comprising two words, described two words contain the word and second cue of first hint phrase, and first The word of hint phrase is the prefix of second hint phrase;By that analogy, all words of the N-1 hint phrase are that n-th is carried Show the prefix of phrase, n-th hint phrase is that N-gram includes N number of word;
Step S3:The multi-component grammar hint phrase that user key-press is selected is responded, or receives the input of user by bond order Row;Using log-linear model, the machine translation translation candidate list and input keystroke sequence are calculated, M input is generated Method phrase candidate is simultaneously output to input unit interface, waits user key-press selection;
Step S4:The input method phrase candidate that user key-press is selected is responded, or receives the input button of user Sequence, judges whether user has completed the translation of typing source language sentence, if it is terminates, if otherwise utilizing typing Translation part and the machine translation translation candidate list carry out N number of multi-component grammar prompting after maximum-prefix matching generation updates Phrase, is output to input unit interface, waits user key-press selection, and jump to step S3;
Wherein, N, M are positive integer set in advance.
2. the input method of computer-oriented supplementary translation according to claim 1, it is characterised in that the utilization logarithm Linear model, calculates machine translation translation candidate list and input keystroke sequence, generates M input method phrase candidate, including Following steps:
Step S31:A point word is carried out to input keystroke sequence, the input keystroke sequence after point word is obtained;Input after described point of word The coding unit that keystroke sequence is separated by point character is constituted, and each coding unit is the whole of the character input method coding of correspondence word Or the prefix of character input method coding;
Step S32:Input method phrase candidate list is initialized as sky, to each in the input keystroke sequence after described point of word Coding unit is calculated as below successively:
According to character input method coding rule, the coding unit is calculated and obtains target word candidate collection;
Using decoding algorithm to the target word candidate collection, input method phrase candidate list and machine translation translation candidate list Calculate, obtain new input method phrase candidate list;
Each input method phrase candidate in the new input method phrase candidate list is carried out using log-linear model Give a mark and arrange in descending order;If the length of the new input method phrase candidate list exceedes the threshold value M of setting, only retain Preceding M marking highest input method phrase candidate;The number for the target word candidate that each input method phrase candidate includes is equal to The coding unit number of decoding, order and the decoded coding of the effective candidate of target word that each input method phrase candidate includes Sequence of unit is consistent;
The input method phrase candidate list is substituted with the new input method phrase candidate list;
Step S33:Complete in the input keystroke sequence after described point of word after the calculating of all coding units, the input method is short The length of language candidate list be M, and by marking descending arrangement, wherein M for it is set in advance be more than zero positive integer.
3. the input method of computer-oriented supplementary translation according to claim 2, it is characterised in that the log-linear The feature that model is used includes:
(1) typing model probability;
(2) probabilistic language model;
(3) probability of occurrence of the word in input method phrase candidate;
(4) input method phrase candidate probability of occurrence;
(5) word in input method phrase candidate whether the binary feature in machine translation translation candidate;
(6) input method phrase candidate whether the binary feature in machine translation translation candidate;
(7) input method phrase candidate whether the binary feature in user's terminology bank.
4. the input method of computer-oriented supplementary translation according to claim 1, it is characterised in that described utilize has been recorded Enter translation part and the machine translation translation candidate list carries out N number of multi-component grammar after maximum-prefix matching generation updates and carried The step of showing phrase, specifically include following steps:
Step S41:Respond after user key-press selection multi-component grammar hint phrase or input method phrase candidate, to typing translation Part carries out participle and obtains the translation of the typing part after participle;
Step S42:If the optimal machine translation translation includes last word of the translation of the typing part after participle, Using maximum-prefix matching algorithm, the translation of the typing part after optimal machine translation translation candidate and participle is calculated, generation N number of multi-component grammar hint phrase after renewal;
Step S43:If the optimal machine translation translation does not include last word of the translation of the typing part after participle, The machine of all last words for including the translation of the typing part after participle is then selected in machine translation translation candidate list Device translates translation candidate, obtains suboptimum machine translation translation candidate list, and the highest machine translation translation time that will wherein give a mark It is elected to be as suboptimum machine translation translation;Using prefix match algorithm, to the record after suboptimum machine translation translation candidate and participle Enter the calculating of translation part, N number of multi-component grammar hint phrase after generation renewal.
5. a kind of computer-oriented supplementary translation of the input method of the computer-oriented supplementary translation described in usage right requirement 1 Input unit, it is characterised in that the device includes:Word-dividing mode, translation module, the first generation module, the second generation module, Input unit interface, wherein:
Word-dividing mode, generated for by source language sentence and typing translation part and export the source language sentence after participle and point The translation of typing part after word;
Translation module is connected with word-dividing mode, using MT engine, obtains the source language sentence after participle corresponding Machine translation translation candidate list, and the highest machine translation translation candidate that will wherein give a mark is defeated as optimal machine translation translation Go out the module to input unit interface;
First generation module is connected with translation module, input unit interface, for machine translation translation candidate list and input Keystroke sequence is calculated, using log-linear model, is generated M input method phrase candidate and is output to input unit interface;
Second generation module is connected with translation module, input unit interface, for translating typing translation part and machine translation Literary candidate list is calculated, and is carried out N number of multi-component grammar hint phrase after maximum-prefix matching generation updates and is output to input dress Put interface;
Input unit interface, for showing optimal machine translation translation, input method phrase candidate and multi-component grammar hint phrase, and Receive user key-press select command and input keystroke sequence, the translation of typing source language sentence.
6. a kind of input unit of computer-oriented supplementary translation, including:
The device of participle is carried out to source language sentence;
Using MT engine, the corresponding object language machine translation translation candidate of the source language sentence after participle is obtained List, and the highest machine translation translation candidate that will wherein give a mark is output to input unit circle as optimal machine translation translation Face, using the top n word of the optimal machine translation translation initial N number of multi-component grammar hint phrase is generated, and be output to defeated Enter device interface, wait the device of user key-press selection;Wherein, N number of multi-component grammar hint phrase is continuous multiple phrases Into hint phrase, the hint phrase includes:First hint phrase is unigram, only comprising a word;Second carries It is bi-gram to show phrase, and comprising two words, described two words contain the word and second cue of first hint phrase, and The word of first hint phrase is the prefix of second hint phrase;By that analogy, all words of the N-1 hint phrase are The prefix of N number of hint phrase, n-th hint phrase is that N-gram includes N number of word;
The multi-component grammar hint phrase that user key-press is selected is responded, or receives the input keystroke sequence of user, is utilized Log-linear model is calculated the machine translation translation candidate list and input keystroke sequence, and M input method phrase of generation is waited Select and be output to input unit interface, wait the device of user key-press selection;
The input method phrase candidate that user key-press is selected is responded, or receives the input keystroke sequence of user, is sentenced Whether disconnected user has completed the translation of typing source language sentence, if it is terminates, if otherwise utilizing typing translation portion Divide and the machine translation translation candidate list generates N number of multi-component grammar hint phrase after updating, be output to input unit circle Face, waits user key-press to select and circulate the device for performing above-mentioned response of step;
Wherein, N, M are positive integer set in advance.
CN201410678005.XA 2014-11-21 2014-11-21 The input method and device of computer-oriented supplementary translation Active CN104462072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410678005.XA CN104462072B (en) 2014-11-21 2014-11-21 The input method and device of computer-oriented supplementary translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410678005.XA CN104462072B (en) 2014-11-21 2014-11-21 The input method and device of computer-oriented supplementary translation

Publications (2)

Publication Number Publication Date
CN104462072A CN104462072A (en) 2015-03-25
CN104462072B true CN104462072B (en) 2017-09-26

Family

ID=52908138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410678005.XA Active CN104462072B (en) 2014-11-21 2014-11-21 The input method and device of computer-oriented supplementary translation

Country Status (1)

Country Link
CN (1) CN104462072B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920472A (en) * 2018-07-04 2018-11-30 哈尔滨工业大学 A kind of emerging system and method for the machine translation system based on deep learning

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069000A (en) * 2015-08-24 2015-11-18 中译语通科技(北京)有限公司 Interactive prediction input method
CN107870900B (en) * 2016-09-27 2023-04-18 松下知识产权经营株式会社 Method, apparatus and recording medium for providing translated text
CN106649293A (en) * 2016-12-28 2017-05-10 语联网(武汉)信息技术有限公司 Translation method and translation system
CN107123318B (en) * 2017-03-30 2020-05-08 河南工学院 Foreign language writing learning system based on input method device
CN107885729B (en) * 2017-09-25 2021-05-11 沈阳航空航天大学 Interactive machine translation method based on bilingual fragments
CN108829686B (en) * 2018-05-30 2022-04-15 北京小米移动软件有限公司 Translation information display method, device, equipment and storage medium
US11328132B2 (en) * 2019-09-09 2022-05-10 International Business Machines Corporation Translation engine suggestion via targeted probes
CN111090460B (en) * 2019-10-12 2021-05-04 浙江大学 Code change log automatic generation method based on nearest neighbor algorithm
CN111079449B (en) * 2019-12-19 2023-04-11 北京百度网讯科技有限公司 Method and device for acquiring parallel corpus data, electronic equipment and storage medium
CN111339788B (en) 2020-02-18 2023-09-15 北京字节跳动网络技术有限公司 Interactive machine translation method, device, equipment and medium
CN111597826B (en) * 2020-05-15 2021-10-01 苏州七星天专利运营管理有限责任公司 Method for processing terms in auxiliary translation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253930A (en) * 2010-05-18 2011-11-23 腾讯科技(深圳)有限公司 Method and device for translating text
CN102662933A (en) * 2012-03-28 2012-09-12 成都优译信息技术有限公司 Distributive intelligent translation method
CN103955457A (en) * 2014-05-20 2014-07-30 陈北宗 Machine-aided literature translation program

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8843359B2 (en) * 2009-02-27 2014-09-23 Andrew Nelthropp Lauder Language translation employing a combination of machine and human translations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253930A (en) * 2010-05-18 2011-11-23 腾讯科技(深圳)有限公司 Method and device for translating text
CN102662933A (en) * 2012-03-28 2012-09-12 成都优译信息技术有限公司 Distributive intelligent translation method
CN103955457A (en) * 2014-05-20 2014-07-30 陈北宗 Machine-aided literature translation program

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920472A (en) * 2018-07-04 2018-11-30 哈尔滨工业大学 A kind of emerging system and method for the machine translation system based on deep learning

Also Published As

Publication number Publication date
CN104462072A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN104462072B (en) The input method and device of computer-oriented supplementary translation
Vogel et al. The CMU statistical machine translation system
US20080040095A1 (en) System for Multiligual Machine Translation from English to Hindi and Other Indian Languages Using Pseudo-Interlingua and Hybridized Approach
Mauser et al. Extending statistical machine translation with discriminative and trigger-based lexicon models
US20050216253A1 (en) System and method for reverse transliteration using statistical alignment
Wu et al. Inversion transduction grammar constraints for mining parallel sentences from quasi-comparable corpora
CN105068997B (en) The construction method and device of parallel corpora
JP2000353161A (en) Method and device for controlling style in generation of natural language
CN105573994B (en) Statictic machine translation system based on syntax skeleton
KR102043353B1 (en) Apparatus and method for recognizing Korean named entity using deep-learning
KR100911372B1 (en) Apparatus and method for unsupervised learning translation relationships among words and phrases in the statistical machine translation system
CN106156013A (en) The two-part machine translation method that a kind of regular collocation type phrase is preferential
Tomás et al. Statistical phrase-based models for interactive computer-assisted translation
Weerasinghe A statistical machine translation approach to sinhala-tamil language translation
Slayden et al. Thai sentence-breaking for large-scale SMT
Ney et al. Improving word alignment quality using morpho-syntactic information
Sin et al. Attention-based syllable level neural machine translation system for myanmar to english language pair
Alabau et al. Multimodal interactive machine translation
JP2005506635A (en) Computer controlled coder / decoder not limited by language or method
Devi et al. Steps of pre-processing for english to mizo smt system
JP2013186673A (en) Machine translation device and machine translation program
Dasgupta et al. Resource creation and development of an English-Bangla back transliteration system
Braune et al. Rule selection with soft syntactic features for string-to-tree statistical machine translation
WO2024004183A1 (en) Extraction device, generation device, extraction method, generation method, and program
WO2024004184A1 (en) Generation device, generation method, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant