CN106156013A - The two-part machine translation method that a kind of regular collocation type phrase is preferential - Google Patents

The two-part machine translation method that a kind of regular collocation type phrase is preferential Download PDF

Info

Publication number
CN106156013A
CN106156013A CN201610522056.2A CN201610522056A CN106156013A CN 106156013 A CN106156013 A CN 106156013A CN 201610522056 A CN201610522056 A CN 201610522056A CN 106156013 A CN106156013 A CN 106156013A
Authority
CN
China
Prior art keywords
phrase
translation
sentence
word
regular collocation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610522056.2A
Other languages
Chinese (zh)
Other versions
CN106156013B (en
Inventor
秦科
刘贵松
罗光春
段贵多
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201610522056.2A priority Critical patent/CN106156013B/en
Publication of CN106156013A publication Critical patent/CN106156013A/en
Application granted granted Critical
Publication of CN106156013B publication Critical patent/CN106156013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of reach the machine translation method of whole sentence translation purpose by preferentially translating the regular collocation type phrase being made up of one or more phrase nestings.Embodiments of the present invention, comprise the following steps: labelling regular collocation, the regular collocation in source language sentence are marked;Translation regular collocation, is divided into two parts to translate respectively regular collocation, then translation of recombinating;Phrase divides, and former sentence remainder is divided into all possible phrase, and regular collocation part is as translator unit;Structure candidate phrase table, the phrase that only will be present in phrase translation probability tables screens, and adds candidate phrase table;Sentence translation, the source language sentence of the local translation for being made up of regular collocation translation and other untranslated parts, utilizing existing decoding heuristic device and candidate phrase table is that it generates optimum translation.The first stage of the present invention is translation regular collocation type phrase, and second stage is translation of the sentence remainder.

Description

The two-part machine translation method that a kind of regular collocation type phrase is preferential
Technical field
The present invention relates to machine translation field, the two-part statistical machine being specifically related to a kind of preferential translation regular collocation turns over Translate method.
Background technology
Statistical machine translation is the interpretation method of a kind of data-driven, and it regards the translation of natural language as machine learning as Problem, models translation with mathematical model, and utilizes the bilingual teaching mode possessing certain scale to train this model and ginseng Number, finally uses this model to generate the translation of maximum probability.Compare rule-based interpretation method, statistical machine translation without Needing human expert to write translation rule, its translation rule can be obtained from Parallel Corpus automatically by training process.Additionally Statistical machine translation has language independence, as long as providing the Parallel Corpus of corresponding language pair, statistical machine translation just can be instructed Practise the translation model of correspondence, it is not necessary to interpretation method is made the amendment of essence.Three kinds of statistical machine translation sides mentioned above Method is the statistical machine translation method of current main flow, has many open source software and the instruments carrying out realizing of comparison for these three method Bag, its translation effect has reached certain level, has developed more ripe.From above feature it can be seen that statistical machine Translation system is relatively more flexible, exploitation cost is low, better performances, is the interpretation method being widely used at present.
Statistical machine translation extracting phrase from bilingual teaching mode based on phrase, obtains phrase translation probability tables, Phrase therein refers to continuous print word string.Phrase in phrase translation probability tables contains word bulk, the most adjacent Sequence, had both comprised and had had the idiom of grammatical meaning, regular collocation, also comprised some words constituting sentence without grammatical meaning Word order arranges.This interpretation method has extremely strong language independence and is made without the syntactic analysis of complexity, moreover it is possible to obtain one Fixed effect, is the machine translation method being the most relatively suitable for some rare foreign languages.
But the machine translation method added up based on phrase also has its inherent deficiency.On the one hand, although phrase bag Contain substantial amounts of contextual information, naturally solved the problem that the internal word of phrase puts in order, but the method is for sentence A kind of phrase divide in the order of multiple phrases adjust unsatisfactory, the particularly languages that differ greatly of sentence constituent order Translation situation, here it is so-called remote tune sequence problem;On the other hand, in translation process, for a concrete phrase, The translation finding correspondence from phrase translation probability tables is to use the method mated completely, if i.e. can be from phrase translation probability tables In find the same phrase, then can obtain the translation of correspondence;If can not find, then can not translate.As a example by Chinese-English translation, for Concrete phrase " with the most consistent ", if cannot find the phrase item of " with the most consistent ", i.e. in phrase translation probability tables Make existence differ from the phrase " with the most consistent " of a word, still cannot translate this phrase, here it is so-called Sparse Problem; Finally, in phrase translation probability tables, phrase is continuous phrase, however practical language also comprises a class word discontinuous but Having a regular collocation of grammatical meaning, such as some comprises the regular collocation of preposition, such as " with ... consistent " this kind of discontinuous short Language, because the phrase length in the method is restricted, so wherein phrase generally cannot intactly comprise this kind of discontinuous solid Fixed collocation, thus cause the reduction translating effect, as a example by Chinese-English translation, it is assumed that on the phrase length in phrase translation probability tables Be limited to 4, then for " identical of views with mentioned above ", this comprises the phrase of 6 words, can't be present in phrase translation In probability tables, there is " with mentioned above " this kind of imperfect phrase on the contrary, thus final translation just with correct translation " consistent with opinion mentioned above " difference.That is statistical machine based on phrase Interpretation method cannot translate " with ... consistent " this kind of non-continuous phrase.
Wherein latter two defect is more significantly when corpus scale is less, thus explores and the most deeply excavate existing language Material, make full use of the such problem of limited language material and be very important.
Summary of the invention
For above-mentioned prior art, present invention aim at providing a kind of machine translation method based on statistics, it is intended to Solve in prior art statistical machine translation based on phrase because corpus scale is limited and its restriction is extracted phrase length The Sparse Problem caused.
For reaching above-mentioned purpose, the technical solution used in the present invention is as follows:
The two-part machine translation method that a kind of regular collocation type phrase is preferential, comprises the steps,
Step 1, in phrase template base, go out present in source language sentence to be translated according to phrase template mark therein Regular collocation;
Step 2, in regular collocation, it is thus achieved that the word translation matched with phrase template, then will with phrase template mutually Remainder after the word removal joined, as new sentence to be translated, sends the sentence to be translated that iteration must make new advances to step 1 The translation of son, is merged into the translation of regular collocation by the word translation that this translation and phrase template match;
Step 1 and 2 is the first stage of translation;
Step 3, obtain bilingual phrase translation probability tables, then will be not labeled in source language sentence to be translated remainder Carry out phrase division, and retrieve in bilingual phrase translation probability tables according to the phrase divided and mate;
If the phrase that step 4 is divided mates, then by bilingual phrase completely with phrase in bilingual phrase translation probability tables The corresponding phrase matched in translation probability table is as the candidate phrase of decoding process;
Step 5, by the translation correspondence of regular collocation is replaced source language sentence to be translated, obtain partial translation wait turn over Translate sentence, and utilize decoding heuristic device that the sentence to be translated of partial translation is translated according to candidate phrase, ultimately produce Translation.
Step 3,4 and 5 are the second stage of translation.
In said method, described step 1, its phrase template includes terminal symbol and nonterminal symbol, and terminal symbol is taken for fixing The trunk word joined, nonterminal symbol is the replaceable part of regular collocation.
In said method, described step 1, the step of labelling regular collocation includes:
Whether step 1.1, the word traveled through in source language sentence to be translated, exist with a word in retrieval phrase template base The phrase template started, if it exists, then as the beginning of the phrase template of labelling and performs next step using this word Suddenly;
Step 1.2, from the beginning of this word, travel through source language sentence remaining part to be translated, whether retrieval exists another word Language can mate the word of the phrase template remainder of current markers, if it is present short as labelling of the word obtained The ending of language template, and correspondence markings goes out the regular collocation in source language sentence to be translated.
In said method, described step 2, wherein,
Obtaining regular collocation translation is as a new sentence to be translated using the replaceable part of regular collocation, by repeatedly The mode in generation obtains translation;
Further according to the word position corresponding relation in regular collocation, replaceable part translation is matched with phrase template Word translation merge and obtain the translation of regular collocation.
In said method, the bilingual phrase translation probability tables obtaining step in described step 3 includes:
Step 3.1, bilingual teaching mode is carried out word alignment training, it is thus achieved that comprise the corpus of word alignment information;
Step 3.2, from the corpus obtained extracting phrase pair, it is thus achieved that bilingual phrase translation probability tables.
In said method, described above-mentioned steps 3.2 can be divided into following steps:
Step 3.2.1, word alignment from obtained corpus sentence in extracting phrase pair;
The translation probability of the phrase pair that step 3.2.2, calculating extract, it is thus achieved that phrase translation probability tables.
In said method, described step 3.2.2, its translation probability comprises forward and reverse phrase translation probability and forward and reverse Lexical probability.
In said method, described step 5, its step includes,
Step 5.1, by the translation correspondence of regular collocation is replaced source language sentence to be translated, obtain treating of partial translation Translation of the sentence;
Decoding heuristic device given in the sentence to be translated of step 5.2, candidate phrase screening obtained and partial translation, Decoding heuristic device generates translation.
Compared with prior art, beneficial effects of the present invention:
The present invention by a fairly large number of regular collocation of word in prior extraction source language sentence and by it premature translation, Thus compensate for phrase translation model and complicated phrase is adjusted the deficiency of sequence, overcome simultaneously and make phrase because phrase length limits Cannot completely cover this defect of longer regular collocation, thus improve translation effect;The present invention is by deeply excavating language material Storehouse obtains phrase template, and uses outer template, is sufficiently used limited corpus, and the use of template is delayed to a certain extent Solve Sparse Problem.The present invention can utilize existing decoding heuristic device to generate translation.
Accompanying drawing explanation
Fig. 1 is the translation schematic diagram of the present invention;
Fig. 2 is that the three of the present invention trains greatly process schematic;
Fig. 3 is the acquisition of the phrase translation probability tables of the present invention;
Fig. 4 is the bilingual corpora preprocessing process of the present invention;
Fig. 5 is the phrase extraction process of the present invention;
Fig. 6 is the sentence translation of the present invention.
Detailed description of the invention
All features disclosed in this specification, or disclosed all methods or during step, except mutually exclusive Feature and/or step beyond, all can combine by any way.
The present invention will be further described below in conjunction with the accompanying drawings:
Embodiment 1
Statistical machine translation based on phrase comprises training and translation two parts, and training part mainly obtains decoder institute The model needed, wherein the phrase translation probability tables in step S3 is i.e. separately won by training department;Obtain the instructions such as phrase translation probability tables After practicing result, decoder utilizes the training results such as phrase translation probability tables to treat translation of the sentence and translates.
1, being embodied as of training part is as follows:
Training mainly includes three parts, i.e. translation model training, language model training and tuning training, referring specifically to Fig. 2; Those skilled in the art are appreciated that translation model training mainly obtains phrase translation probability tables, existing training side Formula exists multiple, and one of which is as it is shown on figure 3, be divided into three below step:
Step 301, bilingual corpora pretreatment.See Fig. 4, be first word segmentation processing, for not having the language of natural participle to need Participle instrument to be utilized carries out participle;Then sentence filters, and sentence each in the corpus after participle is carried out length filtration, this One step will give up the word number sentence more than 30, and shorter sentence can obtain more preferable result;It is followed by changing double byte character For half-angle character, sentence in the corpus obtained is carried out code conversion corpus can be made unified and standard after filtering.
Step 302, word alignment is trained.Word alignment is the technology of a comparative maturity, utilizes Peter in the present embodiment Expectation-maximization algorithm in Brown paper obtains the A language word pair to B language in an iterative manner from Parallel Corpus Should be related to.The step for use be through the pretreated corpus of bilingual corpora, utilization be free use achieve IBM The word alignment software GIZA++ of model.In order to obtain the more word alignment of symmetry, carry out A language to B language first with GIZA++ The word alignment of speech, then carry out the B language word alignment to A language, after this two-way word alignment, apply heuristic grow- Diag-final obtains the symmetrical word corresponding relation of multi-to-multi.Lexical translation can be counted general by this word alignment relation Rate, i.e. w (e | f) and w (f | e), represent the probability that bilingual word is translated each other.Word alignment information is taken out for follow-up phrase Take process.
Step 303, phrase extraction.Phrase extraction be extraction translation rule core procedure, the step for utilize word alignment Information come extracting phrase to and calculating probability obtain phrase translation probability tables.Seeing Fig. 5, this step comprises the steps of:
First, bilingual phrase is extracted.From the beginning of the word of beginning of the sentence, travel through possible phrase combination, and believed by word alignment Breath judges that current phrase combines the phrase obtained to whether meeting in word alignment concordance, i.e. word in A phrase and B phrase Word is at least mutually aligned and can not snap to the word in other phrases.Such as, bilingual sentence is to " Great Wall starts to repair from the Qin Dynasty Build, the great wall was built since qin dynasty ", wherein word alignment information is " 1:1 1:2 1:3 2:6 3:7 3:8 4:6 5:4 5:5 ", then according to this word alignment information can extracting phrase to " Great Wall | | | the great wall | | | 1:1 1:2 1:3 ", " Qin Dynasty | | | qin dynasty | | | 3:7 4:8 ", " start to build from the Qin Dynasty | | | was built Since qin dynasty " etc..
Then, phrase is calculated to translation probability.Comprise forward phrase translation probability φ (e | f), reverse phrase translation probability φ (f | e), forward Lexical translation probability lex (e | f) and reverse Lexical translation probability lex (f | e) etc..
Phrase translation probability φ (f | e) represent that phrase e translates into the probability of phrase f, computational methods are as follows:
φ ( f | e ) = c o u n t ( ( e , f ) ) Σ k = 1 K c o u n t ( ( e , f k ) )
Wherein, count ((e, fk)) represent that phrase is to (e, fk) number of times of appearance in whole corpus, K represents and target The number of the source phrase of language phrase e alignment.In like manner can calculate φ (e | f).
The general lex of Lexical translation (e | f) represent that phrase f translates into the Lexical probability of phrase e, computational methods are as follows:
l e x ( e | f , A ) = Π i = 1 l e n g t h ( e ) 1 | { j | ( i , j ) ∈ A } | Σ ∀ ( i , j ) ∈ A w ( e i | f j )
Wherein, fjRepresent the word in source language phrase f, eiRepresent the word in object language phrase e, w (ei|fj) table Show vocabulary translation probability.Work as eiDuring with k word alignment in source language phrase f, the mark in formula is 1/k.In like manner can count Calculation lex (f | e).
It will be understood to those skilled in the art that existing language model training technique has multiple, one of which training method It is divided into two steps:
1) single language language material pretreatment.The preprocessing process that single language language material processes during training with translation model is similar, only It it is the wherein side's language here just for bilingual teaching mode.First it is word segmentation processing, for there is no the language of natural participle Speech needs to utilize participle instrument to carry out participle;Then sentence filters, and grows sentence each in the corpus after participle and spends Filtering, this step will give up the word number sentence more than 30, and shorter sentence can obtain more preferable result;It is followed by double byte character Be converted to half-angle character, sentence in the corpus obtained carried out code conversion corpus more specification can be made to unite after filtering One.
2) language model training.Language model training is that the object language in translation is carried out n-gram modeling, this mistake Journey uses KenLM instrument to generate the language model file meeting ARPA standard, 3 metalanguage moulds of this step training objective language Type.
It will be understood to those skilled in the art that the training of existing tuning has multiple, one of which is divided into two steps:
1) bilingual corpora pretreatment.Tuning training uses a small-scale bilingual corpora to train each as development set The optimal parameter of submodel, its preprocessing process is consistent with translation model training.
2) minimal error rate training.In order to obtain best translation effect, this step uses the maximization BLEU training of Och The weight of model is trained by algorithm MERT, it is thus achieved that the optimal weight of model.
2, translator unit be embodied as follows:
As it is shown in figure 1, the labelling regular collocation of step S1.Travel through each word in sentence to be translated, if it is possible to coupling Phrase template in phrase template base, then can mark in source language sentence to be translated according to the original position word of template The regular collocation existed.
The translation regular collocation of step S2.For the regular collocation being marked, remove the surplus of word shared by phrase template Remaining part divides the most replaceable part, and replaceable part, as new sentence to be translated, forwards step S1 to, by the way of iteration, Finally give its translation, according to replaceable part corresponding relation of position in phrase template source document with translation, this part is translated Literary composition merges with the translation of phrase template thus obtains the translation of regular collocation.
Step S3 phrase divides.For source language sentence, after removing regular collocation, source language sentence the most to be translated remaining Part, is carried out phrase division, and next step is by the phrase retrieval phrase translation probability tables according to division, and regular collocation part Do not process as translator unit.
Step S4 structure candidate phrase table.In step S3 divides the phrase obtained, if the phrase divided and phrase In translation probability table, phrase mates completely, then the corresponding phrase in phrase translation probability tables is short to the candidate as decoding process Language pair, thus obtain candidate phrase table.
Step S5 perform sentence translation, see Fig. 6, by regular collocation partial replacement be its translation obtain partial translation wait turn over Translate sentence, and utilize decoding heuristic device to translate according to above-mentioned candidate phrase.Decoder based on heuritic approach utilizes instruction The phrase translation table, language model file and the parameter configuration that obtain during white silk and the parts constituting a generation translation.Turn over Translate unit can a sentence to be translated be decoded, from phrase translation table, i.e. search possible translation extension translation, and Preserve the bigger translation of probability in this process and give up the translation that probability is less, finally obtaining optimum translation.
For example, Chinese sentence " this viewpoint is with mentioned above identical of views " is translated as English, is first carried out Step S1, matches phrase template " consistent with X " according to the 4th word "AND" with last word " consistent ", wherein terminates Symbol part " with ... consistent " it is the trunk of template, nonterminal symbol part " X " is the replaceable part of template, can according to this template To mark the regular collocation " identical of views with mentioned above " in sentence, the replaceable part of regular collocation is for " to carry above The viewpoint arrived ", this is new sentence to be translated.
Then performing step S2, the new sentence to be translated step S1 obtained is as the input of the present invention, and iteration obtains Its translation, owing to not comprising regular collocation, it is possible to directly obtains its translation " the by phrase statistical machine translation Opinion mentioned above ", according to the position corresponding relation of nonterminal symbol in phrase template, i.e. " consistent with X " and In " is consistent with X ", X is corresponding, thus the translation " is of regular collocation after being combined consistent with the opinion mentioned above”。
Then perform step S3, sentence remainder is carried out phrase division, remove " identical of views with mentioned above " Afterwards, remainder is " this viewpoint ", can obtain two kinds of phrases and divide " [this viewpoint] " and " [this] [viewpoint] ".
Then performing step S4, from phrase translation probability tables, searching step S3 divides the phrase obtained, if complete Join then by corresponding bilingual phrase to adding in candidate phrase table, " this is seen such as to there is phrase from phrase translation probability tables Point ", then bilingual phrase is added candidate phrase to " this viewpoint | | | the opinion | | | 1:1 2:2 | | | 0.41 0.63 " Table.
Finally performing step S5, this step generates translation, and the sentence of local translation is " this viewpoint is consistent With the opinion mentioned above ", and comprise bilingual phrase to " this viewpoint | | | the opinion | | | 1:1 2:2 | | | 0.41 0.63 " candidate phrase table, candidate phrase next life chosen from candidate phrase table by decoding heuristic device Become translation, and finally select the translation of highest scoring.
The above, the only detailed description of the invention of the present invention, but protection scope of the present invention is not limited thereto, and any Belong to those skilled in the art in the technical scope that the invention discloses, the change that can readily occur in or replacement, all answer Contain within protection scope of the present invention.

Claims (8)

1. the two-part machine translation method that a regular collocation type phrase is preferential, it is characterised in that comprise the steps,
Step 1, in phrase template base, go out present in source language sentence to be translated fixing according to phrase template mark therein Collocation;
Step 2, in regular collocation, it is thus achieved that the word translation that matches with phrase template, then will match with phrase template Remainder after word removal, as new sentence to be translated, sends the sentence to be translated that iteration must make new advances to step 1 Translation, is merged into the translation of regular collocation by the word translation that this translation and phrase template match;
Step 3, obtain bilingual phrase translation probability tables, then remainder not labeled in source language sentence to be translated is carried out Phrase divides, and retrieves in bilingual phrase translation probability tables according to the phrase divided and mate;
If the phrase that step 4 is divided mates, then by bilingual phrase translation completely with phrase in bilingual phrase translation probability tables The corresponding phrase matched in probability tables is as the candidate phrase of decoding process;
Step 5, by the translation correspondence of regular collocation is replaced source language sentence to be translated, obtain the sentence to be translated of partial translation Son, and utilize decoding heuristic device that the sentence to be translated of partial translation is translated according to candidate phrase, ultimately produce translation.
The two-part machine translation method that a kind of regular collocation type phrase the most according to claim 1 is preferential, its feature exists In, described step 1, its phrase template includes terminal symbol and nonterminal symbol, and terminal symbol is the trunk word of regular collocation, non-end Knot symbol is the replaceable part of regular collocation.
The two-part machine translation method that a kind of regular collocation type phrase the most according to claim 1 is preferential, its feature exists In, described step 1, the step of labelling regular collocation includes:
Whether step 1.1, the word traveled through in source language sentence to be translated, exist in retrieval phrase template base and start with a word Phrase template, if its exist, then this word as the beginning of the phrase template of labelling and is performed next step;
Step 1.2, from the beginning of this word, travel through source language sentence remaining part to be translated, whether retrieval exists another word energy Enough mate the word of the phrase template remainder of current markers, if it is present the word obtained is as the phrase mould of labelling The ending of plate, and correspondence markings goes out the regular collocation in source language sentence to be translated.
The two-part machine translation method that a kind of regular collocation type phrase the most according to claim 2 is preferential, its feature exists In, described step 2, wherein,
Obtaining regular collocation translation is as a new sentence to be translated using the replaceable part of regular collocation, by iteration Mode obtains translation;
Further according to the word position corresponding relation in regular collocation, the word that replaceable part translation is matched with phrase template Language translation merges the translation obtaining regular collocation.
The two-part machine translation method that a kind of regular collocation type phrase the most according to claim 1 is preferential, its feature exists In, the bilingual phrase translation probability tables obtaining step in described step 3 includes:
Step 3.1, bilingual teaching mode is carried out word alignment training, it is thus achieved that comprise the corpus of word alignment information;
Step 3.2, from the corpus obtained extracting phrase pair, it is thus achieved that bilingual phrase translation probability tables.
The two-part machine translation method that a kind of regular collocation type phrase the most according to claim 5 is preferential, its feature exists Following steps can be divided in, described above-mentioned steps 3.2:
Step 3.2.1, word alignment from obtained corpus sentence in extracting phrase pair;
The translation probability of the phrase pair that step 3.2.2, calculating extract, it is thus achieved that phrase translation probability tables.
The two-part machine translation method that a kind of regular collocation type phrase the most according to claim 6 is preferential, its feature exists In, described step 3.2.2, its translation probability comprises forward and reverse phrase translation probability and forward and reverse Lexical probability.
The two-part machine translation method that a kind of regular collocation type phrase the most according to claim 1 is preferential, its feature exists In, described step 5, its step includes,
Step 5.1, by the translation correspondence of regular collocation is replaced source language sentence to be translated, obtain the to be translated of partial translation Sentence;
Decoding heuristic device given in the sentence to be translated of step 5.2, candidate phrase screening obtained and partial translation, inspires Formula decoder generates translation.
CN201610522056.2A 2016-06-30 2016-06-30 A kind of two-part machine translation method that regular collocation type phrase is preferential Active CN106156013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610522056.2A CN106156013B (en) 2016-06-30 2016-06-30 A kind of two-part machine translation method that regular collocation type phrase is preferential

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610522056.2A CN106156013B (en) 2016-06-30 2016-06-30 A kind of two-part machine translation method that regular collocation type phrase is preferential

Publications (2)

Publication Number Publication Date
CN106156013A true CN106156013A (en) 2016-11-23
CN106156013B CN106156013B (en) 2019-02-19

Family

ID=58061918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610522056.2A Active CN106156013B (en) 2016-06-30 2016-06-30 A kind of two-part machine translation method that regular collocation type phrase is preferential

Country Status (1)

Country Link
CN (1) CN106156013B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741931A (en) * 2017-08-30 2018-02-27 捷开通讯(深圳)有限公司 Interpretation method, mobile terminal and the storage device of operating system framework
CN108363704A (en) * 2018-03-02 2018-08-03 北京理工大学 A kind of neural network machine translation corpus expansion method based on statistics phrase table
CN109299480A (en) * 2018-09-04 2019-02-01 上海传神翻译服务有限公司 Terminology Translation method and device based on context of co-text
CN112036191A (en) * 2020-08-31 2020-12-04 文思海辉智科科技有限公司 Data processing method and device and readable storage medium
US11461561B2 (en) 2019-10-25 2022-10-04 Beijing Xiaomi Intelligent Technology Co., Ltd. Method and device for information processing, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290616A (en) * 2008-06-11 2008-10-22 中国科学院计算技术研究所 Statistical machine translation method and system
CN101763344A (en) * 2008-12-25 2010-06-30 株式会社东芝 Method for training translation model based on phrase, mechanical translation method and device thereof
CN102214166A (en) * 2010-04-06 2011-10-12 三星电子(中国)研发中心 Machine translation system and machine translation method based on syntactic analysis and hierarchical model
US20130144598A1 (en) * 2011-12-05 2013-06-06 Sharp Kabushiki Kaisha Translation device, translation method and recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290616A (en) * 2008-06-11 2008-10-22 中国科学院计算技术研究所 Statistical machine translation method and system
CN101763344A (en) * 2008-12-25 2010-06-30 株式会社东芝 Method for training translation model based on phrase, mechanical translation method and device thereof
CN102214166A (en) * 2010-04-06 2011-10-12 三星电子(中国)研发中心 Machine translation system and machine translation method based on syntactic analysis and hierarchical model
US20130144598A1 (en) * 2011-12-05 2013-06-06 Sharp Kabushiki Kaisha Translation device, translation method and recording medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙越恒 等: "统计机器翻译中的非连续短语模板抽取及其应用", 《计算机科学》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741931A (en) * 2017-08-30 2018-02-27 捷开通讯(深圳)有限公司 Interpretation method, mobile terminal and the storage device of operating system framework
CN108363704A (en) * 2018-03-02 2018-08-03 北京理工大学 A kind of neural network machine translation corpus expansion method based on statistics phrase table
CN109299480A (en) * 2018-09-04 2019-02-01 上海传神翻译服务有限公司 Terminology Translation method and device based on context of co-text
CN109299480B (en) * 2018-09-04 2023-11-07 上海传神翻译服务有限公司 Context-based term translation method and device
US11461561B2 (en) 2019-10-25 2022-10-04 Beijing Xiaomi Intelligent Technology Co., Ltd. Method and device for information processing, and storage medium
CN112036191A (en) * 2020-08-31 2020-12-04 文思海辉智科科技有限公司 Data processing method and device and readable storage medium
CN112036191B (en) * 2020-08-31 2023-11-28 文思海辉智科科技有限公司 Data processing method and device and readable storage medium

Also Published As

Publication number Publication date
CN106156013B (en) 2019-02-19

Similar Documents

Publication Publication Date Title
Tiedemann Recycling translations: Extraction of lexical data from parallel corpora and their application in natural language processing
JPS62163173A (en) Mechanical translating device
CN106156013B (en) A kind of two-part machine translation method that regular collocation type phrase is preferential
US8874433B2 (en) Syntax-based augmentation of statistical machine translation phrase tables
CN103116578A (en) Translation method integrating syntactic tree and statistical machine translation technology and translation device
CN102799578A (en) Translation rule extraction method and translation method based on dependency grammar tree
Xu et al. Do we need Chinese word segmentation for statistical machine translation?
CN103020045B (en) Statistical machine translation method based on predicate argument structure (PAS)
US20200192982A1 (en) Methods, computer readable media, and systems for machine translation between arabic and arabic sign language
Alqudsi et al. A hybrid rules and statistical method for Arabic to English machine translation
CN106649289A (en) Realization method and realization system for simultaneously identifying bilingual terms and word alignment
CN108491399A (en) Chinese to English machine translation method based on context iterative analysis
Lavie et al. Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario
Pust et al. Using syntax-based machine translation to parse english into abstract meaning representation
Chen et al. Towards automatic generation of natural language generation systems
Dandapat et al. Using example-based MT to support statistical MT when translating homogeneous data in a resource-poor setting
Ahmadnia et al. Round-trip training approach for bilingually low-resource statistical machine translation systems
Gamal et al. Survey of arabic machine translation, methodologies, progress, and challenges
Ahmadnia et al. Statistical machine translation for bilingually low-resource scenarios: A round-tripping approach
Zhou et al. Constrained phrase-based translation using weighted finite-state transducers
CN115310433A (en) Data enhancement method for Chinese text proofreading
Seresangtakul et al. Thai-Isarn dialect parallel corpus construction for machine translation
Rana et al. Example based machine translation using fuzzy logic from English to Hindi
Garside The large-scale production of syntactically analysed corpora
Sánchez-Cartagena et al. Enriching a statistical machine translation system trained on small parallel corpora with rule-based bilingual phrases

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant