CN105573994B - Statictic machine translation system based on syntax skeleton - Google Patents

Statictic machine translation system based on syntax skeleton Download PDF

Info

Publication number
CN105573994B
CN105573994B CN201610053560.2A CN201610053560A CN105573994B CN 105573994 B CN105573994 B CN 105573994B CN 201610053560 A CN201610053560 A CN 201610053560A CN 105573994 B CN105573994 B CN 105573994B
Authority
CN
China
Prior art keywords
translation
rule
syntactic
syntax
skeleton
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610053560.2A
Other languages
Chinese (zh)
Other versions
CN105573994A (en
Inventor
肖桐
朱靖波
张春良
高瑜泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Yayi Network Technology Co ltd
Original Assignee
SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd filed Critical SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority to CN201610053560.2A priority Critical patent/CN105573994B/en
Publication of CN105573994A publication Critical patent/CN105573994A/en
Application granted granted Critical
Publication of CN105573994B publication Critical patent/CN105573994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of statictic machine translation systems based on syntax skeleton, comprising the following steps: 1) probability SCFG level Rule Extracting Algorithm extracts non-syntactic translation rule, the translation for sentence non-skeleton section to be translated;2) GHKM rule and method extracts syntactic translation rule, the translation of the skeleton part for sentence to be translated;3) non-fully syntactic translation rule generates: generating non-fully syntactic translation rule using syntactic translation rule, in conjunction with non-syntactic translation rule and syntactic translation rule, realizes the integration of two kinds of translation system advantages of non-syntactic translation system and syntactic translation system;4) model generates, and the sequencing that present system application syntactic translation rule carries out translation and long range to syntax skeleton is asked, the vocabulary translation and sequencing of low level are handled using the rule of non-syntactic translation system.Model is easily realized, and significant effect.

Description

Statictic machine translation system based on syntax skeleton
Technical field
The present invention relates to the technologies modeled in a kind of statistical machine translation to source statement method, are specifically a kind of Statictic machine translation system based on syntax skeleton.
Background technique
In statistical machine translation (Statistical Machine Translation, SMT), there are different translation systems System, such as phrase-based and non-syntactic translation system based on level phrase, tree arrive string and string to syntactic translations systems such as trees. There are respective advantage and disadvantage in different translation systems, such as, syntactic translation system is in processing long range and various composition Between have apparent advantage in complicated sequencing problem, but when the translation rule of syntactic translation completely is than sparse or coverage rate When relatively low, there will be the robustness problem of system, may result in translation, the effect is relatively poor.And have confirmed that as Fruit distich genealogy of law system simply realized, there is no phrase-based and be based on the non-syntactic translation such as level phrase for translation result The effect that system obtains is good.In addition, non-syntactic translation system is when translating shorter sentence fragment, accuracy rate is relatively high, and Also there is relatively good ability of regulation and control to the hierarchical structure of short-movie section.It can be the performance when handling the word order of long range of non-syntax system Ability is poor.
Currently, processing target language character string translation (such as according to the tree obtained from syntactic analysis data to string Mapping relations replace the string on target language surface) during, a kind of popular method is exactly to utilize source language end syntax and sentence Information in minor structure instructs or executes decoding.This mode and start from the nineties based on word or word-based translation System is different, its source statement method model is that the syntax analytic tree of the source language end sentence of dependence input generates.What is done so is good Place is the ability that it can reinforce model treatment sequencing problem complicated over long distances between transfer and various composition.
In addition, why the use of source statement method can have good performance in machine translation is presented because it has The ability of sentence skeleton structure (syntactic structure).If we use the translational action of machine translation system analogy people, this skeleton The interpretive scheme of structure can show more prominent: in artificial translation process, the source language end given for one inputs sentence, People can using syntactical priori knowledge first in consciousness to sentence generate a high-level rough sentence structure or Then type determines the translation and sequence of some sentence key components according to this sentence structure or type, completes again later The selection of vocabulary and the sequencing work of part.Since the sentence skeleton structure of source language can be indicated with the syntax of source language, then Following problems can be generated unavoidably: whether can the syntactic structure Information application of source language to it, the function and effect in translation are most prominent Place? such as it can be translated according to the skeleton structure information of source language, while non-syntactic translation system can be utilized again Complete the advantage of good phrase translation?
But it is disappointed to be, although sentence framework information is integrated into the prospect in machine translation makes us the phase very much To, but can be realized the statictic machine translation system based on syntax skeleton there is not yet report, in addition syntax system and non-syntax System has different representations, is also not quite similar when utilizing.And once there are some scholars to attempt to utilize artificial mark Syntax skeleton data, effect is bad, and realize process complexity.
Summary of the invention
For in syntactic translation system in the prior art cannot to the short-movie section of sentence carry out it is good translation and sequencing with And rule it is sparse caused by system robustness problem, and sentence element of the model to long range in non-syntactic translation system Not can be carried out effective sequencing problem, the problems such as framework information manually marked is time-consuming and laborious, the invention solves technology ask Topic is to provide a kind of statictic machine translation system based on syntax skeleton, and the syntax skeleton high-level to source language models, and And good translation is carried out to the phrase of low level, while proposing a kind of novel representation of syntax skeleton, so that machine turns over Translate system use.
In order to solve the above technical problems, the technical solution adopted by the present invention is that:
A kind of statictic machine translation system based on syntax skeleton of the present invention, comprising the following steps:
1) probability SCFG level Rule Extracting Algorithm extracts non-syntactic translation rule, is used for sentence non-skeleton section to be translated Translation:
Using the method for the heuristic limitation for extracting level rule, is passing through word alignment but do not carrying out the parallel of syntactic analysis Probability SCFG grammar rule is extracted in sentence pair, utilizes level phrase rule, that is, non-syntactic translation rule process sentence low layer to be translated The translation of secondary structure;
2) GHKM rule and method extracts syntactic translation rule, the translation of the skeleton part for sentence to be translated:
Using GHKM Rule Extracting Algorithm in the syntactic analysis result of parallel sentence pairs and original language end Jing Guo word alignment GHKM rule is extracted, is rewritten into syntactic translation rule using the GHKM rule of above-mentioned extraction.It is high using syntactic translation rule process The generation and translation of level skeleton structure;
3) non-fully syntactic translation rule generates:
Non-fully syntactic translation rule is generated using syntactic translation rule, is advised in conjunction with non-syntactic translation rule and syntactic translation Then, the integration of two kinds of translation system advantages of non-syntactic translation system and syntactic translation system is realized;
4) model generates:
According to above-mentioned non-fully syntactic translation rule, according to different translation duties to syntax translation system and non-syntax The syntax i.e. translation rule set of translation system are integrated, and are generated non-fully syntactic translation and are derived, are turned over using non-syntax The translation for translating the phrase or phrase of rule process sentence low level to be translated, completes sentence to be translated using syntactic translation rule The translation duties of high-level syntax skeleton structure;Utilize non-fully syntactic translation rules guide skeleton generating process and translated Journey;It collects non-syntactic translation rule, syntactic translation rule and non-fully syntactic translation rule generates one with big coverage SCFG grammar system, and complete by non-fully syntactic translation rule the combination of the different form syntax.
Syntactic translation rule i.e. syntactic translation rule is rewritten into using the GHKM rule of above-mentioned extraction are as follows: by the GHKM of extraction Rule, rule format are as follows:
Source language phrase syntactic marker is<using above-mentioned syntactic marker as the source statement method tree segment>→ target language string of root node
Wherein " the source language phrase syntactic marker " of left part of a rule is is defined phrase structure class by linguistics syntactic knowledge Type label, i.e. syntax nonterminal symbol;" the syntax subtree segment " of left part of a rule is the segment of sentence parsing tree, is tree knot Structure, leaf node can be terminal symbol word or nonterminal symbol, and these nonterminal symbols must belong to the analysis of source statement method In certain a kind of syntactic marker;" the target language string " of right part of a rule is the string that target language terminal symbol word and nonterminal symbol are constituted, Nonterminal symbol label and the nonterminal symbol of source statement method tree segment leaf node correspond.
Above-mentioned GHKM can be advised by keeping the nonterminal symbol of syntax subtree segment boundaries and giving up internal tree construction Then it is rewritten as syntactic translation rule
Source language phrase syntactic marker →<source language string, target language string>
Wherein " source language string " indicates source language terminal symbol word, the sequence that nonterminal symbol is constituted and corresponding " syntactic marker " is constituted Column, the sequence are the leaf node sequence of source statement method tree segment in the rule of GHKM corresponding to syntactic rule;" target language string " For the string being made of target language terminal symbol word, nonterminal symbol and corresponding " syntactic marker ", nonterminal symbol label and source language The nonterminal symbol of syntax subtree segment leaf node corresponds.
Non-fully syntactic translation rule, non-fully syntactic translation are generated using non-syntactic translation rule and syntactic translation rule Rule format statement are as follows:
Source language phrase syntactic marker → < source language string*, target language string*>
Wherein, " the source language phrase syntactic marker " of left part is a nonterminal symbol, " source language string*" it is source language terminal symbol word The string that language, nonterminal symbol and extensive label X are constituted, " target language string*" it is target language terminal symbol word, nonterminal symbol and extensive mark Remember that the string that X is constituted, nonterminal symbol label and the nonterminal symbol of source statement method tree segment leaf node correspond;
Non-fully syntactic translation rule and the difference of syntactic translation rule are: non-fully syntactic translation rule is not required for All nonterminal symbols must belong to a kind of phrase method label of certain in the analysis of source statement method in rule, and part therein non-end Knot symbol is X by reduction, indicates the nonterminal symbol and is not belonging to any syntactic analysis type.
Realize the combination of two kinds of translation system advantages of non-syntactic translation system and syntactic translation system are as follows:
What syntactic translation rule regular by the syntactic translation at source language end, non-and non-fully syntactic translation rule generated covers greatly The cover degree SCFG syntax create syntax skeleton in decoding process;
In the generating process of above-mentioned syntax skeleton structure, capture between the sequencing in original language in syntactic structure ingredient, The high-level translation duties of sentence to be translated are distributed into syntactic translation system to handle.And sentence low level to be translated Translation duties distribute to non-syntactic translation system to complete;The advantages of realizing different translation systems contributes to the translation being respectively good at In task.
It is integrated according to the syntax of the different translation duties to non-syntactic translation system and syntactic translation system are as follows: In SCFG system, each translation rule is derived and carries out weight calculation, more accurately to be derived using various translation rules, The score that each translation rule derives d is calculated using following formula:
Wherein, s (d) is the score that translation rule derives d, and t is the character string at target language end, and the score of d is then defined as more The product of a factor, comprising:
The weight product for the strictly all rules that syntax skeleton (ds) is included in factor 1:dWherein riIt is ds In the i-th rule, w (r*) is the weight of regular r*;
Non-skeleton section (d in factor 2:dh) included strictly all rules weight productWherein rj For dhIn j-th strip rule, w (r*) is the weight of regular r*;
The exponential weighting score of factor 3:n gram language model lm (t)λlmIndicate the power of n gram language model Weight;
The factor 4: vocabulary rewards exp (λwb| t |), wherein exp (| t |) indicates the e index calculated result of translation length, when Sentence is longer, and this " reward " is bigger, λwbIt is the weight of vocabulary reward.
The invention has the following beneficial effects and advantage:
1. the special syntactic structure information (syntax skeleton or referred to as skeleton) that present system has used oneself to define The method translated, syntax skeleton that can be high-level to source language model, and so as to machine translation system use, it is one It is good in a frame combine two advantages: 1) translation and long range to be carried out to syntax skeleton using syntactic translation rule Sequencing problem;2) vocabulary translation and sequencing of low level are handled using the rule of non-syntactic translation system.
2. model of the invention is very flexible, normal form can be decoded by an independent succinct syntax to cover non-sentence The derivation of method, non-fully syntax or full syntactic translation rule may be implemented between syntactic translation rule and non-syntactic translation rule It is two-way gradually excessively, allow translation system between syntactic translation system and non-syntactic translation system selectively using turning over Translate system.Therefore, non-syntactic translation system and syntactic translation system can be considered as obtain using the method two kinds of special cases, mould Type Yi Shixian, and significant effect.
3. present system is also applied for being generally basede on the translation system for synchronizing Grammars (SCFGs) frame up and down, can With easy realization in the translation system of a support SCFG syntax decoder, and confirm to accelerate the translation of system.
4. being the first automatic to the progress of syntax framework information invention defines a kind of novel skeleton structure representation It obtains, it can be in syntactic translation rule, non-fully under the guidance of syntactic translation rule and non-syntactic translation rule, realization skeleton The automatic acquisition of information avoids a large amount of hand labors of mark framework information waste.
5. the present invention is different from traditional syntactic translation system, in the decoding process of translation system, which is realized First to the translation of syntax structural framing, and sequencing is controlled, local segment is then realized under good syntax skeleton Non- syntax translation, this is to use this kind of method for the first time in current translation system.
Detailed description of the invention
Fig. 1 is the model framework figure of present system;
Fig. 2 is the sample figure that non-syntactic translation rule and syntactic translation rule are extracted in present system;
Fig. 3 is the procedure chart that the present invention generates syntax skeleton from a sample syntax;
Fig. 4 is the procedure chart of system one syntactic translation rule of decoding in the present invention based on tree;
Fig. 5 is influence diagram of the skeleton depth to translation quality;
Fig. 6 is the comparison diagram that homologous ray does not generate translation result.
Specific embodiment
The present invention is further elaborated with reference to the accompanying drawings of the specification.
As shown in Figure 1, a kind of statictic machine translation system based on syntax skeleton of the present invention the following steps are included:
1) probability SCFG level Rule Extracting Algorithm extracts non-syntactic translation rule, is used for sentence non-skeleton section to be translated Translation:
Using the method for the heuristic limitation for extracting level rule, is passing through word alignment but do not carrying out the parallel of syntactic analysis Probability SCFG grammar rule is extracted in sentence pair, it is low using non-syntactic translation rule, that is, non-syntactic translation rule process sentence to be translated The translation of hierarchical structure;
2) GHKM rule and method extracts syntactic translation rule, the translation of the skeleton part for sentence to be translated:
Using GHKM Rule Extracting Algorithm in the syntactic analysis result of parallel sentence pairs and original language end Jing Guo word alignment GHKM hierarchy type rule is extracted, is rewritten into syntactic translation rule i.e. syntactic translation rule, place using the GHKM rule of above-mentioned extraction Manage the high-level organization of sentence to be translated, that is, the syntax translation of sentence syntactic structure;
3) the non-fully generation of syntactic translation rule:
Non-fully syntactic translation rule is generated using syntactic translation rule, and combines the use of non-syntactic translation rule, it is real The combination of existing two kinds of translation system advantages of non-syntactic translation system and syntactic translation system;
4) model generates:
According to above-mentioned non-fully syntactic translation rule, according to different translation duties to non-syntactic translation system and syntax The syntax (translation rule set) of translation system are integrated, and are generated non-fully syntactic translation and are derived, are advised by non-fully syntax Then, different translation duties are identified, using the translation of non-syntactic translation rule process text low level (phrase or phrase), Using syntactic translation rule and non-fully syntactic translation rule complete text high-level (syntactic structure) translation duties;It collects non- Syntactic translation rule, syntactic translation is regular and non-fully syntactic translation rule generates the SCFG syntax with big coverage System.
In step 1), probability SCFG level rule extraction: the present invention, which utilizes, passes through word alignment, but does not carry out syntax parsing In parallel sentence pairs, probability SCFG grammar rule is extracted using the inspiration method for limiting for extracting level phrase rule, it is short using level The translation of language rule, that is, non-syntactic translation rule process sentence low level structure to be translated;
It is regular using word alignment parallel sentence pairs data pick-up GHKM under original language syntax tree information guiding in step 2), And be rewritten into the syntactic translation rule of SCFG formula, i.e., by the GHKM of extraction rule, from following rule format:
Source statement method P-marker (source language string attribute source language string syntactic structure source language string) → target language translation
Syntactic translation rule are rewritten as by keeping the nonterminal symbol of syntax tree piece section boundary and giving up internal tree construction Form then:
Source language phrase syntactic marker →<source language string, target language string>
Wherein " source language string " indicates source language terminal symbol word, the sequence that nonterminal symbol is constituted and corresponding " syntactic marker " is constituted Column, the sequence are the leaf node sequence of source statement method tree segment in the rule of GHKM corresponding to syntactic rule;" target language string " For the string being made of target language terminal symbol word, nonterminal symbol and corresponding " syntactic marker ", nonterminal symbol label and source language The nonterminal symbol of syntax subtree segment leaf node corresponds.
In step 3), using original language end syntactic information, syntax framework information is obtained, by syntax translation rule and non- The regulation and reorganization of syntactic translation rule obtain non-fully syntactic translation rule, non-fully the form of syntactic translation rule are as follows:
Source language phrase syntactic marker → < source language string*, target language string*>
Wherein, " the source language phrase syntactic marker " of left part is a nonterminal symbol, " source language string*" it is source words and phrases (termination Symbol), nonterminal symbol and it is extensive label X constitute sequence,;" target language string*" be by target words and phrases (terminal symbol), nonterminal symbol and The string that extensive label X is constituted, terminal symbol label and the nonterminal symbol of source statement method tree segment leaf node correspond;
Non-fully syntactic translation rule and the difference of syntactic translation rule are: non-fully syntactic translation rule is not required for All nonterminal symbols must belong to a kind of phrase method label of certain in the analysis of source statement method in rule, and part therein non-end Knot symbol is X by reduction, indicates the nonterminal symbol and is not belonging to any syntactic analysis type.
For the rule of each syntactic translation, its form can be rewritten, obtain non-fully syntactic translation rule Then, concrete mode is extensive at X by one or two nonterminal symbol right part of a rule, and keeps left part constant, can be with It is converted into non-fully syntactic translation rule.
In syntactic translation rule, non-syntactic translation rule, after non-fully syntactic translation rule is collected completely, using all Rule generates a biggish SCFG grammar system, is realized in sentence decoding process to be translated by non-fully syntactic translation rule The guidance of derivation, and generate corresponding syntactic structure, the advantages of different interpretative systems are utilized in different sentence levels.Processing The advantages of can use non-syntactic translation system when the translation of low level (such as phrase) is handled high-level (such as syntactic structure) Translation duties when can utilize syntactic translation system the advantages of.
Realize the combination of two kinds of translation system advantages of non-syntactic translation system and syntactic translation system are as follows:
By the SCFG grammar system of the big coverage of generation, using non-fully syntactic translation rule, realization is turned over from syntax The gradually transition of system to non-syntactic translation system is translated, syntax skeleton is created in derivation process;
Using above-mentioned non-fully syntactic translation rule and the capture of syntactic translation rule treat in translation of the sentence different compositions at Sequencing between point, and the translation duties of low level are distributed to non-syntactic translation rule to handle;By high-level skeleton part Translation duties distribute to syntactic translation rule and non-fully syntactic rule is handled.
In step 4), according to above-mentioned non-fully syntactic translation rule, according to different translation duties to non-syntactic translation The syntax of system and syntactic translation system are regulated and controled, and the SCFG syntax system of the big coverage of three types rule composition is generated System not only can carry out good sequencing to sentence framework ingredient, and realize the syntax skeleton of sentence in SCFG system Generation, wherein can to each translation rule derive carry out weight calculation, more accurately to be pushed away using various translation rules It leads, the score that each translation rule derives d is calculated using following formula:
Wherein, s (d) is the score that translation rule derives d, and t is the character string at target language end, and the score of d is then defined as more The product of a factor, comprising:
Syntax skeleton (d in factor 1:ds) included strictly all rules weight productWherein riIt is ds In the i-th rule, w (r*) is the weight of regular r*;
Non-skeleton section (d in factor 2:dh) included strictly all rules weight productWherein rj For dhIn j-th strip rule, w (r*) is the weight of regular r*;
The exponential weighting score of factor 3:n gram language model lm (t)λlmIndicate the power of n gram language model Weight;
The factor 4: vocabulary rewards exp (λwb| t |), wherein exp (| t |) indicates the e index calculated result of translation length less, Sentence is longer, and " reward " is bigger, λwbIt is the weight of vocabulary reward.
Decoding application:
This model is in decoding in application, passing through the SCFG synchronous context Grammars pair using the big coverage generated Source language end sentence to be translated carries out syntax decoding, is treated during analysis using non-fully syntactic rule and syntactic rule Translation of the sentence is analyzed according to the structure of syntax skeleton, during analysis, generates the syntax skeleton of sentence, and utilize life At big coverage SCFG synchronize regular target language in upper and lower Grammars and derive the translation that part generates target language end.Each Segment can then obtain the structural information of local segment, if do not had if there is the non-fully corresponding derivation of syntactic translation rule Corresponding non-fully syntactic translation rule is found, model can derive space (comprising syntactic translation rule, non-syntactic translation rule Then, non-fully syntactic translation is regular) find best translation derivation.
In the present invention, the Machine Translation Model frame based on syntax skeleton can substantially be divided into three parts: Rule, Model generation, model application etc..Model framework is as shown in Figure 1.
It is taken out in the way of different in bilingual alignment data and source statement method tree information using method described above first Different types of translation rule is taken, then according to source statement method feature, overwritten parts syntactic translation rule is generated appropriate non-complete Full syntactic translation structure derives, and connects various types of derivation rule.Finally utilize skeleton pattern according to not in decoding Same level translation duties find suitable derivation mode.
One, translation rule obtains:
In the present invention, different rules is extracted using different methods:
1) non-syntactic translation rule extraction:
Since the present invention is, for SCFG grammar rule, following form to can be used based on realizing on the SCFG syntax It is expressed:
LHS →<α, β ,~>
Wherein LHS is a nonterminal symbol, and α and β are source language end and target language end respectively by terminal symbol and nonterminal symbol group At word sequence ,~then indicate the one-to-one relationship of nonterminal symbol in α and β.
For non-syntactic translation rule, using the method for the heuristic limitation for extracting level rule, in process word alignment but It does not carry out extracting probability SCFG grammar rule in the parallel sentence pairs of syntactic analysis, for the probability SCFG syntax of acquisition, one is given Fixed translation of the sentence can be decoded by finding the rule of most probable, maximum probability to derive.Fig. 2 gives an extraction The example of non-syntactic translation rule, wherein nonterminal symbol is only marked as X.If there is the sequence of some such SCFG rule compositions Column set can be covered completely and derive source statement, then it is assumed that it is the SCFG derivation syntax of this source statement. Such as the regular h in figure5、h1And h3The derivation of a sentence pair can be produced.
2) syntactic translation Rule:
The form of non-syntactic translation rule and the form of regular syntax (syntax) translation rule are substantially the same, It is that non-syntactic translation rule is generated not in accordance with the constraint of (source language end or target language end) syntax.If utilizing any one side The syntactic information at (source language end or target language end) is constrained, our available derivation rules for meeting syntactic information, also It is syntactic translation rule, for this purpose, we, which can use following manner, obtains syntactic translation rule.
GHKM rule extraction:
In order to generate the syntactic rule of syntactical form, the method that the present invention utilizes mainstream ----utilize the syntax tree at source language end Information extracts GHKM rule as constraint and guidance in the bilingual sentence pair for have word alignment information.
In the method for extracting GHKM, the present invention is modeling on from original language syntax tree to target language string, a GHKM Rule is by source language segment sr, target language segment trWith pair of nonterminal symbol in their segments (source language segment and target language segment) It should be related to composition, such as following formula is a GHKM rule:
VP (VV (raising) x1: NN) → increase x1
GHKM rule is rewritten:
Above-mentioned rule format is rewritten into SCFG rule format by the present embodiment, and concrete operations are to maintain the non-terminal of front end Symbol annotation is constant, abandons the tree structure information inside nonterminal symbol, such as:
VP → < improve NN1, increase NN1>
Wherein, VP is verb phrase, and VV is verb part of speech, and NN is noun part-of-speech, x1For nonterminal symbol, NN1It is name for part of speech One variable of word.
GHKM rule is converted with reference to SCFG rule in invention, since all nonterminal symbols are all by original language end Syntactic label label, so application all will receive the constraint of correct syntax when generating syntactic translation rule.
The process that syntactic translation rule is extracted from a source language tree and target language string centering, the present embodiment are given in Fig. 2 The multi-level tree construction of original GHKM rule is had ignored, but remains the node of regular front end, so such rewriting operation can To allow system to have a relatively good generation ability to new sentence translation result.
In addition, SCFG syntactic analysis process can be regarded as by being decoded using syntactic translation rule.It is a kind of popular Method be exactly string parsing (or decoding based on string), this method can in a table decoder to input sentence carry out It decodes (for example, CYK decoder).And under test set active language end parsing information state, we can use tree parsing (or Decoding based on tree) method analytic tree is decoded.In this case, since all derivations all must comply with input Syntax analytic tree, source language end syntactic information can be regarded as applying hard constraint, to increase accuracy.
3) non-fully syntactic translation regular (present invention definition) obtains
Non- syntactic translation system and syntactic translation system have respective advantage and disadvantage, for example, non-syntactic translation model is in word There is the excellent ability for following Lexical rule in terms of remittance selection and sequencing, but is had very when processing complexity is at componental movement Multiple constraint.The model of syntactic translation class can annotate to describe the movement of the hierarchy of ingredient by the syntax in linguistics, and And it also has outstanding performance on high-level syntax-based reordering.But both models, which all have sparse and limited covering degree, asks Topic.
In the ideal case, the advantage of two kinds of models can be applied to the maximum place of its effect degree: 1) syntax turns over Translate the sequencing that model is capable of handling between the generation and syntactic constituent of high-level syntax skeleton;2) non-syntactic translation rule can Handle the vocabulary translation and sequencing of low level.In order to reach this purpose, the invention proposes one kind can be in a model In conjunction with the method for two kinds of advantages.The syntax of non-syntactic translation and syntactic translation is re-used in translation, and develops one kind Novel rule --- non-fully syntactic translation rule, for the company that syntactic translation rule and non-syntactic translation rule is transitional It picks up and.
If the left part (LHS) of a rule is the syntactic label at source language end, and right part (RHS) at least one Nonterminal symbol band X indicates.Here is a non-fully syntactic translation rule:
VP → < improve X1, increase X1>
NT → < improve X2, increase X2>
Wherein left part represents a verb phrase (VP), as the non-syntactic translation rule of right part and standard, contains Nonterminal symbol X.This rule can be applied in the non-syntactic translation in a part derives, and generates one using VP as root node Derivation rule.Then the rule of syntactic translation can replace this VP to push away as usually in syntax machine translation system It leads, to realize from syntax translation system to the excessive of non-syntactic translation system.
Two, skeleton pattern generates
Since non-fully syntactic translation rule can connect non-syntactic translation rule and syntactic translation rule, so Both all rules be can use to set up non-fully syntactic translation derivation rule, constitute the text that syntax skeleton can be generated Method system, that is, the basis of skeleton pattern.Fig. 3 gives one from non-syntactic translation rule, and syntactic translation is regular and non- The derivation of complete syntax translation rule building.In this derivation, non-syntactic translation rule (h3, h6And h8) it is applied to low level Translation.By applying syntactic rule (non-fully syntactic translation rule p on the part of X derives3With syntactic translation rule r1With r4), establish the derivation for meeting sentence syntax skeleton.
This syntactic structure is that ((upper right corner Fig. 3) creates, it is referred to as by the syntactic rule at source language end in the present invention Syntax skeleton.It is generally a kind of with the tree piece for having terminal symbol or nonterminal symbol on high-level syntax and leaf node Section.By using this skeleton structure, the sequencing in " between NP VP " ingredient can be easily captured, and low layer Secondary translation (" answer " and " being satisfied with ") distributes to non-syntactic translation rule to handle.
In order to obtain non-fully syntactic translation rule, a kind of simple direct method is used.Each syntax is turned over The rule translated by one or two nonterminal symbol reduction right part (RHS) at X, and keeps left part (LHS) constant, can It is converted to non-fully syntactic translation rule.Such as the system based on tree decodes the procedure chart of a syntactic translation rule R in (Fig. 4)5(VP → < to NP1 VP2, VP2 with NP1>), available three non-fully syntactic translation rule:
VP → < to X1 X2, X2 with X1)
VP → < to X1 VP2, VP2 with X1>
VP → < to NP1 X2, X2 with NP1>
Once all rules include, non-syntactic translation rule, syntactic translation rule and non-fully syntactic translation rule standard It is standby to finish, a bigger SCFG derivation syntax just are established using them and are applied it in decoder.Utilize weight Logarithmic linear method carrys out the weight of computation rule.With standardized based on as SCFG model, for LHS →<α, β ,~>and have with Under several features:
1. translation probability P (α | β) and P (β | α) it is estimated using the relevant frequency, the two probability are that forward direction is turned over respectively Translate probability and reverse translation probability.
2. the weight Plex (α | β) and Plex (β | α) of vocabulary are estimated using the method for discovery learning.
3. for non-syntactic translation rule, the rule reward (exp of syntactic translation rule and non-fully syntactic translationization rule It (1)) is respectively different.
4. defining instruction glue rule, the indicator of lexical rule and nonlexicalized rule can allow for model learning The specific rule of selection.
5. source language end non-fully syntactic translation rule in nonterminal symbol X number (exp (#)), it controls model and offends The compatible degree of syntax.
The present invention defines in a model derives weight (score).Define the derivation that d is above-mentioned syntax.In order to syntax Rule (namely syntactic translation rule and non-fully syntactic translation rule) and non-syntactic translation rule differentiate, definition d be One tuple < ds,dh>, wherein dsIt is the part derivation of skeleton structure, dhIt is the rule set for setting up the derivation of d remainder It closes.For example, in Fig. 2, ds={ r4, r1, p3, other dh={ h6, h8, h3}。
The character string that t is target language end coding is defined, then the score of d can be defined as possessing n-gram language model The result of the regular weight of continued product and vocabulary the reward exp (| t |) of lm (t).
Wherein w (r*) is the weight of regular r*, λlmAnd λwbIt is the feature weight of language model and vocabulary reward respectively.
In addition, frame is very flexible for model of the present invention, it particularly includes syntactic translation and non-syntactic translation mould Type.For example, (that is if d is only made of non-syntactic translation rule), then it is exactly that a non-syntax turns over Translate derivation.Equally, if (that is a derivation d is only made of syntactic translation rule), then it is exactly one Syntactic translation formula derives.What the present invention illustrated is how with non-fully syntactic translation rules guide to non-syntactic translation and syntax The derivation space of translation formula.Decoder can select best derivation according to model score from widened derivation rule.
Three, model is applied in decoding:
Model of the invention can regard the problem of string parsing as when in use, because it uses the sentence of original language end Method rule parses the text string at source language end, and the translation knot of target language is generated using the rule induction information at target language end Fruit.So translation result can be treated as generating and having the target language string of top score by rule induction.The invention In, system on CYK decoder conducive to based on realizing, and beam search and cube has been utilized in decoder Pruning technology is able to use the binaryzation rule obtained by synchronous binarization method.
It is introduced into due to having a large amount of non-fully syntactic translation rule, causes decoding speed very slow.In order to raise speed Decoding system further uses several pruning methods and carries out beta pruning to search space, reduces search space.Firstly, abandoning those Morphology of the sphere of action greater than 3 or non-fully syntactic translation rule.Why removing these rules is because they are to reduce solution One main cause of code speed, and they are not very helpful to last translation result.In addition, giving up those right sides Portion (RHS) only have nonterminal symbol X non-morphological rule and non-fully syntactic translation rule.At most of conditions, this type The rule of type can not play syntactical limitation guiding function.Such as it says, regular VP → < X1X2, X2X1> existing too universal, If two without in the continuous blocks of any morphology or syntax sign introduce a VP sentence element, be it is very reasonless, Because any effect can not be played by doing so.
Other than carrying out beta pruning to rule, a parameter w can also be usedsTo control the depth of syntax skeleton.If being assigned to ws The value of one very little, then system compulsory can use a smaller syntax skeleton (and less syntactic rule).In pole In the case of end, if parameter ws=0, system can then retract into a typical non-syntactic translation system;Similarly, if parameter Value wsThe syntax skeleton of any depth can be considered in=+ ∞, system.So we can be on test set to parameter wsTuning is come Find an equalization point.
For acceleration system, we are also using the technologies of some tree parsings.In addition to source statement, we are also the sentence of source language Method analytic tree adds decoder.We are using the non-syntactic translation rule generally used in non-syntactic translation system to source language first Sentence is parsed, but when we handle segment corresponding with syntax tree ingredient in the language of source, there is no to application rule away from From being limited.Then, we utilize syntactic translation rule on parsing tree.If a syntactic translation at source language end is advised It can be then matched to an input tree segment, then: 1) this rule can be converted to non-fully syntactic translation rule (see the 3rd Point);2) syntactic translation and corresponding non-fully syntactic translation rule can be added in list of rules, these lists and and source The corresponding CYK grid cell link of language syntax tree segment.Fig. 4 gives in a decoder and sets matched example.Later, remaining Decoding step (such as building translation hypergraph, language model intersect) can normally handle.This method can be matched effectively (non-fully) syntactic translation rule of decoding requirements, also, it is not necessary to carry out binary conversion treatment to rule.Due to source statement method tree Given is hard constraint, is handled as a tradeoff, we can introduce some pairs of sensitive derivations of syntax.
Four, it tests
The method that the present invention tests them on English-Chinese (en-zh) and Chinese-English (zh-en) translation.
1) baseline system experimentation is arranged
The present invention uses the 2740000 Chinese-English bilingual sentence pairs selected in NIST12 OpenMT.It is allowed using GIZA++ tool After bilingual text generates two-way word alignment, the present invention obtains the word of symmetrization using the method for grow-diag-final-and It is aligned file.For syntactic analysis, the present invention is first respectively processed both sides data using Berkeley parser, then sharp Binaryzation is carried out to parsing tree with popular leftmost derivation method, so as to production better on test set.Based on syntax (or syntactic translation) rule is concentrated from entire training data to be extracted, and at most can only be there are five nonterminal symbol in rule.And it is right In non-syntactic translation system, hierarchy rule (non-syntactic translation) is extracted from 940,000 sentence subset, and every rule In nonterminal symbol be no more than two, and phrase rule is extracted from entire training set.Here all rules all make It is obtained with Open-Source Tools packet NiuTrans.
The present invention has trained two 5 gram language models: one is Xinhua part in English Gigaword data and double Training on the English components of language data, this model use in Chinese-English translation system;The other is in Chinese Training on the Xinhua part of Gigaword data and the Chinese part of bilingual data, this model is applied to English-Chinese translation system In system.All language models are all carried out using corrected Hneser-Ney smoothing method smooth.
For Chinese-English translation system, the present invention respectively evaluates system in News Field and online data.The present invention Tuning collection (News Field: 1198 sentences, web datas: 1308 sentences) be quote NIST machine translation 04-06 evaluation and test Data and GALE data.Test set (News Field: 1779 sentences, web:1768 sentence) then includes NIST08,12 machines The evaluation and test data of evaluation and test and News Field and network data all in 08-progress.For English-Chinese translation system, the present invention Tuning collection (995 sentences) and test set (1859 sentences) be that SSMT07 and NIST MT08 Chinese-English translation records respectively Evaluate data.Active language end parsing tree all use with processing training data as method handled.
2) the machine translation system experiment based on syntax skeleton
The present invention method that the application obscure portions in decoding are mentioned to according to model realizes their CYK decoder.Default Under setting, use has been arrived string and has been parsed in experiment, under initial situation, parameter wsIt is arranged to+∞.All feature weights all use The method of MERT carries out tuning.Since MERT has to obtain the possibility of local optimum result, so we have carried out 5 to each experiment Secondary operation, and different initial characteristic values are assigned every time.In evaluation portion, we use respectively unmodified BLEU4 and Unmodified BLEU5 evaluates Chinese-English and English-Chinese translation system.
3) based on the machine translation system experimental result of syntax skeleton
Table 1 is experimental result, wherein the system based on syntax skeleton is write a Chinese character in simplified form into SYNSKEL.First it can be seen that SYNSKEL system is all significantly improved on 3 test sets.One is obtained using the parsing tree of CTB formula averagely to exist 0.6 or more BLEU is worth improving, and improves by the available average BLEU value 0.9 or more of the syntax tree of y-bend.And And (part) syntactic rule can be applied in the non-syntactic translation rule of normal use well using the method for analytic tree, it obtains Good result.It obtains and goes here and there the comparable BLEU value of analytic method.However, being put into more in the forest of a y-bend There is no what improvement effects to result for more trees.These it is interesting the result shows that, in very big derivation space, It is difficult by considering more to introduce some novel derivations by the optional syntactic structure of y-bend.
The not experimental result under homologous ray of table 1
In addition, in different skeleton depth capacity (namely parameter ws) under have studied the result of system.Fig. 5 illustrates too big Skeleton may not be able to obtain preferably as a result, wherein BLEU be evaluation translation quality index.Control parameter wsWhen≤5 Satisfactory raising can be obtained using shell system, with all using shell system when compare, reduce nearly 27% Decoding time.
The utilization rate of different each rule-likes is as shown in table 2, it is seen that the rule for the non-fully syntactic translation type that the present invention defines Then utilization rate highest, and achieve good translation effect.
The different utilization rates derived on 2 tuning collection of table
4) analysis of experimental results
The present invention has studied the frequency that system calls different type to derive after testing.Table 2 is illustrated at three In different tasks, Systematic selection non-fully syntactic translation derive and non-syntactic translation derive when tendency.English-Chinese translation is appointed Show in business to syntax and severe that non-fully syntactic translation derives use, immediately after be Chinese-English translation News Field And network data translation duties.This result reflects analysis quality in different language and FIELD Data to a certain extent It is discrepant.
1. translation quality is promoted:
Test result shows that the system the present invention is based on syntax skeleton is write a Chinese character in simplified form into SYNSKEL.First it can be seen that SYNSKEL system is all significantly improved on 3 test sets.Using the syntax tree of CTB formula obtain one it is average 0.6 with On BLEU be worth improving, improve by the available average BLEU value 0.9 or more of the syntax tree of y-bend.And it utilizes The method of analytic tree can be well in the non-syntactic translation rule of normal use, using syntactic translation rule and non-fully syntax Translation rule obtains good result.It obtains and goes here and there the comparable BLEU value of analytic method.
2. good sequencing control:
The comparative experimental data provided according to the present invention arranges 5 most preceding translation results it is found that Fig. 6 tuning is concentrated, For comparing translation result different on the same tuning collection.In addition, the translation result of non-syntactic translation system output is in disorder, and And sequencing is also mistake, syntactic translation translation system is to the parsing of original syntax with regard to the " right of difficulty3... it is difficult20" structure translation Effect is also poor.In contrast, SYNSKEL system has used span source language word " also2... worry22" all above-mentioned rule Then, and due to being in non-syntactic translation system, so system derives the " right of covering to by the non-syntactic translation in part3…… It is difficult20" structure translation result it is also relatively good.
3. syntactic structure more preferably identifies:
The bottom of Fig. 6 illustrates one and derives the true translation example generated by this rule, it can be seen that SYNSKEL system This rule coverage source language word is " right in system3... annual pay20", and successfully identify " ... ... " sequencing structure.Note Meaning, although also there is such rule X → < X in non-syntactic translation system1X2, X2 of X1> can translate " ... ... " (de) Structure.But when span becomes larger, for example one longer word sequence of table of translation " should8... dollar15", non-syntactic translation system System will lose the ability of this sequencing.

Claims (4)

1. a kind of statistical machine translation method based on syntax skeleton, it is characterised in that the following steps are included:
1) probability SCFG level Rule Extracting Algorithm extracts non-syntactic translation rule, for turning over for sentence non-skeleton section to be translated It translates:
Using the method for the heuristic limitation for extracting level rule, the parallel sentence pairs of syntactic analysis are not carried out in process word alignment but Upper extraction probability SCFG grammar rule utilizes level phrase rule, that is, non-syntactic translation rule process sentence low level knot to be translated The translation of structure;
2) GHKM rule and method extracts syntactic translation rule, the translation of the skeleton part for sentence to be translated:
It is extracted in the syntactic analysis result of parallel sentence pairs and original language end Jing Guo word alignment using GHKM Rule Extracting Algorithm GHKM rule is rewritten into syntactic translation rule using the GHKM rule of above-mentioned extraction;It is high-level using syntactic translation rule process The generation and translation of skeleton structure;
3) non-fully syntactic translation rule generates:
Non-fully syntactic translation rule is generated using non-syntactic translation rule and syntactic translation rule, in conjunction with non-syntactic translation rule With syntactic translation rule, the integration of two kinds of translation system advantages of non-syntactic translation system and syntactic translation system is realized;
4) model generates:
According to above-mentioned non-fully syntactic translation rule, according to different translation duties to syntax translation system and non-syntactic translation The syntax i.e. translation rule set of system are integrated, and are generated non-fully syntactic translation and are derived, are advised using non-syntactic translation The translation for then handling the phrase or phrase of sentence low level to be translated, the high level of sentence to be translated is completed using syntactic translation rule The translation duties of secondary syntax skeleton structure;Utilize non-fully syntactic translation rules guide skeleton generating process and translation process;It receives Collect non-syntactic translation rule, syntactic translation rule and non-fully syntactic translation rule generates the SCFG with big coverage Grammar system, and pass through the combination of the non-fully syntactic translation rule completion different form syntax;
In step 3), non-fully syntactic translation rule is generated using non-syntactic translation rule and syntactic translation rule, non-fully sentence The statement of method translation rule form are as follows:
Source language phrase syntactic marker → < source language string*, target language string*>
Wherein, the source language phrase syntax of left part is labeled as a nonterminal symbol, and source language string is source language terminal symbol word, nonterminal symbol And the string that extensive label X is constituted, target language string*It is constituted for target language terminal symbol word, nonterminal symbol and extensive label X String, nonterminal symbol label and the nonterminal symbol of source statement method tree segment leaf node correspond;
Non-fully syntactic translation rule and the difference of syntactic translation rule are: non-fully syntactic translation rule is not required for rule In all nonterminal symbol must belong to a kind of phrase method label of certain in the analysis of source statement method, and part nonterminal symbol therein It is X by reduction, indicates the nonterminal symbol and be not belonging to any syntactic analysis type.
2. the statistical machine translation method according to claim 1 based on syntax skeleton, it is characterised in that: utilize above-mentioned extraction GHKM rule be rewritten into syntactic translation rule are as follows: the GHKM rule format of extraction is as follows:
Source language phrase syntactic marker is<using above-mentioned syntactic marker as the source statement method tree segment>→ target language string of root node
Wherein the source language phrase syntax of left part of a rule is labeled as through the defined phrase structure type label of linguistics syntactic knowledge, That is syntax nonterminal symbol;The syntax subtree segment of left part of a rule is the segment of sentence parsing tree, is tree construction, leaf section Point can be terminal symbol word or nonterminal symbol, and these nonterminal symbols must belong to a kind of syntax of certain in the analysis of source statement method Label;The target language string of right part of a rule is the string that target language terminal symbol word and nonterminal symbol are constituted, nonterminal symbol label with The nonterminal symbol of source statement method tree segment leaf node corresponds;
Above-mentioned GHKM rule can be changed by keeping the nonterminal symbol of syntax subtree segment boundaries and giving up internal tree construction It is written as syntactic translation rule
Source language phrase syntactic marker →<source language string, target language string>
Wherein language string in source indicates that the sequence that source language terminal symbol word, nonterminal symbol and corresponding syntactic marker are constituted, the sequence are The leaf node sequence of source statement method tree segment in the rule of GHKM corresponding to syntactic rule;Target language string is to be terminated by target language Accord with the string that word, nonterminal symbol and corresponding syntactic marker are constituted, nonterminal symbol label and source statement method tree segment leaf The nonterminal symbol of node corresponds.
3. the statistical machine translation method according to claim 1 based on syntax skeleton, it is characterised in that: realize that non-syntax turns over Translate the integration of two kinds of translation system advantages of system and syntactic translation system are as follows:
Pass through the syntactic translation rule at source language end, the big coverage regular and that non-fully syntactic translation rule generates of non-syntactic translation The SCFG syntax create syntax skeleton in decoding process;
In the generating process of above-mentioned syntax skeleton structure, capture, will be between the sequencing in original language in syntactic structure ingredient The high-level translation duties of translation of the sentence distribute to syntactic translation system to handle;And the translation of sentence low level to be translated Task distributes to non-syntactic translation system to complete;The advantages of realizing different translation systems contributes to the translation duties being respectively good at In.
4. the statistical machine translation method according to claim 1 based on syntax skeleton, it is characterised in that: turned over according to different It translates task to integrate the syntax of non-syntactic translation system and syntactic translation system are as follows: in SCFG system, turn over each It translates rule induction and carries out weight calculation, more accurately to derive using various translation rules, each turn over is calculated using following formula Translate the score of rule induction d:
Wherein, s (d) be translation rule derive d score, t be target language end character string, the score of d be then defined as it is multiple because The product of son, comprising:
The weight product for the strictly all rules that syntax skeleton is included in factor 1:dWherein riIt is dsIn i-th Rule, w (r*) it is regular r*Weight;
The product for the strictly all rules weight that non-skeleton section is included in factor 2:dWherein rjFor dhIn J rule, w (r*) it is regular r*Weight;
The exponential weighting score of factor 3:n gram language model lm (t)λlmIndicate the weight of n gram language model;
The factor 4: vocabulary rewards exp (λwb| t |), wherein exp (| t |) indicates the e index calculated result of translation length, works as sentence It is longer, bigger, the λ of this rewardwbIt is the weight of vocabulary reward.
CN201610053560.2A 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton Active CN105573994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610053560.2A CN105573994B (en) 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610053560.2A CN105573994B (en) 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton

Publications (2)

Publication Number Publication Date
CN105573994A CN105573994A (en) 2016-05-11
CN105573994B true CN105573994B (en) 2019-03-22

Family

ID=55884144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610053560.2A Active CN105573994B (en) 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton

Country Status (1)

Country Link
CN (1) CN105573994B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10565318B2 (en) * 2017-04-14 2020-02-18 Salesforce.Com, Inc. Neural machine translation with latent tree attention
CN107273363B (en) * 2017-05-12 2019-11-22 清华大学 A kind of language text interpretation method and system
CN107729326B (en) * 2017-09-25 2020-12-25 沈阳航空航天大学 Multi-BiRNN coding-based neural machine translation method
CN110489529B (en) * 2019-08-26 2021-12-14 哈尔滨工业大学(深圳) Dialogue generating method based on syntactic structure and reordering
CN116108830B (en) * 2023-03-30 2023-07-07 山东大学 Syntax-controllable text rewriting method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942192A (en) * 2013-11-21 2014-07-23 北京理工大学 Bilingual largest noun group separating-fusing translation method
CN104268133A (en) * 2014-09-11 2015-01-07 北京交通大学 Machine translation method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101732634B1 (en) * 2010-08-23 2017-05-08 에스케이플래닛 주식회사 Statistical Machine Translation Method using Dependency Forest

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942192A (en) * 2013-11-21 2014-07-23 北京理工大学 Bilingual largest noun group separating-fusing translation method
CN104268133A (en) * 2014-09-11 2015-01-07 北京交通大学 Machine translation method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Effective incorporation of source syntax into hierarchicalEffective incorporation phrase-based translation;Tong Xiao等;《Proceedings of the 25th International Conference on Computational Linguistics: Technical Paper》;20141231;第2064-2074页
融入双语最大名词组块的树-串统计机器翻译模型;李业刚 等;《山东理工大学学报(自然科学版)》;20151130;第29卷(第6期);第11-15,19页

Also Published As

Publication number Publication date
CN105573994A (en) 2016-05-11

Similar Documents

Publication Publication Date Title
Liu et al. Machine translation: general
CN105573994B (en) Statictic machine translation system based on syntax skeleton
JP5774751B2 (en) Extracting treelet translation pairs
US7295963B2 (en) Adaptive machine translation
CA2469593C (en) Adaptive machine translation
US8249856B2 (en) Machine translation
CN104462072B (en) The input method and device of computer-oriented supplementary translation
KR101762866B1 (en) Statistical translation apparatus by separating syntactic translation model from lexical translation model and statistical translation method
CN100437557C (en) Machine translation method and apparatus based on language knowledge base
CN101458681A (en) Voice translation method and voice translation apparatus
CN105068997B (en) The construction method and device of parallel corpora
JP5586817B2 (en) Extracting treelet translation pairs
JP2004171575A (en) Statistical method and device for learning translation relationships among phrases
CN103782291A (en) Customization of natural language processing engine
JP3345763B2 (en) Natural language translator
CN106156013B (en) A kind of two-part machine translation method that regular collocation type phrase is preferential
Lavie et al. Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario
CN107491441B (en) Method for dynamically extracting translation template based on forced decoding
Gao et al. Chinese-Naxi machine translation method based on Naxi dependency language model
Li et al. Combining translation memories and syntax-based SMT: Experiments with real industrial data
Asscher The explanatory power of descriptive translation studies in the machine translation era
Wu et al. Improving neural machine translation with neural sentence rewriting
Dickinson et al. Dependency annotation of coordination for learner language
CN106021286A (en) Method for language understanding based on language structure
CN110147556A (en) A kind of construction method of multidirectional neural network translation system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220215

Address after: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: Calf Yazhi (Shenyang) Technology Co.,Ltd.

Address before: Room 1517, No. 55, Sanhao Street, Heping District, Shenyang, Liaoning 110003

Patentee before: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220714

Address after: 110004 11 / F, block C, Neusoft computer city, 78 Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

Address before: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee before: Calf Yazhi (Shenyang) Technology Co.,Ltd.