CN105573994A - Statistic machine translation system based on syntax framework - Google Patents

Statistic machine translation system based on syntax framework Download PDF

Info

Publication number
CN105573994A
CN105573994A CN201610053560.2A CN201610053560A CN105573994A CN 105573994 A CN105573994 A CN 105573994A CN 201610053560 A CN201610053560 A CN 201610053560A CN 105573994 A CN105573994 A CN 105573994A
Authority
CN
China
Prior art keywords
translation
rule
syntactic
syntax
skeleton
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610053560.2A
Other languages
Chinese (zh)
Other versions
CN105573994B (en
Inventor
肖桐
朱靖波
张春良
高瑜泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Yayi Network Technology Co ltd
Original Assignee
SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd filed Critical SHENYANG YAYI NETWORK TECHNOLOGY Co Ltd
Priority to CN201610053560.2A priority Critical patent/CN105573994B/en
Publication of CN105573994A publication Critical patent/CN105573994A/en
Application granted granted Critical
Publication of CN105573994B publication Critical patent/CN105573994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a statistic machine translation system based on a syntax framework. The translation process comprises the steps that 1, non-syntax translation rules are extracted through a probabilistic SCFG-level rule extraction method and used for translation of a non-framework part of a sentence to be translated; 2, syntax translation rules are extracted through a GHKM rule method and used for translation of a framework part of the sentence to be translated; 3, non-complete syntax translation rules are generated according to the syntax translation rules, and the non-syntax translation rules and the syntax translation rules are combined to realize integration of advantages of a non-syntax translation system and advantages of a syntax translation system; 4, a model is generated. According to the system, the syntax translation rules are used for translation of the syntax framework and long-distance sequencing, and the rules of the non-syntax translation system are used for processing low-level vocabulary translation and sequencing; the model is easy to realize, and the effect is remarkable.

Description

Based on the statictic machine translation system of syntax skeleton
Technical field
The present invention relates to the technology of in a kind of statistical machine translation, source statement method being carried out to modeling, is specifically a kind of statictic machine translation system based on syntax skeleton.
Background technology
In statistical machine translation (StatisticalMachineTranslation, SMT), there is different translation systems, such as based on phrase and the non-syntactic translation system based on level phrase, tree arrives the syntactic translation systems such as tree to string and string.Respective relative merits are had in different translation systems, such as, the tune sequence problem that syntactic translation system is complicated between the long distance of process and various composition there is obvious advantage, but when syntactic translation translation rule is completely more sparse or coverage rate is lower, the robustness problem of system will be there is, the effectiveness comparison of translation may be caused poor.And confirmed if distich method system simply realizes, its translation result not based on phrase and based on the non-syntactic translation systems such as level phrase obtain effective.In addition, non-syntactic translation system is when translating shorter sentence fragment, and accuracy rate is higher, and also has reasonable ability of regulation and control to the hierarchical structure of short-movie section.Can be that non-syntax system expressive ability when processing the word order of long distance is poor.
At present, in the translation (such as according to the tree obtained from syntactic analysis data to the mapping relations of string to replace the string on target language surface) of processing target language character string, in process, a kind of popular method is exactly utilize the information on source language end syntax and sentence structure instruct or perform decoding.This mode with start from the nineties based on word or the translation system based on word different, its source statement method model is that the syntax analytic tree of the source language end sentence relying on input generates.The benefit done like this is that it can strengthen the long ability apart from tune sequence problem complicated between transfer and various composition of model treatment.
In addition, why the use of source statement method can have good performance to be because it has the ability presenting sentence skeleton structure (syntactic structure) in mechanical translation.If we use the translational action of machine translation system analogy people, the interpretive scheme of this skeleton structure can show more outstanding: in artificial translation process, for a given source language end input sentence, people can utilize syntactical priori first in consciousness to sentence produce one high-level on sentence structure roughly or type, then according to translation and the order of this sentence structure or some sentence key components of type decided, the selection of vocabulary and the tune sequence work of local is completed afterwards again.Since the sentence skeleton structure of source language can represent with the syntax of source language, can can so unavoidably produce following problem: the syntactic structure Information application of source language to its action effect is the most outstanding in translation place? such as, can translate according to the skeleton structure information of source language, non-syntactic translation system can be utilized again to complete the advantage of good phrase translation simultaneously?
Disappointedly be, although sentence framework information to be integrated into the prospect very desirable in mechanical translation, but the statictic machine translation system that can realize based on syntax skeleton there is not yet report, syntax system and non-syntax system have different representations in addition, are also not quite similar when utilizing.And once had some scholars to attempt utilizing the syntax skeleton data of artificial mark, effect is bad, and implementation procedure is complicated.
Summary of the invention
For not carrying out good translation to the short-movie section of sentence in syntactic translation system in prior art and adjust sequence and the sparse and system robust sex chromosome mosaicism that causes of rule, and model can not effectively adjust sequence problem to the sentence element of long distance in non-syntactic translation system, the framework information of artificial mark such as to be wasted time and energy at the problem, the technical problem to be solved in the present invention is to provide a kind of statictic machine translation system based on syntax skeleton, the syntax skeleton high-level to source language carries out modeling, and good translation is carried out to the phrase of low level, a kind of novel representation of syntax skeleton is proposed simultaneously, so that machine translation system uses.
For solving the problems of the technologies described above, the technical solution used in the present invention is:
A kind of statictic machine translation system based on syntax skeleton of the present invention, comprises the following steps:
1) probability SCFG level Rule Extracting Algorithm extracts non-syntactic translation rule, the translation for the non-skeleton part of sentence to be translated:
Utilize the method for the heuristic restriction extracting level rule, through word alignment but the parallel sentence not carrying out syntactic analysis on extract probability SCFG grammar rule, utilize the translation of level phrase rule and non-syntactic translation rule treatments sentence low level to be translated structure;
2) GHKM rule and method extracts syntactic translation rule, the translation for the skeleton part of sentence to be translated:
Utilize GHKM Rule Extracting Algorithm through word alignment parallel sentence to the syntactic analysis result of source language end on extract GHKM rule, utilize the GHKM rule of above-mentioned extraction to be rewritten into syntactic translation rule.Utilize generation and the translation of the high-level skeleton structure of syntactic translation rule treatments;
3) non-fully syntactic translation generate rule:
Utilize syntactic translation generate rule non-fully syntactic translation rule, in conjunction with non-syntactic translation rule and syntactic translation rule, realize the integration of non-syntactic translation system and syntactic translation system two kinds of translation system advantages;
4) model generation:
According to above-mentioned non-fully syntactic translation rule, according to different translation duties, the syntax of syntax translation system and non-syntactic translation system i.e. translation rule set are integrated, generate non-fully syntactic translation to derive, utilize the non-syntactic translation rule treatments phrase of sentence low level to be translated or the translation of phrase, utilize syntactic translation rule to complete the translation duties of the high-level syntax skeleton structure of sentence to be translated; Utilize non-fully syntactic translation rules guide skeleton generative process and translation process; Collect non-syntactic translation rule, SCFG grammar system that the regular and non-fully syntactic translation generate rule one of syntactic translation has large coverage, and completed the combination of the multi-form syntax by non-fully syntactic translation rule.
The GHKM rule of above-mentioned extraction is utilized to be rewritten into syntactic translation rule and syntactic translation rule is: by the GHKM rule extracted, rule format is as follows:
Source statement method tree fragment > → target language string that source language phrase syntax label L EssT.LTssT.LT is root node with above-mentioned syntactic marker
Wherein " the source language phrase syntactic marker " of left part of a rule is by being defined phrase structure type label by linguistics syntactic knowledge, i.e. syntax nonterminal symbol; The fragment that " the syntax subtree fragment " of left part of a rule is sentence parsing tree, be tree construction, its leaf node can be terminal symbol word or nonterminal symbol, and these nonterminal symbols must belong to source statement method analyze in a certain class syntactic marker; The string that " the target language string " of right part of a rule is formed for target language terminal symbol word and nonterminal symbol, its nonterminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node.
Above-mentioned GHKM rule can be rewritten as syntactic translation rule by keeping the nonterminal symbol of syntax subtree segment boundaries and giving up inner tree construction
Source language phrase syntactic marker → < source language string, target language string >
Wherein " source language string " represents the sequence that " syntactic marker " of source language terminal symbol word, nonterminal symbol formation and correspondence is formed, the leaf node sequence of source statement method tree fragment in this sequence GHKM rule corresponding to syntactic rule; " target language string " string for being made up of " syntactic marker " of target language terminal symbol word, nonterminal symbol and correspondence, its nonterminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node.
Utilize non-syntactic translation rule and syntactic translation generate rule non-fully syntactic translation rule, non-fully syntactic translation rule format is expressed as:
Source language phrase syntactic marker → < source language string *, target language string *>
Wherein, " the source language phrase syntactic marker " of left part is a nonterminal symbol, " source language string *" for source language terminal symbol word, nonterminal symbol and extensive mark X form string, " target language string *" be the string that target language terminal symbol word, nonterminal symbol and extensive mark X are formed, its nonterminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node;
Non-fully syntactic translation rule is with the difference of syntactic translation rule: non-fully syntactic translation rule do not require nonterminal symbols all in rule must belong to source statement method analyze in a certain class phrase syntactic marker, and part nonterminal symbol is wherein X by reduction, represent that this nonterminal symbol does not belong to any syntactic analysis type.
Realize being combined into of non-syntactic translation system and syntactic translation system two kinds of translation system advantages:
In decode procedure, syntax skeleton is created by the large coverage SCFG syntax of the syntactic translation rule of source language end, non-syntactic translation rule and non-fully syntactic translation generate rule;
In the generative process of above-mentioned syntax skeleton structure, catch the tune sequence in syntactic structure in source language between composition, translation duties high-level for sentence to be translated is distributed to syntactic translation system to process.And the translation duties of sentence low level to be translated is distributed to non-syntactic translation system come; The advantage realizing different translation system contributes in the translation duties of being good at separately.
Be integrated into according to the syntax of different translation duties to non-syntactic translation system and syntactic translation system: in SCFG system, each translation rule is derived and carries out weight calculation, to utilize various translation rule to derive more accurately, utilize following formula to calculate the score of each translation rule derivation d:
s ( d ) = &Pi; r i &Element; d s w ( r i ) &times; &Pi; r j &Element; d h w ( r j ) &times; l m ( t ) &lambda; i m &times; exp ( &lambda; w b &CenterDot; | t | )
Wherein, s (d) is the score of translation rule derivation d, and t is the character string of target language end, and the score of d is then defined as the product of multiple factor, comprising:
The weight product of the strictly all rules that syntax skeleton (ds) comprises in factor 1:d wherein r id sin the i-th rule, w (r*) is the weight of regular r*;
Non-skeleton part (d in factor 2:d h) product of strictly all rules weight that comprises wherein r jfor d hin jth rule, w (r*) is the weight of regular r*;
The exponential weighting score of factor 3:n gram language model lm (t) λ lmrepresent the weight of n gram language model;
The factor 4: exp (λ rewarded in vocabulary wb| t|), wherein exp (| t|) represents the e index result of calculation of translation length, and when sentence is longer, this " award " is larger, λ wbit is the weight that vocabulary is rewarded.
The present invention has following beneficial effect and advantage:
1. the special syntactic structure information (syntax skeleton or referred to as skeleton) that present system employs oneself definition carries out the method translated, modeling can be carried out by the syntax skeleton high-level to source language, so that machine translation system uses, it is good in a framework combines two advantages: 1) apply syntactic translation rule and translate syntax skeleton and the tune sequence problem of long distance; 2) rule of non-syntactic translation system is applied to process the vocabulary translation of low level and to adjust sequence.
2. model of the present invention is very flexible, the derivation of non-syntax, non-fully syntax or full syntactic translation rule can be contained by an independent succinct syntax decoding normal form, can realize between syntactic translation rule and non-syntactic translation rule two-way excessive gradually, make translation system between syntactic translation system and non-syntactic translation system, selectively can use translation system.Therefore, non-syntactic translation system and syntactic translation system can be regarded as the two kinds of special cases utilizing the method to obtain, and model easily realizes, and Be very effective.
3. present system is also applicable to, generally based on the translation system of synchronous Grammars (SCFGs) framework up and down, to support realization easy in the translation system of SCFG syntax demoder, and confirm to accelerate the translation of system at one.
4. invention defines a kind of skeleton structure representation of novelty, that the first carries out automatic acquisition to syntax framework information, it can under the guidance of syntactic translation rule, non-fully syntactic translation rule and non-syntactic translation rule, realize the automatic acquisition of framework information, avoid a large amount of hand labors of mark framework information waste.
5. the present invention is different from traditional syntactic translation system, in the decode procedure of translation system, this invention achieves first to the translation of syntax structural framing, and exchange sequence and control, then under good syntax skeleton, realize the non-syntax translation of local segment, this is first time use this kind of method in current translation system.
Accompanying drawing explanation
Fig. 1 is the model framework figure of present system;
Fig. 2 is the sample figure extracting non-syntactic translation rule and syntactic translation rule in present system;
Fig. 3 is the procedure chart that the present invention produces syntax skeleton from a sample syntax;
Fig. 4 is the procedure chart based on a system decodes syntactic translation rule of tree in the present invention;
Fig. 5 is that the skeleton degree of depth illustrates the impact of translation quality;
Fig. 6 is the comparison diagram that different system produces translation result.
Embodiment
Below in conjunction with Figure of description, the present invention is further elaborated.
As shown in Figure 1, a kind of statictic machine translation system based on syntax skeleton of the present invention comprises the following steps:
1) probability SCFG level Rule Extracting Algorithm extracts non-syntactic translation rule, the translation for the non-skeleton part of sentence to be translated:
Utilize the method for the heuristic restriction extracting level rule, through word alignment but the parallel sentence not carrying out syntactic analysis on extract probability SCFG grammar rule, utilize the translation of non-syntactic translation rule and non-syntactic translation rule treatments sentence low level to be translated structure;
2) GHKM rule and method extracts syntactic translation rule, the translation for the skeleton part of sentence to be translated:
Utilize GHKM Rule Extracting Algorithm through word alignment parallel sentence to the syntactic analysis result of source language end on extract GHKM hierarchy type rule, the GHKM rule of above-mentioned extraction is utilized to be rewritten into syntactic translation rule and syntactic translation rule, process the high-level organization of sentence to be translated, namely the syntax translation of sentence syntactic structure;
3) generation of non-fully syntactic translation rule:
Utilize syntactic translation generate rule non-fully syntactic translation rule, and in conjunction with the use of non-syntactic translation rule, realize the combination of non-syntactic translation system and syntactic translation system two kinds of translation system advantages;
4) model generation:
According to above-mentioned non-fully syntactic translation rule, integrate according to the syntax (translation rule set) of different translation duties to non-syntactic translation system and syntactic translation system, generate non-fully syntactic translation to derive, by non-fully syntactic rule, different translation duties is identified, utilize the translation of non-syntactic translation rule treatments text low level (phrase or phrase), utilize syntactic translation rule and non-fully syntactic translation rule to complete the translation duties of text high-level (syntactic structure); Collect non-syntactic translation rule, SCFG grammar system that the regular and non-fully syntactic translation generate rule one of syntactic translation has large coverage.
Step 1) in, probability SCFG level rule extraction: the present invention utilizes through word alignment, but do not carry out the parallel sentence of syntax parsing to upper, utilize the inspiration method for limiting extracting level phrase rule to extract probability SCFG grammar rule, utilize the translation of level phrase rule and non-syntactic translation rule treatments sentence low level to be translated structure;
Step 2) in, under source language syntax tree information guiding, utilize the parallel sentence of word alignment to data pick-up GHKM rule, and be rewritten into the syntactic translation rule of SCFG formula, be about to the GHKM rule extracted, the rule format from following:
Source statement method P-marker (language string syntactic structure source, language string attribute source, source language string) → target language translation
By keeping the nonterminal symbol of syntax tree sheet section boundary and giving up the form that inner tree construction is rewritten as syntactic translation rule:
Source language phrase syntactic marker → < source language string, target language string >
Wherein " source language string " represents the sequence that " syntactic marker " of source language terminal symbol word, nonterminal symbol formation and correspondence is formed, the leaf node sequence of source statement method tree fragment in this sequence GHKM rule corresponding to syntactic rule; " target language string " string for being made up of " syntactic marker " of target language terminal symbol word, nonterminal symbol and correspondence, its nonterminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node.
Step 3) in, utilize source language end syntactic information, obtain syntax framework information, by the regulation and control of syntax translation rule and non-syntactic translation rule and reorganization, obtain non-fully syntactic translation rule, the form of non-fully syntactic translation rule is:
Source language phrase syntactic marker → < source language string *, target language string *>
Wherein, " the source language phrase syntactic marker " of left part is a nonterminal symbol, " source language string *" for source words and phrases (terminal symbol), nonterminal symbol and extensive mark X form sequence; " target language string *" be the string be made up of target words and phrases (terminal symbol), nonterminal symbol and extensive mark X, its terminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node;
Non-fully syntactic translation rule is with the difference of syntactic translation rule: non-fully syntactic translation rule do not require nonterminal symbols all in rule must belong to source statement method analyze in a certain class phrase syntactic marker, and part nonterminal symbol is wherein X by reduction, represent that this nonterminal symbol does not belong to any syntactic analysis type.
For the rule of each syntactic translation, its form can be rewritten, obtain non-fully syntactic translation rule, concrete mode is by one or two nonterminal symbol extensive one-tenth X of right part of a rule, and keep left part constant, non-fully syntactic translation rule can be converted into.
After syntactic translation rule, non-syntactic translation rule, non-fully syntactic translation rule are collected completely, strictly all rules is utilized to generate a larger SCFG grammar system, the guidance of deriving in sentence decode procedure to be translated is realized by non-fully syntactic translation rule, and produce corresponding syntactic structure, in different sentence level, utilize the advantage of different interpretative system.The advantage of non-syntactic translation system can be utilized when processing the translation of low level (such as phrase), the advantage of syntactic translation system when processing the translation duties of high-level (such as syntactic structure), can be utilized.
Realize being combined into of non-syntactic translation system and syntactic translation system two kinds of translation system advantages:
By the SCFG grammar system of the large coverage of generation, utilize non-fully syntactic translation rule, realize the transition gradually from syntax translation system to non-syntactic translation system, in derivation, create syntax skeleton;
Utilize above-mentioned non-fully syntactic translation rule and syntactic translation is regular catches the tune sequence treated in translation of the sentence between different constituent, and the translation duties of low level is distributed to non-syntactic translation rule process; The translation duties of high-level skeleton part is distributed to syntactic translation rule and non-fully syntactic rule process.
Step 4) in, according to above-mentioned non-fully syntactic translation rule, regulate and control according to the syntax of different translation duties to non-syntactic translation system and syntactic translation system, generate the SCFG grammar system of the large coverage of three types rule composition, in SCFG system, not only can carry out good tune sequence to sentence framework ingredient, and achieve the generation of the syntax skeleton of sentence, wherein can derive to each translation rule and carry out weight calculation, to utilize various translation rule to derive more accurately, utilize following formula to calculate the score of each translation rule derivation d:
s ( d ) = &Pi; r i &Element; d s w ( r i ) &times; &Pi; r j &Element; d h w ( r j ) &times; l m ( t ) &lambda; i m &times; exp ( &lambda; w b &CenterDot; | t | )
Wherein, s (d) is the score of translation rule derivation d, and t is the character string of target language end, and the score of d is then defined as the product of multiple factor, comprising:
Syntax skeleton (d in factor 1:d s) the weight product of strictly all rules that comprises wherein r id sin the i-th rule, w (r*) is the weight of regular r*;
Non-skeleton part (d in factor 2:d h) product of strictly all rules weight that comprises wherein r jfor d hin jth rule, w (r*) is the weight of regular r*;
The exponential weighting score of factor 3:n gram language model lm (t) λ lmrepresent the weight of n gram language model;
The factor 4: exp (λ rewarded in vocabulary wb| t|), wherein exp (| t|) represents the e index result of calculation of translation length less, and sentence is longer, and " award " is larger, λ wbit is the weight that vocabulary is rewarded.
Decoding application:
When this model is applied in decoding, syntax decoding is carried out by utilizing the SCFG synchronous context Grammars of the large coverage the generated sentence to be translated to source language end, in the process analyzed, utilize non-fully syntactic rule and syntactic rule to treat translation of the sentence analyze according to the structure of syntax skeleton, in the process analyzed, produce the syntax skeleton of sentence, and in the synchronous Grammars up and down of the large coverage SCFG of utilization generation, the target language derivation of rule partly produces the translation of target language end.If the derivation that each fragment has non-fully syntactic translation rule corresponding, then can obtain the structural information of local segment, if do not find corresponding non-fully syntactic translation rule, model can be found best translation in derivation space (comprising syntactic translation rule, non-syntactic translation rule, non-fully syntactic translation rule) and derive.
In the present invention, the Machine Translation Model framework based on syntax skeleton can be divided into three parts substantially: Rule, model generation, models applying etc.Model framework as shown in Figure 1.
First adopt above described method in bilingual alignment data and source statement method tree information, different modes is utilized to extract dissimilar translation rule, then according to source statement method feature, overwritten parts syntactic translation rule, generate appropriate non-fully syntactic translation structure to derive, connect various dissimilar derivation rule.Finally in decoding, skeleton pattern is utilized to find suitable derivation mode according to different levels translation duties.
One. translation rule obtains:
In the present invention, different rules adopts diverse ways to extract:
1) non-syntactic translation rule extraction:
Because the present invention realizes based on the SCFG syntax, for SCFG grammar rule, form below can be used to express:
LHS→<α,β,~>
Wherein LHS is a nonterminal symbol, α and β is the word sequence that source language end and target language end are made up of terminal symbol and nonterminal symbol respectively, ~ then represent the one-to-one relationship of nonterminal symbol in α and β.
For non-syntactic translation rule, utilize the method for the heuristic restriction extracting level rule, through word alignment but the parallel sentence not carrying out syntactic analysis on extract probability SCFG grammar rule, for the probability SCFG syntax obtained, given translation of the sentence can by finding most probable, the rule of maximum probability derive and decode.Fig. 2 gives the example that is extracted non-syntactic translation rule, and wherein nonterminal symbol is only marked as X.If arrangement set be made up of some such SCFG rule can be complete covering also derive source statement, then think that it is that a SCFG of this source statement derives grammatical.Regular h in such as figure 5, h 1and h 3the derivation that a sentence is right can be produced.
2) syntactic translation Rule:
The form of non-syntactic translation rule is the same with the form of regular syntax (syntax) translation rule substantially, be only non-syntactic translation rule is not generate according to the constraint of (source language end or target language end) syntax.If utilize the syntactic information of (source language end or target language end) on one side arbitrarily to retrain, we can obtain the derivation rule meeting syntactic information, namely syntactic translation rule, and for this reason, we can utilize following manner to obtain syntactic translation rule.
GHKM rule extraction:
In order to generate the syntactic rule of syntactical form, the present invention utilizes the method for main flow----utilize the syntax tree information of source language end as constraint and instruct, have the bilingual sentence of word alignment information on extract GHKM rule.
In the method extracting GHKM, the present invention's modeling on from source language syntax tree to target language string, a GHKM rule is by source language fragment s r, target language fragment t rwith their fragments (source language fragment and target language fragment) in nonterminal symbol corresponding relation composition, such as following formula is a GHKM rule:
VP (VV (raising) x 1: NN) → increasex 1
GHKM rule is rewritten:
Above-mentioned rule format is rewritten into SCFG rule format by the present embodiment, and concrete operations keep the annotation of nonterminal symbol foremost constant, abandons the tree structure information of nonterminal symbol inside, such as:
VP → < improves NN 1, increaseNN 1>
Wherein, VP is verb phrase, and VV is verb part of speech, and NN is noun part-of-speech, x 1for nonterminal symbol, NN 1for the variable that part of speech is noun.
With reference to SCFG rule, GHKM rule is transformed in invention, because all nonterminal symbols are all marked by the syntactic label of source language end, so all the constraint of correct syntax can be subject to when application generates syntactic translation rule.
The process extracting syntactic translation rule from a source language tree and target language string centering is given in Fig. 2, the present embodiment have ignored the multi-level tree construction of original GHKM rule, but remain the node of regular front end, so such rewriting operation can allow system have a reasonable generation ability to new sentence translation result.
In addition, utilize syntactic translation rule to carry out decoding and can regard SCFG syntactic analysis process as.A kind of popular method goes here and there parsing (or based on string decoding) exactly, and this method can decode (such as, CYK demoder) to input sentence in a form demoder.And in test set active language end resolving information situation, the method that we can utilize tree to resolve (or based on the decoding of setting) is decoded to analytic tree.In this case, because all derivations all must follow the syntax analytic tree of input, source language end syntactic information can be regarded as applying hard constraint, to increase accuracy.
3) non-fully syntactic translation rule (the present invention's definition) obtains
Non-syntactic translation system and syntactic translation system have respective relative merits, and such as, non-syntactic translation model has the excellent ability following Lexical rule in lexical choice and tune sequence, but has a lot of constraint when processing complicated one-tenth componental movement.The model of syntactic translation class can describe the motion of the level of composition by the syntax annotation in linguistics, and it also has outstanding performance on high-level syntax-based reordering.All there is sparse and limited covering degree problem in these two kinds of models.
In the ideal case, the advantage of two kinds of models can be applied to the maximum place of its effect degree: 1) syntactic translation model can process the tune sequence between the generation of high-level syntax skeleton and syntactic constituent; 2) non-syntactic translation rule can process the vocabulary translation of low level and adjust sequence.In order to reach this object, the present invention proposes a kind of can in conjunction with the method for two kinds of advantages in a model.In translation, re-using the syntax of non-syntactic translation and syntactic translation, and develop a kind of novel rule---non-fully syntactic translation rule, is used for syntactic translation rule and non-syntactic translation is regular cambicly couples together.
If the left part of a rule (LHS) is the syntactic label of a source language end, and right part (RHS) has at least a nonterminal symbol band X to indicate.Here is a non-fully syntactic translation rule:
VP → < improves X 1, increaseX 1>
NT → < improves X 2, increaseX 2>
Wherein left part represents a verb phrase (VP), and right part is the same with the non-syntactic translation rule of standard, contains nonterminal symbol X.This rule can be applied in non-syntactic translation of part is derived, and produce the derivation rule that take VP as root node.Then the rule of syntactic translation can replace this VP to derive as at ordinary times in syntax machine translation system, thus it is excessive to achieve from syntax translation system to non-syntactic translation system.
Two, skeleton pattern generates
Because non-fully syntactic translation rule can couple together non-syntactic translation rule and syntactic translation rule, so these two kinds all rules can be utilized to set up non-fully syntactic translation derivation rule, formation can generate the grammar system of syntax skeleton, namely the basis of skeleton pattern.Fig. 3 gives one from non-syntactic translation rule, the derivation that syntactic translation rule and non-fully syntactic translation rule build.In this derives, non-syntactic translation rule (h 3, h 6and h 8) be applied to the translation of low level.Syntactic rule (non-fully syntactic translation rule p is applied on deriving in the part of X 3with syntactic translation rule r 1and r 4), set up the derivation that meets sentence syntax skeleton.
In the present invention, this syntactic structure is that ((Fig. 3 upper right corner) creates, and it is referred to as syntax skeleton by the syntactic rule of source language end.It is a kind ofly have high-level syntax and leaf node have the tree fragment of terminal symbol or nonterminal symbol substantially.By using this skeleton structure, the tune sequence between composition in " to NPVP " can be captured easily, and non-syntactic translation rule is distributed in the translation of low level (" answer " and " being satisfied with ") process.
In order to obtain non-fully syntactic translation rule, use a kind of simple direct method.For the rule of each syntactic translation, become X by one or two nonterminal symbol reduction right part (RHS), and keep left part (LHS) constant, it can be changed into non-fully syntactic translation rule.Such as based on the r in the procedure chart (Fig. 4) of a system decodes syntactic translation rule of tree 5(VP → < is to NP 1vP 2, VP 2withNP 1>), three non-fully syntactic translation rules can be obtained:
VP → < is to X 1x 2, X 2withX 1)
VP → < is to X 1vP 2, VP 2withX 1>
VP → < is to NP 1x 2, X 2withNP 1>
Once all rules comprise, non-syntactic translation rule, syntactic translation rule and non-fully syntactic translation rule ready, just utilize them to set up a larger SCFG and derive grammatical and apply it in demoder.The log-linear method of weight is utilized to carry out the weight of computation rule.The same based on SCFG model with standardized, there is following feature for LHS → < α, β, ~ >:
1. translation probability P (α | β) and P (β | α) use the relevant frequency to estimate, these two probability are forward translation probability and reverse translation probability respectively.
2. the weight Plex (α | β) of vocabulary and Plex (β | α) uses the method for discovery learning to estimate.
3., for non-syntactic translation rule, it is different respectively that syntactic translation rule and the rule of non-fully syntactic translation rule reward (exp (1)).
4. define instruction glue rule, the indicator of lexical rule and nonlexicalized rule, model learning can be allowed to select specific rule.
5. the number (exp (#)) of nonterminal symbol X in source language end non-fully syntactic translation rule, which control the compatible degree that model offends syntax.
The present invention defines derivation weight (score) in a model.Definition d is the derivation of above-mentioned syntax.In order to the rule of syntax (namely syntactic translation rule and non-fully syntactic translation rule) and non-syntactic translation rule are differentiated, definition d is a tuple <d s, d h>, wherein d sthe local derivation of skeleton structure, d hbe used to the regular collection setting up the derivation of d remainder.Such as, in fig. 2, d s={ r 4, r 1, p 3, d in addition h={ h 6, h 8, h 3.
Definition t is the character string of target language end coding, and then the score of d just can be defined as the result of the regular weight of the continued product that has n-gram language model lm (t) and vocabulary award exp (| t|).
s ( d ) = &Pi; r i &Element; d s w ( r i ) &times; &Pi; r j &Element; d h w ( r j ) &times; l m ( t ) &lambda; i m &times; exp ( &lambda; w b &CenterDot; | t | )
Wherein w (r*) is the weight of regular r*, λ lmand λ wbthe feature weight that language model and vocabulary are rewarded respectively.
In addition, for model of the present invention, framework is very flexible, and it includes syntactic translation and non-syntactic translation model especially.Such as, if (that is d is just made up of non-syntactic translation rule ), so it is exactly that a non-syntactic translation is derived.Equally, if (that is a derivation d is just made up of syntactic translation rule ), so it is exactly that a syntactic translation formula is derived.What the present invention illustrated is exactly how with the derivation space of non-fully syntactic translation rules guide to non-syntactic translation and syntactic translation formula.Demoder can select best derivation according to model score from the derivation rule expanded.
Three, model is applied in decoding:
Model of the present invention can regard the problem that a string is resolved in use as, because it uses the text string of syntactic rule to source language end of source language end to resolve, uses the rule induction information of target language end to generate the translation result of target language.So it is produced by rule induction and have the target language string of top score that translation result can be taken as.In this invention, system is beneficial to and realizes based on CYK demoder, and demoder make use of beamsearch and cubepruning technology, can use the binaryzation rule obtained through synchronous binarization method.
Be introduced into owing to having a large amount of non-fully syntactic translation rules, cause decoding speed very slow.In order to the decode system that raises speed, use several pruning method to carry out beta pruning to search volume further, reduce search volume.First, morphology or non-fully syntactic translation rule that those reach are greater than 3 is abandoned.Why removing these rules is because they are the main causes reducing decoding speed, and they are not very helpful to last translation result.In addition, non-morphological rule and non-fully syntactic translation rule that those right parts (RHS) only have nonterminal symbol X is given up.At most of conditions, such rule can not play syntactical restriction guiding function.Such as say, regular VP → <X 1x 2, X 2x 1it is too general that > exists, if introduce a VP sentence element in two continuous blocks without any morphology or syntax sign, is very reasonless, can not plays what effect because do like this.
Except carrying out except beta pruning to rule, can also with a parameter w scontrol the degree of depth of syntax skeleton.If be assigned to w sa very little value, so system can the less syntax skeleton (with less syntactic rule) of compulsory use one.In extreme circumstances, if parameter w s=0, system then can become a typical non-syntactic translation system by rollback; Similarly, if parameter value w s=+∞, system can consider the syntax skeleton of any degree of depth.So we can to parameter w on test set stuning finds an equilibrium point.
In order to accelerating system, we also apply the technology that some trees are resolved.Except source statement, we also add demoder the syntax analytic tree of source language.First we utilize the non-syntactic translation rule generally used in non-syntactic translation system to resolve source statement, but when we process fragment corresponding with syntax tree composition in the language of source, do not limit the distance of application rule.Then, we utilize syntactic translation rule on parsing tree.If a syntactic translation rule of source language end can match an input tree fragment, then: 1) this rule can be converted to non-fully syntactic translation rule (see the 3rd part); 2) the non-fully syntactic translation rule of syntactic translation and correspondence can be added in list of rules, these lists and linking with the CYK grid cell that source statement method sets fragment corresponding.Fig. 4 gives the example setting coupling in a demoder.Afterwards, remaining decoding step (such as build translation hypergraph, and language model intersects) can normal process.This method can effectively matching and decoding demand (non-fully) syntactic translation rule, and, do not need to carry out binary conversion treatment to rule.Is hard constraint given by source statement method tree, and as a balance process, we can introduce some derivations to syntax sensitivity.
Four, test
The present invention tests their method in English-Chinese (en-zh) and Chinese-English (zh-en) translation.
1) baseline system experimentation is arranged
The present invention uses the 2740000 Chinese-English bilingual sentences selected from NIST12OpenMT right.Utilizing after GIZA++ instrument produces two-way word alignment by bilingual text, the present invention uses the method for grow-diag-final-and to obtain the word alignment file of symmetrization.For syntactic analysis, first the present invention uses Berkeley parser to process respectively both sides data, then utilizes popular leftmost derivation method to carry out binaryzation to parsing tree, so that better production on test set.Concentrate extraction based on syntax (or syntactic translation) rule from whole training data, and five nonterminal symbols can only be had at most in rule.And for non-syntactic translation system, level rule (non-syntactic translation) be from 940,000 sentence subset extract, and the nonterminal symbol in every rule is no more than two, and phrase rule is then extract from whole training set.Here all rules are all use Open-Source Tools bag NiuTrans to obtain.
The present invention trained two 5 gram language model: one be in English Gigaword data Xinhua part and bilingual data English components on train, this model is used in Chinese-English translation system; Another is trained in Xinhua's part of Chinese Gigaword data and the Chinese part of bilingual data, and this model is applied in English-Chinese translation system.All language models all use corrected Hneser-Ney smoothing method smoothing.
For Chinese-English translation system, the present invention evaluates system at News Field and online data respectively.Tuning collection of the present invention (News Field: 1198 sentences, web data: 1308 sentences) is evaluation and test data and the GALE data of quoting NIST mechanical translation 04-06.Test set (News Field: 1779 sentences, web:1768 sentence) then comprises the evaluation and test data of all News Fields and network data in NIST08,12 machine evaluation and tests and 08-progress.For English-Chinese translation system, tuning collection of the present invention (995 sentences) and test set (1859 sentences) are the evaluating data of SSMT07 and NISTMT08 Chinese-English translation record respectively.Active language end parsing tree all use the method the same with processing training data to process.
2) machine translation system based on syntax skeleton is tested
The method that the present invention is mentioned to according to model applying portion in decoding is to realize their CYK demoder.Under default setting, use string in experiment and resolved, under initial situation, parameter w sbe set to+∞.All feature weights all use the method for MERT to carry out tuning.Because MERT has the possibility obtaining local optimum result, so we have carried out 5 operations to each experiment, and all give different initial characteristic values at every turn.In evaluation portion, we use unmodified BLEU4 and unmodified BLEU5 to evaluate Chinese-English and English-Chinese translation system respectively.
3) based on the machine translation system experimental result of syntax skeleton
Table 1 is experimental result, and the system wherein based on syntax skeleton is write a Chinese character in simplified form into SYNSKEL.First can see that SYNSKEL system is all significantly improved on 3 test sets.The parsing tree of use CTB formula obtains an average BLEU more than 0.6 to be worth improving, and the syntax tree through y-bend can obtain average BLEU value improvement more than 0.9.And utilize the method for analytic tree can apply (part) syntactic rule normal use during non-syntactic translation rule well, obtain good result.It obtains and goes here and there the suitable BLEU value of analytic method.But, in the forest of a y-bend, put into more tree what raising effect is not had to result.These interesting results show, in very large derivation space, are difficult to by considering that more optional syntactic structures through y-bend introduce the derivation of some novelties.
Experimental result under table 1 different system
In addition, at different skeleton depth capacity (namely parameter w s) under have studied the result of system.Fig. 5 describes too large skeleton might not obtain better result, and wherein BLEU is the index evaluating translation quality.Controling parameters w suse shell system can obtain gratifying raising when≤5, comparing with all using during shell system, decreasing the decode time of 27% nearly.
The utilization rate of different each rule-like is as shown in table 2, and the regular utilization rate of the non-fully syntactic translation type of visible the present invention's definition is the highest, and achieves good translation effect.
Different utilization rate of deriving on table 2 tuning collection
4) interpretation
The present invention have studied the frequency of the dissimilar derivation of system call after testing.Table 2 illustrates in three different tasks, the tendency when syntactic translation of Systematic selection non-fully is derived and non-syntactic translation is derived.The severe shown in English-Chinese translation task syntax and non-fully syntactic translation are derived uses, immediately following thereafter be News Field and the network data translation duties of Chinese-English translation.This result reflects to a certain extent and analyzes quality is discrepant at different language and FIELD Data.
1. translation quality promotes:
Test findings shows, the system that the present invention is based on syntax skeleton is write a Chinese character in simplified form into SYNSKEL.First can see that SYNSKEL system is all significantly improved on 3 test sets.The syntax tree of use CTB formula obtains an average BLEU more than 0.6 to be worth improving, and the syntax tree through y-bend can obtain average BLEU value improvement more than 0.9.And utilize the method for analytic tree well when the non-syntactic translation rule of normal use, syntactic translation rule and non-fully syntactic translation rule can be applied, obtain good result.It obtains and goes here and there the suitable BLEU value of analytic method.
2. good tune sequence controls:
According to the contrast experiment's data provided in the present invention, 5 translation results that Fig. 6 tuning concentrates arrangement the most front, are used for contrasting translation results different on same tuning collection.In addition, the translation result that non-syntactic translation system exports is in disorder, and adjusts sequence to be also wrong, and syntactic translation translation system resolves the " right of just difficulty to original syntax 3difficulty 20" the translation effect of structure is also poor.Under contrast, SYNSKEL system " also employs span source language word 2worry 22" all above-mentioned rule, and owing to being in non-syntactic translation system, so system is " right to what derived cover by the non-syntactic translation in local 3difficulty 20" translation result of structure is also relatively good.
3. syntactic structure better identifies:
The bottom of Fig. 6 illustrates one and to be derived the true translation example produced by this rule, can to see in SYNSKEL system that this rule coverage source language word is " right 3annual pay 20", and successfully identify " ... ... " adjust sequence structure.Note, although also have so regular X → <X in non-syntactic translation system 1x 2, X 2ofX 1> can translate " ... ... " (de) structure.But when span becomes large, the word sequence that such as translation one table is longer " should 8dollar 15", non-syntactic translation system will lose the ability of this tune sequence.

Claims (5)

1., based on a statictic machine translation system for syntax skeleton, it is characterized in that comprising the following steps:
1) probability SCFG level Rule Extracting Algorithm extracts non-syntactic translation rule, the translation for the non-skeleton part of sentence to be translated:
Utilize the method for the heuristic restriction extracting level rule, through word alignment but the parallel sentence not carrying out syntactic analysis on extract probability SCFG grammar rule, utilize the translation of level phrase rule and non-syntactic translation rule treatments sentence low level to be translated structure;
2) GHKM rule and method extracts syntactic translation rule, the translation for the skeleton part of sentence to be translated:
Utilize GHKM Rule Extracting Algorithm through word alignment parallel sentence to the syntactic analysis result of source language end on extract GHKM rule, utilize the GHKM rule of above-mentioned extraction to be rewritten into syntactic translation rule.Utilize generation and the translation of the high-level skeleton structure of syntactic translation rule treatments;
3) non-fully syntactic translation generate rule:
Utilize syntactic translation generate rule non-fully syntactic translation rule, in conjunction with non-syntactic translation rule and syntactic translation rule, realize the integration of non-syntactic translation system and syntactic translation system two kinds of translation system advantages;
4) model generation:
According to above-mentioned non-fully syntactic translation rule, according to different translation duties, the syntax of syntax translation system and non-syntactic translation system i.e. translation rule set are integrated, generate non-fully syntactic translation to derive, utilize the non-syntactic translation rule treatments phrase of sentence low level to be translated or the translation of phrase, utilize syntactic translation rule to complete the translation duties of the high-level syntax skeleton structure of sentence to be translated; Utilize non-fully syntactic translation rules guide skeleton generative process and translation process; Collect non-syntactic translation rule, SCFG grammar system that the regular and non-fully syntactic translation generate rule one of syntactic translation has large coverage, and completed the combination of the multi-form syntax by non-fully syntactic translation rule.
2. by the statictic machine translation system based on syntax skeleton according to claim 1, it is characterized in that: namely syntactic translation rule is to utilize the GHKM rule of above-mentioned extraction to be rewritten into syntactic translation rule: the GHKM of extraction is regular, and rule format is as follows:
Source statement method tree fragment > → target language string that source language phrase syntax label L EssT.LTssT.LT is root node with above-mentioned syntactic marker
Wherein " the source language phrase syntactic marker " of left part of a rule is by being defined phrase structure type label by linguistics syntactic knowledge, i.e. syntax nonterminal symbol; The fragment that " the syntax subtree fragment " of left part of a rule is sentence parsing tree, be tree construction, its leaf node can be terminal symbol word or nonterminal symbol, and these nonterminal symbols must belong to source statement method analyze in a certain class syntactic marker; The string that " the target language string " of right part of a rule is formed for target language terminal symbol word and nonterminal symbol, its nonterminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node.
Above-mentioned GHKM rule can be rewritten as syntactic translation rule by keeping the nonterminal symbol of syntax subtree segment boundaries and giving up inner tree construction
Source language phrase syntactic marker → < source language string, target language string >
Wherein " source language string " represents the sequence that " syntactic marker " of source language terminal symbol word, nonterminal symbol formation and correspondence is formed, the leaf node sequence of source statement method tree fragment in this sequence GHKM rule corresponding to syntactic rule; " target language string " string for being made up of " syntactic marker " of target language terminal symbol word, nonterminal symbol and correspondence, its nonterminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node.
3. by the statictic machine translation system based on syntax skeleton according to claim 1, it is characterized in that: utilize non-syntactic translation rule and syntactic translation generate rule non-fully syntactic translation rule, non-fully syntactic translation rule format is expressed as:
Source language phrase syntactic marker → < source language string *, target language string *>
Wherein, " the source language phrase syntactic marker " of left part is a nonterminal symbol, " source language string *" for source language terminal symbol word, nonterminal symbol and extensive mark X form string, " target language string *" be the string that target language terminal symbol word, nonterminal symbol and extensive mark X are formed, its nonterminal symbol mark and source statement method set the nonterminal symbol one_to_one corresponding of fragment leaf node;
Non-fully syntactic translation rule is with the difference of syntactic translation rule: non-fully syntactic translation rule do not require nonterminal symbols all in rule must belong to source statement method analyze in a certain class phrase syntactic marker, and part nonterminal symbol is wherein X by reduction, represent that this nonterminal symbol does not belong to any syntactic analysis type.
4. by the statictic machine translation system based on syntax skeleton according to claim 1, it is characterized in that: realize being combined into of non-syntactic translation system and syntactic translation system two kinds of translation system advantages:
In decode procedure, syntax skeleton is created by the large coverage SCFG syntax of the syntactic translation rule of source language end, non-syntactic translation rule and non-fully syntactic translation generate rule;
In the generative process of above-mentioned syntax skeleton structure, catch the tune sequence in syntactic structure in source language between composition, translation duties high-level for sentence to be translated is distributed to syntactic translation system to process.And the translation duties of sentence low level to be translated is distributed to non-syntactic translation system come; The advantage realizing different translation system contributes in the translation duties of being good at separately.
5. by the statictic machine translation system based on syntax skeleton according to claim 1, it is characterized in that: be integrated into according to the syntax of different translation duties to non-syntactic translation system and syntactic translation system: in SCFG system, each translation rule is derived and carries out weight calculation, to utilize various translation rule to derive more accurately, utilize following formula to calculate the score of each translation rule derivation d:
s ( d ) = &Pi; r i &Element; d s w ( r i ) &times; &Pi; r j &Element; d h w ( r j ) &times; l m ( t ) &lambda; i m &times; exp ( &lambda; w b &CenterDot; | t | )
Wherein, s (d) is the score of translation rule derivation d, and t is the character string of target language end, and the score of d is then defined as the product of multiple factor, comprising:
Syntax skeleton (d in factor 1:d s) the weight product of strictly all rules that comprises wherein r id sin the i-th rule, w (r *) be regular r *weight;
Non-skeleton part (d in factor 2:d h) product of strictly all rules weight that comprises wherein r jfor d hin jth rule, w (r *) be regular r *weight;
The exponential weighting score of factor 3:n gram language model lm (t) λ lmrepresent the weight of n gram language model;
The factor 4: exp (λ rewarded in vocabulary wb| t|), wherein exp (| t|) represents the e index result of calculation of translation length, and when sentence is longer, this " award " is larger, λ wbit is the weight that vocabulary is rewarded.
CN201610053560.2A 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton Active CN105573994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610053560.2A CN105573994B (en) 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610053560.2A CN105573994B (en) 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton

Publications (2)

Publication Number Publication Date
CN105573994A true CN105573994A (en) 2016-05-11
CN105573994B CN105573994B (en) 2019-03-22

Family

ID=55884144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610053560.2A Active CN105573994B (en) 2016-01-26 2016-01-26 Statictic machine translation system based on syntax skeleton

Country Status (1)

Country Link
CN (1) CN105573994B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273363A (en) * 2017-05-12 2017-10-20 清华大学 A kind of language text interpretation method and system
CN107729326A (en) * 2017-09-25 2018-02-23 沈阳航空航天大学 Neural machine translation method based on Multi BiRNN codings
CN110489529A (en) * 2019-08-26 2019-11-22 哈尔滨工业大学(深圳) Dialogue generation method that is a kind of based on syntactic structure and reordering
CN110506279A (en) * 2017-04-14 2019-11-26 易享信息技术有限公司 Using the neural machine translation of hidden tree attention
CN116108830A (en) * 2023-03-30 2023-05-12 山东大学 Syntax-controllable text rewriting method and device
CN117933372A (en) * 2024-03-22 2024-04-26 山东大学 Data enhancement-oriented vocabulary combined knowledge modeling method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130158975A1 (en) * 2010-08-23 2013-06-20 Sk Planet Co., Ltd. Statistical machine translation method using dependency forest
CN103942192A (en) * 2013-11-21 2014-07-23 北京理工大学 Bilingual largest noun group separating-fusing translation method
CN104268133A (en) * 2014-09-11 2015-01-07 北京交通大学 Machine translation method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130158975A1 (en) * 2010-08-23 2013-06-20 Sk Planet Co., Ltd. Statistical machine translation method using dependency forest
CN103942192A (en) * 2013-11-21 2014-07-23 北京理工大学 Bilingual largest noun group separating-fusing translation method
CN104268133A (en) * 2014-09-11 2015-01-07 北京交通大学 Machine translation method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TONG XIAO等: "Effective incorporation of source syntax into hierarchicalEffective incorporation phrase-based translation", 《PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS: TECHNICAL PAPER》 *
李业刚 等: "融入双语最大名词组块的树-串统计机器翻译模型", 《山东理工大学学报(自然科学版)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110506279A (en) * 2017-04-14 2019-11-26 易享信息技术有限公司 Using the neural machine translation of hidden tree attention
CN110506279B (en) * 2017-04-14 2024-04-05 硕动力公司 Neural machine translation with hidden tree attention
CN107273363A (en) * 2017-05-12 2017-10-20 清华大学 A kind of language text interpretation method and system
CN107273363B (en) * 2017-05-12 2019-11-22 清华大学 A kind of language text interpretation method and system
CN107729326A (en) * 2017-09-25 2018-02-23 沈阳航空航天大学 Neural machine translation method based on Multi BiRNN codings
CN107729326B (en) * 2017-09-25 2020-12-25 沈阳航空航天大学 Multi-BiRNN coding-based neural machine translation method
CN110489529A (en) * 2019-08-26 2019-11-22 哈尔滨工业大学(深圳) Dialogue generation method that is a kind of based on syntactic structure and reordering
CN110489529B (en) * 2019-08-26 2021-12-14 哈尔滨工业大学(深圳) Dialogue generating method based on syntactic structure and reordering
CN116108830A (en) * 2023-03-30 2023-05-12 山东大学 Syntax-controllable text rewriting method and device
CN117933372A (en) * 2024-03-22 2024-04-26 山东大学 Data enhancement-oriented vocabulary combined knowledge modeling method and device
CN117933372B (en) * 2024-03-22 2024-06-07 山东大学 Data enhancement-oriented vocabulary combined knowledge modeling method and device

Also Published As

Publication number Publication date
CN105573994B (en) 2019-03-22

Similar Documents

Publication Publication Date Title
CN105573994A (en) Statistic machine translation system based on syntax framework
KR101762866B1 (en) Statistical translation apparatus by separating syntactic translation model from lexical translation model and statistical translation method
CN104462072B (en) The input method and device of computer-oriented supplementary translation
CN106484681A (en) A kind of method generating candidate&#39;s translation, device and electronic equipment
CN107066455A (en) A kind of multilingual intelligence pretreatment real-time statistics machine translation system
CN104915337B (en) Translation chapter integrity assessment method based on bilingual structure of an article information
CN103116578A (en) Translation method integrating syntactic tree and statistical machine translation technology and translation device
Patejuk et al. Towards an LFG parser for Polish
CN101714136A (en) Method and device for adapting a machine translation system based on language database to new field
CN106156013B (en) A kind of two-part machine translation method that regular collocation type phrase is preferential
CN103020045B (en) Statistical machine translation method based on predicate argument structure (PAS)
Babhulgaonkar et al. Statistical machine translation
Lavie et al. Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario
CN107526726A (en) A kind of method that Chinese procedural model is automatically converted to English natural language text
Liu et al. Maximum entropy based rule selection model for syntax-based statistical machine translation
Singh et al. An English-assamese machine translation system
Li et al. Combining translation memories and statistical machine translation using sparse features
Wu et al. Improving neural machine translation with neural sentence rewriting
CN107526727B (en) Language generation method based on statistical machine translation
CN110147556A (en) A kind of construction method of multidirectional neural network translation system
CN111597831A (en) Machine translation method for generating statistical guidance by hybrid deep learning network and words
Ortiz-Martínez et al. Interactive machine translation based on partial statistical phrase-based alignments
Mukesh et al. Statistical machine translation
Pérez et al. Speech translation with phrase based stochastic finite-state transducers
Shaalan et al. Automatic rule induction in Arabic to English machine translation framework

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220215

Address after: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: Calf Yazhi (Shenyang) Technology Co.,Ltd.

Address before: Room 1517, No. 55, Sanhao Street, Heping District, Shenyang, Liaoning 110003

Patentee before: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220714

Address after: 110004 11 / F, block C, Neusoft computer city, 78 Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee after: SHENYANG YAYI NETWORK TECHNOLOGY CO.,LTD.

Address before: 110004 1001 - (1103), block C, No. 78, Sanhao Street, Heping District, Shenyang City, Liaoning Province

Patentee before: Calf Yazhi (Shenyang) Technology Co.,Ltd.