WO2015083762A1 - 学習装置、翻訳装置、学習方法、および翻訳方法 - Google Patents
学習装置、翻訳装置、学習方法、および翻訳方法 Download PDFInfo
- Publication number
- WO2015083762A1 WO2015083762A1 PCT/JP2014/082058 JP2014082058W WO2015083762A1 WO 2015083762 A1 WO2015083762 A1 WO 2015083762A1 JP 2014082058 W JP2014082058 W JP 2014082058W WO 2015083762 A1 WO2015083762 A1 WO 2015083762A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- source language
- language
- unit
- elements
- label
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/47—Machine-assisted translation, e.g. using translation memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- the present invention relates to a learning apparatus for learning a model used for machine translation.
- pre-ordering that rearranges the word order before selecting a translation word.
- Non-Patent Document 1 Many existing pre-ordering techniques have been proposed that utilize a source language parser (see Non-Patent Document 1, for example).
- Non-Patent Document 2 a method that does not use a parser at all has been proposed (for example, see Non-Patent Document 2).
- Non-Patent Document 1 cannot be used when there is no source language syntax analyzer. Further, when the performance of the source language parser is low, the performance of word order rearrangement is also low.
- Non-Patent Document 2 has a problem in the performance of rearranging the word order because the syntax analyzer is not used.
- the learning apparatus includes a bilingual corpus that can store one or more bilingual sentences having a source language sentence and a target language sentence that is a translation result of the source language sentence, an element of the source language, and an element of the target language Is a result of syntactic analysis of an object language storage unit that can store one or more element pairs that are paired with each other, and a target language sentence included in each of the one or more parallel translation sentences.
- a source language element acquisition unit that acquires one or more elements of a source language corresponding to an element of a target language from one or more element pairs of an element pair storage unit, and one or more target languages included in a binary tree of target language sentences
- the structure indicated by the partial structure is applied to one or more elements of the source language constituting the source language sentence, indicating the order of the two or more elements constituting the source language sentence, and a parent node having a phrase label and a child of the parent node
- a label assigning unit that assigns a reordering label, which is a label that can be distinguished from a source language partial structure having the same code order, to one or more source language partial structures and obtains one or more labeled source language partial structures
- a model construction unit that constructs one or more parsing models having probability information indicating the likelihood of appearance of labeled source language substructures using one or more labeled source language substructures;
- a storage unit that stores one or more constructed parsing models.
- the learning device uses the one or more source language partial structures indicating the order of the two or more elements constituting the source language sentence, as compared with the first invention, to provide the elements of the target language sentence.
- a reordering unit that obtains one or more source language partial structures in which the order of two or more elements constituting the source language sentence is rearranged so that the order becomes closer to a predetermined condition with respect to the order of
- the label providing unit is a learning device that assigns a label for rearrangement to one or more source language partial structures rearranged by the rearrangement unit.
- the learning device is a parent node having a phrase label and a child node of the parent node and having two phrase labels or source language POS tags, relative to the first or second aspect of the invention.
- a statistical model storage unit that can store a statistical model of a CFG rule including a node
- the source language partial structure acquisition unit has a structure indicated by one or more target language partial structures included in the binary tree of the target language sentence, Applies to one or more elements of the source language that make up the source language sentence, indicates the order of the two or more elements that make up the source language sentence, is a parent node having a phrase label and a child node of the parent node, and is a phrase label or source Source language partial structure acquisition means for acquiring one or more source language partial structures having two child nodes having language elements, and one or more source language partial structures acquired by the source language partial structure acquisition means
- the learning device includes a partial structure complementing unit that applies a statistical model to the source language partial structure and obtains the complete source
- the translation device provides a binary tree storage unit that stores one or more parsing models accumulated by the learning device and any of the first to third aspects of the invention, Received by the reception unit using an element pair storage unit that can store two or more element pairs that are pairs of elements and target language elements, a reception unit that receives source language sentences, and one or more parsing models
- a labeled source language substructure acquisition unit that acquires one or more labeled source language substructures from two or more elements of a source language sentence, and a reordering label that the one or more labeled source language substructures have If the order of the two child nodes included in the language substructure is different from the order of the two child nodes included in the source language substructure, the labeled source language part corresponding to the rearrangement label Structure has 2
- the translation device of the fifth aspect of the invention provides that the source language elements of one or more labeled source language substructures constituting a binary tree of one or more source language sentences are POS.
- a labeled source language partial structure acquisition unit further comprising a morpheme analysis unit that corresponds to a tag and analyzes a source language sentence received by the reception unit and acquires two or more elements corresponding to the POS tag.
- a translation apparatus that acquires one or more labeled source language partial structures from two or more elements associated with a POS tag using a binary tree of one or more source language sentences.
- the learning apparatus can learn a model that enables highly accurate translation.
- the block diagram of the learning apparatus 1 in Embodiment 1 of this invention A flowchart for explaining the operation of the learning device 1 A flowchart for explaining the operation of the learning device 1
- the figure which shows the binary tree which the same syntax analysis part 14 acquires
- the figure which shows the element of the source language which the source language element acquisition part 15 acquires Diagram showing the concept of the source language substructure Figure showing the source language substructure Figure showing a binary tree with one or more source language substructures Figure showing a binary tree with one or more source language substructures
- Block diagram of translation apparatus 2 in Embodiment 2 of the present invention Flowchart for explaining the operation of translation apparatus 2
- the figure which shows the binary tree which has the source language substructure with 1 or more same labels The figure which shows the result which rearranges the word of the same source language sentence order in the word order of the target language Diagram showing the results of translation quality evaluation Overview of the computer system in the above embodiment Block diagram of the computer system
- Embodiment 1 of the present invention a learning apparatus for learning a preordering model will be described. Further, in the present embodiment, a learning apparatus that learns a pre-ordering model based on a constraint that changes the word order within a certain range during translation will be described. Further, in the present embodiment, a learning device that uses a statistical model will be described.
- FIG. 1 is a block diagram of the learning apparatus 1 in the present embodiment.
- the learning device 1 includes a recording medium 10, a parallel corpus 11, an element pair storage unit 12, a statistical model storage unit 13, a syntax analysis unit 14, a source language element acquisition unit 15, a source language substructure acquisition unit 16, a rearrangement unit 17, A label providing unit 18, a model building unit 19, and a storage unit 20 are provided.
- the source language partial structure acquisition unit 16 includes a source language partial structure acquisition unit 161 and a partial structure complement unit 162.
- the recording medium 10 can store a pre-ordering model (hereinafter referred to as “pre-ordering model” as appropriate).
- the pre-ordering model is one or more syntax analysis models stored in the storage unit 20.
- the parsing model is a binary tree of source language sentences.
- a binary tree of source language sentences is a binary tree that can be composed of source language sentences. This binary tree has one or more labeled source language substructures.
- the parsing model usually has probability information indicating the ease of appearance of the labeled source language substructure.
- the labeled source language substructure includes a parent node and two child nodes.
- the parent node has a phrase label.
- the parent node may be, for example, the phrase label itself, or may have an ID for identifying the parent node and a phrase label.
- the parent node may have information for identifying its own child nodes.
- the parent node usually has a rearrangement label.
- the two child nodes are nodes below the corresponding parent node, and are also referred to as sibling nodes.
- Child nodes have phrase labels or source language elements.
- the child node may be a phrase label or a source language element itself, and may have an ID for identifying the child node and a phrase label or a source language element.
- the data structure of the parent node and the child node does not matter.
- the phrase label is information indicating the part of speech, and is, for example, a POS tag.
- the POS tag is information indicating the part of speech.
- the parent node and the child node may have a hidden class of a corresponding element (such as a word).
- a hidden class is a group identifier when elements are grouped. Examples of parsing models are “0.01 S_ST-> NP_ST VP_SW” and “0.005 S_ST 1- > NP_ST 2 VP_SW 4 ”. The number added to the right side of the label indicates the hidden class.
- a hidden class is a group identifier when elements (such as words) are grouped.
- the phrase label is information that identifies the type of phrase, for example, “S” (indicating that it is a sentence), “VP” (indicating that it is a verb phrase), “NP” (that it is a noun phrase) For example).
- the rearrangement label is a label that can distinguish the first-type source language partial structure from the second-type source language partial structure.
- the first type of source language substructure is an order of two child nodes included in the target language substructure and an order of two child nodes included in the source language substructure corresponding to the target language substructure.
- the second type of source language partial structure is an order of two child nodes included in the target language partial structure and an order of two child nodes included in the source language partial structure corresponding to the target language partial structure. The same source language substructure.
- the rearrangement label is, for example, “_SW” which is a label indicating rearrangement, “_ST” which is a label indicating no rearrangement, or the like. Further, a rearrangement label may be added to both the first type source language substructure and the second type source language substructure, or a rearrangement label may be added to either one of the source language substructures. good.
- the rearrangement label is normally held by the parent node of the source language partial structure. “ST” is an abbreviation for “straight”, and “SW” is an abbreviation for “switch”.
- phrase label and the rearrangement label may be expressed together.
- the phrase label and the rearrangement label are expressed as “phrase label_rearrangement label”, for example, “S_ST”, “VP_SW”, “NP_ST”, and the like.
- S_ST indicates a source language partial structure that constitutes a sentence and cannot be rearranged.
- VP_SW is a source language partial structure that constitutes a verb phrase and indicates that the verb phrase is rearranged.
- NP_ST is a source language partial structure that constitutes a noun phrase, and indicates that it cannot be rearranged.
- the pre-ordering model is, for example, an analysis using CFG (context-free grammer) having probability information and ITG (inversion transduction grammar) ("Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23 (3): 377-403 ”)).
- This pre-ordering model is also called an ITG parsing model.
- the bilingual corpus 11 can store one or more bilingual sentences.
- the parallel translation sentence has a source language sentence and a target language sentence.
- This target language sentence is a translation result of the source language sentence.
- the source language sentence and the target language may be different languages, regardless of the language. However, in this case, it is preferable that the source language sentence and the target language are languages whose word order is significantly different, such as Japanese and English.
- the element pair storage unit 12 can store one or more element pairs.
- An element pair is a pair of a source language element and a target language element.
- An element is a part that constitutes a sentence, such as a word, a morpheme, or a phrase.
- An element may be two or more term strings or sentences.
- the element pair may hold information on the probability of correspondence between the source language element and the target language element.
- the element pair storage unit 12 may be called a so-called term dictionary.
- the statistical model storage unit 13 can store a statistical model of CFG rules.
- a CFG rule includes a parent node and two child nodes.
- the parent node here has a phrase label.
- the child node has a phrase label or a POS tag in the source language.
- the statistical model is, for example, a CFG model having probability information.
- a Pitman-Yor hereinafter referred to as “PY” as appropriate
- PY a Pitman-Yor
- For the PY process see “Jim Pitman and Marc Yor. 1997. The The two-parameter poisson-dirichlet distribution and derived from a stable subordinator. The Annals of Probability, 25 (2): 855-900.”
- R is a set of CFG rules
- L is a set of phrase labels in the target language syntax
- T is a set of POS tags in the source language.
- P (t) the probability “P (t)” of the derived tree t is expressed by the following Equation 1.
- the derived tree t is a syntactic structure of a tree structure of the source language.
- Equation 1 “x ⁇ ⁇ ” is the CFG rule, “c (x ⁇ ⁇ , t)” is the number of “x ⁇ ⁇ ” used in the derivation tree t, and “x ⁇ L” is the parent of the CFG rule.
- x)” of the node is a probability that ⁇ is generated when the phrase label “x” of the parent node is given.
- the designated phrase label is used as the phrase label of the parent node of the derived tree t.
- Equation 2 the PY model is a distribution of CFG rules, and is expressed as Equation 2.
- the back-off probability is a probability used when back-off smoothing is performed.
- Equation 3 “
- the CFG rule since the CFG rule has two child nodes, and the child node is a phrase label or a POS tag, the number of types of child nodes to be paired is “(
- the syntax analysis unit 14 acquires one or more binary trees of the target language sentence. This binary tree has one or more target language substructures.
- the syntax analysis unit 14 normally parses the target language sentence included in each of the one or more parallel translation sentences, and acquires one or more binary trees of the target language sentence.
- the syntax analysis unit 14 may be configured to send a target language sentence included in one or more parallel translation sentences to an external device and receive one or more binary trees of the target language sentence from the external device. .
- the parsing unit 14 is, for example, Berkeley parser (Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. 2006. Learning accurate, compact, and interpretable tree annotation, Inprocedings 433-440. , Sydney, Australia, July.
- the target language partial structure indicates the order of two or more elements constituting the target language sentence, and is a parent node having a phrase label and a child node of the parent node and is a phrase label of the target language or a source language POS tag or And two child nodes having elements in the source language.
- the binary tree of the target language sentence is a syntax tree based on the phrase structure grammar obtained by parsing the target language sentence, and is a tree structure having a maximum of two branches.
- the syntax tree based on the phrase structure grammar is composed of a subtree indicating the range of phrases and its phrase labels (phrase labels).
- the binary tree of the target language sentence has one or more target language partial structures.
- the target language substructure includes a parent node and two child nodes of the parent node.
- the parent node here has a phrase label.
- Child nodes have phrase labels or target language elements.
- the target language element may be associated with a POS tag that is information indicating the part of speech.
- the source language element acquisition unit 15 acquires one or more elements of the source language corresponding to one or more elements of the target language included in the target language sentence from the element pair storage unit 12.
- the source language element acquisition unit 15 is one or more elements constituting the source language sentence corresponding to the target language sentence, and is an end child node of one or more target language substructures included in the binary tree of the target language sentence.
- One or more elements in the source language corresponding to the language element are acquired from one or more element pairs in the element pair storage unit 12.
- the source language partial structure acquisition unit 16 acquires one or more source language partial structures. Specifically, the source language partial structure acquisition unit 16 applies the structure indicated by one or more target language partial structures included in the binary tree of the target language sentence to one or more elements of the source language constituting the source language sentence. Obtain one or more source language substructures.
- the source language partial structure acquired by the source language partial structure acquisition unit 16 is information indicating the order of two or more elements constituting the source language sentence.
- the source language partial structure includes a parent node and two child nodes. This parent node has a phrase label. Child nodes also have phrase labels or source language elements.
- the source language partial structure indicates the order of two or more elements constituting the source language sentence, and is a parent node having a phrase label and two child nodes of the parent node and having a phrase label or a source language POS tag. Including child nodes.
- the source language substructure acquisition unit 16 uses one or more element pairs in the element pair storage unit 12 to handle one or more target language substructure spans of the target language sentence binary tree.
- the span to be determined is determined for each source language substructure.
- the span indicates a range of two word positions.
- the source language substructure acquisition unit 16 writes the phrase label of the parent node of the corresponding target language substructure as the phrase label of the parent node of each source language substructure corresponding to each span.
- the span of the source language substructure is the span from the position corresponding to the leftmost element in the corresponding target language substructure to the position corresponding to the rightmost element in the corresponding target language substructure. is there.
- the source language partial structure acquisition unit 16 may not acquire a source language partial structure having a complete structure.
- the source language partial structure having an incomplete structure is, for example, a source language partial structure whose span is unknown or unclear, a source language partial structure including a parent node whose phrase label could not be determined, and the like.
- the case where the span is unknown or unclear is, for example, the case where there is an element of the source language sentence that does not correspond to the element constituting the target language sentence, or the case where the span is not retained due to a conflict.
- an unknown part usually occurs when the number of words is larger in the source language sentence than in the target language sentence.
- the source language partial structure acquisition unit 16 When there is an element of a source language sentence that does not correspond to an element constituting the target language sentence by using one or more element pairs of the element pair storage unit 12, the source language partial structure acquisition unit 16 Are included in adjacent spans using the statistical model stored in the statistical model storage unit 13. Then, the source language partial structure acquisition unit 16 determines the phrase label of the parent node of the source language partial structure corresponding to the span that includes the element that does not correspond using the statistical model of the statistical model storage unit 13. .
- the source language substructure acquisition unit 16 normally retains the span of the source language substructure only when no conflict occurs between the source language substructures. That is, the source language partial structure acquisition unit 16 normally does not hold conflicting spans.
- the conflict means a state where two or more source language spans partially overlap each other. In other words, the conflict is not a so-called nest but a crossed state.
- the element part of the source language sentence that does not correspond to the element constituting the target language sentence can be resolved as ambiguity can be avoided so as not to cause a conflict.
- the source language substructure acquisition unit 16 applies the loosest constraint to each derived tree when the span of the source language substructure is ambiguous. Applying the loosest constraint is to always apply the non-conflicting constraint when there are cases where the conflict may or may not occur depending on how the constraint is taken due to the ambiguity of the constraint.
- the source language partial structure acquisition unit 16 extracts the phrase label. For this extraction, for example, “sentence-level blocked Gibbs sampler (Trevor Cohn, Phil Brunsom, and Sharon Goldwater. 2010. Inducing Tree-Substitution Grammars. Journal of Machine Learning Research, 11: 3053-3096.”) Is used. The sampler performs the following two steps for each sentence. (1) Calculate internal probabilities from the bottom up. (2) Sample the tree top-down.
- the source language substructure acquisition unit 16 After the distribution of the PY model is constructed, for example, the source language substructure acquisition unit 16 performs CFG and CYK algorithms (Daniel H. Younger (1967). Recognition and parsing of context-free languages in time n3 Using. Information and Control 10 (2): 189-208), obtain one or more best source language substructures. Note that the CYK algorithm searches for the most likely syntax structure from among the syntax structures that satisfy the constraints of the source language partial structure span and phrase labels. This constraint is the same as the constraint used to construct a CFG containing probability information.
- the source language partial structure acquisition unit 161 constituting the source language partial structure acquisition unit 16 acquires one or more source language partial structures.
- the source language partial structure acquisition unit 161 may not be able to acquire a complete source language partial structure.
- the partial structure complementing unit 162 applies a statistical model to the source language partial structure.
- the statistical model of the storage unit 13 is applied to obtain a complete source language partial structure.
- the incomplete source language partial structure is as described above.
- the partial structure complementing unit 162 applies the statistical model to the source language partial structure, Determine the phrase label of the parent node of the source language substructure.
- the phrase label determination may be the writing of the determined phrase label as the phrase label of the parent node of the source language partial structure.
- the statistical model here is usually a statistical model of the CFG rule.
- the rearrangement unit 17 rearranges the order of two or more elements constituting the source language sentence using one or more source language partial structures indicating the order of the two or more elements constituting the source language sentence. And the rearrangement part 17 acquires one or more source language partial structures which are the result of rearrangement. Further, the rearrangement unit 17 usually rearranges the order of two or more elements constituting the source language sentence so that the order becomes closer to a predetermined condition with respect to the order of the elements of the target language sentence.
- the close order means that the order of the elements of the target language sentence is close to the order of the elements of the source language sentence.
- the rearrangement unit 17 rearranges sibling nodes of the source language partial structure so that, for example, “Kendall ⁇ ” between the source language partial structure and the target language partial structure after the rearrangement processing is maximized. , Or do not rearrange.
- “Kendall ⁇ ” is one type of rank correlation coefficient.
- the label assigning unit 18 assigns the rearrangement label to one or more source language partial structures, and acquires one or more labeled source language partial structures.
- the label assigning unit 18 usually assigns a label for rearrangement to one or more source language partial structures rearranged by the rearrangement unit 17, and acquires one or more labeled source language partial structures.
- the label assigning unit 18 adds the rearrangement label only to the first type source language partial structure.
- This rearrangement label is a label indicating that rearrangement is performed.
- the label assigning unit 18 adds a rearrangement label only to the second type source language partial structure, for example.
- This rearrangement label is a label indicating that no rearrangement is performed.
- the label assigning unit 18 adds a rearrangement label (for example, “_SW”) indicating rearrangement to the first type source language partial structure, and does not rearrange to the second type source language partial structure.
- a rearrangement label (for example, “_ST”) may be added.
- the model constructing unit 19 constructs one or more parsing models (for example, ITG parsing model) using the one or more labeled source language partial structures acquired by the label assigning unit 18.
- the model construction unit 19 can be realized by, for example, a model learning function of a berkeley parser.
- the accumulation unit 20 accumulates one or more syntax analysis models acquired by the model construction unit 19.
- the storage unit 20 normally stores one or more syntax analysis models in the recording medium 10.
- the recording medium 10, the bilingual corpus 11, the element pair storage unit 12, and the statistical model storage unit 13 are preferably non-volatile recording media, but can also be realized by volatile recording media.
- the process of storing the translated text in the translated corpus 11 or the like is not limited.
- the bilingual sentence or the like may be stored in the bilingual corpus 11 or the like via the recording medium, and the bilingual sentence or the like transmitted via the communication line or the like is stored in the bilingual corpus 11 or the like.
- the bilingual sentence or the like input via the input device may be stored in the bilingual corpus 11 or the like.
- Syntax analysis unit 14 source language element acquisition unit 15, source language partial structure acquisition unit 16, rearrangement unit 17, label assignment unit 18, model construction unit 19, storage unit 20, source language partial structure acquisition unit 161, and partial structure
- the complementing means 162 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the syntax analysis unit 14 or the like is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- Step S201 The parsing unit 14 substitutes 1 for the counter i.
- Step S202 The syntax analysis unit 14 determines whether or not the i-th parallel translation sentence exists in the parallel corpus 11. If the i-th parallel translation sentence exists, the process proceeds to step S203, and if the i-th parallel translation sentence does not exist, the process ends.
- Step S203 The parsing unit 14 reads the i-th target language sentence included in the i-th parallel translation sentence from the parallel corpus 11.
- Step S204 The syntax analysis unit 14 parses the target language sentence read out in step S203. Then, the syntax analysis unit 14 acquires a binary tree of the target language corresponding to the i-th target language sentence.
- the target language binary tree has one or more target language partial structures.
- Step S205 The source language element acquisition unit 15 reads the i-th source language sentence of the i-th parallel translation sentence from the parallel corpus 11.
- Step S206 The source language element acquisition unit 15 substitutes 1 for the counter j.
- Step S207 The source language element acquisition unit 15 determines whether or not the j-th terminal child node exists in the binary tree of the target language acquired in Step S204. If the j-th terminal child node exists, the process goes to step S208. If the j-th terminal child node does not exist, the process goes to step S210.
- the child node at the end is an element such as a word in the target language.
- the source language element acquisition unit 15 is an element such as a source language word corresponding to the j-th terminal child node (target language element), and the element included in the i-th source language sentence is an element. Obtained from the pair storage unit 12.
- Step S209 The source language element acquisition unit 15 increments the counter j by 1, and returns to Step S207.
- Step S210 The source language partial structure acquisition means 161 constituting the source language partial structure acquisition unit 16 assigns 1 to the counter j.
- Step S211 The source language partial structure acquisition unit 161 determines whether or not the jth target language partial structure exists in the binary tree of the target language acquired in Step S204. If the jth target language partial structure exists, the process proceeds to step S212. If the jth target language partial structure does not exist, the process proceeds to step S214.
- the source language partial structure acquisition unit 161 configures a source language partial structure corresponding to the jth target language partial structure.
- Step S213 The source language partial structure acquisition unit 161 increments the counter j by 1, and returns to Step S211.
- Step S214 The partial structure complementing means 162 substitutes 1 for the counter j.
- Step S215 The partial structure complementing means 162 determines whether or not the jth source language partial structure exists. If the jth source language partial structure exists, the process proceeds to step S216. If the jth source language partial structure does not exist, the process proceeds to step S219.
- Step S216 The partial structure complementing means 162 determines whether or not the jth source language partial structure is an incomplete source language partial structure. If it is an incomplete source language partial structure, go to step S217, and if it is not an incomplete source language partial structure, go to step S218.
- Step S21-7 The partial structure complementing means 162 changes the jth source language partial structure to a complete source language partial structure using a statistical model.
- Step S228 The partial structure complementing means 162 increments the counter j by 1, and returns to Step S215.
- Step S219 The rearrangement unit 17 assigns 1 to the counter j.
- Step S220 The rearrangement unit 17 determines whether or not the jth source language partial structure exists. If the jth source language partial structure exists, the process proceeds to step S221. If the jth source language partial structure does not exist, the process proceeds to step S224.
- Step S221 The rearrangement unit 17 determines whether or not the jth source language partial structure is a source language partial structure that needs to be rearranged. If it is determined that rearrangement is necessary, the process goes to step S222. If it is determined that rearrangement is not necessary, the process goes to step S223.
- Step S222 The rearrangement unit 17 rearranges sibling nodes of the jth source language partial structure.
- Step S223 The rearrangement unit 17 increments the counter j by 1, and returns to Step S220.
- Step S224 The label assigning unit 18 substitutes 1 for the counter j.
- Step S225 The label assigning unit 18 determines whether or not the jth source language partial structure exists. If the jth source language partial structure exists, the process proceeds to step S226, and if the jth source language partial structure does not exist, the process proceeds to step S230.
- Step S226) The label assigning unit 18 determines whether or not rearrangement has occurred in the jth source language partial structure. If rearrangement has occurred, go to step S227, and if rearrangement has not occurred, go to step S228.
- Step S227) The label assigning unit 18 adds a label indicating rearrangement (for example, “_SW”) to the jth source language partial structure.
- Step S228 The label assigning unit 18 adds a label (for example, “_ST”) indicating that no sorting is performed to the jth source language partial structure.
- a label for example, “_ST”
- Step S229) The label assigning unit 18 increments the counter j by 1, and returns to step S225.
- Step S230 The model construction unit 19 constructs one or more parsing models using the one or more labeled source language partial structures acquired by the label assigning unit 18. Then, the storage unit 20 stores one or more syntax analysis models acquired by the model construction unit 19 in the recording medium 10.
- Step S231 The syntax analysis unit 14 increments the counter i by 1, and returns to Step S202.
- the label assigning unit 18 adds a label indicating rearrangement (for example, “_SW”) or a label indicating not rearranging (for example, “_ST”) to the source language partial structure.
- the label assigning unit 18 may add the rearrangement label to some source language partial structures. It is possible to distinguish whether a certain source language partial structure is a source language partial structure to be rearranged even in a part of the source language partial structure to which the rearrangement label is added.
- the bilingual corpus 11 stores “Japanese sentence: he bought a new book yesterday, English sentence: he bought new books yesterday”.
- the element pair storage unit 12 stores data of a Japanese-English dictionary including a large number of element pairs having Japanese words and English words.
- the element pair is, for example, (he, he), (yesterday, yesterday), or the like.
- the statistical model storage unit 13 stores the above-described CFG rule statistical model.
- the learning device 1 operates as follows.
- the syntax analysis unit 14 of the learning device 1 reads the target language sentence “he bought new books yesterday” of the parallel corpus 11.
- the syntax analysis unit 14 parses the read target language sentence.
- the syntax analysis unit 14 acquires a binary tree of the target language corresponding to the target language sentence. This binary tree is shown in FIG.
- the binary tree includes a target language substructure having a parent node “S” and child nodes “he” and “VP”, a target language substructure having a parent node “VP” and child nodes “VP” and “yesterday”, and a parent node “ A target language substructure having a VP ”and child nodes“ bought ”and“ NP ”, and a target language substructure having a parent node“ NP ”and child nodes“ new ”and“ books ”.
- the syntax analysis unit 14 is, for example, a Berkeley parser.
- the source language element acquisition unit 15 reads the source language sentence “Japanese sentence: he bought a new book yesterday” from the parallel corpus 11.
- the source language element acquisition unit 15 is a source language word corresponding to a terminal child node (a target language word) in the target language binary tree, and a word included in the source language sentence is converted into an element pair. Obtained using the storage unit 12. That is, as shown in FIG. 5, the source language element acquisition unit 15 acquires “he” in association with “he”, acquires “bought” in association with “bought”, and sets “new”. “New publication” is acquired in association with it, “book” is acquired in association with “books”, and “yesterday” is acquired in association with “yesterday”.
- the source language partial structure acquisition unit 161 applies the structure indicated by each of the one or more target language partial structures to one or more elements of the source language constituting the source language sentence, thereby obtaining the one or more source language partial structures. get.
- the concept of one or more source language partial structures is shown in FIG.
- the partial structure complementing means 162 changes the incomplete source language partial structure to a complete source language partial structure using a statistical model.
- the partial structure complementing unit 162 includes, for example, “ha” that does not correspond to a word in the target language in an adjacent span (“hi” span). Then, the partial structure complementing unit 162 acquires the phrase label “NP” of the parent node corresponding to the child node “he” and the child node “ha” using a statistical model. Then, the source language partial structure (subtree) shown in FIG. 7 is obtained.
- the partial structure complementing means 162 includes, for example, “no” that does not correspond to the word in the target language in the adjacent span (the span of “new publication”). Then, the partial structure complementing means 162 acquires the phrase label “PP” of the parent node corresponding to the child node “new issue” and the child node “no” using the statistical model.
- the partial structure complementing means 162 includes, for example, “no” that does not correspond to the word in the target language in the adjacent span (the span of the “new book”). Then, the partial structure complementing means 162 acquires the parent node “NP” corresponding to the “new book” and the phrase label “NP” of the parent node having the node “O” as a child node using a statistical model.
- the source language partial structure acquisition unit 16 obtains a binary tree having one or more source language partial structures shown in FIG.
- This binary tree consists of a source language substructure having a parent node “S” and child nodes “NP” and “VP”, a source language substructure having a parent node “NP” and child nodes “he” and “ha”, a parent node “ Source language substructure with VP and child nodes “Yesterday” “VP”, Source language substructure with parent node “VP” and child nodes “NP” “VP”, parent node “NP” and child node “NP” Source language substructure with “O”, parent node “NP” and source nodes with child nodes “PP” “book”, source node with parent node “PP” and child nodes “new” “no” Includes a source language substructure with structure, parent node “VP” and child nodes “buy” “ta”.
- the rearrangement unit 17 checks whether or not rearrangement of sibling nodes is necessary so that one or more source language partial structures are close to the word order of the target language.
- the rearrangement unit 17 determines that the source language partial structures need to be rearranged spanning “Yesterday” and “I bought a new book”. Further, the rearrangement unit 17 determines and rearranges that the source language partial structure having the span of “new book” and “buy” is necessary. Then, the rearrangement unit 17 obtains a binary tree having the source language partial structure shown in FIG. In this binary tree, “ ⁇ ” (81, 82) is expressed in the parent node of the rearranged source language partial structure.
- the label assigning unit 18 determines whether or not rearrangement has occurred for one or more source language partial structures. Then, the label assigning unit 18 adds a label (here, “_SW”) indicating rearrangement to the source language partial structure in which the rearrangement has occurred. Further, the label assigning unit 18 adds a label (“_ST”) indicating that no sorting is performed to the source language partial structure in which no sorting has occurred. Then, the label assigning unit 18 obtains one or more labeled source language partial structures shown in FIG. The labeled source language substructure included in FIG.
- the accumulation unit 20 accumulates the binary tree (see FIG. 10) of the source language sentence having the labeled source language partial structure acquired by the label assigning unit 18.
- the above processing is executed for all the parallel translation sentences in the parallel translation corpus 11. And the learning apparatus 1 can learn a pre-ordering model.
- the order of two or more elements constituting the source language sentence is rearranged so that the order becomes closer to a predetermined condition with respect to the order of the elements of the target language sentence.
- this software may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded on a recording medium such as a CD-ROM and distributed. This also applies to other embodiments in this specification.
- achieves the learning apparatus 1 in this Embodiment is the following programs.
- this program is a computer-accessible recording medium in which a bilingual corpus that can store one or more parallel translations having a source language sentence and a target language sentence that is a translation result of the source language sentence, An element pair storage unit that can store one or more element pairs that are pairs of elements and elements of the target language, and the computer parses the target language sentence included in each of the one or more parallel translation sentences.
- Yes indicates the order of two or more elements constituting the target language sentence, and is a parent node having a phrase label and a child node of the parent node, and has a target language phrase label, source language POS tag, or source language element
- a source language element acquisition unit that acquires from a pair of elements and a structure indicated by one or more target language partial structures of a binary tree of the target language sentence are applied to one or more elements of the source language that constitute the source language sentence And an order of two or more elements constituting the source language sentence, including a parent node having a phrase label and two child nodes that
- Source language Rearrangement that is a label that can distinguish the substructure from the source language substructure in which the order of the two child nodes included in the target language substructure and the order of the two child nodes included in the corresponding source language substructure are the same
- a label-attached source language is provided using a label assigning unit that assigns a label to the one or more source language substructures and obtains one or more labeled source language substructures, and the one or more labeled source language substructures.
- the computer uses the one or more source language partial structures indicating the order of the two or more elements constituting the source language sentence, and a condition predetermined for the order of the elements of the target language sentence
- the label assigning unit further functions as a rearrangement unit that acquires one or more source language partial structures in which the order of two or more elements constituting the source language sentence is rearranged so that the order becomes closer to satisfy It is preferable to cause a computer to function as one that provides the label for rearrangement to one or more source language partial structures rearranged by the rearrangement unit.
- the recording medium stores statistics for a CFG rule statistical model including a parent node having a phrase label and two child nodes that are child nodes of the parent node and have a phrase label or a POS tag in the source language.
- a model storage unit wherein the source language partial structure acquisition unit has one or more source languages constituting the source language sentence, the structure indicated by the one or more target language partial structures included in the binary tree of the target language sentence An order of two or more elements constituting a source language sentence, and a parent node having a phrase label and two child nodes that are child nodes of the parent node and having a phrase label or source language element;
- Source language partial structure acquisition means for acquiring one or more source language partial structures having the above and one or more source language partial structures acquired by the source language partial structure acquisition means Of these, if there is an incomplete source language partial structure, the statistical model is applied to the source language partial structure, and the partial structure complementing means for obtaining a complete source language partial structure is to be provided. It is preferable to make a computer
- FIG. 11 is a block diagram of translation apparatus 2 in the present embodiment.
- the translation device 2 includes a binary tree storage unit 21, an element pair storage unit 22, a reception unit 23, a morpheme analysis unit 24, a labeled source language partial structure acquisition unit 25, a translation rearrangement unit 26, a search unit 27, and an output unit 28. Is provided.
- the binary tree storage unit 21 stores one or more parsing models.
- the parsing model has one or more labeled source language substructures.
- the one or more syntax analysis models are one or more syntax analysis models accumulated by the learning apparatus 1 described in the first embodiment.
- the binary tree storage unit 21 may be the same as the recording medium 10.
- the element pair storage unit 22 can store one or more element pairs that are pairs of source language elements and target language elements.
- the accepting unit 23 accepts a source language sentence.
- the source language sentence is a source language sentence to be translated.
- accept refers to accepting information input from an input device such as a keyboard, mouse, or touch panel, accepting a source language sentence as a speech recognition result, and information transmitted via a wired or wireless communication line. And receiving information read from a recording medium such as an optical disk, a magnetic disk, or a semiconductor memory.
- the source language sentence input means may be anything such as a keyboard, mouse, touch panel, or menu screen.
- the receiving unit 23 can be realized by a device driver for input means such as a keyboard, control software for a menu screen, or the like.
- the morpheme analysis unit 24 performs morphological analysis on the source language sentence received by the reception unit 23 and acquires two or more elements associated with the POS tag.
- the POS tag is information indicating the part of speech.
- the labeled source language partial structure acquisition unit 25 acquires one or more labeled source language partial structures from two or more elements of the source language sentence received by the reception unit 23 using one or more parsing models. .
- the labeled source language substructure acquisition unit 25 uses an ITG parsing model to generate an existing parsing algorithm (for example, Berkeley parser (Improved inference for unlexicalized parsing. In NAACL-HLT, pages 404-411, Rochester, New York, April. Association for Computational Linguistics.)
- the source language sentence structure is obtained by parsing the source language sentence.
- the ITG parsing model uses one or more parsing models, and the morphological analyzer 24 It is constructed by learning from the acquired POS tag and word sequence.
- the translation reordering unit 26 determines the order of the two child nodes included in the target language partial structure and the source of the reordered labels of the one or more labeled source language partial structures acquired by the labeled source language partial structure acquisition unit 25. Processing for rearranging the order of two child nodes of the labeled source language substructure corresponding to the rearrangement label when the rearrangement label indicates that the order of the two child nodes included in the language substructure is different To obtain two or more elements of the source language after the sorting. Needless to say, the translation rearrangement unit 26 does not rearrange the order of two child nodes of all labeled source language substructures.
- the translation rearrangement unit 26 does not rearrange the order of the two child nodes included in the source language partial structure corresponding to the label.
- the two or more elements in the source language after being rearranged may include elements that are not rearranged.
- the two or more elements in the source language are elements corresponding to the end node of the binary tree.
- the search unit 27 acquires, from the element pair storage unit 22, two or more elements of the target language corresponding to each of two or more elements of the source language acquired by the translation sorting unit.
- the two or more elements of the target language are 2 of the target language arranged in an order corresponding to the order of the two or more elements constituting the source language sentence after being rearranged by the translation rearranging unit 26. These are the elements.
- the output unit 28 outputs a target language sentence composed of two or more elements acquired by the search unit 27.
- output refers to display on a display, projection using a projector, printing with a printer, audio output, transmission to an external device, storage in a recording medium, other processing device or other program, etc. It is a concept that includes delivery of processing results.
- the binary tree storage unit 21 and the element pair storage unit 22 are preferably non-volatile recording media, but can also be realized by volatile recording media.
- the process of storing the binary tree or the like in the binary tree storage unit 21 or the like is not limited.
- a binary tree or the like may be stored in the binary tree storage unit 21 or the like via a recording medium, and a binary tree or the like transmitted via a communication line or the like is stored in the binary tree storage unit 21 or the like.
- a binary tree or the like input via an input device may be stored in the binary tree storage unit 21 or the like.
- the morphological analysis unit 24, the labeled source language partial structure acquisition unit 25, the translation rearrangement unit 26, and the search unit 27 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the morphological analysis unit 24 and the like is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the output unit 28 may or may not include an output device such as a display or a speaker.
- the output unit 28 may be realized by driver software for an output device or driver software for an output device and an output device.
- Step S1201 The receiving unit 23 determines whether or not a source language sentence has been received. If the source language sentence is accepted, the process proceeds to step S1202, and if the source language sentence is not accepted, the process returns to step S1201.
- Step S1202 The morphological analysis unit 24 performs morphological analysis on the source language sentence received by the reception unit 23, and acquires two or more elements associated with the POS tag.
- the labeled source language partial structure acquisition unit 25 uses one or more syntax analysis models of the source language sentence, and at least one of two or more elements associated with the POS tag acquired by the morpheme analysis unit 24. Get the labeled source language substructure of.
- Step S1204 The translation rearrangement unit 26 substitutes 1 for the counter i.
- Step S1205 The translation rearrangement unit 26 determines whether or not the i-th labeled source language partial structure exists. If the i-th labeled source language partial structure exists, the process proceeds to step S1206. If the i-th labeled source language partial structure does not exist, the process proceeds to step S1209.
- Step S1206 The translation rearrangement unit 26 determines whether or not the rearrangement label included in the i-th labeled source language partial structure is a label indicating rearrangement of sibling nodes. If it is a label indicating rearrangement, the process proceeds to step S1207, and if it is not a label indicating rearrangement, the process proceeds to step S1208.
- Step S1207 The translation rearrangement unit 26 rearranges sibling nodes of the i-th labeled source language partial structure.
- Step S1208 The translation rearrangement unit 26 increments the counter i and returns to step S1205.
- Step S1209) 1 is assigned to the counter i.
- Step S1210 The search unit 27 has an i-th terminal node (i-th element) of a binary tree composed of one or more labeled source language partial structures that have been processed by the translation translating unit 26. Judge whether to do. If the i-th element exists, the process goes to step S1211, and if the i-th element does not exist, the process goes to step S1213.
- Step S ⁇ b> 1211 The search unit 27 acquires the target language element corresponding to the i-th element from the element pair storage unit 22.
- the acquisition of the target language element is performed in the order of the elements constituting the sentence.
- Step S1212 The search unit 27 increments the counter i and returns to step S1210.
- Step S1213 The output unit 28 outputs a target language sentence including two or more elements acquired by the search unit 27 in step S1211, and returns to step S1201.
- the target language sentence is a sentence in which elements are arranged in the order acquired by the search unit 27.
- the process is ended by powering off or interruption for aborting the process.
- translation apparatus 2 performs Japanese-English translation.
- the translation device 2 is assumed to be a machine translation device that performs statistical translation, for example.
- the translation device 2 preferably performs statistical translation, but may be a device that performs machine translation by other methods.
- the binary tree storage unit 21 stores a large number of binary trees as shown in FIG.
- the binary tree is a binary tree of source language sentences having one or more labeled source language partial structures.
- the morphological analysis unit 24 performs a morphological analysis on the original language sentence “you bought a new book yesterday” received by the reception unit 23, and acquires two or more elements associated with the POS tag.
- the labeled source language partial structure acquisition unit 25 uses one or more binary trees of the source language sentence to output one or more labels from two or more elements associated with the POS tag acquired by the morpheme analysis unit 24. Get attached language substructure.
- the binary tree having one or more labeled source language partial structures acquired by the labeled source language partial structure acquisition unit 25 is a binary tree as shown in FIG.
- the translation reordering unit 26 indicates that among the labeled source language partial structures of the binary tree as shown in FIG. 13, the rearrangement label of the labeled source language partial structure rearranges sibling nodes. Replace the sibling node for the labeled source language substructure. Then, as shown at 141 in FIG. 14, a sequence of the end elements of the binary tree (a sequence of source language elements) “You bought the newly published book yesterday” is obtained.
- the search unit 27 acquires, from the element pair storage unit 22, elements in the target language corresponding to the respective elements 141 in FIG. Then, the search unit 27 obtains the target language sentence “you bought new books yesterday” (142 in FIG. 14).
- the output unit 28 outputs a target language sentence “you bought new books yesterday” composed of two or more elements acquired by the search unit 27.
- the learning data and development data in NTCIR-9 and NTCIR-10 are the same, but the test data are different.
- the learning data was about 3.18 million dialogue sentences, and the development data was about 2000 translations.
- the test data was 2000 sentences for NTCIR-9 and 2300 sentences for NTCIR-10.
- MeCab was used as a Japanese morphological analyzer. As in English, alphanumeric tokenization was applied in Japanese.
- the translation model used was an English sentence that was learned using a parallel translation with a length of 40 words or less. As a result, about 2,060,000 parallel translations were used for learning translation models.
- GIZA ++ and growdiag-final-and heuristics were used to construct word alignment (the element pair storage unit 12 described above).
- word alignment the element pair storage unit 12 described above.
- the English articles (a, an, the) and Japanese particles (gahahahaha) were deleted before word alignment.
- the removed word was stored in its original position.
- SMT weight parameters are based on MRT (“Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. Ince Proceedings of the 41st Annual Meeting of the Association for-160, ))).
- MRT Freanz Josef Och. 2003. Minimum error rate training in statistical machine translation. Ince Proceedings of the 41st Annual Meeting of the Association for-160, )
- the weight parameter was tuned three times by MERT using the first half of the development data.
- the SMT weight parameter set having the highest score was selected from the three weight parameter sets.
- This method (translation device 2 method) is also referred to as “PROPOSED”.
- the full binary tree structure of the source language which is the learning data of the preordering model of this method, was composed of 200,000 source language sentences. These 200,000 source language sentences were selected by the following process. First, the learning sentences in the source language are sorted based on the coverage rate of the span of the source language obtained from the syntactic structure of the target language through word alignment. Next, the top 200,000 unique source language sentences are selected. In order to construct a full binary tree structure, the process was repeated 20 times using Gibbs sampler. Here, the coverage rate is calculated by “number of projection spans / (number of words in the sentence ⁇ 1)”.
- the distortion threshold was set to 6. Note that the distortion threshold limits the relative position between the last translated phrase and the next translated phrase in the input sentence when generating sentences in the target language continuously from the left. This is a threshold value used for limiting the translation so that the absolute value of “the leftmost word position of the phrase to be translated—the rightmost word position of the last translated phrase ⁇ 1” is equal to or less than the threshold value. When this value is small (for example, 6), long distance word order rearrangement is not performed during translation.
- Phrase-based SMT (PBMT L ) using a word rearrangement model ("Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007.
- PBMT L Phrase-based SMT
- HPBMT Hierarchical phrase-based SMT
- LADER pre-ordering model is the same 200,000 source language sentences as the learning data for the pre-ordering model of this method.
- Japanese POS tags are generated by MeCab. In learning the LADER pre-ordering model, repeated learning was repeated 100 times.
- the strain threshold of PBMT L was set to 20. There is no maximum chart span limit for HPBMT and SBMT. Refer to the Moses decoder manual for the maximum chart span.
- the strain threshold of LADER was set to 6. The default value is a parameter used in other systems.
- FIG. 15 shows the RIBES score and BLEU score in each method when NTCIR-9 and NTCIR-10 data are used.
- FIG. 15 shows that the proposed method (the method of the translation apparatus 2) records the best score and is superior to the other methods.
- RIBES is sensitive to global word order
- BLEU is sensitive to local word order. This experiment confirmed that this method is effective for global and local word ordering.
- this method was compared with the other three methods (PBMT L , HPBMT, PBMT D ) that simultaneously perform word selection and rearrangement without using a parser.
- PBMT L the other three methods
- HPBMT the present method exceeded the other three methods in both NTCIR-9 and NTCIR-10 data and in both RIBES and BLEU scores.
- this method was compared with the method using the target language parser (SBMT).
- SBMT target language parser
- this program is a computer-accessible recording medium that includes at least one source language sentence having one or more labeled source language substructures accumulated by the learning device according to any one of claims 1 to 3.
- a binary tree storage unit storing a tree, and an element pair storage unit capable of storing one or more element pairs that are pairs of source language elements and target language elements.
- the attached source language substructure acquisition unit and the reordering labels of the one or more labeled source language substructures include the order of two child nodes included in the target language substructure and the two children included in the source language substructure.
- a process of rearranging the order of the two child nodes of the labeled source language partial structure corresponding to the sort label is performed, and the sorted source language
- a translation rearrangement unit that obtains two or more elements of the target language, and two or more element ⁇ in the target language corresponding to each of the two or more elements of the source language obtained by the translation rearrangement unit from the element pair storage unit
- the source language elements included in the one or more labeled source language substructures constituting the binary tree of the one or more source language sentences are associated with POS tags, and the computer accepts the computer.
- the source language sentence structure is further functioned as a morpheme analysis unit that acquires two or more elements associated with the POS tag, and the labeled source language partial structure acquisition unit performs the morphological analysis of the one or more source language sentences. It is preferable to make a computer function as a binary tree that acquires one or more labeled source language substructures from two or more elements associated with the POS tag.
- FIG. 16 shows the external appearance of a computer that implements the learning device 1 or the translation device 2 according to various embodiments described above by executing the program described in the present invention.
- the above-described embodiments can be realized by computer hardware and a computer program executed thereon.
- FIG. 16 is an overview diagram of the computer system 300
- FIG. 17 is a block diagram of the system 300.
- a computer system 300 includes a computer 301 including a CD-ROM drive, a keyboard 302, a mouse 303, and a monitor 304.
- the computer 301 includes an MPU 3013, a bus 3014 connected to the MPU 3013 and the CD-ROM drive 3012, a ROM 3015 for storing a program such as a bootup program, and an MPU 3013. And a RAM 3016 for temporarily storing instructions of the application program and providing a temporary storage space, and a hard disk 3017 for storing the application program, the system program, and data.
- the computer 301 may further include a network card that provides connection to the LAN.
- a program that causes the computer system 300 to execute the functions of the learning device of the above-described embodiment may be stored in the CD-ROM 3101, inserted into the CD-ROM drive 3012, and further transferred to the hard disk 3017.
- the program may be transmitted to the computer 301 via a network (not shown) and stored in the hard disk 3017.
- the program is loaded into the RAM 3016 at the time of execution.
- the program may be loaded directly from the CD-ROM 3101 or the network.
- the program does not necessarily include an operating system (OS) or a third-party program that causes the computer 301 to execute the functions of the learning device according to the above-described embodiment.
- the program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 300 operates is well known and will not be described in detail.
- the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.
- each process and each function may be realized by centralized processing by a single device or a single system, or by distributed processing by a plurality of devices. It may be realized.
- the present invention is not limited to the above-described embodiment, and can be variously modified without departing from the scope of the invention in the implementation stage, and these are also included in the scope of the present invention. Needless to say.
- the embodiments include inventions at various stages, and various inventions can be extracted by appropriately combining a plurality of disclosed constituent elements. For example, even if some constituent requirements are deleted from all the constituent requirements shown in the embodiment, the problem described in the column of the problem to be solved by the invention can be solved, and the effect described in the column of the effect of the invention Can be obtained as an invention.
- the learning device according to the present invention has an effect that translation with high accuracy is possible, and is useful as a machine translation device or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
Description
本実施の形態において、プレオーダリングのモデルを学習する学習装置について説明する。また、本実施の形態において、翻訳時に一定以内の語順変更となるような制約に基づいて、プレオーダリングのモデルを学習する学習装置について説明する。さらに、本実施の形態において、統計モデルを利用する学習装置について説明する。
モデル構築部19は、ラベル付与部18が取得した1以上のラベル付き原言語部分構造を用いて、1以上の構文解析モデル(例えば、ITGパージングモデル)を構築する。モデル構築部19は、例えば、berkeley parserのモデル学習機能により実現され得る。
本実施の形態において、実施の形態1で説明した学習装置1により学習したプレオーダリングモデルを利用して機械翻訳を行う翻訳装置について説明する。
形態素解析部24は、例えば、Chasen(URL:http://chasen.aist-nara.ac.jp/index.php?cmd=read&page=ProjectPractice2005&word=%A3%C3%A3%E8%A3%E1%A3%F3%A3%E5%A3%EE参照)やMeCab(URL:http://mecab.sourceforge.net/参照)等である。なお、形態素解析技術は公知技術であるので、詳細な説明は省略する。
(実験結果)
source toolkit for statistical machine translation. In Proceedings of the ACL Demo and Poster Sessions,pages 177-180.」参照)」を使用した。この場合、ひずみ(distortion)の閾値を6とした。なお、ひずみの閾値とは、ターゲット言語の文を左から連続的に生成する際に、入力文中において最後に翻訳したフレーズと次に翻訳するフレーズとの相対位置を制限するものであり、「次に翻訳するフレーズの左端の単語位置-最後に翻訳したフレーズの右端の単語位置-1」の絶対値が閾値以下になるように翻訳を制限する際に用いられる閾値である。この値が小さい(例えば6)場合には、翻訳時に長距離の語順並べ替えが行われない。
(1)単語並び替えモデルを用いたフレーズベースSMT (PBMTL)(「Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation.」参照)
(2)階層的フレーズベースSMT(HPBMT) (「David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics, 33(2):201-228.」参照)
(3)String-to-tree文法ベース SMT(SBMT) (「Hieu Hoang, Philipp Koehn, and Adam Lopez. 2009.A Unified Framework for Phrase Based, Hierarchical, and Syntax Based Statistical Machine Translation. In Proceedings of IWSLT 2009, pages 152-159.」参照)
(4)ひずみモデルを用いたフレーズベースSMT(PBMTD) (「Isao Goto, Masao Utiyama, Eiichiro Sumita, Akihiro Tamura, and Sadao Kurohashi. 2013b. Distortion model considering rich context for statistical machine translation. In Proceedings of the 51th Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August. Association for Computational Linguistics.」参照)
(5)構文解析器を用いないプレオーダリング方法(LADER) (「Graham Neubig, Taro Watanabe, and Shinsuke Mori. 2012. Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 843-853, Jeju Island, Korea, July. Association for Computational Linguistics.」参照)
MSD双方向単語並び替えモデルはすべてのデータを用いて構築され、翻訳モデルの構築のために使用した。なお、MSD双方向単語並び替えモデルについては、Mosesデコーダの説明書を参照のこと。
なお、本願発明は、上記の実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能であり、それらも本発明の範囲内に包含されるものであることは言うまでもない。
2 翻訳装置
10 記録媒体
11 対訳コーパス
12、22 要素対格納部
13 統計モデル格納部
14 構文解析部
15 原言語要素取得部
16、25 原言語部分構造取得部
17 並替部
18 ラベル付与部
19 モデル構築部
20 蓄積部
21 二分木格納部
23 受付部
24 形態素解析部
26 翻訳並替部
27 検索部
28 出力部
161 原言語部分構造取得手段
162 部分構造補完手段
Claims (4)
- 原言語文と当該原言語文の翻訳結果である目的言語文とを有する1以上の対訳文を格納し得る対訳コーパスと、
原言語の要素と目的言語の要素との対である1以上の要素対を格納し得る要素対格納部と、
前記1以上の各対訳文が有する目的言語文を構文解析した結果であり、目的言語文を構成する2以上の要素の順序を示し、かつフレーズラベルを有する親ノードと当該親ノードの子ノードであり目的言語のフレーズラベルまたは原言語POSタグまたは原言語の要素を有する2つの子ノードとを含む1以上の目的言語部分構造を有する目的言語文の二分木を取得する構文解析部と、
前記目的言語文に対応する原言語文を構成する1以上の要素であり、前記目的言語文の二分木が有する1以上の目的言語部分構造の終端の子ノードである目的言語の要素に対応する原言語の1以上の要素を、前記要素対格納部の1以上の要素対から取得する原言語要素取得部と、
前記目的言語文の二分木が有する1以上の目的言語部分構造が示す構造を、前記原言語文を構成する原言語の1以上の要素に適用し、前記原言語文を構成する2以上の要素の順序を示し、フレーズラベルを有する親ノードと当該親ノードの子ノードでありフレーズラベルまたは原言語のPOSタグを有する2つの子ノードとを含む1以上の原言語部分構造を取得する原言語部分構造取得部と、
目的言語部分構造に含まれる2つの子ノードの順序と対応する原言語部分構造に含まれる2つの子ノードの順序とが異なる原言語部分構造と、目的言語部分構造に含まれる2つの子ノードの順序と対応する原言語部分構造に含まれる2つの子ノードの順序とが同じ原言語部分構造とを区別可能なラベルである並び替えラベルを、前記1以上の原言語部分構造に付与し、1以上のラベル付き原言語部分構造を取得するラベル付与部と、
前記1以上のラベル付き原言語部分構造を用いて、ラベル付き原言語部分構造の出現し易さを示す確率情報を有する1以上の構文解析モデルを構築するモデル構築部と、
前記モデル構築部が構築した1以上の構文解析モデルを蓄積する蓄積部とを具備する学習装置。 - 請求項1記載の学習装置が蓄積した1以上の構文解析モデルを格納している二分木格納部と、
原言語の要素と目的言語の要素との対である1以上の要素対を格納し得る要素対格納部と、
原言語文を受け付ける受付部と、
前記1以上の構文解析モデルを用いて、前記受付部が受け付けた原言語文が有する2以上の各要素から1以上のラベル付き原言語部分構造を取得するラベル付き原言語部分構造取得部と、
前記1以上のラベル付き原言語部分構造が有する並び替えラベルが、目的言語部分構造に含まれる2つの子ノードの順序と原言語部分構造に含まれる2つの子ノードの順序とが異なることを示す並び替えラベルである場合、当該並び替えラベルに対応するラベル付き原言語部分構造が有する2つの子ノードの順序を並び替える処理を行い、並び替えられた後の原言語の2以上の要素を取得する翻訳並替部と、
前記並び替えられた後の原言語の2以上の各要素に対応する目的言語の2以上の要素を、前記要素対格納部から取得する検索部と、
前記並び替えられた後の原言語の2以上の要素の順序と対応する目的言語の2以上の要素とが、同じ順序に並べられている前記検索部が取得した目的言語の2以上の要素からなる目的言語文を出力する出力部とを具備する翻訳装置。 - 記録媒体は、
原言語文と当該原言語文の翻訳結果である目的言語文とを有する1以上の対訳文を格納し得る対訳コーパスと、
原言語の要素と目的言語の要素との対である1以上の要素対を格納し得る要素対格納部とを具備し、
構文解析部、原言語要素取得部、原言語部分構造取得部、ラベル付与部、モデル構築部、および蓄積部により実現される学習方法であって、
前記構文解析部が、前記1以上の各対訳文が有する目的言語文を構文解析した結果であり、目的言語文を構成する2以上の要素の順序を示し、かつフレーズラベルを有する親ノードと当該親ノードの子ノードでありフレーズラベルまたは目的言語の要素を有する2つの子ノードとを含む1以上の目的言語部分構造を有する目的言語文の二分木を取得する構文解析ステップと、
前記原言語要素取得部が、前記目的言語文に対応する原言語文を構成する1以上の要素であり、前記目的言語文の二分木が有する1以上の目的言語部分構造の終端の子ノードである目的言語の要素に対応する原言語の1以上の要素を、前記要素対格納部の1以上の要素対から取得する原言語要素取得ステップと、
前記原言語部分構造取得部が、前記目的言語文の二分木が有する1以上の目的言語部分構造が示す構造を、前記原言語文を構成する原言語の1以上の要素に適用し、前記原言語文を構成する2以上の要素の順序を示し、フレーズラベルを有する親ノードと当該親ノードの子ノードでありフレーズラベルまたは原言語の要素を有する2つの子ノードとを含む1以上の原言語部分構造を取得する原言語部分構造取得ステップと、
前記ラベル付与部が、目的言語部分構造に含まれる2つの子ノードの順序と対応する原言語部分構造に含まれる2つの子ノードの順序とが異なる原言語部分構造と、目的言語部分構造に含まれる2つの子ノードの順序と対応する原言語部分構造に含まれる2つの子ノードの順序とが同じ原言語部分構造とを区別可能なラベルである並び替えラベルを、前記1以上の原言語部分構造に付与し、1以上のラベル付き原言語部分構造を取得するラベル付与ステップと、
前記モデル構築部が、前記1以上のラベル付き原言語部分構造を用いて、ラベル付き原言語部分構造の出現し易さを示す確率情報を有する1以上の構文解析モデルを構築するモデル構築ステップと、
前記蓄積部が、前記モデル構築ステップで構築された1以上の構文解析モデル蓄積する蓄積ステップとを具備する学習方法。 - 記録媒体は、
請求項1記載の学習装置が蓄積した1以上の構文解析モデルを格納している二分木格納部と、
原言語の要素と目的言語の要素との対である1以上の要素対を格納し得る要素対格納部とを具備し、
受付部、ラベル付き原言語部分構造取得部、翻訳並替部、検索部、および出力部により実現される翻訳方法であって、
前記受付部が、原言語文を受け付ける受付ステップと、
前記ラベル付き原言語部分構造取得部が、前記構文解析モデルを用いて、前記受付ステップで受け付けられた原言語文が有する2以上の各要素から1以上のラベル付き原言語部分構造を取得するラベル付き原言語部分構造取得ステップと、
前記翻訳並替部が、前記1以上のラベル付き原言語部分構造が有する並び替えラベルが、目的言語部分構造に含まれる2つの子ノードの順序と原言語部分構造に含まれる2つの子ノードの順序とが異なることを示す並び替えラベルである場合、当該並び替えラベルに対応するラベル付き原言語部分構造が有する2つの子ノードの順序を並び替える処理を行い、並び替えられた後の原言語の2以上の要素を取得する翻訳並替ステップと、
前記検索部が、前記翻訳並替ステップで取得された原言語の2以上の各要素に対応する目的言語の2以上の要素を、前記要素対格納部から取得する検索ステップと、
前記出力部が、前記検索ステップで取得された2以上の要素からなる目的言語文を出力する出力ステップとを具備する翻訳方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020167014792A KR20160093011A (ko) | 2013-12-04 | 2014-12-04 | 학습 장치, 번역 장치, 학습 방법 및 번역 방법 |
US15/101,266 US9779086B2 (en) | 2013-12-04 | 2014-12-04 | Learning apparatus, translation apparatus, learning method, and translation method |
EP14866862.7A EP3079075A4 (en) | 2013-12-04 | 2014-12-04 | Learning device, translation device, learning method, and translation method |
CN201480064778.7A CN105849718B (zh) | 2013-12-04 | 2014-12-04 | 学习装置、翻译装置、学习方法以及翻译方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2013251524A JP5843117B2 (ja) | 2013-12-04 | 2013-12-04 | 学習装置、翻訳装置、学習方法、翻訳方法、およびプログラム |
JP2013-251524 | 2013-12-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015083762A1 true WO2015083762A1 (ja) | 2015-06-11 |
Family
ID=53273522
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/082058 WO2015083762A1 (ja) | 2013-12-04 | 2014-12-04 | 学習装置、翻訳装置、学習方法、および翻訳方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US9779086B2 (ja) |
EP (1) | EP3079075A4 (ja) |
JP (1) | JP5843117B2 (ja) |
KR (1) | KR20160093011A (ja) |
CN (1) | CN105849718B (ja) |
WO (1) | WO2015083762A1 (ja) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6613666B2 (ja) * | 2015-07-10 | 2019-12-04 | 日本電信電話株式会社 | 単語並べ替え学習装置、単語並べ替え装置、方法、及びプログラム |
JP6590723B2 (ja) * | 2016-02-12 | 2019-10-16 | 日本電信電話株式会社 | 単語並べ替え学習方法、単語並べ替え方法、装置、及びプログラム |
US20170308526A1 (en) * | 2016-04-21 | 2017-10-26 | National Institute Of Information And Communications Technology | Compcuter Implemented machine translation apparatus and machine translation method |
JP6930179B2 (ja) * | 2017-03-30 | 2021-09-01 | 富士通株式会社 | 学習装置、学習方法及び学習プログラム |
CN110895660B (zh) * | 2018-08-23 | 2024-05-17 | 澳门大学 | 一种基于句法依存关系动态编码的语句处理方法及装置 |
CN109960814B (zh) * | 2019-03-25 | 2023-09-29 | 北京金山数字娱乐科技有限公司 | 模型参数搜索方法以及装置 |
CN111783465B (zh) * | 2020-07-03 | 2024-04-30 | 深圳追一科技有限公司 | 一种命名实体归一化方法、系统及相关装置 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013054607A (ja) * | 2011-09-05 | 2013-03-21 | Nippon Telegr & Teleph Corp <Ntt> | 並べ替え規則学習装置、方法、及びプログラム、並びに翻訳装置、方法、及びプログラム |
JP2013218524A (ja) * | 2012-04-09 | 2013-10-24 | National Institute Of Information & Communication Technology | 翻訳装置、およびプログラム |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2279164A (en) * | 1993-06-18 | 1994-12-21 | Canon Res Ct Europe Ltd | Processing a bilingual database. |
US6085162A (en) * | 1996-10-18 | 2000-07-04 | Gedanken Corporation | Translation system and method in which words are translated by a specialized dictionary and then a general dictionary |
WO1999000789A1 (en) * | 1997-06-26 | 1999-01-07 | Koninklijke Philips Electronics N.V. | A machine-organized method and a device for translating a word-organized source text into a word-organized target text |
DE69837979T2 (de) * | 1997-06-27 | 2008-03-06 | International Business Machines Corp. | System zum Extrahieren einer mehrsprachigen Terminologie |
US6195631B1 (en) * | 1998-04-15 | 2001-02-27 | At&T Corporation | Method and apparatus for automatic construction of hierarchical transduction models for language translation |
WO2006042321A2 (en) * | 2004-10-12 | 2006-04-20 | University Of Southern California | Training for a text-to-text application which uses string to tree conversion for training and decoding |
CN102270196A (zh) * | 2010-06-04 | 2011-12-07 | 中国科学院软件研究所 | 一种机器翻译方法 |
WO2012170817A1 (en) * | 2011-06-10 | 2012-12-13 | Google Inc. | Augmenting statistical machine translation with linguistic knowledge |
-
2013
- 2013-12-04 JP JP2013251524A patent/JP5843117B2/ja active Active
-
2014
- 2014-12-04 WO PCT/JP2014/082058 patent/WO2015083762A1/ja active Application Filing
- 2014-12-04 KR KR1020167014792A patent/KR20160093011A/ko not_active Application Discontinuation
- 2014-12-04 CN CN201480064778.7A patent/CN105849718B/zh not_active Expired - Fee Related
- 2014-12-04 US US15/101,266 patent/US9779086B2/en not_active Expired - Fee Related
- 2014-12-04 EP EP14866862.7A patent/EP3079075A4/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013054607A (ja) * | 2011-09-05 | 2013-03-21 | Nippon Telegr & Teleph Corp <Ntt> | 並べ替え規則学習装置、方法、及びプログラム、並びに翻訳装置、方法、及びプログラム |
JP2013218524A (ja) * | 2012-04-09 | 2013-10-24 | National Institute Of Information & Communication Technology | 翻訳装置、およびプログラム |
Non-Patent Citations (23)
Title |
---|
BERKELEY: "NAACL-HLT", ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Improved inference for unlexicalized parsing", pages: 404 - 411 |
DANIEL H. YOUNGER: "Recognition and parsing of context-free languages in time n", INFORMATION AND CONTROL, vol. 10, no. 2, 1967, pages 189 - 208 |
DAVID CHIANG: "Hierarchical phrase-based translation", COMPUTATIONAL LINGUISTICS, vol. 33, no. 2, 2007, pages 201 - 228 |
DEKAI WU: "Stochastic inversion transduction grammars and bilingual parsing of parallel corpora", COMPUTATIONAL LINGUISTICS, vol. 23, no. 3, 1997, pages 377 - 403, XP058193166 |
FRANZ JOSEF OCH: "Minimum error rate training in statistical machine translation", PROCEEDINGS OF THE 41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2003, pages 160 - 167, XP058291935, DOI: doi:10.3115/1075096.1075117 |
GRAHAM NEUBIG; TARO WATANABE; SHINSUKE MORI: "Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning", July 2012, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Inducing a discriminative parser to optimize machine translation reordering", pages: 843 - 853 |
HIDEKI ISOACID; KATSU HITO SODOM; HAJIME TSOUMADA; KEVIN DUH: "Head Finalization: A Simple Reordering Rule for SOV Languages", PROCEEDINGS OF THE JOINT FIFTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION AND METRICSMATR, UPPSALA, SWEDEN, 2010, pages 244 - 251 |
HIDEKI ISOACID; KATSU HITO SODOM; HAJIME TSOUMADA; KEVIN DUH: "HPSG-based preprocessing for English-to-Japanese translation", ACM TRANSACTIONS ON ASIAN LANGUAGE INFORMATION PROCESSING, vol. 11, no. 3, September 2012 (2012-09-01), pages 8.1 - 8.16, XP055382026, DOI: doi:10.1145/2334801.2334802 |
HIDEKI ISOACID; TSUTOMU HIRAO; KEVIN DUH; KATSU HITO SODOM; HAJIME TSOUMADA: "Automatic Evaluation of Translation Quality for Distant Language Pairs", PROCEEDINGS OF THE 2010 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2010, pages 944 - 952, XP058103490 |
HIEU HOANG; PHILIPP KOEHN; ADAM LOPEZ: "A Unified Framework for Phrase Based, Hierarchical, and Syntax Based Statistical Machine Translation", PROCEEDINGS OF IWSLT, 2009, pages 152 - 159 |
HIROSHI YAMAMOTO ET AL.: "Tokei Hon'yaku ni Okeru Kobungi o Mochiita Gojun Seiyaku no Donyu", THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING DAI 14 KAI NENJI TAIKAI HAPPYO RONBUNSHU, 17 March 2008 (2008-03-17), pages 57 - 60, XP008183744 * |
ISAO GOTO; BIN LU; KA PO CHOW; EIICHIRO SUMITA; BENJAMIN K. TSOU.: "Overview of the patent machine translation task at the NTCIR-9 workshop", PROCEEDINGS OF NTCIR-9, 2011, pages 559 - 578 |
ISAO GOTO; KA PO CHOW; BIN LU; EIICHIRO SUMITA; BENJAMIN K. TSOU: "Overview of the patent machine translation task at the NTCIR-10 workshop", PROCEEDINGS OF NTCIR-10, 2013, pages 260 - 286 |
ISAO GOTO; MASAO UTIYAMA; EIICHIRO SUMITA; AKIHIRO TAMURA; SADAO KUROHASHI: "Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria", August 2013, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Distortion model considering rich context for statistical machine translation" |
JIM PITMAN; MARC YOR: "The Annals of Probability", vol. 25, 1997, article "The two-parameter poisson-dirichlet distribution derived from a stable subordinator", pages: 855 - 900 |
KISHORE PAPINENI; SALIM ROUKOS; TODDWARD; WEIJING ZHU: "Bleu: a Method for Automatic Evaluation of Machine Translation", PROCEEDINGS OF ACL, 2002, pages 311 - 318, XP002375179 |
PHILIPP KOEHN; HIEU HOANG; ALEXANDRA BIRCH; CHRIS CALLISON BURCH; MARCELLO FEDERICO; NICOLA BERTOLDI; BROOKE COWAN; WADE SHEN; CHR, MOSES: OPEN SOURCE TOOLKIT FOR STATISTICAL MACHINE TRANSLATION, 2007 |
PHILIPP KOEHN; HIEU HOANG; ALEXANDRA BIRCH; CHRIS CALLISON-BURCH; MARCELLO FEDERICO; NICOLA BERTOLDI; BROOKE COWAN; WADE SHEN; CHR: "Moses: Opensource toolkit for statistical machine translation", PROCEEDINGS OF THE ACL DEMO AND POSTER SESSIONS, 2007, pages 177 - 180, XP055170705, DOI: doi:10.3115/1557769.1557821 |
See also references of EP3079075A4 |
SLAV PETROV; LEON BARRETT; ROMAIN THIBAUX; DAN KLEIN: "Proceedings of COLING-ACL 2006.", 2006, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, article "Learning accurate, compact, and interpretable tree annotation", pages: 433 - 440 |
TAKUYA NISHIMURA ET AL.: "Einichi SMT eno Head- Final Seiyaku no Donyu", THE ASSOCIATION FOR NATURAL LANGUAGE PROCESSING DAI 17 KAI NENJI TAIKAI HAPPYO RONBUNSHU TUTORIAL HONKAIGI WORKSHOP [ CD-ROM, 7 March 2011 (2011-03-07), pages 167 - 170, XP008185222 * |
TREVOR COHN; PHIL BLUNSOM; SHARON GOLDWATER: "Inducing Tree-Substitution Grammars", JOURNAL OF MACHINE LEARNING RESEARCH, vol. 11, 2010, pages 3053 - 3096, XP058336468 |
YUSUKE MIYAO; JUN'ICHI TSUJII.: "Feature forest models for probabilistic HPSG parsing", COMPUTATIONAL LINGUISTICS, vol. 34, no. 1, 2008, pages 81 - 88, XP058177039, DOI: doi:10.1162/coli.2008.34.1.35 |
Also Published As
Publication number | Publication date |
---|---|
EP3079075A1 (en) | 2016-10-12 |
US20160306793A1 (en) | 2016-10-20 |
US9779086B2 (en) | 2017-10-03 |
JP2015108975A (ja) | 2015-06-11 |
CN105849718B (zh) | 2018-08-24 |
EP3079075A4 (en) | 2017-08-02 |
JP5843117B2 (ja) | 2016-01-13 |
CN105849718A (zh) | 2016-08-10 |
KR20160093011A (ko) | 2016-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5843117B2 (ja) | 学習装置、翻訳装置、学習方法、翻訳方法、およびプログラム | |
Seddah et al. | Overview of the SPMRL 2013 shared task: A cross-framework evaluation of parsing morphologically rich languages | |
JP4404211B2 (ja) | マルチリンガル翻訳メモリ、翻訳方法および翻訳プログラム | |
US8959011B2 (en) | Indicating and correcting errors in machine translation systems | |
JPH05189481A (ja) | 翻訳用コンピュータ操作方法、字句モデル生成方法、モデル生成方法、翻訳用コンピュータシステム、字句モデル生成コンピュータシステム及びモデル生成コンピュータシステム | |
KR20120021933A (ko) | 의존관계 포레스트를 이용한 통계적 기계 번역 방법 | |
JP5911098B2 (ja) | 翻訳装置、およびプログラム | |
CN102662932B (zh) | 构建树结构及基于树结构的机器翻译系统的方法 | |
Prabhakar et al. | Machine transliteration and transliterated text retrieval: a survey | |
Stanojević et al. | Reordering grammar induction | |
Wax | Automated grammar engineering for verbal morphology | |
Lyons | A review of Thai–English machine translation | |
Carter et al. | Syntactic discriminative language model rerankers for statistical machine translation | |
Shen et al. | Effective use of linguistic and contextual information for statistical machine translation | |
KR101757222B1 (ko) | 한글 문장에 대한 의역 문장 생성 방법 | |
JP5924677B2 (ja) | 機械翻訳装置、機械翻訳方法、およびプログラム | |
Mara | English-Wolaytta Machine Translation using Statistical Approach | |
Ho | Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations | |
Weese et al. | Using categorial grammar to label translation rules | |
Schmirler et al. | Computational modelling of Plains Cree syntax: A Constraint Grammar approach to verbs and arguments in a plains cree corpus | |
Hadj Ameur et al. | A POS-based preordering approach for english-to-arabic statistical machine translation | |
Aghaebrahimian et al. | The TransBank Aligner: Cross-Sentence Alignment with Deep Neural Networks | |
Zhang et al. | A unified approach for effectively integrating source-side syntactic reordering rules into phrase-based translation | |
Meetei et al. | An empirical study of a novel multimodal dataset for low-resource machine translation | |
Aung et al. | Proposed Framework for Stochastic Parsing of Myanmar Language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14866862 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20167014792 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15101266 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2014866862 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014866862 Country of ref document: EP |