CN108363704A - A kind of neural network machine translation corpus expansion method based on statistics phrase table - Google Patents
A kind of neural network machine translation corpus expansion method based on statistics phrase table Download PDFInfo
- Publication number
- CN108363704A CN108363704A CN201810175915.4A CN201810175915A CN108363704A CN 108363704 A CN108363704 A CN 108363704A CN 201810175915 A CN201810175915 A CN 201810175915A CN 108363704 A CN108363704 A CN 108363704A
- Authority
- CN
- China
- Prior art keywords
- phrase
- language
- translation
- training set
- define
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Machine Translation (AREA)
Abstract
A kind of neural network machine translation corpus expansion method based on statistics phrase table, belongs to machine translation mothod field.The present invention proposes a kind of machine translation corpus expansion method based on statistics phrase table for neural network machine translation technology, can effectively extend language material scale on the basis of machine translation original training set;This method includes mainly:Training set extension phase and model training stage;Stage one is fused into the training set after new extension by statistical machine learning method from original trained focusing study phrase table and by it according to certain filtering rule and original training set, stage two is trained neural Machine Translation Model, it first passes through the training set after extension and carries out pre-training, it is trained again with tuning by original training set, obtains final mask;The experimental results showed that the present invention is compared with without using the Machine Translation Model of corpus expansion method, BLEU assessment indicators are obviously improved.
Description
Technical field
The present invention relates to a kind of neural network machines based on statistics phrase table to translate corpus expansion method, belongs to computer
Using and machine translation mothod field.
Background technology
Machine translation is that a kind of language (original language) is automatically translated into another language (target language using computer
Speech) technology.
With the development of artificial neural network and depth learning technology, the neural network machine based on depth learning technology turns over
It translates technology (hereinafter referred to as neural machine translation) and was achieving important achievement in recent years.Neural machine translation has:It needs
Linguistic knowledge and artificial intervention are few, and model storage takes up space small, translates output the translation reads smoothly the advantages such as naturally.In face
In the translation duties abundant to bilingual resource, neural machine translation is typically considered best selection.Currently, neural machine
Translation has been subjected to extensive concern and the approval in machine translation field, and has put it into commercial operation.
The data of training neural network are based on bilingual parallel sentence pairs.In general, the neural network used in neural machine translation
Model has large-scale free parameter, and theoretically, this class model needs large-scale bilingual parallel corporas to be trained it.
Experience have shown that including the neural Machine Translation Model of ten million rank free parameter usually requires the data of at least million sentence pair ranks
Ideal effect can be obtained by being trained.For the more rare language of some bilingual parallel resources, carried out using neural network
Translation is difficult to obtain promising result.
In addition, the training of neural machine translation is usually carried out using one or a set of (multiple) complete sentence pairs as unit, when
When language material scarcity of resources, the limited ability for the lower phrase study of some frequencies of occurrences that distich centering includes, especially independent
When translating these phrases.
Invention content
Model training problem of the present invention for the neural machine translation of scarcity of resources language, it is proposed that one kind is based on statistics
The neural network machine of phrase table translates corpus expansion method, can effectively extend the training data of neural Machine Translation Model, delays
Solve the rare adverse effect to model training of language resource.
The present invention includes:Training set extension phase and model training stage;
Wherein, A) training set extension phase operation it is as follows:By statistical machine learning method from original training set middle school
Acquistion is filtered to the phrase table with probability score, and according to the phrase table that rule obtains study, will be filtered short
Language table is taken into new bilingual parallel phrase to data set, the data set newly extracted and original training set is spliced to obtain new
Bilingual parallel pseudo- data, realize the extension of training set;
B) operation of model training stage is divided into two steps, and step 1 is pre-training, i.e., by stage A) obtain it is bilingual
Parallel puppet data carry out pre-training to model, and the good model b of pre-training is obtained after training1;Step 2 utilizes original training set weight
Newly to model b2It is trained, purpose is to carry out tuning to model, alleviates influence of the noise introduced in pseudo- data to model;
To achieve the above object and technology, the technical solution adopted by the present invention are as follows:
Related definition is carried out first, it is specific as follows:
Define 1:Original language, i.e., in machine translation, by the language belonging to content to be translated when being translated, such as from
In translator of Chinese to the machine translation of English, Chinese is original language;
Define 2:Source language data belongs to the data of original language, if source language data is a natural language sentences,
The data for belonging to original language are known as source language sentence, such as from the machine translation that Chinese translates English, the Chinese of input
Sentence is exactly source language data, also referred to as source language sentence;
Source language data collection is collectively referred to as by the collection that source language data forms;
Define 3:Object language, i.e., in machine translation, the language belonging to content being translated into when being translated, such as from
In translator of Chinese to the machine translation of English, English is object language;
Define 4:Target language data belongs to the data of object language, if target language data is a natural language
Sentence, then the data for belonging to object language are known as target language sentence, such as are translated in English machine translation from Chinese,
The english sentence of output is exactly target language data, also referred to as target language sentence;
Target language data collection is collectively referred to as by the collection that target language data forms;
Define 5:Training set refers in particular to the training set of statistical machine translation model, that is, is used to train statistical machine translation model
Data acquisition system, be denoted as T;
Define 6:Original training set, i.e., by the training set before extension;
Define 7:Word alignment information, abbreviation word alignment, i.e. in training set T, between original language word and target language word
Alignment relation, be denoted as α;
Wherein, if in training set T, there are alignment relations to be denoted as i-th of word of j-th of word of original language and object language
(j,i);
Definition 8, phrase, the linguistic unit of one or more words compositions;
The language used is that the phrase of original language is known as source language phrase, is denoted as f, and the language used is the short of object language
Language is known as object language phrase, is denoted as e;
The phrase pair of the object language phrase composition of definition 9, translation phrase pair, source language phrase and alignment, for example, it is " (' long
City ', ' The Great Wall ') ";
10 are defined, positive phrase translation probability translates the condition of object language phrase e when giving source language phrase f
Probability is denoted as
11 are defined, reversed phrase translation probability is translated back into the condition of source language phrase f when giving object language phrase e
Probability.It is denoted as
Definition 12, two-way phrase translation probability, positive phrase translation probability and reversed phrase translation probability are collectively referred to as two-way
Phrase translation probability;
13 are defined, positive Lexical phrase translation probability translates object language phrase e's when giving source language phrase f
Lexical translation probability is denoted as lex (e | f);
14 are defined, reversed Lexical phrase translation probability is translated back into source language phrase f's when giving object language phrase e
Lexical translation probability is denoted as lex (f | e);
Definition 15, two-way Lexical phrase translation probability, positive Lexical translation probability and reversed Lexical translation probability
It is collectively referred to as two-way Lexical translation probability;
Define 16, phrase table, also referred to as phrase translation table, by multigroup translation phrase to constituting, and it is to every group of translation short
Language is to the two-way phrase translation probability of affix and two-way Lexical translation probability;
17 are defined, filtering rule filters the rule of phrase table, according to source language phrase, the mesh for being included in phrase table
Mark language phrase, two-way phrase translation probability, two-way Lexical phrase translation probabilistic information are filtered phrase table artificial
The rule of formulation;
Training set extension phase, includes the following steps:
Step A1 pre-processes original training set, obtains according to defining 1, defining 2, definition 3, definition 4 and definition 5
By pretreated original training set Tf;
Wherein, pretreated detailed process different, purpose due to different source language and the target language is carried out to original training set
To carry out standardization processing to training set, obtain by pretreated original training set Tf;
Step A2, the original training set T after pretreatment obtained based on step A1f, and learned according to defining 7 and defining 8
Word alignment information is practised, which using word alignment kit realization of increasing income, will usually obtain after pretreatment in step A1
Original training set as input, by training word alignment tool training, obtain the word alignment information α of training set;
Step A3 defines 7, defines 8, define 9, define 10, define 11, define 12, define 13, definition according to defining 6
14,15 are defined and defines 16, the pretreated original training set T of process obtained in conjunction with step A1fAnd step A2 is obtained
The word alignment information α of training set extracts translation phrase pair, and it is short to obtain each translation to carrying out probability Estimation to translation phrase
The two-way phrase translation probability and two-way Lexical translation probability, combining translation phrase pair and translation probability of language pair, obtain phrase
Table, every of phrase table record by translation phrase to, word alignment information, two-way phrase translation probability and two-way Lexical translation it is general
Rate forms;
Step A4, according to defining 9, defining 12, define 15, define 16 and define 17, using the filtering rule of Manual definition,
The obtained phrase tables of step A3 are filtered, the lower translation phrase pair of probability is filtered out, obtains filtered phrase table, are remembered
For Pnew;
Step A5 according to definition 5, defines 16, the filtered phrase table P that step A4 is obtainednewIn translation phrase pair
The pretreated original training set T that part is obtained with step A1fSplicing, obtains new training set Tnew;
Step A1 to step A5 completes the training set extension phase of this method;
Model training stage includes the following steps:
Step B1, the new training set T obtained using step A5newPre-training is carried out to model, obtains model b1;
Step B2, the pretreated original training set T obtained using step A1f, model b that step B1 is obtained1Again
It is trained, obtains new trained model b2;
So far, from step B1 to step B2, the model training stage of this method is completed;
So far, from step A1 to step A5 and step B1 to step B2, a kind of god based on statistics phrase table is completed
Through Network-based machine translation corpus expansion method.
Advantageous effect
A kind of neural network machine based on statistics phrase table of the present invention translates corpus expansion method, is turned over existing machine
It translates training set application method to compare, have the advantages that:
1. the present invention devises the neural network machine based on statistics phrase table and translates corpus expansion method, this method is not
In the case of needing additional bilingual or single language data, original training set can effectively be extended, alleviate scarcity of resources
The adverse effect that speech training collection small scale carrys out the training band of neural Machine Translation Model.
2., the present invention and nerve without using the present invention identical in training set, development set and test set data
Machine Translation Model training method is compared, and BLEU evaluation metrics are obviously improved.
Description of the drawings
Fig. 1 is in the present invention a kind of neural network machine translation corpus expansion method and embodiment based on statistics phrase table
Flow chart.
Specific implementation mode
The method of the invention is described in detail with reference to the accompanying drawings and embodiments.Include according to the present invention when illustrating
Two Main Stages:1) training set extension phase and 2) model training stage, illustrate respectively.
Embodiment 1
The present embodiment describes the flow and its specific embodiment of the method for the invention.
Fig. 1 is that a kind of neural network machine based on statistics phrase table of the present invention translates corpus expansion method and in this implementation
Flow chart in example.
As can be seen from Figure 1 two stages 1 that the present invention includes) training set extension phase and 2) model training stage
Operating process.
By taking the translation of Uighur to Chinese as an example, wherein Uighur is original language, and Chinese is object language.
1) training set extension phase:
Step 1 pre-processes original training set according to defining 1, defining 2, definition 3, definition 4, definition 5, pre- to locate
It is different due to different source language and the target language to manage detailed process, purpose is to carry out standardization processing to training set, wherein to source language
Say that the preprocessing process of the data of Uighur and target language Chinese is:Word segment (word-piece) is first carried out to cut
Point, then word segmentation (tokenization) is carried out, it obtains by pretreated original training set Tf;
Step 2 learns word alignment according to 6 and definition 7 are defined, and in the present embodiment, which utilizes word alignment of increasing income
Kit GIZA++ is realized, using the pretreated original training set of the process obtained in step 1 as input, by training word
The training of alignment tool GIZA++ obtains the word alignment information α of training set;
Step 3 defines 7 according to defining 6, defines 8, defines 9, define 10, define 11, define 12, define 13, definition
14,15 are defined and defines 16, the pretreated original training set T of process obtained in conjunction with step 1fAnd step 2 obtains
The word alignment information α of training set extracts translation phrase pair, and to translation phrase to carrying out probability Estimation, in the present embodiment, utilizes
Train-model.perl scripts in Moses Open-Source Tools realize above-mentioned function, obtain phrase table P, every note of phrase table
Record by translation phrase to, word alignment information, two-way phrase translation probability and two-way Lexical translation probability form;
Step 4, according to defining 9, defining 12, defining 15, defining 16, defining 17, using the filtering rule of Manual definition,
The phrase table that step 3 obtains is filtered, the rule of Manual definition is as follows:
Retain the translation phrase pair, and if only if the probability of the translation phrase pairAndAnd lex (e | f) >=0.025, and lex (f | e) >=0.025;
The lower translation phrase pair of probability is filtered out, filtered new phrase table P is obtainednew;
Step 5 according to definition 5, defines 16, the filtered new phrase table P that step 4 is obtainednewTranslation phrase pair
The pretreated original training set T that part is obtained with step 1fSplicing, obtains new training set Tnew;
2) the step of model training stage is as follows:
Step 6 carries out model pre-training, neural Machine Translation Model of increasing income is used in the present embodiment
Tesnor2tensor, the new training set T obtained using step 5newPre-training is carried out to model, obtains model b1;
Step 7, the pretreated original training set T obtained using step 1f, model b that step 6 is obtained1Again
It is trained, obtains new trained model b2;
So far, from step 1 to step 7, a kind of neural network machine translation language material based on statistics phrase table is completed
Extended method.
Embodiment 2
Training set in Uighur-Chinese news translation duties that CWMT2017 is provided randomly is split as training
Collection, development set and test set 1, in addition, the exploitation of the Uighur that CWMT2017 is provided-Chinese news translation evaluation and test task
Collect data as test set 2, the experimental results showed that, in original training set, development set, test set data and neural machine translation mould
In the case of type is identical, the present invention is compared with without using the neural Machine Translation Model training method of the present invention, using based on the Chinese
The BLEU of word can obtain following experimental result as evaluation metrics.
Table 1 is compared using BLEU values before and after training set extended method proposed by the present invention
Table 1 the experimental results showed that:It is identical in training set, development set and test set data, using the present invention
Compared with without using the neural Machine Translation Model training method of the present invention, BLEU evaluation metrics are obviously improved the method.
The above is presently preferred embodiments of the present invention, and it is public that the present invention should not be limited to embodiment and attached drawing institute
The content opened.It is every not depart from the lower equivalent or modification completed of spirit disclosed in this invention, both fall within the model that the present invention protects
It encloses.
Claims (4)
1. a kind of neural network machine based on statistics phrase table translates corpus expansion method, it is characterised in that:Including:Training set
Extension phase and model training stage;
Wherein, A) training set extension phase operation it is as follows:By statistical machine learning method from the acquistion of original training set middle school
It is filtered to the phrase table with probability score, and according to the phrase table that rule obtains study, by filtered phrase table
New bilingual parallel phrase is taken into data set, the data set newly extracted and original training set is spliced to obtain new bilingual
Parallel puppet data, realize the extension of training set;
B) operation of model training stage is divided into two steps, and step 1 is pre-training, i.e., by stage A) obtain it is bilingual parallel
Pseudo- data carry out pre-training to model, and the good model b of pre-training is obtained after training1;Step 2 is again right using original training set
Model b2It is trained, purpose is to carry out tuning to model, alleviates influence of the noise introduced in pseudo- data to model.
2. a kind of neural network machine based on statistics phrase table according to claim 1 translates corpus expansion method,
It is characterized in that:To achieve the above object and technology, it adopts the following technical scheme that:
Related definition is carried out first, it is specific as follows:
Define 1:Original language, i.e., in machine translation, by the language belonging to content to be translated when being translated, such as from Chinese
In the machine translation for translating English, Chinese is original language;
Define 2:Source language data belongs to the data of original language, if source language data is a natural language sentences, the category
It is known as source language sentence in the data of original language, such as from the machine translation that Chinese translates English, the Chinese sentence of input
It is exactly source language data, also referred to as source language sentence;
Source language data collection is collectively referred to as by the collection that source language data forms;
Define 3:Object language, i.e., in machine translation, the language belonging to content being translated into when being translated, such as from Chinese
In the machine translation for translating English, English is object language;
Define 4:Target language data belongs to the data of object language, if target language data is a natural language sentences,
Then the data for belonging to object language are known as target language sentence, such as from the machine translation that Chinese translates English, output
English sentence be exactly target language data, also referred to as target language sentence;
Target language data collection is collectively referred to as by the collection that target language data forms;
Define 5:Training set refers in particular to the training set of statistical machine translation model, that is, is used to train the number of statistical machine translation model
According to set, it is denoted as T;
Define 6:Original training set, i.e., by the training set before extension;
Define 7:Word alignment information, abbreviation word alignment, i.e. in training set T, pair between original language word and target language word
Homogeneous relation is denoted as α;
Wherein, if in training set T, there are alignment relations to be denoted as (j, i) with i-th of word of object language for j-th of word of original language;
Definition 8, phrase, the linguistic unit of one or more words compositions;
The language used is that the phrase of original language is known as source language phrase, is denoted as f, and the language used is that the phrase of object language claims
For object language phrase, it is denoted as e;
The phrase pair of the object language phrase composition of definition 9, translation phrase pair, source language phrase and alignment, such as " (' Great Wall ',
‘The Great Wall’)”;
10 are defined, positive phrase translation probability, that is, when giving source language phrase f, the condition for translating object language phrase e is general
Rate is denoted as
11 are defined, reversed phrase translation probability, that is, when giving object language phrase e, the condition for being translated back into source language phrase f is general
Rate is denoted as
12 are defined, two-way phrase translation probability, positive phrase translation probability and reversed phrase translation probability are collectively referred to as two-way phrase
Translation probability;
13 are defined, positive Lexical phrase translation probability translates the vocabulary of object language phrase e when giving source language phrase f
Change translation probability, is denoted as lex (e | f);
14 are defined, reversed Lexical phrase translation probability is translated back into the vocabulary of source language phrase f when giving object language phrase e
Change translation probability, is denoted as lex (f | e);
15 are defined, two-way Lexical phrase translation probability, positive Lexical translation probability and reversed Lexical translation probability are collectively referred to as
For two-way Lexical translation probability;
16 are defined, phrase table, also referred to as phrase translation table, by multigroup translation phrase to constituting, and to every group of translation phrase pair
The two-way phrase translation probability of affix and two-way Lexical translation probability;
17 are defined, filtering rule filters the rule of phrase table, according to source language phrase, the target language for being included in phrase table
The artificial formulation that speech phrase, two-way phrase translation probability, two-way Lexical phrase translation probabilistic information are filtered phrase table
Rule;
Training set extension phase, includes the following steps:
Step A1, according to define 1, define 2, define 3, define 4 and define 5, original training set is pre-processed, obtain by
Pretreated original training set Tf;
Step A2, the original training set T after pretreatment obtained based on step A1f, and learn words pair according to defining 7 and defining 8
Neat information, the process are pretreated original by the process obtained in step A1 usually using word alignment kit realization of increasing income
Training set obtains the word alignment information α of training set as input by the training of training word alignment tool;
Step A3 defines 7, defines 8, define 9, define 10, define 11, define 12, define 13, define 14, is fixed according to defining 6
Justice 15 and definition 16, the pretreated original training set T of process obtained in conjunction with step A1fAnd the training set that step A2 is obtained
Word alignment information α, extract translation phrase pair, and each translation phrase pair is obtained to carrying out probability Estimation to translation phrase
Two-way phrase translation probability and two-way Lexical translation probability, combining translation phrase pair and translation probability, obtain phrase table, phrase
Every of table record by translation phrase to, word alignment information, two-way phrase translation probability and two-way Lexical translation probability form;
Step A4, according to defining 9, defining 12, define 15, define 16 and define 17, using the filtering rule of Manual definition, to step
The phrase table that rapid A3 is obtained is filtered, and is filtered out the lower translation phrase pair of probability, is obtained filtered phrase table, be denoted as
Pnew;
Step A5 according to definition 5, defines 16, the filtered phrase table P that step A4 is obtainednewIn translation phrase to part
The pretreated original training set T obtained with step A1fSplicing, obtains new training set Tnew。
3. a kind of neural network machine based on statistics phrase table according to claim 1 translates corpus expansion method,
It is characterized in that:Model training stage includes the following steps:
Step B1, the new training set T obtained using step A5newPre-training is carried out to model, obtains model b1;
Step B2, the pretreated original training set T obtained using step A1f, model b that step B1 is obtained1It carries out again
Training obtains new trained model b2。
4. a kind of neural network machine based on statistics phrase table according to claim 1 translates corpus expansion method,
It is characterized in that:In step A1, wherein carry out pretreated detailed process because of different source language and the target language to original training set
And it is different, purpose is to carry out standardization processing to training set, is obtained by pretreated original training set Tf。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810175915.4A CN108363704A (en) | 2018-03-02 | 2018-03-02 | A kind of neural network machine translation corpus expansion method based on statistics phrase table |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810175915.4A CN108363704A (en) | 2018-03-02 | 2018-03-02 | A kind of neural network machine translation corpus expansion method based on statistics phrase table |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108363704A true CN108363704A (en) | 2018-08-03 |
Family
ID=63003675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810175915.4A Pending CN108363704A (en) | 2018-03-02 | 2018-03-02 | A kind of neural network machine translation corpus expansion method based on statistics phrase table |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363704A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190768A (en) * | 2018-08-09 | 2019-01-11 | 北京中关村科金技术有限公司 | A kind of data enhancing corpus training method in neural network |
CN110046332A (en) * | 2019-04-04 | 2019-07-23 | 珠海远光移动互联科技有限公司 | A kind of Similar Text data set generation method and device |
CN110472252A (en) * | 2019-08-15 | 2019-11-19 | 昆明理工大学 | The method of the more neural machine translation of the Chinese based on transfer learning |
CN110543645A (en) * | 2019-09-04 | 2019-12-06 | 网易有道信息技术(北京)有限公司 | Machine learning model training method, medium, device and computing equipment |
CN110717341A (en) * | 2019-09-11 | 2020-01-21 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN110852117A (en) * | 2019-11-08 | 2020-02-28 | 沈阳雅译网络技术有限公司 | Effective data enhancement method for improving translation effect of neural machine |
CN111160046A (en) * | 2018-11-07 | 2020-05-15 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN111368035A (en) * | 2020-03-03 | 2020-07-03 | 新疆大学 | Neural network-based Chinese dimension-dimension Chinese organization name dictionary mining system |
CN112507734A (en) * | 2020-11-19 | 2021-03-16 | 南京大学 | Roman Uygur language-based neural machine translation system |
US10963757B2 (en) | 2018-12-14 | 2021-03-30 | Industrial Technology Research Institute | Neural network model fusion method and electronic device using the same |
CN113111667A (en) * | 2021-04-13 | 2021-07-13 | 沈阳雅译网络技术有限公司 | Method for generating pseudo data by low-resource language based on multi-language model |
CN117540755A (en) * | 2023-11-13 | 2024-02-09 | 北京云上曲率科技有限公司 | Method and system for enhancing data by neural machine translation model |
CN118095302A (en) * | 2024-04-26 | 2024-05-28 | 四川交通运输职业学校 | Auxiliary translation method and system based on computer |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102214166A (en) * | 2010-04-06 | 2011-10-12 | 三星电子(中国)研发中心 | Machine translation system and machine translation method based on syntactic analysis and hierarchical model |
US20130144593A1 (en) * | 2007-03-26 | 2013-06-06 | Franz Josef Och | Minimum error rate training with a large number of features for machine learning |
CN104391842A (en) * | 2014-12-18 | 2015-03-04 | 苏州大学 | Translation model establishing method and system |
CN105068997A (en) * | 2015-07-15 | 2015-11-18 | 清华大学 | Parallel corpus construction method and device |
CN105190609A (en) * | 2013-06-03 | 2015-12-23 | 国立研究开发法人情报通信研究机构 | Translation device, learning device, translation method, and recording medium |
CN106156013A (en) * | 2016-06-30 | 2016-11-23 | 电子科技大学 | The two-part machine translation method that a kind of regular collocation type phrase is preferential |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN107092594A (en) * | 2017-04-19 | 2017-08-25 | 厦门大学 | Bilingual recurrence self-encoding encoder based on figure |
CN107329960A (en) * | 2017-06-29 | 2017-11-07 | 哈尔滨工业大学 | Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive |
-
2018
- 2018-03-02 CN CN201810175915.4A patent/CN108363704A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130144593A1 (en) * | 2007-03-26 | 2013-06-06 | Franz Josef Och | Minimum error rate training with a large number of features for machine learning |
CN102214166A (en) * | 2010-04-06 | 2011-10-12 | 三星电子(中国)研发中心 | Machine translation system and machine translation method based on syntactic analysis and hierarchical model |
CN105190609A (en) * | 2013-06-03 | 2015-12-23 | 国立研究开发法人情报通信研究机构 | Translation device, learning device, translation method, and recording medium |
CN104391842A (en) * | 2014-12-18 | 2015-03-04 | 苏州大学 | Translation model establishing method and system |
CN105068997A (en) * | 2015-07-15 | 2015-11-18 | 清华大学 | Parallel corpus construction method and device |
CN106484682A (en) * | 2015-08-25 | 2017-03-08 | 阿里巴巴集团控股有限公司 | Based on the machine translation method of statistics, device and electronic equipment |
CN106156013A (en) * | 2016-06-30 | 2016-11-23 | 电子科技大学 | The two-part machine translation method that a kind of regular collocation type phrase is preferential |
CN107092594A (en) * | 2017-04-19 | 2017-08-25 | 厦门大学 | Bilingual recurrence self-encoding encoder based on figure |
CN107329960A (en) * | 2017-06-29 | 2017-11-07 | 哈尔滨工业大学 | Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive |
Non-Patent Citations (1)
Title |
---|
张金鹏 等: "基于跨语言语料的汉泰词分布表示", 《计算机工程与科学》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190768A (en) * | 2018-08-09 | 2019-01-11 | 北京中关村科金技术有限公司 | A kind of data enhancing corpus training method in neural network |
CN111160046A (en) * | 2018-11-07 | 2020-05-15 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
US10963757B2 (en) | 2018-12-14 | 2021-03-30 | Industrial Technology Research Institute | Neural network model fusion method and electronic device using the same |
CN110046332A (en) * | 2019-04-04 | 2019-07-23 | 珠海远光移动互联科技有限公司 | A kind of Similar Text data set generation method and device |
CN110046332B (en) * | 2019-04-04 | 2024-01-23 | 远光软件股份有限公司 | Similar text data set generation method and device |
CN110472252A (en) * | 2019-08-15 | 2019-11-19 | 昆明理工大学 | The method of the more neural machine translation of the Chinese based on transfer learning |
CN110472252B (en) * | 2019-08-15 | 2022-12-13 | 昆明理工大学 | Method for translating Hanyue neural machine based on transfer learning |
CN110543645A (en) * | 2019-09-04 | 2019-12-06 | 网易有道信息技术(北京)有限公司 | Machine learning model training method, medium, device and computing equipment |
CN110543645B (en) * | 2019-09-04 | 2023-04-07 | 网易有道信息技术(北京)有限公司 | Machine learning model training method, medium, device and computing equipment |
CN110717341A (en) * | 2019-09-11 | 2020-01-21 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN110717341B (en) * | 2019-09-11 | 2022-06-14 | 昆明理工大学 | Method and device for constructing old-Chinese bilingual corpus with Thai as pivot |
CN110852117B (en) * | 2019-11-08 | 2023-02-24 | 沈阳雅译网络技术有限公司 | Effective data enhancement method for improving translation effect of neural machine |
CN110852117A (en) * | 2019-11-08 | 2020-02-28 | 沈阳雅译网络技术有限公司 | Effective data enhancement method for improving translation effect of neural machine |
CN111368035A (en) * | 2020-03-03 | 2020-07-03 | 新疆大学 | Neural network-based Chinese dimension-dimension Chinese organization name dictionary mining system |
CN112507734A (en) * | 2020-11-19 | 2021-03-16 | 南京大学 | Roman Uygur language-based neural machine translation system |
CN112507734B (en) * | 2020-11-19 | 2024-03-19 | 南京大学 | Neural machine translation system based on romanized Uygur language |
CN113111667A (en) * | 2021-04-13 | 2021-07-13 | 沈阳雅译网络技术有限公司 | Method for generating pseudo data by low-resource language based on multi-language model |
CN113111667B (en) * | 2021-04-13 | 2023-08-22 | 沈阳雅译网络技术有限公司 | Method for generating pseudo data in low-resource language based on multi-language model |
CN117540755A (en) * | 2023-11-13 | 2024-02-09 | 北京云上曲率科技有限公司 | Method and system for enhancing data by neural machine translation model |
CN118095302A (en) * | 2024-04-26 | 2024-05-28 | 四川交通运输职业学校 | Auxiliary translation method and system based on computer |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363704A (en) | A kind of neural network machine translation corpus expansion method based on statistics phrase table | |
CN110852117B (en) | Effective data enhancement method for improving translation effect of neural machine | |
CN107690634B (en) | Automatic query pattern generation method and system | |
CN105138507A (en) | Pattern self-learning based Chinese open relationship extraction method | |
CN109359304A (en) | Limited neural network machine interpretation method and storage medium | |
CN104391885A (en) | Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training | |
CN105573994B (en) | Statictic machine translation system based on syntax skeleton | |
CN101404036A (en) | Keyword abstraction method for PowerPoint electronic demonstration draft | |
CN109101518A (en) | Phonetic transcription text quality appraisal procedure, device, terminal and readable storage medium storing program for executing | |
CN106156013B (en) | A kind of two-part machine translation method that regular collocation type phrase is preferential | |
CN112446213A (en) | Text corpus expansion method | |
CN112101047A (en) | Machine translation method for matching language-oriented precise terms | |
CN108491399A (en) | Chinese to English machine translation method based on context iterative analysis | |
CN101763403A (en) | Query translation method facing multi-lingual information retrieval system | |
JP2016164707A (en) | Automatic translation device and translation model learning device | |
CN112765977B (en) | Word segmentation method and device based on cross-language data enhancement | |
Li et al. | Cultural concept adaptation on multimodal reasoning | |
CN106156007A (en) | A kind of English-Chinese statistical machine translation method of word original shape | |
CN112836525A (en) | Human-computer interaction based machine translation system and automatic optimization method thereof | |
Millour et al. | Unsupervised data augmentation for less-resourced languages with no standardized spelling | |
CN117251524A (en) | Short text classification method based on multi-strategy fusion | |
Gad-Elrab et al. | Named entity disambiguation for resource-poor languages | |
Baisa et al. | Automating dictionary production: a Tagalog-English-Korean dictionary from scratch | |
CN111597824B (en) | Training method and device for language translation model | |
CN114492469A (en) | Translation method, translation device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180803 |
|
WD01 | Invention patent application deemed withdrawn after publication |