CN108363704A - A kind of neural network machine translation corpus expansion method based on statistics phrase table - Google Patents

A kind of neural network machine translation corpus expansion method based on statistics phrase table Download PDF

Info

Publication number
CN108363704A
CN108363704A CN201810175915.4A CN201810175915A CN108363704A CN 108363704 A CN108363704 A CN 108363704A CN 201810175915 A CN201810175915 A CN 201810175915A CN 108363704 A CN108363704 A CN 108363704A
Authority
CN
China
Prior art keywords
phrase
language
translation
training set
define
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810175915.4A
Other languages
Chinese (zh)
Inventor
黄河燕
史学文
鉴萍
唐翼琨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201810175915.4A priority Critical patent/CN108363704A/en
Publication of CN108363704A publication Critical patent/CN108363704A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

A kind of neural network machine translation corpus expansion method based on statistics phrase table, belongs to machine translation mothod field.The present invention proposes a kind of machine translation corpus expansion method based on statistics phrase table for neural network machine translation technology, can effectively extend language material scale on the basis of machine translation original training set;This method includes mainly:Training set extension phase and model training stage;Stage one is fused into the training set after new extension by statistical machine learning method from original trained focusing study phrase table and by it according to certain filtering rule and original training set, stage two is trained neural Machine Translation Model, it first passes through the training set after extension and carries out pre-training, it is trained again with tuning by original training set, obtains final mask;The experimental results showed that the present invention is compared with without using the Machine Translation Model of corpus expansion method, BLEU assessment indicators are obviously improved.

Description

A kind of neural network machine translation corpus expansion method based on statistics phrase table
Technical field
The present invention relates to a kind of neural network machines based on statistics phrase table to translate corpus expansion method, belongs to computer Using and machine translation mothod field.
Background technology
Machine translation is that a kind of language (original language) is automatically translated into another language (target language using computer Speech) technology.
With the development of artificial neural network and depth learning technology, the neural network machine based on depth learning technology turns over It translates technology (hereinafter referred to as neural machine translation) and was achieving important achievement in recent years.Neural machine translation has:It needs Linguistic knowledge and artificial intervention are few, and model storage takes up space small, translates output the translation reads smoothly the advantages such as naturally.In face In the translation duties abundant to bilingual resource, neural machine translation is typically considered best selection.Currently, neural machine Translation has been subjected to extensive concern and the approval in machine translation field, and has put it into commercial operation.
The data of training neural network are based on bilingual parallel sentence pairs.In general, the neural network used in neural machine translation Model has large-scale free parameter, and theoretically, this class model needs large-scale bilingual parallel corporas to be trained it. Experience have shown that including the neural Machine Translation Model of ten million rank free parameter usually requires the data of at least million sentence pair ranks Ideal effect can be obtained by being trained.For the more rare language of some bilingual parallel resources, carried out using neural network Translation is difficult to obtain promising result.
In addition, the training of neural machine translation is usually carried out using one or a set of (multiple) complete sentence pairs as unit, when When language material scarcity of resources, the limited ability for the lower phrase study of some frequencies of occurrences that distich centering includes, especially independent When translating these phrases.
Invention content
Model training problem of the present invention for the neural machine translation of scarcity of resources language, it is proposed that one kind is based on statistics The neural network machine of phrase table translates corpus expansion method, can effectively extend the training data of neural Machine Translation Model, delays Solve the rare adverse effect to model training of language resource.
The present invention includes:Training set extension phase and model training stage;
Wherein, A) training set extension phase operation it is as follows:By statistical machine learning method from original training set middle school Acquistion is filtered to the phrase table with probability score, and according to the phrase table that rule obtains study, will be filtered short Language table is taken into new bilingual parallel phrase to data set, the data set newly extracted and original training set is spliced to obtain new Bilingual parallel pseudo- data, realize the extension of training set;
B) operation of model training stage is divided into two steps, and step 1 is pre-training, i.e., by stage A) obtain it is bilingual Parallel puppet data carry out pre-training to model, and the good model b of pre-training is obtained after training1;Step 2 utilizes original training set weight Newly to model b2It is trained, purpose is to carry out tuning to model, alleviates influence of the noise introduced in pseudo- data to model;
To achieve the above object and technology, the technical solution adopted by the present invention are as follows:
Related definition is carried out first, it is specific as follows:
Define 1:Original language, i.e., in machine translation, by the language belonging to content to be translated when being translated, such as from In translator of Chinese to the machine translation of English, Chinese is original language;
Define 2:Source language data belongs to the data of original language, if source language data is a natural language sentences, The data for belonging to original language are known as source language sentence, such as from the machine translation that Chinese translates English, the Chinese of input Sentence is exactly source language data, also referred to as source language sentence;
Source language data collection is collectively referred to as by the collection that source language data forms;
Define 3:Object language, i.e., in machine translation, the language belonging to content being translated into when being translated, such as from In translator of Chinese to the machine translation of English, English is object language;
Define 4:Target language data belongs to the data of object language, if target language data is a natural language Sentence, then the data for belonging to object language are known as target language sentence, such as are translated in English machine translation from Chinese, The english sentence of output is exactly target language data, also referred to as target language sentence;
Target language data collection is collectively referred to as by the collection that target language data forms;
Define 5:Training set refers in particular to the training set of statistical machine translation model, that is, is used to train statistical machine translation model Data acquisition system, be denoted as T;
Define 6:Original training set, i.e., by the training set before extension;
Define 7:Word alignment information, abbreviation word alignment, i.e. in training set T, between original language word and target language word Alignment relation, be denoted as α;
Wherein, if in training set T, there are alignment relations to be denoted as i-th of word of j-th of word of original language and object language (j,i);
Definition 8, phrase, the linguistic unit of one or more words compositions;
The language used is that the phrase of original language is known as source language phrase, is denoted as f, and the language used is the short of object language Language is known as object language phrase, is denoted as e;
The phrase pair of the object language phrase composition of definition 9, translation phrase pair, source language phrase and alignment, for example, it is " (' long City ', ' The Great Wall ') ";
10 are defined, positive phrase translation probability translates the condition of object language phrase e when giving source language phrase f Probability is denoted as
11 are defined, reversed phrase translation probability is translated back into the condition of source language phrase f when giving object language phrase e Probability.It is denoted as
Definition 12, two-way phrase translation probability, positive phrase translation probability and reversed phrase translation probability are collectively referred to as two-way Phrase translation probability;
13 are defined, positive Lexical phrase translation probability translates object language phrase e's when giving source language phrase f Lexical translation probability is denoted as lex (e | f);
14 are defined, reversed Lexical phrase translation probability is translated back into source language phrase f's when giving object language phrase e Lexical translation probability is denoted as lex (f | e);
Definition 15, two-way Lexical phrase translation probability, positive Lexical translation probability and reversed Lexical translation probability It is collectively referred to as two-way Lexical translation probability;
Define 16, phrase table, also referred to as phrase translation table, by multigroup translation phrase to constituting, and it is to every group of translation short Language is to the two-way phrase translation probability of affix and two-way Lexical translation probability;
17 are defined, filtering rule filters the rule of phrase table, according to source language phrase, the mesh for being included in phrase table Mark language phrase, two-way phrase translation probability, two-way Lexical phrase translation probabilistic information are filtered phrase table artificial The rule of formulation;
Training set extension phase, includes the following steps:
Step A1 pre-processes original training set, obtains according to defining 1, defining 2, definition 3, definition 4 and definition 5 By pretreated original training set Tf
Wherein, pretreated detailed process different, purpose due to different source language and the target language is carried out to original training set To carry out standardization processing to training set, obtain by pretreated original training set Tf
Step A2, the original training set T after pretreatment obtained based on step A1f, and learned according to defining 7 and defining 8 Word alignment information is practised, which using word alignment kit realization of increasing income, will usually obtain after pretreatment in step A1 Original training set as input, by training word alignment tool training, obtain the word alignment information α of training set;
Step A3 defines 7, defines 8, define 9, define 10, define 11, define 12, define 13, definition according to defining 6 14,15 are defined and defines 16, the pretreated original training set T of process obtained in conjunction with step A1fAnd step A2 is obtained The word alignment information α of training set extracts translation phrase pair, and it is short to obtain each translation to carrying out probability Estimation to translation phrase The two-way phrase translation probability and two-way Lexical translation probability, combining translation phrase pair and translation probability of language pair, obtain phrase Table, every of phrase table record by translation phrase to, word alignment information, two-way phrase translation probability and two-way Lexical translation it is general Rate forms;
Step A4, according to defining 9, defining 12, define 15, define 16 and define 17, using the filtering rule of Manual definition, The obtained phrase tables of step A3 are filtered, the lower translation phrase pair of probability is filtered out, obtains filtered phrase table, are remembered For Pnew
Step A5 according to definition 5, defines 16, the filtered phrase table P that step A4 is obtainednewIn translation phrase pair The pretreated original training set T that part is obtained with step A1fSplicing, obtains new training set Tnew
Step A1 to step A5 completes the training set extension phase of this method;
Model training stage includes the following steps:
Step B1, the new training set T obtained using step A5newPre-training is carried out to model, obtains model b1
Step B2, the pretreated original training set T obtained using step A1f, model b that step B1 is obtained1Again It is trained, obtains new trained model b2
So far, from step B1 to step B2, the model training stage of this method is completed;
So far, from step A1 to step A5 and step B1 to step B2, a kind of god based on statistics phrase table is completed Through Network-based machine translation corpus expansion method.
Advantageous effect
A kind of neural network machine based on statistics phrase table of the present invention translates corpus expansion method, is turned over existing machine It translates training set application method to compare, have the advantages that:
1. the present invention devises the neural network machine based on statistics phrase table and translates corpus expansion method, this method is not In the case of needing additional bilingual or single language data, original training set can effectively be extended, alleviate scarcity of resources The adverse effect that speech training collection small scale carrys out the training band of neural Machine Translation Model.
2., the present invention and nerve without using the present invention identical in training set, development set and test set data Machine Translation Model training method is compared, and BLEU evaluation metrics are obviously improved.
Description of the drawings
Fig. 1 is in the present invention a kind of neural network machine translation corpus expansion method and embodiment based on statistics phrase table Flow chart.
Specific implementation mode
The method of the invention is described in detail with reference to the accompanying drawings and embodiments.Include according to the present invention when illustrating Two Main Stages:1) training set extension phase and 2) model training stage, illustrate respectively.
Embodiment 1
The present embodiment describes the flow and its specific embodiment of the method for the invention.
Fig. 1 is that a kind of neural network machine based on statistics phrase table of the present invention translates corpus expansion method and in this implementation Flow chart in example.
As can be seen from Figure 1 two stages 1 that the present invention includes) training set extension phase and 2) model training stage Operating process.
By taking the translation of Uighur to Chinese as an example, wherein Uighur is original language, and Chinese is object language.
1) training set extension phase:
Step 1 pre-processes original training set according to defining 1, defining 2, definition 3, definition 4, definition 5, pre- to locate It is different due to different source language and the target language to manage detailed process, purpose is to carry out standardization processing to training set, wherein to source language Say that the preprocessing process of the data of Uighur and target language Chinese is:Word segment (word-piece) is first carried out to cut Point, then word segmentation (tokenization) is carried out, it obtains by pretreated original training set Tf
Step 2 learns word alignment according to 6 and definition 7 are defined, and in the present embodiment, which utilizes word alignment of increasing income Kit GIZA++ is realized, using the pretreated original training set of the process obtained in step 1 as input, by training word The training of alignment tool GIZA++ obtains the word alignment information α of training set;
Step 3 defines 7 according to defining 6, defines 8, defines 9, define 10, define 11, define 12, define 13, definition 14,15 are defined and defines 16, the pretreated original training set T of process obtained in conjunction with step 1fAnd step 2 obtains The word alignment information α of training set extracts translation phrase pair, and to translation phrase to carrying out probability Estimation, in the present embodiment, utilizes Train-model.perl scripts in Moses Open-Source Tools realize above-mentioned function, obtain phrase table P, every note of phrase table Record by translation phrase to, word alignment information, two-way phrase translation probability and two-way Lexical translation probability form;
Step 4, according to defining 9, defining 12, defining 15, defining 16, defining 17, using the filtering rule of Manual definition, The phrase table that step 3 obtains is filtered, the rule of Manual definition is as follows:
Retain the translation phrase pair, and if only if the probability of the translation phrase pairAndAnd lex (e | f) >=0.025, and lex (f | e) >=0.025;
The lower translation phrase pair of probability is filtered out, filtered new phrase table P is obtainednew
Step 5 according to definition 5, defines 16, the filtered new phrase table P that step 4 is obtainednewTranslation phrase pair The pretreated original training set T that part is obtained with step 1fSplicing, obtains new training set Tnew
2) the step of model training stage is as follows:
Step 6 carries out model pre-training, neural Machine Translation Model of increasing income is used in the present embodiment Tesnor2tensor, the new training set T obtained using step 5newPre-training is carried out to model, obtains model b1
Step 7, the pretreated original training set T obtained using step 1f, model b that step 6 is obtained1Again It is trained, obtains new trained model b2
So far, from step 1 to step 7, a kind of neural network machine translation language material based on statistics phrase table is completed Extended method.
Embodiment 2
Training set in Uighur-Chinese news translation duties that CWMT2017 is provided randomly is split as training Collection, development set and test set 1, in addition, the exploitation of the Uighur that CWMT2017 is provided-Chinese news translation evaluation and test task Collect data as test set 2, the experimental results showed that, in original training set, development set, test set data and neural machine translation mould In the case of type is identical, the present invention is compared with without using the neural Machine Translation Model training method of the present invention, using based on the Chinese The BLEU of word can obtain following experimental result as evaluation metrics.
Table 1 is compared using BLEU values before and after training set extended method proposed by the present invention
Table 1 the experimental results showed that:It is identical in training set, development set and test set data, using the present invention Compared with without using the neural Machine Translation Model training method of the present invention, BLEU evaluation metrics are obviously improved the method.
The above is presently preferred embodiments of the present invention, and it is public that the present invention should not be limited to embodiment and attached drawing institute The content opened.It is every not depart from the lower equivalent or modification completed of spirit disclosed in this invention, both fall within the model that the present invention protects It encloses.

Claims (4)

1. a kind of neural network machine based on statistics phrase table translates corpus expansion method, it is characterised in that:Including:Training set Extension phase and model training stage;
Wherein, A) training set extension phase operation it is as follows:By statistical machine learning method from the acquistion of original training set middle school It is filtered to the phrase table with probability score, and according to the phrase table that rule obtains study, by filtered phrase table New bilingual parallel phrase is taken into data set, the data set newly extracted and original training set is spliced to obtain new bilingual Parallel puppet data, realize the extension of training set;
B) operation of model training stage is divided into two steps, and step 1 is pre-training, i.e., by stage A) obtain it is bilingual parallel Pseudo- data carry out pre-training to model, and the good model b of pre-training is obtained after training1;Step 2 is again right using original training set Model b2It is trained, purpose is to carry out tuning to model, alleviates influence of the noise introduced in pseudo- data to model.
2. a kind of neural network machine based on statistics phrase table according to claim 1 translates corpus expansion method, It is characterized in that:To achieve the above object and technology, it adopts the following technical scheme that:
Related definition is carried out first, it is specific as follows:
Define 1:Original language, i.e., in machine translation, by the language belonging to content to be translated when being translated, such as from Chinese In the machine translation for translating English, Chinese is original language;
Define 2:Source language data belongs to the data of original language, if source language data is a natural language sentences, the category It is known as source language sentence in the data of original language, such as from the machine translation that Chinese translates English, the Chinese sentence of input It is exactly source language data, also referred to as source language sentence;
Source language data collection is collectively referred to as by the collection that source language data forms;
Define 3:Object language, i.e., in machine translation, the language belonging to content being translated into when being translated, such as from Chinese In the machine translation for translating English, English is object language;
Define 4:Target language data belongs to the data of object language, if target language data is a natural language sentences, Then the data for belonging to object language are known as target language sentence, such as from the machine translation that Chinese translates English, output English sentence be exactly target language data, also referred to as target language sentence;
Target language data collection is collectively referred to as by the collection that target language data forms;
Define 5:Training set refers in particular to the training set of statistical machine translation model, that is, is used to train the number of statistical machine translation model According to set, it is denoted as T;
Define 6:Original training set, i.e., by the training set before extension;
Define 7:Word alignment information, abbreviation word alignment, i.e. in training set T, pair between original language word and target language word Homogeneous relation is denoted as α;
Wherein, if in training set T, there are alignment relations to be denoted as (j, i) with i-th of word of object language for j-th of word of original language;
Definition 8, phrase, the linguistic unit of one or more words compositions;
The language used is that the phrase of original language is known as source language phrase, is denoted as f, and the language used is that the phrase of object language claims For object language phrase, it is denoted as e;
The phrase pair of the object language phrase composition of definition 9, translation phrase pair, source language phrase and alignment, such as " (' Great Wall ', ‘The Great Wall’)”;
10 are defined, positive phrase translation probability, that is, when giving source language phrase f, the condition for translating object language phrase e is general Rate is denoted as
11 are defined, reversed phrase translation probability, that is, when giving object language phrase e, the condition for being translated back into source language phrase f is general Rate is denoted as
12 are defined, two-way phrase translation probability, positive phrase translation probability and reversed phrase translation probability are collectively referred to as two-way phrase Translation probability;
13 are defined, positive Lexical phrase translation probability translates the vocabulary of object language phrase e when giving source language phrase f Change translation probability, is denoted as lex (e | f);
14 are defined, reversed Lexical phrase translation probability is translated back into the vocabulary of source language phrase f when giving object language phrase e Change translation probability, is denoted as lex (f | e);
15 are defined, two-way Lexical phrase translation probability, positive Lexical translation probability and reversed Lexical translation probability are collectively referred to as For two-way Lexical translation probability;
16 are defined, phrase table, also referred to as phrase translation table, by multigroup translation phrase to constituting, and to every group of translation phrase pair The two-way phrase translation probability of affix and two-way Lexical translation probability;
17 are defined, filtering rule filters the rule of phrase table, according to source language phrase, the target language for being included in phrase table The artificial formulation that speech phrase, two-way phrase translation probability, two-way Lexical phrase translation probabilistic information are filtered phrase table Rule;
Training set extension phase, includes the following steps:
Step A1, according to define 1, define 2, define 3, define 4 and define 5, original training set is pre-processed, obtain by Pretreated original training set Tf
Step A2, the original training set T after pretreatment obtained based on step A1f, and learn words pair according to defining 7 and defining 8 Neat information, the process are pretreated original by the process obtained in step A1 usually using word alignment kit realization of increasing income Training set obtains the word alignment information α of training set as input by the training of training word alignment tool;
Step A3 defines 7, defines 8, define 9, define 10, define 11, define 12, define 13, define 14, is fixed according to defining 6 Justice 15 and definition 16, the pretreated original training set T of process obtained in conjunction with step A1fAnd the training set that step A2 is obtained Word alignment information α, extract translation phrase pair, and each translation phrase pair is obtained to carrying out probability Estimation to translation phrase Two-way phrase translation probability and two-way Lexical translation probability, combining translation phrase pair and translation probability, obtain phrase table, phrase Every of table record by translation phrase to, word alignment information, two-way phrase translation probability and two-way Lexical translation probability form;
Step A4, according to defining 9, defining 12, define 15, define 16 and define 17, using the filtering rule of Manual definition, to step The phrase table that rapid A3 is obtained is filtered, and is filtered out the lower translation phrase pair of probability, is obtained filtered phrase table, be denoted as Pnew
Step A5 according to definition 5, defines 16, the filtered phrase table P that step A4 is obtainednewIn translation phrase to part The pretreated original training set T obtained with step A1fSplicing, obtains new training set Tnew
3. a kind of neural network machine based on statistics phrase table according to claim 1 translates corpus expansion method, It is characterized in that:Model training stage includes the following steps:
Step B1, the new training set T obtained using step A5newPre-training is carried out to model, obtains model b1
Step B2, the pretreated original training set T obtained using step A1f, model b that step B1 is obtained1It carries out again Training obtains new trained model b2
4. a kind of neural network machine based on statistics phrase table according to claim 1 translates corpus expansion method, It is characterized in that:In step A1, wherein carry out pretreated detailed process because of different source language and the target language to original training set And it is different, purpose is to carry out standardization processing to training set, is obtained by pretreated original training set Tf
CN201810175915.4A 2018-03-02 2018-03-02 A kind of neural network machine translation corpus expansion method based on statistics phrase table Pending CN108363704A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810175915.4A CN108363704A (en) 2018-03-02 2018-03-02 A kind of neural network machine translation corpus expansion method based on statistics phrase table

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810175915.4A CN108363704A (en) 2018-03-02 2018-03-02 A kind of neural network machine translation corpus expansion method based on statistics phrase table

Publications (1)

Publication Number Publication Date
CN108363704A true CN108363704A (en) 2018-08-03

Family

ID=63003675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810175915.4A Pending CN108363704A (en) 2018-03-02 2018-03-02 A kind of neural network machine translation corpus expansion method based on statistics phrase table

Country Status (1)

Country Link
CN (1) CN108363704A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190768A (en) * 2018-08-09 2019-01-11 北京中关村科金技术有限公司 A kind of data enhancing corpus training method in neural network
CN110046332A (en) * 2019-04-04 2019-07-23 珠海远光移动互联科技有限公司 A kind of Similar Text data set generation method and device
CN110472252A (en) * 2019-08-15 2019-11-19 昆明理工大学 The method of the more neural machine translation of the Chinese based on transfer learning
CN110543645A (en) * 2019-09-04 2019-12-06 网易有道信息技术(北京)有限公司 Machine learning model training method, medium, device and computing equipment
CN110717341A (en) * 2019-09-11 2020-01-21 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN110852117A (en) * 2019-11-08 2020-02-28 沈阳雅译网络技术有限公司 Effective data enhancement method for improving translation effect of neural machine
CN111160046A (en) * 2018-11-07 2020-05-15 北京搜狗科技发展有限公司 Data processing method and device and data processing device
CN111368035A (en) * 2020-03-03 2020-07-03 新疆大学 Neural network-based Chinese dimension-dimension Chinese organization name dictionary mining system
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
US10963757B2 (en) 2018-12-14 2021-03-30 Industrial Technology Research Institute Neural network model fusion method and electronic device using the same
CN113111667A (en) * 2021-04-13 2021-07-13 沈阳雅译网络技术有限公司 Method for generating pseudo data by low-resource language based on multi-language model
CN117540755A (en) * 2023-11-13 2024-02-09 北京云上曲率科技有限公司 Method and system for enhancing data by neural machine translation model
CN118095302A (en) * 2024-04-26 2024-05-28 四川交通运输职业学校 Auxiliary translation method and system based on computer

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214166A (en) * 2010-04-06 2011-10-12 三星电子(中国)研发中心 Machine translation system and machine translation method based on syntactic analysis and hierarchical model
US20130144593A1 (en) * 2007-03-26 2013-06-06 Franz Josef Och Minimum error rate training with a large number of features for machine learning
CN104391842A (en) * 2014-12-18 2015-03-04 苏州大学 Translation model establishing method and system
CN105068997A (en) * 2015-07-15 2015-11-18 清华大学 Parallel corpus construction method and device
CN105190609A (en) * 2013-06-03 2015-12-23 国立研究开发法人情报通信研究机构 Translation device, learning device, translation method, and recording medium
CN106156013A (en) * 2016-06-30 2016-11-23 电子科技大学 The two-part machine translation method that a kind of regular collocation type phrase is preferential
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN107092594A (en) * 2017-04-19 2017-08-25 厦门大学 Bilingual recurrence self-encoding encoder based on figure
CN107329960A (en) * 2017-06-29 2017-11-07 哈尔滨工业大学 Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130144593A1 (en) * 2007-03-26 2013-06-06 Franz Josef Och Minimum error rate training with a large number of features for machine learning
CN102214166A (en) * 2010-04-06 2011-10-12 三星电子(中国)研发中心 Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN105190609A (en) * 2013-06-03 2015-12-23 国立研究开发法人情报通信研究机构 Translation device, learning device, translation method, and recording medium
CN104391842A (en) * 2014-12-18 2015-03-04 苏州大学 Translation model establishing method and system
CN105068997A (en) * 2015-07-15 2015-11-18 清华大学 Parallel corpus construction method and device
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN106156013A (en) * 2016-06-30 2016-11-23 电子科技大学 The two-part machine translation method that a kind of regular collocation type phrase is preferential
CN107092594A (en) * 2017-04-19 2017-08-25 厦门大学 Bilingual recurrence self-encoding encoder based on figure
CN107329960A (en) * 2017-06-29 2017-11-07 哈尔滨工业大学 Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张金鹏 等: "基于跨语言语料的汉泰词分布表示", 《计算机工程与科学》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190768A (en) * 2018-08-09 2019-01-11 北京中关村科金技术有限公司 A kind of data enhancing corpus training method in neural network
CN111160046A (en) * 2018-11-07 2020-05-15 北京搜狗科技发展有限公司 Data processing method and device and data processing device
US10963757B2 (en) 2018-12-14 2021-03-30 Industrial Technology Research Institute Neural network model fusion method and electronic device using the same
CN110046332A (en) * 2019-04-04 2019-07-23 珠海远光移动互联科技有限公司 A kind of Similar Text data set generation method and device
CN110046332B (en) * 2019-04-04 2024-01-23 远光软件股份有限公司 Similar text data set generation method and device
CN110472252A (en) * 2019-08-15 2019-11-19 昆明理工大学 The method of the more neural machine translation of the Chinese based on transfer learning
CN110472252B (en) * 2019-08-15 2022-12-13 昆明理工大学 Method for translating Hanyue neural machine based on transfer learning
CN110543645A (en) * 2019-09-04 2019-12-06 网易有道信息技术(北京)有限公司 Machine learning model training method, medium, device and computing equipment
CN110543645B (en) * 2019-09-04 2023-04-07 网易有道信息技术(北京)有限公司 Machine learning model training method, medium, device and computing equipment
CN110717341A (en) * 2019-09-11 2020-01-21 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN110717341B (en) * 2019-09-11 2022-06-14 昆明理工大学 Method and device for constructing old-Chinese bilingual corpus with Thai as pivot
CN110852117B (en) * 2019-11-08 2023-02-24 沈阳雅译网络技术有限公司 Effective data enhancement method for improving translation effect of neural machine
CN110852117A (en) * 2019-11-08 2020-02-28 沈阳雅译网络技术有限公司 Effective data enhancement method for improving translation effect of neural machine
CN111368035A (en) * 2020-03-03 2020-07-03 新疆大学 Neural network-based Chinese dimension-dimension Chinese organization name dictionary mining system
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
CN112507734B (en) * 2020-11-19 2024-03-19 南京大学 Neural machine translation system based on romanized Uygur language
CN113111667A (en) * 2021-04-13 2021-07-13 沈阳雅译网络技术有限公司 Method for generating pseudo data by low-resource language based on multi-language model
CN113111667B (en) * 2021-04-13 2023-08-22 沈阳雅译网络技术有限公司 Method for generating pseudo data in low-resource language based on multi-language model
CN117540755A (en) * 2023-11-13 2024-02-09 北京云上曲率科技有限公司 Method and system for enhancing data by neural machine translation model
CN118095302A (en) * 2024-04-26 2024-05-28 四川交通运输职业学校 Auxiliary translation method and system based on computer

Similar Documents

Publication Publication Date Title
CN108363704A (en) A kind of neural network machine translation corpus expansion method based on statistics phrase table
CN110852117B (en) Effective data enhancement method for improving translation effect of neural machine
CN107690634B (en) Automatic query pattern generation method and system
CN105138507A (en) Pattern self-learning based Chinese open relationship extraction method
CN109359304A (en) Limited neural network machine interpretation method and storage medium
CN104391885A (en) Method for extracting chapter-level parallel phrase pair of comparable corpus based on parallel corpus training
CN105573994B (en) Statictic machine translation system based on syntax skeleton
CN101404036A (en) Keyword abstraction method for PowerPoint electronic demonstration draft
CN109101518A (en) Phonetic transcription text quality appraisal procedure, device, terminal and readable storage medium storing program for executing
CN106156013B (en) A kind of two-part machine translation method that regular collocation type phrase is preferential
CN112446213A (en) Text corpus expansion method
CN112101047A (en) Machine translation method for matching language-oriented precise terms
CN108491399A (en) Chinese to English machine translation method based on context iterative analysis
CN101763403A (en) Query translation method facing multi-lingual information retrieval system
JP2016164707A (en) Automatic translation device and translation model learning device
CN112765977B (en) Word segmentation method and device based on cross-language data enhancement
Li et al. Cultural concept adaptation on multimodal reasoning
CN106156007A (en) A kind of English-Chinese statistical machine translation method of word original shape
CN112836525A (en) Human-computer interaction based machine translation system and automatic optimization method thereof
Millour et al. Unsupervised data augmentation for less-resourced languages with no standardized spelling
CN117251524A (en) Short text classification method based on multi-strategy fusion
Gad-Elrab et al. Named entity disambiguation for resource-poor languages
Baisa et al. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch
CN111597824B (en) Training method and device for language translation model
CN114492469A (en) Translation method, translation device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180803

WD01 Invention patent application deemed withdrawn after publication