CN107329960A - Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive - Google Patents

Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive Download PDF

Info

Publication number
CN107329960A
CN107329960A CN201710514935.5A CN201710514935A CN107329960A CN 107329960 A CN107329960 A CN 107329960A CN 201710514935 A CN201710514935 A CN 201710514935A CN 107329960 A CN107329960 A CN 107329960A
Authority
CN
China
Prior art keywords
word
translation
phrase
unregistered
unregistered word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710514935.5A
Other languages
Chinese (zh)
Other versions
CN107329960B (en
Inventor
杨沐昀
朱聪慧
赵铁军
张红阳
徐冰
曹海龙
郑德权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heilongjiang Industrial Technology Research Institute Asset Management Co ltd
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201710514935.5A priority Critical patent/CN107329960B/en
Publication of CN107329960A publication Critical patent/CN107329960A/en
Application granted granted Critical
Publication of CN107329960B publication Critical patent/CN107329960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Abstract

The present invention propose a kind of context-sensitive neural network machine translation in unregistered word translating equipment and method, belong to lexical translation apparatus and method technical field.Unregistered word translating equipment proposed by the present invention provides module, feature extraction module, evaluation module, order module and replacement module by searching modul, candidate word and realizes neutral net translation, unregistered word translating equipment proposed by the present invention solves the problem of existing translating equipment and low method translation degree of accuracy, and effectively increase the degree of accuracy that unregistered word is translated in neutral net translation, meanwhile, it is applied to various neutral nets and translates field.

Description

A kind of context-sensitive neural network machine translation in unregistered word translating equipment and Method
Technical field
The present invention relates to a kind of lexical translation apparatus and method, belong to lexical translation apparatus and method technical field.
Background technology
Neural network machine translation (neural machine translation, NMT) is a kind of new machine translation side Method, core is a kind of the simple of end-to-end training and is easy to extensive depth neural network.This Web vector graphic one kind coding- The structure of decoding, coded portion is responsible for the semantic vector that former end sentence is encoded into regular length to represent, decoded portion is one Recognition with Recurrent Neural Network (recurrent neural network, RNN), it is believed using the history of former end sentence expression and destination end Cease the machine translation sentence that word one by one decodes destination end.Machine between multilingual is turned over since this network is suggested Translate and effect best at present is all achieved in task, such as Great Britain and France translate, Germany and Britain's translation, the translation of English Czech.
In actual model realization, due to the limitation of amount of calculation and GPU internal memories, NMT models need to be determined in advance one Unregistered word (out of vocabulary, OOV) outside the everyday words vocabulary being very limited, other vocabularys all uses special symbol Number<unk>(unknown) mark, vocabulary size is generally set as 30000 to 80000.Because translation is an open vocabulary problem, So a large amount of unregistered words for enriching semanteme are expressed as one without semanteme<unk>Mark can greatly increase former end sentence Ambiguousness.Meanwhile, once included in the translation of generation<unk>, all do not stepped on due to having been abandoned during NMT model translations Word information is recorded, so can not be to these<unk>Handled, we can only be after the completion of translation in translation result<unk> Post-processed.
At present, what is be most widely used is a kind of greedy post-processing approach:Word alignment information is recorded in NMT models, Here usually using notice mechanism (attention mechanism), found according to alignment information<unk>Align maximum probability Former terminal word, realize that the dictionary for translation constructed finds the translation candidate of former terminal word in advance by one afterwards, in selection dictionary The maximum word of translation probability is in translation result<unk>It is replaced.This method is also to be contrasted during the present embodiment is tested Baseline Methods.
This method is given<unk>The substitute found is proved that NMT translation result can be lifted by many experiments Quality, but be due to not accounted for when replacing in translation result<unk>Contextual information, so remaining Many problems.Because in translation process, existing a large amount of " one-to-many ", " many-one ", the translation mapping of " multi-to-multi ", simultaneously Even in the case of " one-to-one " translation, same former terminal word may also need to translate into different mesh under different linguistic context Mark terminal word.In face of these complicated translation phenomenons, using above-mentioned greedy post-processing approach, then substantial amounts of replacement can be caused wrong By mistake, replace and repeat, the problem of sentence is not clear and coherent after replacement.
The content of the invention
Translation of the present invention in order to solve existing neutral net machine translator can not meet and linguistic context from the context or language Adopted the problem of, it is proposed that unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive.
Unregistered word translating equipment in a kind of neural network machine translation of context-sensitive, the technical scheme taken is such as Under:
The unregistered word translating equipment includes:
According to all former terminal words, the searching modul of search terms in dictionary for translation;
It is according to the lookup word result that the searching modul is obtained<unk>Mark provides possible unregistered word candidate and turned over The candidate's translation translated provides module;
The feature extraction module of contextual feature is extracted for being translated for the candidate;
For the contextual feature, the unregistered word candidate translation is obtained using the SVM rank models trained Evaluation index, and unregistered word candidate translation is ranked up by the order of evaluation index from high in the end according to evaluation index Order module;
For evaluation index sequence highest unregistered word candidate's translation to be replaced in the translation of the sentence<unk>Mark Replacement module, obtain and meet the complete translation sentence of context of co-text.
Further, the feature extraction module includes:
Word alignment characteristic extracting module for extracting word alignment feature from NMT notice alignment models;
For the word grain size characteristic extraction module for the word grain size characteristic for extracting former terminal word and unregistered word candidate translation;
For the phrase grain size characteristic extraction module for the phrase grain size characteristic for extracting former terminal word and unregistered word candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, near unregistered word candidate translation The language model characteristic extracting module of language model feature.
Further, institute's predicate grain size characteristic extraction module includes:
The positive translation probability module of unregistered word candidate translation is translated for former terminal word;
The reverse translation probabilistic module of former terminal word is translated for unregistered word candidate;
The former terminal word number of times extraction module of the number of times occurred in parallel corpora is trained in NMT for extracting former terminal word;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in NMT training parallel corporas turns over Translate number of times extraction module;
For extracting being total to for the co-occurrence number of times of former terminal word and unregistered word candidate translation in the parallel sentence pair of parallel corpora Occurrence number extraction module;
The vocabulary position extraction module for appearing in the position in vocabulary for extracting former terminal word;
For judge former terminal word whether be unregistered word judge module.
Further, the phrase grain size characteristic is extracted and included:
Number of times extraction module in former terminal word phrase table for extracting the number of times that former terminal word occurs in phrase table;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in phrase table translates phrase table Middle number of times extraction module one;
For extracting the short of the co-occurrence number of times of former terminal word and unregistered word candidate translation in each phrase pair of phrase table Co-occurrence number of times extraction module in language table;
Phrase number of times extraction module for extracting phrase occurrence number in phrase table that former terminal word is constituted with front and rear word;
When constituting phrase with front and rear word for extracting former terminal word, unregistered word candidate translation is appeared in correspondence object phrase Number of times unregistered word candidate translation phrase table in number of times extraction module two;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, the unregistered word candidate translation phrase length extraction module of the maximum length of unregistered word candidate translation phrase;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, and during unregistered word candidate translation phrase acquirement maximum length, the former terminal word phrase length of former terminal word phrase length is extracted Module.
Further, the language model characteristic extracting module includes:
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The positive n gram language models probability extraction module of the positive n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The reverse n gram language models probability extraction module of the reverse n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the unregistered word of corresponding first number is included The word string number of extracted module of the word string quantity of candidate's translation.
Unregistered word interpretation method in a kind of neural network machine translation of context-sensitive, the technical scheme taken is such as Under:
The unregistered word interpretation method includes:
According to all former terminal words, the finding step of search terms in dictionary for translation;
It is according to the lookup word result that the finding step is obtained<unk>Mark provides possible unregistered word candidate and turned over The candidate's translation translated provides step;
The feature extraction step of contextual feature is extracted for being translated for the candidate;
For the contextual feature, the unregistered word candidate translation is obtained using the SVM rank models trained Evaluation index, and unregistered word candidate translation is ranked up by the order of evaluation index from high in the end according to evaluation index Sequence step;
For evaluation index sequence highest unregistered word candidate's translation to be replaced in the translation of the sentence<unk>Mark Replacement step, obtain and meet the complete translation sentence of context of co-text.
Further, the feature extraction step includes:
Word alignment characteristic extraction step for extracting word alignment feature from NMT notice alignment models;
For the word grain size characteristic extraction step for the word grain size characteristic for extracting former terminal word and unregistered word candidate translation;
For the phrase grain size characteristic extraction step for the phrase grain size characteristic for extracting former terminal word and unregistered word candidate translation;
For extracting unregistered word candidate translation appearance<unk>During mark position, the language near unregistered word candidate translation Say the language model characteristic extraction step of the aspect of model.
Further, institute's predicate grain size characteristic extraction step includes:
The positive translation probability step of unregistered word candidate translation is translated for former terminal word;
The reverse translation probability step of former terminal word is translated for unregistered word candidate;
The former terminal word number of times extraction step of the number of times occurred in parallel corpora is trained in NMT for extracting former terminal word;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in NMT training parallel corporas turns over Translate number of times extraction step;
For extracting being total to for the co-occurrence number of times of former terminal word and unregistered word candidate translation in the parallel sentence pair of parallel corpora Occurrence number extraction step;
The vocabulary position extraction step for appearing in the position in vocabulary for extracting former terminal word;
For judge former terminal word whether be unregistered word judgment step.
Further, the phrase grain size characteristic is extracted and included:
Number of times extraction step in former terminal word phrase table for extracting the number of times that former terminal word occurs in phrase table;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in phrase table translates phrase table Middle number of times extraction step one;
For extracting the short of the co-occurrence number of times of former terminal word and unregistered word candidate translation in each phrase pair of phrase table Co-occurrence number of times extraction step in language table;
Phrase number of times extraction step for extracting phrase occurrence number in phrase table that former terminal word is constituted with front and rear word;
When constituting phrase with front and rear word for extracting former terminal word, unregistered word candidate translation is appeared in correspondence object phrase Number of times unregistered word candidate translation phrase table in number of times extraction step two;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, the unregistered word candidate translation phrase length extraction step of the maximum length of unregistered word candidate translation phrase;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, and during unregistered word candidate translation phrase acquirement maximum length, the former terminal word phrase length of former terminal word phrase length is extracted Step.
Further, the language model characteristic extraction step includes:
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The positive n gram language models probability extraction step of the positive n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The reverse n gram language models probability extraction step of the reverse n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the unregistered word of corresponding first number is included The word string number of extracted step of the word string quantity of candidate's translation.
Beneficial effect of the present invention:
Unregistered word translating equipment and method can be done in the neural network machine translation of context-sensitive of the present invention The context of co-text and semanteme for translating word to junction belt are translated, and are made to translate the BLEU values of the word word come and are not stepped on Record word recall rate more preferably, in-English translation duties in NIST data sets on its BLEU and unregistered word recall rate be respectively 33.405 and 6.53% improve 0.012 He respectively than the 33.393 of the greedy post-processing approach of prior art and 6.16% 0.37%;More it has been obviously improved the translation quality of unregistered word in NMT translation results.
Brief description of the drawings
Fig. 1 shows for the structure of unregistered word translating equipment in the neural network machine translation of context-sensitive of the present invention It is intended to.
Fig. 2 is word grain size characteristic extraction module structural representation of the present invention.
Fig. 3 is phrase grain size characteristic extraction module structural representation of the present invention.
Fig. 4 is language model characteristic extracting module structural representation of the present invention.
Fig. 5 illustrates for the case of unregistered word translating equipment in the neural network machine translation of context-sensitive of the present invention It is intended to.
Embodiment
With reference to specific embodiment, the present invention will be further described, but the present invention should not be limited by the examples.
Embodiment 1:
As shown in Figures 1 to 4, unregistered word translating equipment, institute in a kind of neural network machine translation of context-sensitive The technical scheme taken is as follows:
The unregistered word translating equipment includes:
According to all former terminal words, the searching modul of search terms in dictionary for translation;
The lookup word result obtained according to the searching modul provides possible unregistered word candidate for unregistered word and turned over The candidate's translation translated provides module;
The feature extraction module of contextual feature is extracted for being translated for the candidate;
For the contextual feature, the unregistered word candidate translation is obtained using the SVM rank models trained Evaluation index, and unregistered word candidate translation is ranked up by the order of evaluation index from high in the end according to evaluation index Order module;
For evaluation index sequence highest unregistered word candidate's translation to be replaced in the translation of the sentence<unk>Mark Replacement module, obtain and meet the complete translation sentence of context of co-text.
Wherein, the feature extraction module includes:
Word alignment characteristic extracting module for extracting word alignment feature from NMT notice alignment models;
For the word grain size characteristic extraction module for the word grain size characteristic for extracting former terminal word and unregistered word candidate translation;
For the phrase grain size characteristic extraction module for the phrase grain size characteristic for extracting former terminal word and unregistered word candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, near unregistered word candidate translation The language model characteristic extracting module of language model feature.
Institute's predicate grain size characteristic extraction module includes:
The positive translation probability module of unregistered word candidate translation is translated for former terminal word;
The reverse translation probabilistic module of former terminal word is translated for unregistered word candidate;
The former terminal word number of times extraction module of the number of times occurred in parallel corpora is trained in NMT for extracting former terminal word;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in NMT training parallel corporas turns over Translate number of times extraction module;
For extracting being total to for the co-occurrence number of times of former terminal word and unregistered word candidate translation in the parallel sentence pair of parallel corpora Occurrence number extraction module;
The vocabulary position extraction module for appearing in the position in vocabulary for extracting former terminal word;
For judge former terminal word whether be unregistered word judge module.
The phrase grain size characteristic, which is extracted, to be included:
Number of times extraction module in former terminal word phrase table for extracting the number of times that former terminal word occurs in phrase table;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in phrase table translates phrase table Middle number of times extraction module one;
For extracting the short of the co-occurrence number of times of former terminal word and unregistered word candidate translation in each phrase pair of phrase table Co-occurrence number of times extraction module in language table;
Phrase number of times extraction module for extracting phrase occurrence number in phrase table that former terminal word is constituted with front and rear word;
When constituting phrase with front and rear word for extracting former terminal word, unregistered word candidate translation is appeared in correspondence object phrase Number of times unregistered word candidate translation phrase table in number of times extraction module two;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, the unregistered word candidate translation phrase length extraction module of the maximum length of unregistered word candidate translation phrase;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, and during unregistered word candidate translation phrase acquirement maximum length, the former terminal word phrase length of former terminal word phrase length is extracted Module.
Wherein, the language model characteristic extracting module includes:
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The positive n gram language models probability extraction module of the positive n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The reverse n gram language models probability extraction module of the reverse n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the unregistered word of corresponding first number is included The word string number of extracted module of the word string quantity of candidate's translation.
Unregistered word interpretation method in a kind of neural network machine translation of context-sensitive, the technical scheme taken is such as Under:
The unregistered word interpretation method includes:
According to all former terminal words, the finding step of search terms in dictionary for translation;
It is according to the lookup word result that the finding step is obtained<unk>Mark provides possible unregistered word candidate and turned over The candidate's translation translated provides step;
The feature extraction step of contextual feature is extracted for being translated for the candidate;
For the contextual feature, the unregistered word candidate translation is obtained using the SVM rank models trained Evaluation index, and unregistered word candidate translation is ranked up by the order of evaluation index from high in the end according to evaluation index Sequence step;
For evaluation index sequence highest unregistered word candidate's translation to be replaced in the translation of the sentence<unk>Mark Replacement step, obtain and meet the complete translation sentence of context of co-text.
Wherein, the feature extraction step includes:
Word alignment characteristic extraction step for extracting word alignment feature from NMT notice alignment models;
For the word grain size characteristic extraction step for the word grain size characteristic for extracting former terminal word and unregistered word candidate translation;
For the phrase grain size characteristic extraction step for the phrase grain size characteristic for extracting former terminal word and unregistered word candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, near unregistered word candidate translation The language model characteristic extraction step of language model feature.
Institute's predicate grain size characteristic extraction step includes:
The positive translation probability step of unregistered word candidate translation is translated for former terminal word;
The reverse translation probability step of former terminal word is translated for unregistered word candidate;
The former terminal word number of times extraction step of the number of times occurred in parallel corpora is trained in NMT for extracting former terminal word;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in NMT training parallel corporas turns over Translate number of times extraction step;
For extracting being total to for the co-occurrence number of times of former terminal word and unregistered word candidate translation in the parallel sentence pair of parallel corpora Occurrence number extraction step;
For the position extraction step for the vocabulary position appeared in vocabulary for extracting former terminal word;
For judge former terminal word whether be unregistered word judgment step.
The phrase grain size characteristic, which is extracted, to be included:
Number of times extraction step in former terminal word phrase table for extracting the number of times that former terminal word occurs in phrase table;
Unregistered word candidate for extracting the number of times that unregistered word candidate translation occurs in phrase table translates phrase table Middle number of times extraction step one;
For extracting the short of the co-occurrence number of times of former terminal word and unregistered word candidate translation in each phrase pair of phrase table Co-occurrence number of times extraction step in language table;
Phrase number of times extraction step for extracting phrase occurrence number in phrase table that former terminal word is constituted with front and rear word;
When constituting phrase with front and rear word for extracting former terminal word, unregistered word candidate translation appears in correspondence object phrase table In number of times unregistered word candidate translation phrase in number of times extraction step two;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, the unregistered word candidate translation phrase length extraction step of the maximum length of unregistered word candidate translation phrase;
For extracting the phrase of former terminal word and unregistered word candidate translation respectively with front and rear word composition to appearing in phrase table When middle, and during unregistered word candidate translation phrase acquirement maximum length, the former terminal word phrase length of former terminal word phrase length is extracted Step.
The language model characteristic extraction step includes:
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The positive n gram language models probability extraction step of the positive n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>During mark position not The reverse n gram language models probability extraction step of the reverse n gram language models probability of posting term candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the unregistered word of corresponding first number is included The word string number of extracted step of the word string quantity of candidate's translation.
Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive described in the present embodiment, Its experiment the results are shown in Table shown in 1.In table 1, NMT word alignment features are 1. represented, word grain size characteristic is 2. represented, 3. represent phrase grain Feature is spent, language model feature is 4. represented.
It can see from table 1, highest accuracy rate 45.12% reached using the model of whole features trainings.
36.89% than greedy post-processing approach is high by 8.23%.
The model of table 1 post-processes the effect in construction data in unregistered word
The NMT actual translations result under open environment, experimental result is as shown in table 2.Here after the greed that we compare Directly deleted when processing method and post processing<unk>It is all tests to mark BLEU and Recall (OOV) in two methods, table Average value on collection 2.It will be seen that our model has been above greed on Recall (OOV) and BLEU from table 2 Unregistered word processing method.Unregistered word in the neural network machine translation for the context-sensitive that this explanation present invention is extracted Translating equipment and method relative to existing greedy post-processing approach for having significant technological progress.
Effect of the model of word scope on the true translation results of NMT is selected in the extension of table 2
Unregistered word translating equipment and method can be done in the neural network machine translation of context-sensitive of the present invention The context of co-text and semanteme for translating word to junction belt are translated, and are made to translate the BLEU values of the word word come and are not stepped on Record word recall rate more preferably, in-English translation duties in NIST data sets on its BLEU and unregistered word recall rate be respectively 33.405 and 6.53% improve 0.012 He respectively than the 33.393 of the greedy post-processing approach of prior art and 6.16% 0.37%;More it has been obviously improved the translation quality of unregistered word in NMT translation results.
Embodiment 2
Embodiment 2 is that unregistered word is translated in being translated to a kind of neural network machine of context-sensitive described in embodiment 1 The further refinement of method, finds a most suitable word using contextual information and goes to replace in NMT translation results<unk>Mark Note (is used for representing unregistered word) in NMT.The dictionary for translation that unregistered word translating equipment has been constructed according to former terminal word and in advance is carried Take unregistered word candidate to translate, while recording the former terminal word for producing this unregistered word candidate translation, be not logged in for each Word candidate translates and former terminal word is extracted 4 class contextual features from different angles to combining former sentence and translation result:NMT words Alignment feature, word grain size characteristic, phrase grain size characteristic, language model feature, finally using all 4 classes of svm-rank models couplings Feature goes sequence to obtain optimal substitute, and unregistered word interpretation method is turned in the neural network machine translation of the context-sensitive Translate<unk>The detailed process of mark is as follows:
Given one carries<unk>The translation of the sentence of mark former end sentence corresponding with its, the translation flow of this method is as follows:
Step one:Searching dictionary for translation according to all former terminal words is<unk>Possible unregistered word candidate translation is provided.
Step 2:To be each<unk>Unregistered word candidate translation extract contextual feature.
Step 3:Based on context it is characterized as that all unregistered word candidates translate using the SVM rank models trained It is ranked up.
Replaced using sequence highest word in translation of the sentence<unk>Mark.
Wherein, the SVM rank models belong to sequence study in the classes of pairwise mono- method, be for learn to Candidate list sorts rather than two classification tasks.Assume there is one for sorted lists rank in Rank SVM basic assumption Linear function f (x)=wtX+b is metUnderstand that SVMrank is substantially also linear Certain fraction is fitted, only this fraction does not ensure identical with authentic assessment index, is merely able to ensure to use this fraction pair The result of candidate's sequence is consistent with using authentic assessment index.The present invention adds slack variable to locate in SVM rank models Manage the noise in input and increase generalization ability, therefore the formal structure of the model mathematics added after slack variable is:
subjectto
Wherein xiAnd yiIt is candidate i feature and evaluation index, x respectivelyjAnd yjIt is that candidate j feature and evaluation refer to respectively Mark, ξI, jIt is slack variable.
After SVM rank models are selected, whether the feature of input has the pass that distinction is decision model performance quality Key.
Wherein, the process of model training is:
1), SVM rank model trainings data set
The present embodiment from LDC2002E18, LDC2003E07, LDC2003E14, LDC2004T07, LDC2004T08, This 7 data of LDC2005T06 and LDC2005T10 concentrate extracted in 2,100,000-English parallel corpora as NMT training number According to wherein including 5.4 thousand ten thousand Chinese words and 6,000 ten thousand English words respectively.The present embodiment filters out 25 from NMT training corpus Ten thousand parallel corporas with unregistered word, 320,000 unregistered words post processing training examples are constructed with these language materials.Every It is individual training example Central Plains end sentence in all words be all<unk>Mark provides unregistered word candidate translation, and the scope of candidate is Unregistered word in dictionary for translation in maximum preceding 100 words of translation probability.Final average each training example has 65 not Posting term candidate translates.
The order models training data sample of table 3
Table 3 is order models training data sample, and the 1st, 2,3 row are respectively sequence number, candidate's translation and corresponding source word. 5 to 32nd row are respectively alignment feature, word grain size characteristic, phrase grain size characteristic and language model feature.Each candidate's translation is logical Source word lookup dictionary for translation is crossed to obtain.The present embodiment forces the mode of decoding to obtain the notice pair of training data using NMT Neat feature, statistics obtains word grain size characteristic and language model feature in 2,100,000 parallel corporas, the phrase table built in Moses Middle extracting phrase grain size characteristic.
The present embodiment uses " grow-diag-final " method of standard using GIZA++ instruments on 2,100,000 parallel corporas A two-way word alignment matrix is obtained, based on this word alignment result, the present embodiment calculates former end using maximum likelihood method Word to target terminal word positive translation probability and target terminal word to the reverse translation probability of former terminal word, each word is at most protected in dictionary Hold 200 translation candidate's translations.Last the present embodiment obtains former end and used to destination end and destination end to two dictionary for translation at former end In offer unregistered word candidate and the forward and reverse translation probability feature of extraction.
In addition, the present embodiment is extracted 4 class contextual features from different perspectives, as shown in figure 5, four class contextual feature bags Include:1. it is the word alignment feature extracted from NMT notice alignment models, is 2. that extraction source terminal word and unregistered word candidate translate Word grain size characteristic, 3. be extraction source terminal word and unregistered word candidate translation phrase grain size characteristic, 4. be extract unregistered word Candidate's translation is appeared in<unk>During mark position, the language model feature near unregistered word candidate translation.
Wherein, as shown in Fig. 2 by taking the former end sentence in Fig. 2 as an example:
1. NMT word alignments feature
Translated for each candidate and produce its former terminal word pair, we extract a NMT word alignment feature first, this It is characterized in that NMT is produced<unk>When the notice fraction (attention scores) that produces, it is represented in translation result<unk >Snap to the probability of former terminal word.This fraction is that NMT is produced in a model, while being also connection<unk>With the weight of former terminal word Want information.
2. word grain size characteristic
Candidate's translation corresponding with its for each former terminal word, what we first had to extraction is the two words in language material Cooccurrence relation, and themselves statistical information in language material.The present embodiment has extracted 7 word granularity contextual features:
●p(t|s):Former terminal word translates the positive translation probability of candidate's translation.
●p(s|t):Candidate translates the reverse probability of former terminal word.
●number_in_corpus(s):Former terminal word trains the number of times occurred in parallel corpora in NMT.
●number_in_corpus(t):The number of times occurred in parallel corpora is trained in candidate's translation in NMT.
●number_cooc_in_corpus(s,t):Former terminal word and candidate's translation are in the parallel sentence pair of parallel corpora Co-occurrence number of times.
●freq_in_vocab(s):Vocabulary is arranged out from big to small by the frequency of word in parallel corpora, former terminal word Appear in the position in vocabulary.
●1if s is OOV else 0:Whether former terminal word is unregistered word, if former terminal word is unregistered word, feature It is worth for 1, is otherwise 0.
3. phrase grain size characteristic
We further capture the cooccurrence relation and system between the phrase of former terminal word and candidate's translation and its front and rear word composition Count information, this Partial Feature we statistical machine translation instrument Moses generate phrase translation table in counted and extracted. The present embodiment has extracted 7 phrase granularity contextual features:
●number_in_phrase_table(s):The number of times that former terminal word occurs in phrase table.
●number_in_phrase_table(t):The number of times that candidate's translation occurs in phrase table.
●number_cooc_in_phrase_table(s,t):Former terminal word and candidate translate each phrase in phrase table The co-occurrence number of times of centering.
●number_in_phrase_table(phrase(s)):The phrase that former terminal word is constituted with front and rear word is in phrase table Middle occurrence number.
●number_in_phrase_table(phrase(s))if t in phrase table:Former terminal word with it is front and rear When word constitutes phrase, candidate's translation appears in the number of times in correspondence object phrase.
●max_length(t)if cooc(phrase(s),phrase(t)):Former terminal word and candidate translation respectively with it is preceding When the phrase that word is constituted afterwards is to appearing in phrase table, candidate translates the maximum length of phrase.
●length(s)if max_length(t)and cooc(phrase(s),phrase(t)):Former terminal word and candidate It is former when the phrase that translation is constituted with front and rear word respectively is to appearing in phrase table, and during candidate's translation phrase acquirement maximum length Terminal word phrase length.
4. language model feature
Language model is to represent a key character for word context fluency, and the present embodiment is centered on candidate translates, root According to<unk>Front and rear word extracted 15 language model features, for 5 continuous translation word sequences, A B OOV C D:
P (OOV | B), p (C | OOV):Positive 2 gram language model feature comprising OOV.
● p (B | OOV), p (OOV | C):Reverse 2 gram language model feature comprising OOV.
● p (OOV | B, A), p (C | OOV, B), p (D | C, OOV):Positive 3 gram language model feature comprising OOV.
● p (A | B, OOV), p (B | OOV, C), p (OOV | C, D):Reverse 3 gram language model feature comprising OOV.
● count (B OOV), count (OOV C):2 yuan of word string quantity comprising OOV.
● count (A B OOV), count (B OOV C), count (OOV C D):OOV 3 yuan of words are included in sentence
Although the present invention is disclosed as above with preferred embodiment, it is not limited to the present invention, any to be familiar with this The people of technology, without departing from the spirit and scope of the present invention, can do various changes and modification, therefore the protection of the present invention What scope should be defined by claims is defined.

Claims (10)

1. unregistered word translating equipment in a kind of neural network machine translation of context-sensitive, it is characterised in that described not step on Record word translating equipment includes:
According to all former terminal words, the searching modul of search terms in dictionary for translation;
It is according to the lookup word result that the searching modul is obtained<unk>Mark provides possible unregistered word candidate translation Word candidate translation provides module;
The feature extraction module of contextual feature is extracted for being translated for institute predicate candidate;
For the contextual feature, the evaluation of the unregistered word candidate translation is obtained using the SVM rank models trained Index, and the row being ranked up by the order of evaluation index from high in the end is translated to the unregistered word candidate according to evaluation index Sequence module;
For evaluation index sequence highest unregistered word candidate's translation to be replaced in the translation of the sentence<unk>What is marked replaces Block is changed the mold, the complete translation sentence for meeting context of co-text is obtained.
2. unregistered word translating equipment according to claim 1, it is characterised in that the feature extraction module includes:
Word alignment characteristic extracting module for extracting word alignment feature from NMT notice alignment models;
For the word grain size characteristic extraction module for the word grain size characteristic for extracting former terminal word and unregistered word candidate translation;
For the phrase grain size characteristic extraction module for the phrase grain size characteristic for extracting former terminal word and unregistered word candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the language near unregistered word candidate translation The language model characteristic extracting module of the aspect of model.
3. unregistered word translating equipment according to claim 3, it is characterised in that institute's predicate grain size characteristic extraction module bag Include:
The positive translation probability module of unregistered word candidate translation is translated for former terminal word;
The reverse translation probabilistic module of former terminal word is translated for unregistered word candidate;
The former terminal word number of times extraction module of the number of times occurred in parallel corpora is trained in NMT for extracting former terminal word;
The unregistered word candidate translation time of the number of times occurred in parallel corpora is trained in NMT for extracting unregistered word candidate translation Number extraction module;
Co-occurrence time for extracting the co-occurrence number of times of former terminal word and unregistered word candidate translation in the parallel sentence pair of parallel corpora Number extraction module;
The vocabulary position extraction module for appearing in the position in vocabulary for extracting former terminal word;
For judge former terminal word whether be unregistered word judge module.
4. unregistered word translating equipment according to claim 3, it is characterised in that the phrase grain size characteristic extraction module bag Include:
Number of times extraction module in former terminal word phrase table for extracting the number of times that former terminal word occurs in phrase table;
It is secondary in unregistered word candidate translation phrase table for extracting the number of times that unregistered word candidate translation occurs in phrase table Number extraction module one;
Phrase table for extracting the co-occurrence number of times of former terminal word and unregistered word candidate translation in each phrase pair of phrase table Middle co-occurrence number of times extraction module;
Phrase number of times extraction module for extracting phrase occurrence number in phrase table that former terminal word is constituted with front and rear word;
When constituting phrase with front and rear word for extracting former terminal word, unregistered word candidate translation appears in time in correspondence object phrase Number of times extraction module two in several unregistered word candidate translation phrase tables;
For extract former terminal word and unregistered word candidate translation respectively with front and rear word constitute phrase to appearing in phrase table when, The unregistered word candidate translation phrase length extraction module of the maximum length of unregistered word candidate translation phrase;
For extract former terminal word and unregistered word candidate translation respectively with front and rear word constitute phrase to appearing in phrase table when, And during unregistered word candidate translation phrase acquirement maximum length, the former terminal word phrase length extraction module of former terminal word phrase length.
5. unregistered word translating equipment according to claim 3, it is characterised in that the language model characteristic extracting module bag Include:
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>It is not logged in during mark position The positive n gram language models probability extraction module of the positive n gram language models probability of word candidate translation;
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>It is not logged in during mark position The reverse n gram language models probability extraction module of the reverse n gram language models probability of word candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the unregistered word candidate of corresponding first number is included The word string number of extracted module of the word string quantity of translation.
6. unregistered word interpretation method in a kind of neural network machine translation of context-sensitive, it is characterised in that described not step on Record word interpretation method includes:
According to all former terminal words, the finding step of search terms in dictionary for translation;
It is according to the lookup word result that the finding step is obtained<unk>Mark provides possible unregistered word candidate translation Candidate's translation provides step;
The feature extraction step of contextual feature is extracted for being translated for the candidate;
For the contextual feature, the evaluation of the unregistered word candidate translation is obtained using the SVM rank models trained Index, and the row being ranked up by the order of evaluation index from high in the end is translated to the unregistered word candidate according to evaluation index Sequence step;
For evaluation index sequence highest unregistered word candidate's translation to be replaced in the translation of the sentence<unk>What is marked replaces Step is changed, the complete translation sentence for meeting context of co-text is obtained.
7. unregistered word interpretation method according to claim 6, it is characterised in that the feature extraction step includes:
Word alignment characteristic extraction step for extracting word alignment feature from NMT notice alignment models;
For the word grain size characteristic extraction step for the word grain size characteristic for extracting former terminal word and unregistered word candidate translation;
For the phrase grain size characteristic extraction step for the phrase grain size characteristic for extracting former terminal word and unregistered word candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the language near unregistered word candidate translation The language model characteristic extraction step of the aspect of model.
8. unregistered word interpretation method according to claim 7, it is characterised in that institute's predicate grain size characteristic extraction step bag Include:
The positive translation probability step of unregistered word candidate translation is translated for former terminal word;
The reverse translation probability step of former terminal word is translated for unregistered word candidate;
The former terminal word number of times extraction step of the number of times occurred in parallel corpora is trained in NMT for extracting former terminal word;
The unregistered word candidate translation time of the number of times occurred in parallel corpora is trained in NMT for extracting unregistered word candidate translation Number extraction step;
Co-occurrence time for extracting the co-occurrence number of times of former terminal word and unregistered word candidate translation in the parallel sentence pair of parallel corpora Number extraction step;
The vocabulary position extraction step for appearing in the position in vocabulary for extracting former terminal word;
For judge former terminal word whether be unregistered word judgment step.
9. unregistered word interpretation method according to claim 7, it is characterised in that the phrase grain size characteristic extraction step bag Include:
Number of times extraction step in former terminal word phrase table for extracting the number of times that former terminal word occurs in phrase table;
It is secondary in unregistered word candidate translation phrase table for extracting the number of times that unregistered word candidate translation occurs in phrase table Number extraction step one;
Phrase table for extracting the co-occurrence number of times of former terminal word and unregistered word candidate translation in each phrase pair of phrase table Middle co-occurrence number of times extraction step;
Phrase number of times extraction step for extracting phrase occurrence number in phrase table that former terminal word is constituted with front and rear word;
When constituting phrase with front and rear word for extracting former terminal word, unregistered word candidate translation appears in time in correspondence object phrase Number of times extraction step two in several unregistered word candidate translation phrase tables;
For extract former terminal word and unregistered word candidate translation respectively with front and rear word constitute phrase to appearing in phrase table when, The unregistered word candidate translation phrase length extraction step of the maximum length of unregistered word candidate translation phrase;
For extract former terminal word and unregistered word candidate translation respectively with front and rear word constitute phrase to appearing in phrase table when, And during unregistered word candidate translation phrase acquirement maximum length, the former terminal word phrase length extraction step of former terminal word phrase length.
10. unregistered word interpretation method according to claim 7, it is characterised in that the language model characteristic extraction step Including:
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>It is not logged in during mark position The positive n gram language models probability extraction step of the positive n gram language models probability of word candidate translation;
Appeared in for extracting unregistered word candidate translation in continuously translation sequence of terms<unk>It is not logged in during mark position The reverse n gram language models probability extraction step of the reverse n gram language models probability of word candidate translation;
Appeared in for extracting unregistered word candidate translation<unk>During mark position, the unregistered word candidate of corresponding first number is included The word string number of extracted step of the word string quantity of translation.
CN201710514935.5A 2017-06-29 2017-06-29 Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive Active CN107329960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710514935.5A CN107329960B (en) 2017-06-29 2017-06-29 Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710514935.5A CN107329960B (en) 2017-06-29 2017-06-29 Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive

Publications (2)

Publication Number Publication Date
CN107329960A true CN107329960A (en) 2017-11-07
CN107329960B CN107329960B (en) 2019-01-01

Family

ID=60199050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710514935.5A Active CN107329960B (en) 2017-06-29 2017-06-29 Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive

Country Status (1)

Country Link
CN (1) CN107329960B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967263A (en) * 2017-12-11 2018-04-27 中译语通科技股份有限公司 A kind of digital extensive method and system of machine translation, computer, computer program
CN108345590A (en) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 A kind of interpretation method, device, electronic equipment and storage medium
CN108363704A (en) * 2018-03-02 2018-08-03 北京理工大学 A kind of neural network machine translation corpus expansion method based on statistics phrase table
CN108717434A (en) * 2018-05-15 2018-10-30 南京大学 A kind of text sort method of the point-by-point tactful and pairs of strategy of mixing
CN109543151A (en) * 2018-10-31 2019-03-29 昆明理工大学 A method of improving Laotian part-of-speech tagging accuracy rate
CN110188353A (en) * 2019-05-28 2019-08-30 百度在线网络技术(北京)有限公司 Text error correction method and device
CN111274826A (en) * 2020-01-19 2020-06-12 南京新一代人工智能研究院有限公司 Semantic information fusion-based low-frequency word translation method
CN113412515A (en) * 2019-05-02 2021-09-17 谷歌有限责任公司 Adapting automated assistant for use in multiple languages

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193913A (en) * 2010-03-12 2011-09-21 夏普株式会社 Translation apparatus and translation method
CN102662936A (en) * 2012-04-09 2012-09-12 复旦大学 Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
CN105573989A (en) * 2014-11-04 2016-05-11 富士通株式会社 Translation device and translation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102193913A (en) * 2010-03-12 2011-09-21 夏普株式会社 Translation apparatus and translation method
CN102662936A (en) * 2012-04-09 2012-09-12 复旦大学 Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
CN105573989A (en) * 2014-11-04 2016-05-11 富士通株式会社 Translation device and translation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUEJIE ZHANG ET AL.: "Fusion of Multiple Features and Ranking SVM for Web-based English-Chinese OOV Term Translation", 《COLING 2010: POSTER VOLUME》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10929619B2 (en) 2017-12-11 2021-02-23 Glabal Tone Communication Technology Co., Ltd. Numerical generalization method for machine translation and system, computer and computer program thereof
WO2019113783A1 (en) * 2017-12-11 2019-06-20 中译语通科技股份有限公司 Number generalization method and system for machine translation, computer, and computer program
CN107967263A (en) * 2017-12-11 2018-04-27 中译语通科技股份有限公司 A kind of digital extensive method and system of machine translation, computer, computer program
CN108345590A (en) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 A kind of interpretation method, device, electronic equipment and storage medium
CN108363704A (en) * 2018-03-02 2018-08-03 北京理工大学 A kind of neural network machine translation corpus expansion method based on statistics phrase table
CN108717434A (en) * 2018-05-15 2018-10-30 南京大学 A kind of text sort method of the point-by-point tactful and pairs of strategy of mixing
CN108717434B (en) * 2018-05-15 2020-07-31 南京大学 Text ordering method for mixed point-by-point strategy and paired strategy
CN109543151A (en) * 2018-10-31 2019-03-29 昆明理工大学 A method of improving Laotian part-of-speech tagging accuracy rate
CN109543151B (en) * 2018-10-31 2021-05-25 昆明理工大学 Method for improving wording accuracy of Laos language
CN113412515A (en) * 2019-05-02 2021-09-17 谷歌有限责任公司 Adapting automated assistant for use in multiple languages
CN110188353A (en) * 2019-05-28 2019-08-30 百度在线网络技术(北京)有限公司 Text error correction method and device
CN111274826B (en) * 2020-01-19 2021-02-05 南京新一代人工智能研究院有限公司 Semantic information fusion-based low-frequency word translation method
CN111274826A (en) * 2020-01-19 2020-06-12 南京新一代人工智能研究院有限公司 Semantic information fusion-based low-frequency word translation method

Also Published As

Publication number Publication date
CN107329960B (en) 2019-01-01

Similar Documents

Publication Publication Date Title
CN107329960A (en) Unregistered word translating equipment and method in a kind of neural network machine translation of context-sensitive
CN108846017A (en) The end-to-end classification method of extensive newsletter archive based on Bi-GRU and word vector
CN110489760A (en) Based on deep neural network text auto-collation and device
CN106651696B (en) Approximate question pushing method and system
Bustamante et al. No data to crawl? monolingual corpus creation from PDF files of truly low-resource languages in Peru
CN101520802A (en) Question-answer pair quality evaluation method and system
US7962507B2 (en) Web content mining of pair-based data
CN111488466B (en) Chinese language marking error corpus generating method, computing device and storage medium
CN105224520B (en) A kind of Chinese patent document term automatic identifying method
CN106569993A (en) Method and device for mining hypernym-hyponym relation between domain-specific terms
CN111680488A (en) Cross-language entity alignment method based on knowledge graph multi-view information
CN109460552A (en) Rule-based and corpus Chinese faulty wording automatic testing method and equipment
CN110276069A (en) A kind of Chinese braille mistake automatic testing method, system and storage medium
CN107092675A (en) A kind of Uighur semanteme string abstracting method based on statistics and shallow-layer language analysis
CN110134934A (en) Text emotion analysis method and device
CN108038099A (en) Low frequency keyword recognition method based on term clustering
Hamdi et al. In-depth analysis of the impact of OCR errors on named entity recognition and linking
CN110502759B (en) Method for processing Chinese-Yue hybrid network neural machine translation out-of-set words fused into classification dictionary
Bao et al. Contextualized rewriting for text summarization
Nugraha et al. Typographic-based data augmentation to improve a question retrieval in short dialogue system
Pinter et al. Will it Unblend?
Errami et al. Sentiment Analysis onMoroccan Dialect based on ML and Social Media Content Detection
Chklovski et al. The Senseval-3 multilingual English-Hindi lexical sample task
KR102109858B1 (en) System and Method for Korean POS Tagging Using the Concatenation of Jamo and Syllable Embedding
CN107122465A (en) The construction method and system of a kind of Tibetan language sentiment dictionary based on Tibetan language language feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210121

Address after: Building 9, accelerator, 14955 Zhongyuan Avenue, Songbei District, Harbin City, Heilongjiang Province

Patentee after: INDUSTRIAL TECHNOLOGY Research Institute OF HEILONGJIANG PROVINCE

Address before: 150001 No. 92 West straight street, Nangang District, Heilongjiang, Harbin

Patentee before: HARBIN INSTITUTE OF TECHNOLOGY

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230407

Address after: 150027 Room 412, Unit 1, No. 14955, Zhongyuan Avenue, Building 9, Innovation and Entrepreneurship Plaza, Science and Technology Innovation City, Harbin Hi tech Industrial Development Zone, Heilongjiang Province

Patentee after: Heilongjiang Industrial Technology Research Institute Asset Management Co.,Ltd.

Address before: Building 9, accelerator, 14955 Zhongyuan Avenue, Songbei District, Harbin City, Heilongjiang Province

Patentee before: INDUSTRIAL TECHNOLOGY Research Institute OF HEILONGJIANG PROVINCE