CN110287496A - A kind of English to Chinese Word sense disambiguation method neural network based - Google Patents

A kind of English to Chinese Word sense disambiguation method neural network based Download PDF

Info

Publication number
CN110287496A
CN110287496A CN201910591682.0A CN201910591682A CN110287496A CN 110287496 A CN110287496 A CN 110287496A CN 201910591682 A CN201910591682 A CN 201910591682A CN 110287496 A CN110287496 A CN 110287496A
Authority
CN
China
Prior art keywords
word
english
chinese
meaning
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910591682.0A
Other languages
Chinese (zh)
Inventor
吕海港
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910591682.0A priority Critical patent/CN110287496A/en
Publication of CN110287496A publication Critical patent/CN110287496A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

In order to determine the accurate Chinese meaning of a word of the word in english sentence, disappear qi method the invention proposes a kind of English to Chinese meaning of a word neural network based, it is primarily based on the Chinese and English sentence of English-Chinese dictionary and Chinese and English corpus, generates Chinese and English mixed sequence corresponding with each english sentence;Then using english sentence and Chinese and English mixed sequence as parallel corpora, translation model is obtained by neural network method training;Translation model is finally used, the successively restricted decoding in the Chinese meaning of a word of each English word of english sentence to be translated finds the accurate meaning of a word of the word in english sentence, the qi so that meaning of a word for efficiently solving the problems, such as in English to Chinese disappears.

Description

A kind of English to Chinese Word sense disambiguation method neural network based
Technical field
The present invention relates to machine translation field, in particular to a kind of meaning of a word of English to Chinese disappears qi method.
Background technique
The meaning of a word qi that disappears is key points and difficulties in natural language processing.In English-Chinese translation, an English word can To there is one or more Chinese meaning of a word, find the accurate meaning of a word of the word in current sentence be one be not yet fully solved ask Topic.
Currently, machine translation neural network based comparative maturity, can relatively accurately translate english sentence For Chinese sentence, then by attention mechanism therein english sentence and Chinese sentence are aligned in word level, thus Find the substantially meaning of a word of word.But since this attention mechanism is a kind of soft alignment, the attention between the meaning of a word is not bright Aobvious boundary.Frequent one or two of the word poorer than the standard meaning of a word in english Chinese dictionary of the meaning of a word that this method is found, sometimes finds Even antonym, effect are very undesirable.The accurate word of word in sentence is found therefore, it is necessary to a kind of more accurate mode Justice.
The meaning of a word disappear qi method determine English word the accurate meaning of a word can be used for word annotation, help people's Fast Reading English data is also used as the pretranslation step of machine translation, and richer corpus and condition are provided for accurate translation.
Summary of the invention
Disappear qi method the technical problem to be solved by the invention is to provide a kind of English to Chinese meaning of a word neural network based, Determine the accurate Chinese meaning of a word of each word in english sentence.
In order to solve the above technical problems, the technical scheme adopted by the invention is that: a kind of English to Chinese neural network based The meaning of a word disappears qi method, is primarily based on the Chinese and English sentence of English-Chinese dictionary and Chinese and English corpus, generates corresponding with each english sentence Chinese and English mixed sequence;Then it using english sentence and Chinese and English mixed sequence as parallel corpora, is instructed by neural network method Get translation model;Translation model is finally used, successively in the Chinese meaning of a word of each English word of english sentence to be translated Restricted decoding generates Chinese and English mixed sequence.Chinese word corresponding with each word is exactly the word in English in this sequence The accurate meaning of a word in sentence, to solve the problems, such as that the meaning of a word of English to Chinese disappears qi.
The English to Chinese meaning of a word disappears the English-Chinese dictionary of qi method, including English word and its all possible Chinese word Justice, English word therein include that word original shape, noun plurality, verb third-person singular, verb past tense, verb divide now Word, verb past participle, adjective and adverbial word comparative degree and the superlative degree.
The English to Chinese meaning of a word disappear qi method Chinese and English mixed sequence in, when in the corresponding Chinese sentence of english sentence In have some meaning of a word of some word, the word of english sentence is just replaced with this Chinese meaning of a word, successively replacement formation after the completion The Chinese and English mixed sequence of one Chinese meaning of a word, English word and punctuation mark composition.
The English to Chinese meaning of a word disappears the neural metwork training of qi method, can be Recognition with Recurrent Neural Network (RNN), convolution mind Through network (CNN) or converter (Transformer).
The English to Chinese meaning of a word disappears the restricted decoding of qi method, refers to each step in decoding process, beam-search is only Choose the 1-10 Chinese meaning of a word of maximum probability in all Chinese meaning of a word of current English word.
Beneficial effects of the present invention have at 3 points: (1) the Chinese meaning of a word selected by is accurately provided by english Chinese dictionary completely, no The case where will appear multiword or few word;(2) neural network is state-of-the-art machine translation mothod, can efficiently solve and translate The closely related English to Chinese meaning of a word disappears qi problem;(3) disposably the accurate meaning of a word of all words of whole sentence can be found out, is carried out single Word is annotated to improve the English reading efficiency of user.
Detailed description of the invention
Fig. 1 be a kind of English to Chinese meaning of a word neural network based of the present invention disappear qi method training and decoding process figure.
Specific embodiment
The present invention is further elaborated with reference to the accompanying drawing.
Embodiment one
The embodiment of the present invention is using open source neural network machine translation software OpenNMT software package (http://opennmt. Net/), training 1,000,000 contrast between Chinese and English corpus used are from open source Niutrans software package (http://www. Niutrans.com), English-Chinese dictionary comes from ECDict project (https: //github.com/skywind3000/ ECDICT).
The present embodiment mainly includes two parts (Fig. 1): training translation model and the decoding meaning of a word disappear qi.
In the training translation model stage, it is divided into four steps.
The first step extracts English word and its various modifications according to the English-Chinese dictionary file of ECDict, finds out corresponding All Chinese meaning of a word, generate English-Chinese dictionary, a word a line, such as all meaning of a word formats of word work and its various modifications It is as follows:
Work | | | work, works, labour, function work, prove effective, and run, operating, occupation, effectively
Works | | | work, works, labour, function work, prove effective, and run, operating, occupation, effectively
Worked | | | work works, proves effective, and runs, operating, effectively
Working | | | work works, proves effective, and runs, operating, effectively
Wherein, before " | | | " it is English word or its deformation, is followed by all Chinese meaning of a word of this word, with funny between the meaning of a word Number separate.
Second step is English corpus to be marked (token) pretreatment, and punctuation mark and word are separated, generated new English sequence.Such as english sentence (The GNU General Public License is a free, copyleft License for software and other kinds of works.) label pretreatment after be (The GNU General Public License is a free , copyleft license for software and other Kinds of works), wherein punctuation mark and word are all separated with space.
Third step is that the Chinese meaning of a word of each word in english sentence is searched in Chinese sentence according to English-Chinese dictionary.Such as Fruit finds the Chinese meaning of a word, with regard to replacing the correspondence word in english sentence with the Chinese meaning of a word found, thus generates Chinese word, English The Chinese and English mixed sequence of literary word and punctuation mark composition.For example, the corresponding translator of Chinese of above-mentioned english sentence is that (GNU is logical It is a software-oriented and other types works, free public copyright agreement with public permission agreement.), the China and Britain of generation Literary mixed sequence be (the general common protocol of The GNU be it is a freely, public copyright agreement for software and Other types of works), wherein English word is all replaced by the corresponding Chinese meaning of a word, such as works its meaning of a word " works " replace, and there are also " The ", and " GNU ", comma and fullstop remain unchanged.Mark pretreated english sentence and it is corresponding in English mixed sequence carries out the training of neural network machine translation respectively as the parallel corpora of source language and the target language.
4th step is to carry out neural metwork training to 1,000,000 parallel corporas using OpenNMT.The present embodiment uses two layers The Recognition with Recurrent Neural Network (RNN) of 500 hidden units and global attention mechanism are trained, and source language and the target language all make With 100,000 word amounts, every layer uses 512 dimension term vector spaces, 100,000 steps of training, the translation model of generation about 800MB.
Disappear the qi stage in the translation meaning of a word, the word in english sentence to be translated is successively limited using translation model System decoding, the main decoded portion by modification OpenNMT are realized.Modification mainly includes two parts: (1) reading in translation model Followed by read in English-Chinese dictionary, and English word source id as " key ", using the corresponding target id of all Chinese meaning of a word as " value " saves stand-by according to the format of " key-value ";(2) in each step of beam-search, only retain in the word (such as work) Corresponding " work, works, labour, function work, proves effective, and run, operating, occupation, effectively " each meaning of a word of all id(of cliction justice Target id) corresponding log probability is constant, the log probability of other id is uniformly set as the value -10E20 of very little.In this way, boundling Search is limited in limited meaning of a word of word.Successively be decoded with restricted beam-search, what is just obtained is a series of The Chinese meaning of a word, English word and punctuation mark Chinese and English mixing series, each Chinese meaning of a word therein is exactly to correspond to word In the accurate meaning of a word of this sentence.
The present embodiment is assessed using 1000 english sentences.Using when the most frequently used meaning of a word, word exists in English-Chinese dictionary Meaning of a word accuracy rate in sentence is 66.8%.And in the present embodiment, meaning of a word accuracy rate is respectively when bundle size 10,5,1 76.8%, 76.6%, 74.3%, it is seen that the English to Chinese meaning of a word neural network based disappears qi method word meaning of a word side in determining sentence Face is significantly improved, and there are also little influences to accuracy rate for bundle size.
Embodiment two
English-Chinese dictionary, corpus processing and decoding process are the same as example 1, and training translation model is using two layers of 500 hiding lists The convolutional neural networks (CNN) of member and global attention mechanism are trained, and source language and the target language all use 100,000 words Amount, every layer uses 512 dimension term vector spaces, the translation model of generation about 900MB.Restricted solution is carried out using this translation model Code, the accuracy rate for obtaining the meaning of a word in sentence is 79.4%.
Embodiment three
English-Chinese dictionary, corpus processing and decoding process are the same as example 1, and training translation model is hidden single using 6 layers 512 The converter (Transformer) and 8 bulls of member are trained from attention mechanism, and source language and the target language all use 10 Ten thousand word amounts, every layer uses 512 dimension term vector spaces, the translation model of generation about 3000MB.Using this translation model carry out Restricted decoding, the accuracy rate for obtaining the meaning of a word in sentence is 83.2%.
From three embodiments can be seen that this English to Chinese meaning of a word neural network based disappear qi method not only have it is relatively good The meaning of a word disappear qi effect, and different neural network methods can be used and be trained, neural network machine is efficiently used The various state-of-the-art technologies of translation.Therefore, translation model is carried out using other English to Chinese machine translation methods neural network based Trained and restricted decoding carries out the English to Chinese meaning of a word and disappears qi also under protection of the invention.

Claims (5)

  1. A kind of qi method 1. English to Chinese meaning of a word neural network based disappears, which is characterized in that be primarily based on English-Chinese dictionary and China and Britain The Chinese and English sentence of literary corpus generates Chinese and English mixed sequence corresponding with each english sentence;Then with english sentence in English mixed sequence is parallel corpora, obtains translation model by neural network method training;Translation model is finally used, successively The restricted decoding in the meaning of a word of each English word of english sentence to be translated, generates Chinese and English mixed sequence, in this sequence with The corresponding Chinese word of each word is exactly the accurate meaning of a word of the word in english sentence, so that the meaning of a word for solving English to Chinese disappears Qi problem.
  2. The English-Chinese dictionary of qi method 2. the English to Chinese meaning of a word according to claim 1 disappears, which is characterized in that English-Chinese dictionary includes English word and its all possible Chinese meaning of a word, English word therein includes word original shape, noun plurality, the verb third party Claim the comparative degree and the superlative degree of odd number, verb past tense, verb present participle, verb past participle, adjective and adverbial word.
  3. The Chinese and English mixed sequence of qi method 3. the English to Chinese meaning of a word according to claim 1 disappears, which is characterized in that when in English There is some meaning of a word of some word in the corresponding Chinese sentence of sentence, the list of english sentence is just replaced with this Chinese meaning of a word Word successively forms the Chinese and English mixed sequence of a Chinese meaning of a word, English word and punctuation mark composition after the completion of replacement.
  4. The neural metwork training of qi method 4. the English to Chinese meaning of a word according to claim 1 disappears, which is characterized in that neural network Training can be Recognition with Recurrent Neural Network (RNN), convolutional neural networks (CNN) or converter (Transformer).
  5. 5. the English to Chinese meaning of a word disappears the restricted decoding of qi method according to claim 1, which is characterized in that in decoding process Each step, beam-search only choose the 1-10 Chinese meaning of a word of maximum probability in all Chinese meaning of a word of current English word.
CN201910591682.0A 2019-07-02 2019-07-02 A kind of English to Chinese Word sense disambiguation method neural network based Pending CN110287496A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910591682.0A CN110287496A (en) 2019-07-02 2019-07-02 A kind of English to Chinese Word sense disambiguation method neural network based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910591682.0A CN110287496A (en) 2019-07-02 2019-07-02 A kind of English to Chinese Word sense disambiguation method neural network based

Publications (1)

Publication Number Publication Date
CN110287496A true CN110287496A (en) 2019-09-27

Family

ID=68021752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910591682.0A Pending CN110287496A (en) 2019-07-02 2019-07-02 A kind of English to Chinese Word sense disambiguation method neural network based

Country Status (1)

Country Link
CN (1) CN110287496A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116070643A (en) * 2023-04-03 2023-05-05 武昌理工学院 Fixed style translation method and system from ancient text to English

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116070643A (en) * 2023-04-03 2023-05-05 武昌理工学院 Fixed style translation method and system from ancient text to English
CN116070643B (en) * 2023-04-03 2023-08-15 武昌理工学院 Fixed style translation method and system from ancient text to English

Similar Documents

Publication Publication Date Title
Karimi et al. Machine transliteration survey
CN109840331B (en) Neural machine translation method based on user dictionary
CN106383818A (en) Machine translation method and device
CN111652006B (en) Computer-aided translation method and device
Scherrer et al. Modernising historical Slovene words
Onyenwe et al. A Basic Language Resource Kit Implementation for the Igbo NLP Project
Aswani et al. A hybrid approach to align sentences and words in English-Hindi parallel corpora
Tufiş et al. DIAC+: A professional diacritics recovering system
CN110287496A (en) A kind of English to Chinese Word sense disambiguation method neural network based
CN102135957A (en) Clause translating method and device
CN107491441B (en) Method for dynamically extracting translation template based on forced decoding
Dologlou et al. Using monolingual corpora for statistical machine translation: the METIS system
Hu et al. CSCD-IME: correcting spelling errors generated by pinyin IME
KR20120048139A (en) Automatic translation device and method thereof
Karmani et al. Building a standardized Wordnet in the ISO LMF for aeb language
Ingólfsdóttir et al. Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora
Zhang et al. Noahnmt at wmt 2021: Dual transfer for very low resource supervised machine translation
Wu et al. Template-based model for Mongolian-Chinese machine translation
Okuno et al. An ensemble model of word-based and character-based models for Japanese and Chinese input method
Llorens et al. Data-driven approach based on semantic roles for recognizing temporal expressions and events in Chinese
Liu The technical analyses of named entity translation
Martynov et al. Augmentation methods for spelling corruptions
Raza et al. Saraiki Language Word Prediction And Spell Correction Framework
Debbarma et al. Morphological Analyzer for Kokborok
Todiraşcu et al. French text preprocessing with TTL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190927