CN103902528A - Uygur language word alignment method - Google Patents

Uygur language word alignment method Download PDF

Info

Publication number
CN103902528A
CN103902528A CN201210579979.3A CN201210579979A CN103902528A CN 103902528 A CN103902528 A CN 103902528A CN 201210579979 A CN201210579979 A CN 201210579979A CN 103902528 A CN103902528 A CN 103902528A
Authority
CN
China
Prior art keywords
word
uighur
alignment
uygur language
uygur
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210579979.3A
Other languages
Chinese (zh)
Inventor
尼加提·纳吉米
买合木提·买买提
帕肉克·司地克
马斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Electric Power Information Communication Co Ltd
Original Assignee
Xinjiang Electric Power Information Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Electric Power Information Communication Co Ltd filed Critical Xinjiang Electric Power Information Communication Co Ltd
Priority to CN201210579979.3A priority Critical patent/CN103902528A/en
Publication of CN103902528A publication Critical patent/CN103902528A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Uygur language word alignment method. The method includes that automatic alignment of Uygur language words is realized, and five alignment relationships between Uygur language words and Chinese words include one to one, one to multiple, multiple to one, multiple to multiple and one to none; manual alignment is performed on words which are wrong in automatic alignment, so that accuracy of a system to process Uygur language is improved; word splitting and merging of the Uygur language words is realized according to characteristics of the Uygur language. By the Uygur language word alignment method, automatic alignment of the Uygur language words is realized, assistance is provided for Chinese-Uygur machine translation and establishing of electronic Uygur language dictionaries, and a solid foundation is laid for development of electronic dictionaries for Uzbek, Kazak, Kyrgyz and Turkish and machine-aided translation systems.

Description

Uighur word alignment method
Technical field
The present invention relates to language information processing technology, particularly Uighur word alignment method.
Background technology
In today of national economy and social IT application, people to all kinds of languages acquisition of informations, inquiry, translation proposed sooner, higher requirement.Thereupon, develop all kinds of electronic dictionary products and machine translation system, be subject to users and welcome.In the time carrying out mechanical translation, the quality of corpus directly affects the quality of translation, and Uighur word alignment system is the aid of mechanical translation and Corpus Construction.
In the practicalization of machine translation system and natural language processing system, machine dictionary and machine translation system have become the focus of exploitation, and the construction speed of corpus and quality are particularly important.Word alignment is on the text of intertranslation, to find the translation correspondence take word as unit.Word is the alignment that the natural language processing task of bilingualism corpora all needs word-level.The method of word alignment mainly contains 4 kinds at present: based on method, the method based on character, method and the mixed method based on linguistic knowledge of statistics.Method based on statistics is by the statistics training to extensive bilingualism corpora, obtains the co-occurrence probability of bilingual paginal translation word using this as the basis of aliging.Method based on character is that the cognate that contains with the bilingual something in common on part of speech carries out word alignment.Method based on linguistic knowledge is the basis using the linguistic knowledge such as bilingual dictionary and synonymicon as alignment.Mixed method has been used the several different methods that comprises three kinds of methods simultaneously.
In recent years, along with the development of ethnic group's informatization, also had new development at the minority language Corpus Construction in Xinjiang, but great majority are take Uighur as main, in the support of more minority languages and technical merit, have certain defect.
Summary of the invention
The object of the present invention is to provide a kind of Uighur word alignment method, realized the automatic aligning of Uighur word, for the structure of Uighur electronic dictionary and the construction of Uighur corpus provide help; For the research of Chinese dimension machine translation system provides the foundation, the exploitation of crow (Uzbek's literary composition), Kazakhstan (Kazak), Ke (Kirgiz), soil (Turkey's literary composition) electronic dictionary and auxiliary engine translation system is laid a solid foundation.
The object of the present invention is achieved like this: a kind of Uighur word alignment method, 1. realize the automatic aligning of Uighur word, and the alignment relation between Uighur word and Chinese terms is divided into 5 kinds, respectively one to one, one-to-many, many-one, multi-to-multi, a pair of sky; 2. pair automatic aligning occurs that wrong word manually aligns, and has improved the accuracy rate of system processing Uighur; 3. realized fractionation and the merging to Uighur word according to the feature of Uighur.
The present invention relates to the alignment of Uighur word, realized fractionation and the merging of automatic aligning and the Uighur word of Uighur word.Word alignment is one of basic problem of Corpus Construction, is also the problem of always studying for a long time.In the market, thisly can still belong to the first to the system of Uighur word alignment.The Uygur's word the invention solves submitting to carries out automatic aligning; The structure of Uighur electronic dictionary, the good aid of Chinese dimension machine translation system; On the other hand to Chinese dimension mechanical translation Corpus Construction in future; Exploitation to crow (Uzbek's literary composition), Kazakhstan (Kazak), Ke (Kirgiz), soil (Turkey's literary composition) electronic dictionary and auxiliary engine translation system lays a solid foundation.The present invention is the Uighur word alignment system based on computational linguistics, linguistics, sociology, computer information processing science.It is characterized in that: according to the Morphological Features of Uighur, Uighur word is carried out to automatic aligning; Can realize the word that there is no automatic aligning; Realize fractionation and the merging to Uighur word according to the feature native system of Uighur.
The invention has the beneficial effects as follows, system has realized the automatic aligning of Uighur word, for the structure of Uighur electronic dictionary and the construction of Uighur corpus provide help; For the research of Chinese dimension machine translation system provides the foundation, the exploitation of crow (Uzbek's literary composition), Kazakhstan (Kazak), Ke (Kirgiz), soil (Turkey's literary composition) electronic dictionary and auxiliary engine translation system is laid a solid foundation.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the invention will be further described.
Fig. 1 is process flow diagram of the present invention.
Embodiment
A kind of Uighur word alignment method, has 1. realized the automatic aligning of Uighur word, and the alignment relation between Uighur word and Chinese terms is divided into 5 kinds, is respectively one to one, one-to-many, many-one, multi-to-multi, a pair of sky; 2. pair automatic aligning occurs that wrong word manually aligns, and has improved the accuracy rate of system processing Uighur; 3. realized fractionation and the merging to Uighur word according to the feature of Uighur.
As shown in Figure 1, first, judge user's role, then obtain audit by sentence afterwards.Realize fractionation and the merging of word according to the feature of Uighur word, the word of automatic aligning mistake is manually alignd, then preserve alignment result, register vicious sentence simultaneously.

Claims (1)

1. a Uighur word alignment method, is characterized in that: 1. realized the automatic aligning of Uighur word, the alignment relation between Uighur word and Chinese terms is divided into 5 kinds, is respectively one to one, one-to-many, many-one, multi-to-multi, a pair of sky; 2. pair automatic aligning occurs that wrong word manually aligns, and has improved the accuracy rate of system processing Uighur; 3. realized fractionation and the merging to Uighur word according to the feature of Uighur.
CN201210579979.3A 2012-12-28 2012-12-28 Uygur language word alignment method Pending CN103902528A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210579979.3A CN103902528A (en) 2012-12-28 2012-12-28 Uygur language word alignment method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210579979.3A CN103902528A (en) 2012-12-28 2012-12-28 Uygur language word alignment method

Publications (1)

Publication Number Publication Date
CN103902528A true CN103902528A (en) 2014-07-02

Family

ID=50993858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210579979.3A Pending CN103902528A (en) 2012-12-28 2012-12-28 Uygur language word alignment method

Country Status (1)

Country Link
CN (1) CN103902528A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
CN113158693A (en) * 2021-03-13 2021-07-23 中国科学院新疆理化技术研究所 Uygur language keyword generation method and device based on Chinese keywords, electronic equipment and storage medium
CN113536747A (en) * 2021-09-14 2021-10-22 潍坊北大青鸟华光照排有限公司 Uyghur language last-syllable-splitting processing method on mobile equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246177A1 (en) * 2010-04-06 2011-10-06 Samsung Electronics Co. Ltd. Syntactic analysis and hierarchical phrase model based machine translation system and method
CN102662932A (en) * 2012-03-15 2012-09-12 中国科学院自动化研究所 Method for establishing tree structure and tree-structure-based machine translation system
CN102708098A (en) * 2012-05-30 2012-10-03 中国科学院自动化研究所 Dependency coherence constraint-based automatic alignment method for bilingual words

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246177A1 (en) * 2010-04-06 2011-10-06 Samsung Electronics Co. Ltd. Syntactic analysis and hierarchical phrase model based machine translation system and method
CN102662932A (en) * 2012-03-15 2012-09-12 中国科学院自动化研究所 Method for establishing tree structure and tree-structure-based machine translation system
CN102708098A (en) * 2012-05-30 2012-10-03 中国科学院自动化研究所 Dependency coherence constraint-based automatic alignment method for bilingual words

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张亚军 等: "汉语维吾尔语的一对一词对齐研究", 《昌吉学院院报》, no. 6, 15 December 2012 (2012-12-15), pages 80 - 83 *
李英 等: "一种基于词典和长度相结合的汉维句子对齐算法", 《新乡学院院报自然科学版》, vol. 29, no. 1, 15 May 2012 (2012-05-15), pages 66 - 68 *
麦热哈巴·艾力 等: "一种提高维吾尔语汉语词语对齐的方法研究", 《小型微型计算机系统》, vol. 33, no. 11, 15 November 2012 (2012-11-15), pages 2551 - 2555 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507734A (en) * 2020-11-19 2021-03-16 南京大学 Roman Uygur language-based neural machine translation system
CN112507734B (en) * 2020-11-19 2024-03-19 南京大学 Neural machine translation system based on romanized Uygur language
CN113158693A (en) * 2021-03-13 2021-07-23 中国科学院新疆理化技术研究所 Uygur language keyword generation method and device based on Chinese keywords, electronic equipment and storage medium
CN113536747A (en) * 2021-09-14 2021-10-22 潍坊北大青鸟华光照排有限公司 Uyghur language last-syllable-splitting processing method on mobile equipment
CN113536747B (en) * 2021-09-14 2022-03-29 潍坊北大青鸟华光照排有限公司 Uyghur language last-syllable-splitting processing method on mobile equipment

Similar Documents

Publication Publication Date Title
Li et al. Comparison of Google translation with human translation
Al-Jumaily et al. A real time Named Entity Recognition system for Arabic text mining
CN103412855A (en) Method and system for automatic identification of relative words in complex sentence of modern Chinese language
Liu et al. Chinese-Portuguese machine translation: a study on building parallel corpora from comparable texts
CN103902528A (en) Uygur language word alignment method
Russo et al. Improving machine translation of null subjects in Italian and Spanish
Du et al. Using babelnet to improve OOV coverage in SMT
Bouamor et al. Automatic construction of a multiword expressions bilingual lexicon: A statistical machine translation evaluation perspective
Meng et al. Lost in translations? building sentiment lexicons using context based machine translation
Tahir et al. Knowledge based machine translation
Shamsfard Developing FarsNet: A lexical ontology for Persian
Suryakanthi et al. Discourse translation from English to Telugu
Hong Chinese near-synonym study based on the chinese gigaword corpus and the chinese learner corpus
Tsai A learner corpus study of attributive clauses and passive voice in student translations
Wu et al. Applying Chinese word sketch engine to distinguish commonly confused words
Perdek Lexicographic potential of corpus equivalents: The case of English phrasal verbs and their Polish equivalents
Chen A Linguistic Evaluation of the Output Quality of'Google Translate'and'Bing Translator'in Chinese-English Translation
Hutchins Machine translation: problems and issues
Altenbek et al. Kazakh noun phrase extraction based on n-gram and rules
Ji et al. Phonetic name matching for cross-lingual spoken sentence retrieval
程倩倩 A Corpus based Study on Gender Differences in Language——Take Friends as the Example
Qingjun et al. The application of internet technology in translation
Malik et al. Finite-state scriptural translation
Chengping The Research and construction of Yi corpus for information processing
Ghaffar et al. English to arabic statistical machine translation system improvements using preprocessing and arabic morphology analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140702