CN103902528A - Uygur language word alignment method - Google Patents
Uygur language word alignment method Download PDFInfo
- Publication number
- CN103902528A CN103902528A CN201210579979.3A CN201210579979A CN103902528A CN 103902528 A CN103902528 A CN 103902528A CN 201210579979 A CN201210579979 A CN 201210579979A CN 103902528 A CN103902528 A CN 103902528A
- Authority
- CN
- China
- Prior art keywords
- word
- uighur
- alignment
- uygur language
- uygur
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a Uygur language word alignment method. The method includes that automatic alignment of Uygur language words is realized, and five alignment relationships between Uygur language words and Chinese words include one to one, one to multiple, multiple to one, multiple to multiple and one to none; manual alignment is performed on words which are wrong in automatic alignment, so that accuracy of a system to process Uygur language is improved; word splitting and merging of the Uygur language words is realized according to characteristics of the Uygur language. By the Uygur language word alignment method, automatic alignment of the Uygur language words is realized, assistance is provided for Chinese-Uygur machine translation and establishing of electronic Uygur language dictionaries, and a solid foundation is laid for development of electronic dictionaries for Uzbek, Kazak, Kyrgyz and Turkish and machine-aided translation systems.
Description
Technical field
The present invention relates to language information processing technology, particularly Uighur word alignment method.
Background technology
In today of national economy and social IT application, people to all kinds of languages acquisition of informations, inquiry, translation proposed sooner, higher requirement.Thereupon, develop all kinds of electronic dictionary products and machine translation system, be subject to users and welcome.In the time carrying out mechanical translation, the quality of corpus directly affects the quality of translation, and Uighur word alignment system is the aid of mechanical translation and Corpus Construction.
In the practicalization of machine translation system and natural language processing system, machine dictionary and machine translation system have become the focus of exploitation, and the construction speed of corpus and quality are particularly important.Word alignment is on the text of intertranslation, to find the translation correspondence take word as unit.Word is the alignment that the natural language processing task of bilingualism corpora all needs word-level.The method of word alignment mainly contains 4 kinds at present: based on method, the method based on character, method and the mixed method based on linguistic knowledge of statistics.Method based on statistics is by the statistics training to extensive bilingualism corpora, obtains the co-occurrence probability of bilingual paginal translation word using this as the basis of aliging.Method based on character is that the cognate that contains with the bilingual something in common on part of speech carries out word alignment.Method based on linguistic knowledge is the basis using the linguistic knowledge such as bilingual dictionary and synonymicon as alignment.Mixed method has been used the several different methods that comprises three kinds of methods simultaneously.
In recent years, along with the development of ethnic group's informatization, also had new development at the minority language Corpus Construction in Xinjiang, but great majority are take Uighur as main, in the support of more minority languages and technical merit, have certain defect.
Summary of the invention
The object of the present invention is to provide a kind of Uighur word alignment method, realized the automatic aligning of Uighur word, for the structure of Uighur electronic dictionary and the construction of Uighur corpus provide help; For the research of Chinese dimension machine translation system provides the foundation, the exploitation of crow (Uzbek's literary composition), Kazakhstan (Kazak), Ke (Kirgiz), soil (Turkey's literary composition) electronic dictionary and auxiliary engine translation system is laid a solid foundation.
The object of the present invention is achieved like this: a kind of Uighur word alignment method, 1. realize the automatic aligning of Uighur word, and the alignment relation between Uighur word and Chinese terms is divided into 5 kinds, respectively one to one, one-to-many, many-one, multi-to-multi, a pair of sky; 2. pair automatic aligning occurs that wrong word manually aligns, and has improved the accuracy rate of system processing Uighur; 3. realized fractionation and the merging to Uighur word according to the feature of Uighur.
The present invention relates to the alignment of Uighur word, realized fractionation and the merging of automatic aligning and the Uighur word of Uighur word.Word alignment is one of basic problem of Corpus Construction, is also the problem of always studying for a long time.In the market, thisly can still belong to the first to the system of Uighur word alignment.The Uygur's word the invention solves submitting to carries out automatic aligning; The structure of Uighur electronic dictionary, the good aid of Chinese dimension machine translation system; On the other hand to Chinese dimension mechanical translation Corpus Construction in future; Exploitation to crow (Uzbek's literary composition), Kazakhstan (Kazak), Ke (Kirgiz), soil (Turkey's literary composition) electronic dictionary and auxiliary engine translation system lays a solid foundation.The present invention is the Uighur word alignment system based on computational linguistics, linguistics, sociology, computer information processing science.It is characterized in that: according to the Morphological Features of Uighur, Uighur word is carried out to automatic aligning; Can realize the word that there is no automatic aligning; Realize fractionation and the merging to Uighur word according to the feature native system of Uighur.
The invention has the beneficial effects as follows, system has realized the automatic aligning of Uighur word, for the structure of Uighur electronic dictionary and the construction of Uighur corpus provide help; For the research of Chinese dimension machine translation system provides the foundation, the exploitation of crow (Uzbek's literary composition), Kazakhstan (Kazak), Ke (Kirgiz), soil (Turkey's literary composition) electronic dictionary and auxiliary engine translation system is laid a solid foundation.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the invention will be further described.
Fig. 1 is process flow diagram of the present invention.
Embodiment
A kind of Uighur word alignment method, has 1. realized the automatic aligning of Uighur word, and the alignment relation between Uighur word and Chinese terms is divided into 5 kinds, is respectively one to one, one-to-many, many-one, multi-to-multi, a pair of sky; 2. pair automatic aligning occurs that wrong word manually aligns, and has improved the accuracy rate of system processing Uighur; 3. realized fractionation and the merging to Uighur word according to the feature of Uighur.
As shown in Figure 1, first, judge user's role, then obtain audit by sentence afterwards.Realize fractionation and the merging of word according to the feature of Uighur word, the word of automatic aligning mistake is manually alignd, then preserve alignment result, register vicious sentence simultaneously.
Claims (1)
1. a Uighur word alignment method, is characterized in that: 1. realized the automatic aligning of Uighur word, the alignment relation between Uighur word and Chinese terms is divided into 5 kinds, is respectively one to one, one-to-many, many-one, multi-to-multi, a pair of sky; 2. pair automatic aligning occurs that wrong word manually aligns, and has improved the accuracy rate of system processing Uighur; 3. realized fractionation and the merging to Uighur word according to the feature of Uighur.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210579979.3A CN103902528A (en) | 2012-12-28 | 2012-12-28 | Uygur language word alignment method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210579979.3A CN103902528A (en) | 2012-12-28 | 2012-12-28 | Uygur language word alignment method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103902528A true CN103902528A (en) | 2014-07-02 |
Family
ID=50993858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210579979.3A Pending CN103902528A (en) | 2012-12-28 | 2012-12-28 | Uygur language word alignment method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103902528A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507734A (en) * | 2020-11-19 | 2021-03-16 | 南京大学 | Roman Uygur language-based neural machine translation system |
CN113158693A (en) * | 2021-03-13 | 2021-07-23 | 中国科学院新疆理化技术研究所 | Uygur language keyword generation method and device based on Chinese keywords, electronic equipment and storage medium |
CN113536747A (en) * | 2021-09-14 | 2021-10-22 | 潍坊北大青鸟华光照排有限公司 | Uyghur language last-syllable-splitting processing method on mobile equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110246177A1 (en) * | 2010-04-06 | 2011-10-06 | Samsung Electronics Co. Ltd. | Syntactic analysis and hierarchical phrase model based machine translation system and method |
CN102662932A (en) * | 2012-03-15 | 2012-09-12 | 中国科学院自动化研究所 | Method for establishing tree structure and tree-structure-based machine translation system |
CN102708098A (en) * | 2012-05-30 | 2012-10-03 | 中国科学院自动化研究所 | Dependency coherence constraint-based automatic alignment method for bilingual words |
-
2012
- 2012-12-28 CN CN201210579979.3A patent/CN103902528A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110246177A1 (en) * | 2010-04-06 | 2011-10-06 | Samsung Electronics Co. Ltd. | Syntactic analysis and hierarchical phrase model based machine translation system and method |
CN102662932A (en) * | 2012-03-15 | 2012-09-12 | 中国科学院自动化研究所 | Method for establishing tree structure and tree-structure-based machine translation system |
CN102708098A (en) * | 2012-05-30 | 2012-10-03 | 中国科学院自动化研究所 | Dependency coherence constraint-based automatic alignment method for bilingual words |
Non-Patent Citations (3)
Title |
---|
张亚军 等: "汉语维吾尔语的一对一词对齐研究", 《昌吉学院院报》, no. 6, 15 December 2012 (2012-12-15), pages 80 - 83 * |
李英 等: "一种基于词典和长度相结合的汉维句子对齐算法", 《新乡学院院报自然科学版》, vol. 29, no. 1, 15 May 2012 (2012-05-15), pages 66 - 68 * |
麦热哈巴·艾力 等: "一种提高维吾尔语汉语词语对齐的方法研究", 《小型微型计算机系统》, vol. 33, no. 11, 15 November 2012 (2012-11-15), pages 2551 - 2555 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507734A (en) * | 2020-11-19 | 2021-03-16 | 南京大学 | Roman Uygur language-based neural machine translation system |
CN112507734B (en) * | 2020-11-19 | 2024-03-19 | 南京大学 | Neural machine translation system based on romanized Uygur language |
CN113158693A (en) * | 2021-03-13 | 2021-07-23 | 中国科学院新疆理化技术研究所 | Uygur language keyword generation method and device based on Chinese keywords, electronic equipment and storage medium |
CN113536747A (en) * | 2021-09-14 | 2021-10-22 | 潍坊北大青鸟华光照排有限公司 | Uyghur language last-syllable-splitting processing method on mobile equipment |
CN113536747B (en) * | 2021-09-14 | 2022-03-29 | 潍坊北大青鸟华光照排有限公司 | Uyghur language last-syllable-splitting processing method on mobile equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Comparison of Google translation with human translation | |
Al-Jumaily et al. | A real time Named Entity Recognition system for Arabic text mining | |
CN103412855A (en) | Method and system for automatic identification of relative words in complex sentence of modern Chinese language | |
Liu et al. | Chinese-Portuguese machine translation: a study on building parallel corpora from comparable texts | |
CN103902528A (en) | Uygur language word alignment method | |
Russo et al. | Improving machine translation of null subjects in Italian and Spanish | |
Du et al. | Using babelnet to improve OOV coverage in SMT | |
Bouamor et al. | Automatic construction of a multiword expressions bilingual lexicon: A statistical machine translation evaluation perspective | |
Meng et al. | Lost in translations? building sentiment lexicons using context based machine translation | |
Tahir et al. | Knowledge based machine translation | |
Shamsfard | Developing FarsNet: A lexical ontology for Persian | |
Suryakanthi et al. | Discourse translation from English to Telugu | |
Hong | Chinese near-synonym study based on the chinese gigaword corpus and the chinese learner corpus | |
Tsai | A learner corpus study of attributive clauses and passive voice in student translations | |
Wu et al. | Applying Chinese word sketch engine to distinguish commonly confused words | |
Perdek | Lexicographic potential of corpus equivalents: The case of English phrasal verbs and their Polish equivalents | |
Chen | A Linguistic Evaluation of the Output Quality of'Google Translate'and'Bing Translator'in Chinese-English Translation | |
Hutchins | Machine translation: problems and issues | |
Altenbek et al. | Kazakh noun phrase extraction based on n-gram and rules | |
Ji et al. | Phonetic name matching for cross-lingual spoken sentence retrieval | |
程倩倩 | A Corpus based Study on Gender Differences in Language——Take Friends as the Example | |
Qingjun et al. | The application of internet technology in translation | |
Malik et al. | Finite-state scriptural translation | |
Chengping | The Research and construction of Yi corpus for information processing | |
Ghaffar et al. | English to arabic statistical machine translation system improvements using preprocessing and arabic morphology analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140702 |