CN104298656A - Kazakh word continuous writing judgment and storage method based on embedded system - Google Patents

Kazakh word continuous writing judgment and storage method based on embedded system Download PDF

Info

Publication number
CN104298656A
CN104298656A CN201310740856.8A CN201310740856A CN104298656A CN 104298656 A CN104298656 A CN 104298656A CN 201310740856 A CN201310740856 A CN 201310740856A CN 104298656 A CN104298656 A CN 104298656A
Authority
CN
China
Prior art keywords
word
kazakh
embedded system
character
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310740856.8A
Other languages
Chinese (zh)
Inventor
柴雨峰
李满树
杨志杰
汪振东
倪凯峰
塔拉甫·加盘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XINJIANG INFORMATION INDUSTRY Co Ltd
Original Assignee
XINJIANG INFORMATION INDUSTRY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINJIANG INFORMATION INDUSTRY Co Ltd filed Critical XINJIANG INFORMATION INDUSTRY Co Ltd
Priority to CN201310740856.8A priority Critical patent/CN104298656A/en
Publication of CN104298656A publication Critical patent/CN104298656A/en
Pending legal-status Critical Current

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a Kazakh word continuous writing judgment and storage method based on an embedded system. The method includes the steps that (1) according to Kazakh word language features, Kazakh word unicode extended codes serve as bases, prefixes, middles and suffixes of Kazakh words each constitute character sets, the positions, at the prefixes, the middles and the suffixes, of the Kazakh word unicode extended codes are different, fonts change, and deformation is carried out according to the judgment of whether a first word or last word or middle (namely first and last) word or an alone word is generated; (2) a word stock extraction method is designed, Kazakh words used practically and displayed at an intelligent terminal interface are extracted by using a standard 8*16 word stock as a reference, and the Kazakh words are made to generate a new simplified Kazakh word stock.

Description

Kazakh write the two or more syllables of a word together based on embedded system judge and storage means
Technical field
The present invention relates to language supple-settlement technology, the Kazakh write the two or more syllables of a word together particularly based on embedded system judge and storage means.
Background technology
In recent years, along with ethnic group is information-based and the development of automatic field, there has also been in ethnic group based on the smart machine of embedded system in Xinjiang and apply more widely, but the educational level between each department, various nationalities differs greatly, the target fully applying intelligent terminal for reaching ethnic group user is made to be difficult to be implemented.
Summary of the invention
A kind of Kazakh write the two or more syllables of a word together based on embedded system are the object of the present invention is to provide to judge and storage means, the current Kazak input efficiency of energy solution is slow, memory rate is slow, the problem that required storage space is large, and develop spelling and the storage means according to being applicable to Kazakh language feature.
The object of the present invention is achieved like this: a kind of Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, 1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and it is at prefix, the difference of in word and suffix position and the change of font also connect before judging whether word, rear company, middle (connecting namely) or an independent word, be out of shape; 2. devise a kind of character library extracting method, with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.
The present invention can solve that current Kazak input efficiency is slow, memory rate is slow, the problem that required storage space is large, and develops spelling and the storage means according to being applicable to Kazakh language feature.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the invention will be further described.
Fig. 1 is workflow diagram of the present invention.
Embodiment
A kind of Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, as shown in Figure 1,1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and connect before judging whether word, rear company, middle (connecting namely) or a single or only word, tackle it mutually and be out of shape; 2. with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.
The present invention still follows Kazakh transformation rule.Kazakh transformation rule is: Kazak word belongs to Arabic word series, and arabian writing spreads very wide under Mohammedan impact.Farsi, the kinds of words such as Kazak, kirgiz in Wu Er all Xinjiang of literary composition and China adopts Arabic alphabet.The letter of Kazakh does not have the differentiation of upper case and lower case, but has the difference of block letter and clerical type, and removing beyond five letters, other 29 letters can with alphabetical write the two or more syllables of a word together below, and because it is at prefix, different with the position of suffix in word, font also changes to some extent.The presentation direction of Kazak word is different with Chinese, and it is that right-to-left is write sidewards, and therefore, Kazakh books and book are all right formats.
When line feed, general, whether we need to judge is whole word, carry out the line feed of whole word, and word can not be splitted into two parts.Numeral inside Kazakh still adopts display mode from left to right.Therefore, Kazakh word discrete method can be divided into first, last, middle and alone, and connect (character is in set 1 above) before judging whether word, rear company (character is in set 2 below), it is middle that (namely front and back connect, character is in set 1 above, after character in set 2) or or an independent word, carry out corresponding deformation.
According to above-mentioned analysis, provide the array of distortion: the situation respectively above correspondence.For other not in array, its distortion is identical with self, concrete determination methods in the following example:
const?WORD?Arbic_Position[][4]=//first,last,middle,alone
{
{0xfe90,0xfe91,0xfe92,0xfe8f},//0x628
{0xfe94,0xfe93,0xfe93,0xfe93},
{0xfe96,0xfe97,0xfe98,0xfe95},//0x62A
{0xfe9a,0xfe9b,0xfe9c,0xfe99},
{0xfe9e,0xfe9f,0xfea0,0xfe9d},
{0xfea2,0xfea3,0xfea4,0xfea1},
{0xfea6,0xfea7,0xfea8,0xfea5},
{0xfeaa,0xfea9,0xfeaa,0xfea9},
……
}
Judge whether to connect above, adopt the decision method judging the previous character of this character, method is, sees whether previous character is being gathered in set1.If, then there is connection above.Gather 1 as follows:
tatic?U16theSet1[23]={
0x62c,0x62d,0x62e,0x647,0x639,0x63a,0x641,0x642,
0x62b,0x635,0x636,0x637,0x643,0x645,0x646,0x62a,
0x644,0x628,0x64a,0x633,0x634,0x638,0x626};
Judge whether to connect below, adopt the decision method judging a character after this character, method is, sees whether a rear character is being gathered in set2.If, then there is connection below.Gather 2 as follows:
static?U16theSet2[35]={
0x62c,0x62d,0x62e,0x647,0x639,0x63a,0x641,0x642,
0x62b,0x635,0x636,0x637,0x643,0x645,0x646,0x62a,
0x644,0x628,0x64a,0x633,0x634,0x638,0x626,
0x627,0x623,0x625,0x622,0x62f,0x630,0x631,0x632,
0x648,0x624,0x629,0x649};
Hyphen is that character string is below 0x622,0x623 with 0x644 beginning, 0x625,0x627, and the character array 0 or 1 according to circumstances taking off face, if the previous character of 0x644 is in the middle of set 1 (set 1 above same), then peek group 1, otherwise peek group 0.
Array is as follows:
static?U16arabic_specs[][2]=
{{0xFEF5,0xFEF6},
{0xFEF7,0xFEF8},
{0xFEF9,0xFEFA},
{0xFEFB,0xFEFC},
};
Such as: 0x064A, 0x0644,0x0622.。。
According to coding rule 1, a character 0x0644 below of 0x064A, in set 2, show that it is rear hyphen (last), therefore converts to: 0xFEF3. and 0x064A in set 1, therefore substitute 0x06440x0622 these two codings with 0xFEF6
Storage processing method: in order to save storage space, devises a kind of character library extraction procedure, with standard 8 × 16 character library for benchmark, takes out the intelligent terminal interface display actual Kazak word used, and is generated one and new simplifies Kazakh character library.Such as, show the Kazakh of " voltage ", extended code corresponding to it is " FE97; FEEE; FED9; FE91, FBE7, FEB4; FEE4; FEF0 ", and the character library generated according to Kazak word extended code finds out the dot matrix font of this text importing, and wherein the computing formula of dot matrix font address is Uaddr=zkfile+uiger_length × 2+x × 16, wherein, zkfile is Kazakh font file first address in memory, and uiger_length is the sum of Kazak word in character library, and x is its position in character library.

Claims (1)

1. the Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, it is characterized in that: 1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and connect before judging whether word, rear company, middle (connecting namely) or a single or only word, tackle it mutually and be out of shape; 2. with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.
CN201310740856.8A 2013-12-29 2013-12-29 Kazakh word continuous writing judgment and storage method based on embedded system Pending CN104298656A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310740856.8A CN104298656A (en) 2013-12-29 2013-12-29 Kazakh word continuous writing judgment and storage method based on embedded system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310740856.8A CN104298656A (en) 2013-12-29 2013-12-29 Kazakh word continuous writing judgment and storage method based on embedded system

Publications (1)

Publication Number Publication Date
CN104298656A true CN104298656A (en) 2015-01-21

Family

ID=52318385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310740856.8A Pending CN104298656A (en) 2013-12-29 2013-12-29 Kazakh word continuous writing judgment and storage method based on embedded system

Country Status (1)

Country Link
CN (1) CN104298656A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446275A (en) * 2010-09-30 2012-05-09 汉王科技股份有限公司 Identification method and device for Arabic character
CN103870515A (en) * 2012-12-18 2014-06-18 北大方正集团有限公司 Mongolian word stock constructing method, Mongolian displaying method and Mongolian displaying device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102446275A (en) * 2010-09-30 2012-05-09 汉王科技股份有限公司 Identification method and device for Arabic character
CN103870515A (en) * 2012-12-18 2014-06-18 北大方正集团有限公司 Mongolian word stock constructing method, Mongolian displaying method and Mongolian displaying device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯丽娟等: "信息处理用哈萨克斯坦哈文字符及输入法研究", 《电脑知识与技术》 *
袁保社等: "OpenType字形技术研究与哈萨克文字库设计", 《计算工程与设计》 *
阿布力米提·阿不都热依木: "维吾尔文信息处理平台OpenType字体制作技术", 《计算机工程与设计》 *

Similar Documents

Publication Publication Date Title
CN103914443B (en) A kind of mixing method and device of multilingual words
CN105786921B (en) A kind of the data module method for transformation and device of non-structured document
CN103885942B (en) A kind of rapid translation device and method
CN101008940B (en) Method and device for automatic processing font missing
Chowdhury et al. Unsupervised abstractive summarization of bengali text documents
CN103150293A (en) Electronic device with messy code recovery function and messy code recovery method
CN106445904A (en) Page typesetting method and device
CN107908377A (en) The analytic method of font coding information and its device and application in print language
CN102736741A (en) Pinyin input method and system of Chinese characters
Shirali-Shahreza et al. An improved version of Persian/Arabic text steganography using" La" Word
CN104933030B (en) A kind of Uighur spell checking methods and device
CN104298656A (en) Kazakh word continuous writing judgment and storage method based on embedded system
CN104536947A (en) Layout document processing method and device
CN111709431A (en) Instant translation method and device, computer equipment and storage medium
CN101706688A (en) Method for inputting Chinese numbers
CN101625845A (en) Display method and display device
CN102723067B (en) A kind of character display method and device
CN106648618B (en) Text information generation method and device for virtual application
CN101598977B (en) Text input method based on Arabic letter nominal form coding
CN102722261B (en) System and method for editing incorrect character fonts and coding and inputting incorrect characters
CN105446510A (en) Hand-written input method and device
CN103491414A (en) Method and device for processing character and set top box
CN104424184B (en) Generate the method and system of font character library
CN107301162A (en) A kind of method and device for recognizing word or file
EP4109435A1 (en) Braille editting method using error output function, recording medium storing program for executing same, and computer program stored in recording medium for executing same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150121

RJ01 Rejection of invention patent application after publication