CN104298656A

CN104298656A - Kazakh word continuous writing judgment and storage method based on embedded system

Info

Publication number: CN104298656A
Application number: CN201310740856.8A
Authority: CN
Inventors: 柴雨峰; 李满树; 杨志杰; 汪振东; 倪凯峰; 塔拉甫·加盘
Original assignee: XINJIANG INFORMATION INDUSTRY Co Ltd
Current assignee: XINJIANG INFORMATION INDUSTRY Co Ltd
Priority date: 2013-12-29
Filing date: 2013-12-29
Publication date: 2015-01-21

Abstract

The invention discloses a Kazakh word continuous writing judgment and storage method based on an embedded system. The method includes the steps that (1) according to Kazakh word language features, Kazakh word unicode extended codes serve as bases, prefixes, middles and suffixes of Kazakh words each constitute character sets, the positions, at the prefixes, the middles and the suffixes, of the Kazakh word unicode extended codes are different, fonts change, and deformation is carried out according to the judgment of whether a first word or last word or middle (namely first and last) word or an alone word is generated; (2) a word stock extraction method is designed, Kazakh words used practically and displayed at an intelligent terminal interface are extracted by using a standard 8*16 word stock as a reference, and the Kazakh words are made to generate a new simplified Kazakh word stock.

Description

Kazakh write the two or more syllables of a word together based on embedded system judge and storage means

Technical field

The present invention relates to language supple-settlement technology, the Kazakh write the two or more syllables of a word together particularly based on embedded system judge and storage means.

Background technology

In recent years, along with ethnic group is information-based and the development of automatic field, there has also been in ethnic group based on the smart machine of embedded system in Xinjiang and apply more widely, but the educational level between each department, various nationalities differs greatly, the target fully applying intelligent terminal for reaching ethnic group user is made to be difficult to be implemented.

Summary of the invention

A kind of Kazakh write the two or more syllables of a word together based on embedded system are the object of the present invention is to provide to judge and storage means, the current Kazak input efficiency of energy solution is slow, memory rate is slow, the problem that required storage space is large, and develop spelling and the storage means according to being applicable to Kazakh language feature.

The object of the present invention is achieved like this: a kind of Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, 1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and it is at prefix, the difference of in word and suffix position and the change of font also connect before judging whether word, rear company, middle (connecting namely) or an independent word, be out of shape; 2. devise a kind of character library extracting method, with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.

The present invention can solve that current Kazak input efficiency is slow, memory rate is slow, the problem that required storage space is large, and develops spelling and the storage means according to being applicable to Kazakh language feature.

Accompanying drawing explanation

Below in conjunction with accompanying drawing, the invention will be further described.

Fig. 1 is workflow diagram of the present invention.

Embodiment

A kind of Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, as shown in Figure 1,1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and connect before judging whether word, rear company, middle (connecting namely) or a single or only word, tackle it mutually and be out of shape; 2. with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.

The present invention still follows Kazakh transformation rule.Kazakh transformation rule is: Kazak word belongs to Arabic word series, and arabian writing spreads very wide under Mohammedan impact.Farsi, the kinds of words such as Kazak, kirgiz in Wu Er all Xinjiang of literary composition and China adopts Arabic alphabet.The letter of Kazakh does not have the differentiation of upper case and lower case, but has the difference of block letter and clerical type, and removing beyond five letters, other 29 letters can with alphabetical write the two or more syllables of a word together below, and because it is at prefix, different with the position of suffix in word, font also changes to some extent.The presentation direction of Kazak word is different with Chinese, and it is that right-to-left is write sidewards, and therefore, Kazakh books and book are all right formats.

When line feed, general, whether we need to judge is whole word, carry out the line feed of whole word, and word can not be splitted into two parts.Numeral inside Kazakh still adopts display mode from left to right.Therefore, Kazakh word discrete method can be divided into first, last, middle and alone, and connect (character is in set 1 above) before judging whether word, rear company (character is in set 2 below), it is middle that (namely front and back connect, character is in set 1 above, after character in set 2) or or an independent word, carry out corresponding deformation.

According to above-mentioned analysis, provide the array of distortion: the situation respectively above correspondence.For other not in array, its distortion is identical with self, concrete determination methods in the following example:

const?WORD?Arbic_Position[][4]=//first，last，middle，alone

{

{0xfe90，0xfe91，0xfe92，0xfe8f}，//0x628

{0xfe94，0xfe93，0xfe93，0xfe93}，

{0xfe96，0xfe97，0xfe98，0xfe95}，//0x62A

{0xfe9a，0xfe9b，0xfe9c，0xfe99}，

{0xfe9e，0xfe9f，0xfea0，0xfe9d}，

{0xfea2，0xfea3，0xfea4，0xfea1}，

{0xfea6，0xfea7，0xfea8，0xfea5}，

{0xfeaa，0xfea9，0xfeaa，0xfea9}，

……

}

Judge whether to connect above, adopt the decision method judging the previous character of this character, method is, sees whether previous character is being gathered in set1.If, then there is connection above.Gather 1 as follows:

tatic?U16theSet1[23]={

0x62c，0x62d，0x62e，0x647，0x639，0x63a，0x641，0x642，

0x62b，0x635，0x636，0x637，0x643，0x645，0x646，0x62a，

0x644，0x628，0x64a，0x633，0x634，0x638，0x626}；

Judge whether to connect below, adopt the decision method judging a character after this character, method is, sees whether a rear character is being gathered in set2.If, then there is connection below.Gather 2 as follows:

static?U16theSet2[35]={

0x62c，0x62d，0x62e，0x647，0x639，0x63a，0x641，0x642，

0x62b，0x635，0x636，0x637，0x643，0x645，0x646，0x62a，

0x644，0x628，0x64a，0x633，0x634，0x638，0x626，

0x627，0x623，0x625，0x622，0x62f，0x630，0x631，0x632，

0x648，0x624，0x629，0x649}；

Hyphen is that character string is below 0x622,0x623 with 0x644 beginning, 0x625,0x627, and the character array 0 or 1 according to circumstances taking off face, if the previous character of 0x644 is in the middle of set 1 (set 1 above same), then peek group 1, otherwise peek group 0.

Array is as follows:

static?U16arabic_specs[][2]=

{{0xFEF5，0xFEF6}，

{0xFEF7，0xFEF8}，

{0xFEF9，0xFEFA}，

{0xFEFB，0xFEFC}，

}；

Such as: 0x064A, 0x0644,0x0622.。。

According to coding rule 1, a character 0x0644 below of 0x064A, in set 2, show that it is rear hyphen (last), therefore converts to: 0xFEF3. and 0x064A in set 1, therefore substitute 0x06440x0622 these two codings with 0xFEF6

Storage processing method: in order to save storage space, devises a kind of character library extraction procedure, with standard 8 × 16 character library for benchmark, takes out the intelligent terminal interface display actual Kazak word used, and is generated one and new simplifies Kazakh character library.Such as, show the Kazakh of " voltage ", extended code corresponding to it is " FE97; FEEE; FED9; FE91, FBE7, FEB4; FEE4; FEF0 ", and the character library generated according to Kazak word extended code finds out the dot matrix font of this text importing, and wherein the computing formula of dot matrix font address is Uaddr=zkfile+uiger_length × 2+x × 16, wherein, zkfile is Kazakh font file first address in memory, and uiger_length is the sum of Kazak word in character library, and x is its position in character library.

Claims

1. the Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, it is characterized in that: 1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and connect before judging whether word, rear company, middle (connecting namely) or a single or only word, tackle it mutually and be out of shape; 2. with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.