CN104298656A - Kazakh word continuous writing judgment and storage method based on embedded system - Google Patents
Kazakh word continuous writing judgment and storage method based on embedded system Download PDFInfo
- Publication number
- CN104298656A CN104298656A CN201310740856.8A CN201310740856A CN104298656A CN 104298656 A CN104298656 A CN 104298656A CN 201310740856 A CN201310740856 A CN 201310740856A CN 104298656 A CN104298656 A CN 104298656A
- Authority
- CN
- China
- Prior art keywords
- word
- kazakh
- embedded system
- character
- method based
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a Kazakh word continuous writing judgment and storage method based on an embedded system. The method includes the steps that (1) according to Kazakh word language features, Kazakh word unicode extended codes serve as bases, prefixes, middles and suffixes of Kazakh words each constitute character sets, the positions, at the prefixes, the middles and the suffixes, of the Kazakh word unicode extended codes are different, fonts change, and deformation is carried out according to the judgment of whether a first word or last word or middle (namely first and last) word or an alone word is generated; (2) a word stock extraction method is designed, Kazakh words used practically and displayed at an intelligent terminal interface are extracted by using a standard 8*16 word stock as a reference, and the Kazakh words are made to generate a new simplified Kazakh word stock.
Description
Technical field
The present invention relates to language supple-settlement technology, the Kazakh write the two or more syllables of a word together particularly based on embedded system judge and storage means.
Background technology
In recent years, along with ethnic group is information-based and the development of automatic field, there has also been in ethnic group based on the smart machine of embedded system in Xinjiang and apply more widely, but the educational level between each department, various nationalities differs greatly, the target fully applying intelligent terminal for reaching ethnic group user is made to be difficult to be implemented.
Summary of the invention
A kind of Kazakh write the two or more syllables of a word together based on embedded system are the object of the present invention is to provide to judge and storage means, the current Kazak input efficiency of energy solution is slow, memory rate is slow, the problem that required storage space is large, and develop spelling and the storage means according to being applicable to Kazakh language feature.
The object of the present invention is achieved like this: a kind of Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, 1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and it is at prefix, the difference of in word and suffix position and the change of font also connect before judging whether word, rear company, middle (connecting namely) or an independent word, be out of shape; 2. devise a kind of character library extracting method, with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.
The present invention can solve that current Kazak input efficiency is slow, memory rate is slow, the problem that required storage space is large, and develops spelling and the storage means according to being applicable to Kazakh language feature.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the invention will be further described.
Fig. 1 is workflow diagram of the present invention.
Embodiment
A kind of Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, as shown in Figure 1,1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and connect before judging whether word, rear company, middle (connecting namely) or a single or only word, tackle it mutually and be out of shape; 2. with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.
The present invention still follows Kazakh transformation rule.Kazakh transformation rule is: Kazak word belongs to Arabic word series, and arabian writing spreads very wide under Mohammedan impact.Farsi, the kinds of words such as Kazak, kirgiz in Wu Er all Xinjiang of literary composition and China adopts Arabic alphabet.The letter of Kazakh does not have the differentiation of upper case and lower case, but has the difference of block letter and clerical type, and removing
beyond five letters, other 29 letters can with alphabetical write the two or more syllables of a word together below, and because it is at prefix, different with the position of suffix in word, font also changes to some extent.The presentation direction of Kazak word is different with Chinese, and it is that right-to-left is write sidewards, and therefore, Kazakh books and book are all right formats.
When line feed, general, whether we need to judge is whole word, carry out the line feed of whole word, and word can not be splitted into two parts.Numeral inside Kazakh still adopts display mode from left to right.Therefore, Kazakh word discrete method can be divided into first, last, middle and alone, and connect (character is in set 1 above) before judging whether word, rear company (character is in set 2 below), it is middle that (namely front and back connect, character is in set 1 above, after character in set 2) or or an independent word, carry out corresponding deformation.
According to above-mentioned analysis, provide the array of distortion: the situation respectively above correspondence.For other not in array, its distortion is identical with self, concrete determination methods in the following example:
const?WORD?Arbic_Position[][4]=//first,last,middle,alone
{
{0xfe90,0xfe91,0xfe92,0xfe8f},//0x628
{0xfe94,0xfe93,0xfe93,0xfe93},
{0xfe96,0xfe97,0xfe98,0xfe95},//0x62A
{0xfe9a,0xfe9b,0xfe9c,0xfe99},
{0xfe9e,0xfe9f,0xfea0,0xfe9d},
{0xfea2,0xfea3,0xfea4,0xfea1},
{0xfea6,0xfea7,0xfea8,0xfea5},
{0xfeaa,0xfea9,0xfeaa,0xfea9},
……
}
Judge whether to connect above, adopt the decision method judging the previous character of this character, method is, sees whether previous character is being gathered in set1.If, then there is connection above.Gather 1 as follows:
tatic?U16theSet1[23]={
0x62c,0x62d,0x62e,0x647,0x639,0x63a,0x641,0x642,
0x62b,0x635,0x636,0x637,0x643,0x645,0x646,0x62a,
0x644,0x628,0x64a,0x633,0x634,0x638,0x626};
Judge whether to connect below, adopt the decision method judging a character after this character, method is, sees whether a rear character is being gathered in set2.If, then there is connection below.Gather 2 as follows:
static?U16theSet2[35]={
0x62c,0x62d,0x62e,0x647,0x639,0x63a,0x641,0x642,
0x62b,0x635,0x636,0x637,0x643,0x645,0x646,0x62a,
0x644,0x628,0x64a,0x633,0x634,0x638,0x626,
0x627,0x623,0x625,0x622,0x62f,0x630,0x631,0x632,
0x648,0x624,0x629,0x649};
Hyphen is that character string is below 0x622,0x623 with 0x644 beginning, 0x625,0x627, and the character array 0 or 1 according to circumstances taking off face, if the previous character of 0x644 is in the middle of set 1 (set 1 above same), then peek group 1, otherwise peek group 0.
Array is as follows:
static?U16arabic_specs[][2]=
{{0xFEF5,0xFEF6},
{0xFEF7,0xFEF8},
{0xFEF9,0xFEFA},
{0xFEFB,0xFEFC},
};
Such as: 0x064A, 0x0644,0x0622.。。
According to coding rule 1, a character 0x0644 below of 0x064A, in set 2, show that it is rear hyphen (last), therefore converts to: 0xFEF3. and 0x064A in set 1, therefore substitute 0x06440x0622 these two codings with 0xFEF6
Storage processing method: in order to save storage space, devises a kind of character library extraction procedure, with standard 8 × 16 character library for benchmark, takes out the intelligent terminal interface display actual Kazak word used, and is generated one and new simplifies Kazakh character library.Such as, show the Kazakh of " voltage ", extended code corresponding to it is " FE97; FEEE; FED9; FE91, FBE7, FEB4; FEE4; FEF0 ", and the character library generated according to Kazak word extended code finds out the dot matrix font of this text importing, and wherein the computing formula of dot matrix font address is Uaddr=zkfile+uiger_length × 2+x × 16, wherein, zkfile is Kazakh font file first address in memory, and uiger_length is the sum of Kazak word in character library, and x is its position in character library.
Claims (1)
1. the Kazakh write the two or more syllables of a word together based on embedded system judge and storage means, it is characterized in that: 1. according to Kazakh language feature, based on Kazakh unicode extended code, make in Kazakh prefix, word, suffix forms character set separately and connect before judging whether word, rear company, middle (connecting namely) or a single or only word, tackle it mutually and be out of shape; 2. with standard 8 × 16 character library for benchmark, take out the intelligent terminal interface display actual Kazak word used, generated one and new simplify Kazakh character library.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310740856.8A CN104298656A (en) | 2013-12-29 | 2013-12-29 | Kazakh word continuous writing judgment and storage method based on embedded system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310740856.8A CN104298656A (en) | 2013-12-29 | 2013-12-29 | Kazakh word continuous writing judgment and storage method based on embedded system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104298656A true CN104298656A (en) | 2015-01-21 |
Family
ID=52318385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310740856.8A Pending CN104298656A (en) | 2013-12-29 | 2013-12-29 | Kazakh word continuous writing judgment and storage method based on embedded system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104298656A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102446275A (en) * | 2010-09-30 | 2012-05-09 | 汉王科技股份有限公司 | Identification method and device for Arabic character |
CN103870515A (en) * | 2012-12-18 | 2014-06-18 | 北大方正集团有限公司 | Mongolian word stock constructing method, Mongolian displaying method and Mongolian displaying device |
-
2013
- 2013-12-29 CN CN201310740856.8A patent/CN104298656A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102446275A (en) * | 2010-09-30 | 2012-05-09 | 汉王科技股份有限公司 | Identification method and device for Arabic character |
CN103870515A (en) * | 2012-12-18 | 2014-06-18 | 北大方正集团有限公司 | Mongolian word stock constructing method, Mongolian displaying method and Mongolian displaying device |
Non-Patent Citations (3)
Title |
---|
冯丽娟等: "信息处理用哈萨克斯坦哈文字符及输入法研究", 《电脑知识与技术》 * |
袁保社等: "OpenType字形技术研究与哈萨克文字库设计", 《计算工程与设计》 * |
阿布力米提·阿不都热依木: "维吾尔文信息处理平台OpenType字体制作技术", 《计算机工程与设计》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yu et al. | Chinese spelling error detection and correction based on language model, pronunciation, and shape | |
CN103914443B (en) | A kind of mixing method and device of multilingual words | |
CN109284145A (en) | The generation of multilingual configuration file and methods of exhibiting and device, equipment and medium | |
Chowdhury et al. | Unsupervised abstractive summarization of bengali text documents | |
CN106445904A (en) | Page typesetting method and device | |
CN103150293A (en) | Electronic device with messy code recovery function and messy code recovery method | |
CN103885942A (en) | Rapid translation device and method | |
CN101008940A (en) | Method and device for automatic processing font missing | |
CN103136453A (en) | Automatic test paper formation method and automatic scoring method of document manipulation subjects | |
CN104933030B (en) | A kind of Uighur spell checking methods and device | |
CN107908377A (en) | The analytic method of font coding information and its device and application in print language | |
CN102736741A (en) | Pinyin input method and system of Chinese characters | |
CN103678284A (en) | Method and device for translating page characters | |
CN104298656A (en) | Kazakh word continuous writing judgment and storage method based on embedded system | |
CN101625845A (en) | Display method and display device | |
CN103440231A (en) | Equipment and method for comparing texts | |
CN102723067B (en) | A kind of character display method and device | |
CN101149669A (en) | Words coded conversion method | |
CN106648618B (en) | Text information generation method and device for virtual application | |
He et al. | Named entity relation extraction method based on seed self-expansion. | |
CN102722261B (en) | System and method for editing incorrect character fonts and coding and inputting incorrect characters | |
CN104536948A (en) | Layout document processing method and device | |
CN105446510A (en) | Hand-written input method and device | |
CN103491414A (en) | Method and device for processing character and set top box | |
CN101598977A (en) | A kind of character input method based on Arabic letter nominal form coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150121 |
|
RJ01 | Rejection of invention patent application after publication |