WO2008089654A1 - Dispositif et procédé d'extraction et classement de type de caractères chinois et systèmes d'information - Google Patents

Dispositif et procédé d'extraction et classement de type de caractères chinois et systèmes d'information Download PDF

Info

Publication number
WO2008089654A1
WO2008089654A1 PCT/CN2008/000109 CN2008000109W WO2008089654A1 WO 2008089654 A1 WO2008089654 A1 WO 2008089654A1 CN 2008000109 W CN2008000109 W CN 2008000109W WO 2008089654 A1 WO2008089654 A1 WO 2008089654A1
Authority
WO
WIPO (PCT)
Prior art keywords
chinese character
character type
sorting
information system
chinese
Prior art date
Application number
PCT/CN2008/000109
Other languages
English (en)
Chinese (zh)
Inventor
Yingkit Lo
Original Assignee
Yingkit Lo
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yingkit Lo filed Critical Yingkit Lo
Publication of WO2008089654A1 publication Critical patent/WO2008089654A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Definitions

  • the present invention relates to a Chinese character type sorting retrieval method and apparatus for an information system, and an information system, such as a database or a paged arrangement type font (i.e., a font database), to which the Chinese character type sorting search method or apparatus is applied.
  • an information system such as a database or a paged arrangement type font (i.e., a font database), to which the Chinese character type sorting search method or apparatus is applied.
  • the Chinese character font itself is actually a unique meaning symbol. So far, the Chinese character type does not have a complete set of standards, and the unique font sorting and editing methods. It is the constant change in the number of words and fonts.
  • a database with a Chinese character type as content such as font data ⁇ (referred to as a font library)
  • font data ⁇ referred to as a font library
  • the scope of application includes any information system such as data.
  • the Chinese character sorting method so far, there are mainly two methods, the first is to sort according to the radical classification method, and then search in the same number of strokes in the radical.
  • the biggest disadvantage of this method is that it uses the radical classification and the number of strokes.
  • you need to find the sort position in the order of adding and subtracting two strokes in the radical which consumes a lot of search time.
  • the judgment efficiency at the time of sorting is more burdensome.
  • the second is to sort according to the pinyin standard of the font, in the same font, add a pinyin directory, you need to sort out the required Chinese characters in a large number of homophones.
  • Both sorting retrieval methods need to first sort the font page number in the index of the font database, and then search according to the page number position. These processes will consume a lot of time. In fact, both of these are based on the Chinese character type to place the font in the way of the radical classification. At the same time, both of these are just sorting methods that belong to the directory index. This Chinese character type is arranged in the same way and does not improve any efficiency. The Chinese font needs to be repositioned with strict logic in order to sort in a more efficient manner. The problem is that the location of the previous font layout is not conducive to effective logical queries.
  • the Chinese character type is currently one of the most widely used and learned words in the world, and each text application and learning process needs to be applied to a database or paper printed font (such as a dictionary).
  • Etc. its role is to sort the strange Chinese characters in the database, to check the understanding of the font.
  • there is still no fast and accurate Chinese-style sorting retrieval method because the Chinese character is the most complicated in all writing systems.
  • Most of the other texts are composed of dozens of pinyin texts, each of which has a fixed sorting pattern, and the database of all the pinyin texts is usually arranged according to the sorting.
  • the structural concept of the Chinese database is to cognize the systematic arrangement of a large number of Chinese characters and the corresponding font system.
  • the structure of the Chinese character system is composed of different radicals and components.
  • the radical is the category governing the Chinese character system, but the radical itself is in the font, because it is applied to different Chinese characters. There have been more or less changes, resulting in a large amount of confusion in the font collection of the index of the radical index and in the database. Even the radicals used by most databases do not have a standard in classification and quantity. .
  • the present invention is directed to a Chinese character sorting retrieval method and apparatus for an information system, particularly a database, and an information system using the Chinese character sorting search method or apparatus to implement retrieval, arrangement, input, and the like of Chinese content.
  • a Chinese character type sorting retrieval method for an information system includes a Chinese character type
  • the Chinese character type sorting and retrieval method comprises the following steps: receiving the Chinese character according to a predetermined encoding rule
  • the type is mapped to an alphanumeric code; and the Chinese font is sorted according to the order of the encoding;
  • the encoding rule is as follows: the Chinese character is split into at least one stroke according to the predetermined stroke set and the predetermined stroke order, and the stroke and the code are basically One-to-one correspondence, wherein the predetermined stroke collection includes: “,,,, representing a point-like stroke in the Chinese character type; ",,, representing a short and short stroke in the Chinese character type; "", representing Chinese characters Long and long scorpion strokes in the type; "-,,, representing the short and short vertical strokes in the Chinese character type; and "-”, representing the long horizontal and long vertical strokes in the Chinese character type.
  • the encoding includes at least any five of the alphanumeric characters
  • the Chinese character sorting and searching method further includes the following steps: using the input device including the alphabetic key and the numeric key in accordance with the encoding in the information system Search for Chinese characters in sequence.
  • the encoding includes at least five numbers
  • the Chinese font sorting retrieval method further includes the following steps: The Chinese character type is searched in the information system in the order of encoding using an input device including numeric keys.
  • the Chinese character sorting retrieval method further includes the following steps: Using the numeric keys 1, 2, 3, 4, and 5, the Chinese character type is retrieved in the information system in the order of encoding.
  • the information system is used for querying a specific Chinese character type pagination font library, and the Chinese character sorting and retrieving method further comprises the following steps: sorting the Chinese character type and then sequentially sorting them into the font library. In the page, and correspondingly obtain the page number of each page; if the predetermined Chinese character type in the font is to be retrieved, the predetermined Chinese character type is first converted into an encoding, and then corresponding to the page number, and then the Chinese character is retrieved from the page corresponding to the page number. type.
  • the predetermined stroke order is a Chinese character type writing stroke order.
  • a Chinese character type sorting and retrieving apparatus for an information system.
  • the content of the information system is included in a Chinese character type
  • the sorting and retrieving apparatus comprises: An encoding module, configured to map a Chinese character type to an alphanumeric code according to a predetermined encoding rule; a sorting module for sorting the Chinese character type in an encoding order; and a retrieval module for ordering from the encoding
  • the Chinese character type is searched in the information system; wherein, the encoding rule is as follows: the Chinese character type is split into at least one stroke according to the predetermined stroke set and the predetermined stroke order, and the stroke and the code are basically-corresponding, wherein the predetermined stroke set includes: ", representing the strokes in the Chinese character type; "j,,, representing the short and short strokes in the Chinese character type; " ⁇ , representing the long and long scorpion strokes in the Chinese character type; " ,,, representing the short and short vertical strokes in the Chinese character type; and "
  • an information system is provided, the content of which is included in Chinese or sorted or retrieved using the above-described Chinese character type sorting retrieval means.
  • the information system can be a database.
  • the information system can be a font for inputting Chinese characters.
  • the information system can be a font library for querying a particular Chinese character type of pagination. According to the above, the present invention realizes the sorting of the Chinese character type of the information system, thereby realizing quick and convenient input, retrieval and the like.
  • FIG. 1 illustrates a Chinese character sorting method for an information system according to an embodiment of the present invention
  • FIG. 2 shows a Chinese character sorting method for an information system according to an embodiment of the present invention
  • FIG. 3 shows a Chinese character type encoding rule according to an embodiment of the present invention
  • FIG. 4 illustrates a header corresponding page number partition index according to an embodiment of the present invention
  • FIG. 5 illustrates a Chinese character horizontal and vertical row example according to an embodiment of the present invention
  • FIG. 6 illustrates an implementation according to the present invention.
  • the horizontal and vertical fonts of the example correspond to two sets of primary and secondary page numbers.
  • Chinese characters appear in the form of square characters, which can divide Chinese characters into two forms by the direction of the radicals or components: one is horizontal and the other is vertical, that is, horizontally arranged horizontally, vertical is arranged vertically. Style.
  • Chinese characters can be basically separated in two ways, horizontally and vertically, and the ratio of their separation is about half.
  • each Chinese character can also be divided into a single word and a combined word.
  • the single word means that the whole Chinese character is composed of strokes.
  • the combined word means that the whole Chinese character is composed of radicals or parts, and more than 95% of Chinese characters.
  • the combination is a combination of characters, and in addition, the components of the Chinese character are divided into ideographic components and phonological components (ie, phonograms).
  • the Chinese character strokes are basically divided into five groups, and each group of strokes and the keys in the keyboard exist-corresponding mapping relationship.
  • each group of strokes and the keys in the keyboard exist-corresponding mapping relationship.
  • the Chinese characters need to be input, only the horizontal Chinese characters or the vertical Chinese characters are required.
  • the first three strokes of each component are knocked out on the corresponding keys, and the Chinese characters can be retrieved and input.
  • only a few taps on the keyboard can complete the retrieval and input of Chinese characters, which will greatly improve the speed of Chinese character retrieval and input.
  • FIG. 1 illustrates a Chinese character sorting retrieval method for an information system according to an embodiment of the present invention, which includes the following steps: Step 12: mapping the Chinese character type to an alphanumeric code according to a predetermined encoding rule;
  • Step 14 Sort the Chinese characters in the order of encoding.
  • the coding is as follows: the Chinese character type is split into at least one stroke according to the predetermined stroke set and the predetermined stroke order, and the stroke and the code are basically-corresponding, wherein the predetermined stroke collection package ⁇ : ", " represents the Chinese character type "S”, which means short and short strokes in Chinese characters; “J”, which means long strokes and long strokes in Chinese characters; "-,,, for Chinese characters Short and short vertical strokes in the middle; and "one”, representing the long horizontal and long vertical strokes in the Chinese character type.
  • the present invention also provides a Chinese character sorting and retrieving apparatus for an information system, the content of which includes a Chinese character type.
  • the Chinese character sorting apparatus 20 includes: an encoding module 22 for mapping a Chinese character type to an encoding composed of alphanumeric characters according to a predetermined encoding rule;
  • the sorting module 24 is configured to retrieve the Chinese character type in the information system according to the encoding order; wherein, the encoding rule is as follows: the Chinese character type is split into at least one stroke according to the predetermined stroke set and the predetermined stroke order, and the stroke and the code are basically- Correspondingly, among them, the predetermined stroke collection package: ⁇ ", ", representing the Chinese character type middle point stroke; " ", representing the short and short strokes in the Chinese character type; " / ', representing the Chinese character type The long and long scorpion strokes in the Chinese character; "-,,, representing the short and short vertical strokes in the Chinese character type; and "one", representing the long horizontal and long vertical strokes in the Chinese character type.
  • Fig. 3 is a Chinese character type sorting method for a system according to an embodiment of the present invention, and the Chinese character type encoding rule of the present invention will be described below with reference to Fig. 3.
  • the Chinese character type is first classified, and according to the font structure, there are two classification methods: (Step 31)
  • the Chinese characters are divided into horizontal and vertical types. The principle is determined by the direction of the radicals or components.
  • the horizontal arrangement is horizontal, and the upper and lower is vertical, so Chinese characters can basically be used.
  • the two forms are separated by the horizontal and vertical forms, and the ratio of the separation is about half;
  • Step 32 The Chinese characters are divided into two types: a single word and a single character.
  • the single word means that the whole font has only a single part, and the combined word is composed of two or more radicals or parts.
  • the classified Chinese characters are identical in structure, that is, they are basically two-part structures.
  • step 35 is performed as shown in FIG. 3, and the Chinese character type is mapped to an alphanumeric code according to the above encoding rule.
  • the alphanumeric code means "0-5,"
  • step 36 can be performed, and the Chinese characters are compiled into sequence indexes in the system, and can be input in an intuitive manner;
  • step 37 the two sets of numbers in the font are directly paged through the sequence page number. In this way, the search for the index page of the department is saved, and the search for a large number of the same words is saved, thus saving a lot of time.
  • the information system may be a font library for querying a specific Chinese character type of pagination, and the Chinese font sorting method may further comprise the following steps: sorting the Chinese fonts and sequentially sorting them into each page of the font; if searching for a reservation in the font In the Chinese character type, the predetermined Chinese character type is first converted into a code, and then corresponds to the page code of the font, and then the Chinese character type is retrieved from the page of the page number.
  • the encoding includes at least any five of the alphanumeric characters
  • the Chinese font sorting method further comprises the step of: retrieving the Chinese font type in the information system in an order of encoding using an input device including alphabet keys and numeric keys. This can be applied to computers, print media, and the like.
  • the encoding includes at least five digits
  • the Chinese font sorting method further comprises the step of: retrieving the Chinese font in the information system in the order of encoding using an input device including a numeric key. It can also be effectively applied to handheld devices such as mobile phones and PDAs.
  • the font sorting method also includes the following steps: Using the numeric keys 1, 2, 3, 4, and 5 in the order of encoding in the information system. Retrieving the Chinese character type.
  • the predetermined stroke order is a Chinese character type stroke order.
  • the Chinese character type first part corresponds to the sequence page number index.
  • the Chinese character type is first classified in the horizontal and vertical manners, and then the feature position of the first part is determined, and the first stroke order corresponds to the ",,, point,”"Short and short squats, “long squats and long squats, “ - “short and short vertical and “one” long horizontal and long vertical five sets of stroke codes, converted into their own number groups, different first parts according to different numerical sequences The group is set.
  • FIG. 5 shows a first part corresponding page number partition index according to an embodiment of the present invention
  • FIG. 5 shows a Chinese character type horizontal and vertical line division example according to an embodiment of the present invention
  • FIG. 6 shows a team member according to the present invention.
  • the horizontal vertical font of the embodiment of the invention corresponds to an example of the primary and secondary page numbers.
  • Fig. 5 is an example of a category of horizontal and vertical lines that distinguishes Chinese characters.
  • 51 is the most abundant type of sound-shaped word, mainly composed of ideographic parts, namely the radical and the sound component, and the two components are arranged side by side; the word examples are “Lin”, “Gui,,” “Twig,” etc. .
  • 52 is the three components arranged side by side; the word examples are "speed”, “rush”, “extension” and so on.
  • 53 is a way in which large horizontal parts surround small parts; examples include “ ⁇ ", “smith”, “medical”, and the like.
  • 54 is a way in which large horizontal parts carry small parts and three parts are side by side; the words have " ⁇ ", “do”, “balance,”, etc.
  • 55 is a single word with a small number of Chinese characters, all of which belong to the vertical arrangement.
  • the wording has “I,,," “No,”, “Car,, etc.”
  • 56 is composed of two parts arranged in the following manner; the word examples are “word”, “fu”, “zhi” and so on.
  • 57 is a large vertical component surrounding the widget; the word example has “solid”, “same”, “national” and so on.
  • 58 is a vertical small part placed in the lower left corner or the lower right corner of the large part; the words have “exhibition”, “plague", "screen,” and “ ⁇ ", “loading", “planting,” and so on.
  • 59 is an arrangement of three components up and down; the word examples are “product”, “ ⁇ ", “ ⁇ ”, and the like.
  • Fig. 6 is an example of a page code encoding in which the Chinese character type is a horizontal and vertical corresponding primary and secondary.
  • the radicals of the Chinese character "busy” and “forget” are the same in classification, but the ideographic components of the radicals appear in different shapes due to different arrangement patterns, that is, corresponding to different combinations of stroke numbers and numbers. Different belonging page numbers. 401 horizontal "busy” word, the first part is the word “ ⁇ ”, the corresponding stroke order is ",” point, "-” long vertical and ",,, point, the converted sequence of numbers is
  • the Chinese character type in the form classification of the Chinese character type, it is divided into horizontal row and vertical row; in the structure of the Chinese character type, it is divided into a single word and a combined word, which are all encoded by two sets of numbers; The font strokes are less than the sequence number combination, and are all represented by the number "0". For example, "soil” is a single word, and the page number of the word “soil” is "455,000.”
  • the present invention also provides an information system, the content of which includes a Chinese character type, the Chinese character type is sorted according to the above Chinese character type sorting method, or is performed by using the above Chinese character type sorting means.
  • the above information system can be a database.
  • it may also be a font for inputting Chinese characters.
  • the fonts described above may also be fonts for querying a particular Chinese font type of pagination, for example, may also include a paper printed dictionary or dictionary.
  • the present invention realizes the sorting of the Chinese character type of the information system, thereby realizing quick and convenient input, retrieval and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

L'invention concerne un procédé d'extraction et classement d'un type de caractères chinois pour un système d'information. Le contenu du système d'information comprend le type de caractères chinois. Le procédé d'extraction et classement du type de caractères chinois consiste à : mettre en concordance le type de caractères chinois avec le code obtenu par le numéro de la lettre conformément à une règle de codage prédéfinie; et classer le type de caractères chinois conformément à l'ordre du code. L'invention concerne également un dispositif d'extraction et classement d'un type de caractères chinois pour un système d'information, et le système d'information utilisant le procédé ou le dispositif d'extraction et classement.
PCT/CN2008/000109 2007-01-19 2008-01-16 Dispositif et procédé d'extraction et classement de type de caractères chinois et systèmes d'information WO2008089654A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710000750.9 2007-01-19
CNB2007100007509A CN100476826C (zh) 2007-01-19 2007-01-19 中文字型排序检索方法和装置以及一种信息系统

Publications (1)

Publication Number Publication Date
WO2008089654A1 true WO2008089654A1 (fr) 2008-07-31

Family

ID=38692597

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/000109 WO2008089654A1 (fr) 2007-01-19 2008-01-16 Dispositif et procédé d'extraction et classement de type de caractères chinois et systèmes d'information

Country Status (2)

Country Link
CN (1) CN100476826C (fr)
WO (1) WO2008089654A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100476826C (zh) * 2007-01-19 2009-04-08 劳英杰 中文字型排序检索方法和装置以及一种信息系统
CN101408873A (zh) * 2007-10-09 2009-04-15 劳英杰 全范围语义信息综合认知系统及其应用
CN103399756A (zh) * 2013-08-21 2013-11-20 苏州换游信息科技有限公司 冒泡法排序软件

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1219701A (zh) * 1997-12-09 1999-06-16 王仁富 汉字笔划笔顺拼音部首数字输入法
CN101000625A (zh) * 2007-01-19 2007-07-18 劳英杰 中文字型排序检索方法和装置以及一种信息系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1039666C (zh) * 1993-11-06 1998-09-02 黄飞梦 基于两笔形与两笔符的汉字输入方法及键盘
CN1193139A (zh) * 1997-03-07 1998-09-16 梅保全 简拼简划汉字编码及输入方案
CN1295588C (zh) * 2004-05-26 2007-01-17 成巨才 汉字输入方法
CN1271495C (zh) * 2004-06-09 2006-08-23 倪国章 字根首笔划汉字数码输入法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1219701A (zh) * 1997-12-09 1999-06-16 王仁富 汉字笔划笔顺拼音部首数字输入法
CN101000625A (zh) * 2007-01-19 2007-07-18 劳英杰 中文字型排序检索方法和装置以及一种信息系统

Also Published As

Publication number Publication date
CN101000625A (zh) 2007-07-18
CN100476826C (zh) 2009-04-08

Similar Documents

Publication Publication Date Title
JP2006127510A (ja) テンキー・キーボードのための多言語入力方法エディタ
CN1140868C (zh) 表意语言及非表意语言的文字输入系统
CN100462901C (zh) Gb拼音输入法
CN85100837A (zh) 优化五笔字型编码法及其键盘
CN100403239C (zh) 基于英文键盘的藏文输入法
WO2008089654A1 (fr) Dispositif et procédé d'extraction et classement de type de caractères chinois et systèmes d'information
CN102830809A (zh) 汉字编码输入法
CN102368177B (zh) 新汉字声韵输入方法及输入键盘
CN104850240A (zh) 一种基于手机20键位输入法的显示键盘及其输入方法
CN107256092B (zh) 汉字数字形码快速输入法
CN101114196B (zh) 输入中文短语的方法和设备
CN102750009A (zh) 一种无切换汉字输入法及键盘
CN105912139A (zh) 一种模块化笔画编码汉字对应识别的方法
CN1367420A (zh) 数码键盘中文输入方法及其键位例
CN1472626A (zh) 嵌入式智能文字输入解决方法和装置
CN105807949B (zh) 藏文输入方法和系统
CN1196057C (zh) 一码二形数字编码汉字输入方法
CN1162767C (zh) 方圆归类象形码汉字输入法
CN102053718A (zh) 用于生成汉字的方法以及键盘输入设备
CN105589574B (zh) 一种基于五个元音码编码的中英数混合文字输入方法
CN101135934A (zh) 手机汉字输入法
CN100511111C (zh) 双码联合输入法
CN102566764B (zh) Qwerty键盘及其输入方法
CN117917621A (zh) 汉字输入方法和系统以及键盘
CN100389375C (zh) 一种数字码输入法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08700659

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08700659

Country of ref document: EP

Kind code of ref document: A1