CN1755669A - Name input processing method and system - Google Patents
Name input processing method and system Download PDFInfo
- Publication number
- CN1755669A CN1755669A CN 200410083187 CN200410083187A CN1755669A CN 1755669 A CN1755669 A CN 1755669A CN 200410083187 CN200410083187 CN 200410083187 CN 200410083187 A CN200410083187 A CN 200410083187A CN 1755669 A CN1755669 A CN 1755669A
- Authority
- CN
- China
- Prior art keywords
- name
- character
- input
- surname
- given name
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims abstract description 81
- 230000014509 gene expression Effects 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 22
- 108091026890 Coding region Proteins 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000013178 mathematical model Methods 0.000 claims description 4
- 238000002224 dissection Methods 0.000 claims 4
- 230000019771 cognition Effects 0.000 claims 1
- 230000008676 import Effects 0.000 claims 1
- 230000005055 memory storage Effects 0.000 claims 1
- 238000013518 transcription Methods 0.000 claims 1
- 230000035897 transcription Effects 0.000 claims 1
- 230000008569 process Effects 0.000 description 15
- 238000004422 calculation algorithm Methods 0.000 description 10
- 238000001514 detection method Methods 0.000 description 5
- 150000001875 compounds Chemical class 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000003446 memory effect Effects 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种输入汉字人名的方法及系统,特别是一种能够快速输入汉语人名,减少选字并实现连续输入的方法和系统。The present invention relates to a method and system for inputting Chinese character names, in particular to a method and system capable of rapidly inputting Chinese names, reducing character selection and realizing continuous input.
背景技术Background technique
与印欧语系的字母文字不同,东方语系常常具有一个非常大的字符集,例如中文、日文和韩文。GB2312中的汉字达到6763个,而GBK中的汉字超过20,000个,因此汉字的输入始终被认为汉语信息化的第一难题。在日语和韩语输入中也存在着同样的问题。为解决汉字输入问题,在过去二十多年中,诞生了众多的输入法,包括著名的五笔字型和智能ABC等等。紫光拼音输入法和微软拼音输入法还通过语言统计模型,一定程度上解决了减少选字和连续输入的问题。Unlike the alphabetic scripts of Indo-European languages, Oriental languages, such as Chinese, Japanese, and Korean, often have a very large character set. The number of Chinese characters in GB2312 reaches 6763, while the number of Chinese characters in GBK exceeds 20,000, so the input of Chinese characters is always considered the first problem of Chinese informatization. The same problem also exists in Japanese and Korean input. In order to solve the problem of Chinese character input, many input methods have been born in the past 20 years, including the famous Wubi font and intelligent ABC, etc. Ziguang Pinyin input method and Microsoft Pinyin input method also solve the problem of reducing word selection and continuous input to a certain extent through the language statistical model.
普通输入法通过词表来进行从拼音到汉字的转换,然而人名区别于固定词条,具有动态组合的特性,因而无法预先收录一个足够大的人名表,这也就使得用户输入姓名时常常需要经过繁杂的多次选字。Ordinary input methods convert from pinyin to Chinese characters through vocabulary lists. However, personal names are different from fixed entries and have the characteristics of dynamic combination. Therefore, it is impossible to pre-record a large enough list of personal names, which often requires users to input names. After a complicated multiple word selection.
发明内容Contents of the invention
本发明的目的是提供一种能够高效输入人名的方法,该方法可以嵌入到现有的输入法中,也可以设计成一种独立的输入法。作为一个软件系统,可用于任何一种需要进行人名输入的诸如台式计算机、个人数字助理以及移动电话之类的信息设备。The purpose of the present invention is to provide a method capable of efficiently inputting a person's name, which can be embedded in an existing input method, or can be designed as an independent input method. As a software system, it can be used for any kind of information equipment such as desktop computers, personal digital assistants and mobile phones that need to input names.
根据本发明的一个方面,提供一种能够识别输入人名的输入方法,包括步骤:检测输入编码序列中的姓氏输入编码;从姓氏列表中检索与输入的姓氏输入编码对应的姓氏的候选用字;从输入编码序列中获得名字的字数;当判断名字的字数是单名时,则计算作为目标的单名候选用字的使用概率,并根据候选用字的使用概率,按降序排列候选用字;和当判断名字的字数是双名时,则计算作为目标的双名候选用字的使用概率,并根据候选用字的使用概率,按降序排列候选用字。According to one aspect of the present invention, there is provided a kind of input method capable of identifying the input person's name, comprising the steps of: detecting the surname input code in the input code sequence; retrieving the candidate characters of the surname corresponding to the input surname input code from the surname list; Obtain the word count of the name from the input coding sequence; when it is judged that the word count of the name is a single name, then calculate the use probability of the single name candidate word as the target, and arrange the candidate words in descending order according to the use probability of the candidate word; And when it is judged that the number of characters in the name is a double name, the usage probability of the target double-name candidate is calculated, and the candidate characters are arranged in descending order according to the usage probability of the candidate word.
根据本发明的另一个方面,提供一种能够识别输入人名的输入方法,包括步骤:检测输入编码序列中的姓氏输入编码;从姓氏列表中检索与输入的姓氏输入编码对应的汉字姓氏的候选用字;输入单名音节并查找存储的单名索引表;和利用表达式S(w)=0.5*(f(w,G0|p)+f(w,G2|p))获得该拼音对应的所有名字用候选字。According to another aspect of the present invention, there is provided a kind of input method capable of identifying the input person's name, comprising the steps of: detecting the surname input code in the input code sequence; retrieving the candidate of the Chinese character surname corresponding to the input surname input code from the surname list word; input single-name syllable and look up the single-name index table stored; and utilize expression S(w)=0.5*(f(w, G 0 |p)+f(w, G 2 |p)) to obtain this pinyin All corresponding names use candidate characters.
根据本发明的另一个方面,提供一种能够识别输入人名的输入方法,包括步骤:检测输入编码序列中的姓氏输入编码;从姓氏列表中检索与输入的姓氏输入编码对应的汉字姓氏的候选用字;输入双名音节并查找存储的双名索引表;生成与输入的输入编码对应的双名候选组合,并利用表达式
根据本发明的另一个方面,提供一种能够识别输入人名的输入方法,包括步骤:从当前输入的音节序列中获得当前的音节;查询姓氏列表,判断姓氏列表是否为空;如果姓氏列表为空,则结束姓名输入判断,并返回空结果,如果姓氏候选列表不为空,则以姓氏列表中的第一个候选作为输出姓氏,并列出姓氏候选用字;如果判断输出的姓氏不正确,则选择一个姓氏候选用字;如果输出结果是需要的姓氏,则从输入的音节序列中计算单名候选用字或候选双名用字。According to another aspect of the present invention, there is provided a kind of input method capable of identifying the input person's name, comprising the steps of: obtaining the current syllable from the currently input syllable sequence; querying the list of surnames to determine whether the list of surnames is empty; if the list of surnames is empty , then end the name input judgment, and return an empty result, if the surname candidate list is not empty, then use the first candidate in the surname list as the output surname, and list the surname candidate words; if the surname output is judged to be incorrect, Then select a surname candidate; if the output result is the required surname, then calculate the single-name candidate or the candidate double-name candidate from the input syllable sequence.
根据本发明的另一个方面,提供一种能够识别输入人名的输入装置,包括:输入编码装置,用于将汉字转换成可接受的输入编码序列;姓氏处理装置,用于检测和判断用户输入的输入编码中的姓氏;名字处理装置,用于在姓氏处理单元检测到输入编码中的姓氏后,识别后续拼音序列中的名字;和人名输出装置,用于输出与输入编码对应的名字候选汉字。According to another aspect of the present invention, there is provided an input device capable of identifying the name of an input person, including: an input coding device for converting Chinese characters into an acceptable input coding sequence; a surname processing device for detecting and judging the input of the user The surname in the input code; the name processing device is used to identify the name in the subsequent pinyin sequence after the surname processing unit detects the surname in the input code; and the name output device is used to output the name candidate Chinese characters corresponding to the input code.
根据本发明的方法,能够高效地输入汉字人名,对该方法稍作语种相关的修改也适用于输入日文和韩文。According to the method of the present invention, the name of a person in Chinese characters can be input efficiently, and a slight language-related modification of the method is also suitable for inputting Japanese and Korean.
附图说明Description of drawings
通过下面结合附图对本发明的优选实施例进行详细描述,将使本发明的上述及其它目的、特征和优点更加清楚。应该指出,下面给出的说明仅是为了更好地理解本发明而提供的实施例,而不是对本发明的限制。其中:The above and other objects, features and advantages of the present invention will be more clearly described through the following detailed description of the preferred embodiments of the present invention in conjunction with the accompanying drawings. It should be pointed out that the description given below is only an example provided for a better understanding of the present invention, rather than limiting the present invention. in:
图1是表示根据本发明实施例的人名输入系统的示意方框图;Fig. 1 is a schematic block diagram representing a name input system according to an embodiment of the present invention;
图2是表示汉语姓氏反向索引表;Fig. 2 is to represent the reverse index table of Chinese surname;
图3是表示根据本发明实施例人名输入系统在独立输入模式下的单名识别和输入;Fig. 3 shows the single name recognition and input of the personal name input system in the independent input mode according to the embodiment of the present invention;
图4是表示根据本发明实施例人名输入系统在独立输入模式下的双名识别和输入;和Fig. 4 shows the double-name recognition and input of the personal name input system in the independent input mode according to an embodiment of the present invention; and
图5是表示根据本发明实施例具有人名输入功能的输入界面。Fig. 5 shows an input interface with a name input function according to an embodiment of the present invention.
具体实施方式Detailed ways
下面参照附图对本发明的实施例进行详细的说明,在描述过程中省略了对于本发明来说是不必要的细节和功能,以防止对本发明的理解造成混淆。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings, and unnecessary details and functions for the present invention will be omitted during the description to prevent confusion in the understanding of the present invention.
下面描述根据本发明的输入法的基本原理。应该指出,虽然本发明实施例中描述的汉字人名输入,但本发明的构思和原理可以应用到其它领域的汉字输入方法。另外,通过进行与语种相关的修改,本发明也可应用于日文和韩文输入。The basic principle of the input method according to the present invention is described below. It should be pointed out that although the Chinese character name input is described in the embodiment of the present invention, the idea and principle of the present invention can be applied to Chinese character input methods in other fields. In addition, the present invention can also be applied to Japanese and Korean input by making language-related modifications.
下面以中文为例描述本发明。如果不考虑少数民族的姓名特例,中文中的姓名通常只具有四种有限的组合形式,即,单姓单名、单姓双名、复姓单名以及复姓双名。The present invention is described below by taking Chinese as an example. If the special case of names of ethnic minorities is not considered, names in Chinese usually have only four limited combination forms, namely, single surname and single name, single surname and double name, compound surname and single name, and compound surname and double name.
由于姓、名用字的组合可能性非常之大,普通输入法无法提供一个覆盖全部的姓名词表。但是,中国人的姓名用字及其组合是存在一定含义,而不是无规律任意组合的。另外,中国人的姓氏用字是有限的。因此,存在着概率意义下的分布特征。本发明就是利用该特征,通过一个人名识别算法来帮助用户在从编码到汉字的转换过程中减少选字的次数,从而做到高效地输入人名。Because the combination possibility of surname and first name is very large, the common input method cannot provide a full name vocabulary. However, Chinese names and their combinations have certain meanings, rather than arbitrary combinations. In addition, Chinese surnames are limited in characters. Therefore, there are distribution characteristics in the sense of probability. The present invention utilizes this feature and uses a name recognition algorithm to help users reduce the number of word selections during the conversion process from codes to Chinese characters, so as to efficiently input names.
为清楚说明问题,本实施例以拼音输入为例说明本发明的人名输入法。应该指出,虽然本实施例中以拼音输入为例,但本发明并不局限于此,本发明中提出的方法同样适用于其他类型的输入法,例如基于笔画的输入法等。In order to clearly illustrate the problem, this embodiment takes pinyin input as an example to illustrate the personal name input method of the present invention. It should be pointed out that although pinyin input is used as an example in this embodiment, the present invention is not limited thereto, and the method proposed in the present invention is also applicable to other types of input methods, such as stroke-based input methods.
本发明的人名输入方法包括两种应用模式,即在已知当前输入内容为人名的独立模式,和连续输入汉字时的连续模式。在独立模式下,输入法被设定为只用来输入汉语人名,此后输入的字串将被认为是一个姓名,一种人名识别算法将被用来评估和选择输入拼音产生的候选汉字,使得用户可以通过较少的选字完成输入任务。独立模式适用于需要同时出现大量人名的场合,例如,输入工资表,人员名单等。在连续模式下,人名识别算法动态地检测用户输入的拼音串中可能存在的人名,并辅助用户的选字,从而提高输入的效率。连续模式适用于在一般的文章输入等场合输入人名的情况。独立模式可以由用户主动将输入法程序设定为姓名输入模式或者切换到姓名输入法,也可以由操作系统、网页浏览器等环境自动激活姓名输入法,例如填写网页表格中的姓名项。连续输入模式则用于大段文本输入中提高姓名输入的效率。在两种模式下,本发明都通过一个人名识别算法来达到高效输入人名的目的。人名识别算法利用设定的识别方式和数学模型,对输入的拼音编码产生的候选字进行排序,最后产生最大概率意义下的汉字组合,从而降低了用户输入名字时选字的次数,达到高效输入的目的。两种模式下的算法运行方式有所差别,将在下文中分别对此进行描述。The personal name input method of the present invention includes two application modes, that is, an independent mode when the current input content is known to be a personal name, and a continuous mode when continuously inputting Chinese characters. In the stand-alone mode, the input method is set to only input Chinese names, after which the input string will be considered as a name, and a name recognition algorithm will be used to evaluate and select the candidate Chinese characters generated by the input pinyin, so that Users can complete input tasks with fewer word selections. The independent mode is suitable for occasions where a large number of names need to appear at the same time, for example, entering salary tables, personnel lists, etc. In the continuous mode, the name recognition algorithm dynamically detects the names of people that may exist in the pinyin string input by the user, and assists the user in word selection, thereby improving the efficiency of input. The continuous mode is suitable for inputting people's names in general article input and other occasions. In the independent mode, the user can actively set the input method program to the name input mode or switch to the name input method, or the name input method can be automatically activated by the operating system, web browser and other environments, such as filling in the name item in the web form. The continuous input mode is used to improve the efficiency of name input in large text input. In both modes, the present invention achieves the purpose of efficiently inputting names through a name recognition algorithm. The name recognition algorithm uses the set recognition method and mathematical model to sort the candidate characters generated by the input pinyin code, and finally generates the combination of Chinese characters with the highest probability, thereby reducing the number of times the user selects characters when inputting the name and achieving efficient input the goal of. The operation of the algorithm in the two modes is different, which will be described separately below.
下面根据本发明的原理说明根据本发明的人名识别模型及其建立。人名识别模型及其建立The following describes the name recognition model and its establishment according to the present invention according to the principle of the present invention. Name Recognition Model and Its Establishment
人名识别模型是一个数学模型,该模型从一个预先建立的姓名数据库中统计必要的数学参数,用来在输入阶段评价一个汉字(组合)作为汉语姓名侯选汉字的可能性。The name recognition model is a mathematical model, which counts the necessary mathematical parameters from a pre-established name database, and is used to evaluate the possibility of a Chinese character (combination) as a Chinese name candidate in the input stage.
姓名数据库中罗列了一系列真实人名,例如:“陈赓,张治中,冯玉祥,戴安澜”等等。针对中文姓名的四种有限组合形式,对单名和双名分别进行处理。这是由于从数学统计角度来说,单名和双名拥有的信息量不同。A series of real names are listed in the name database, for example: "Chen Geng, Zhang Zhizhong, Feng Yuxiang, Dai Anlan" and so on. For the four limited combinations of Chinese names, single and double names are processed separately. This is because from the perspective of mathematical statistics, the amount of information possessed by a single name and a double name is different.
对于单名,假设我们考察一个汉字w作为姓名中的单名用字G0的情况。为了不失其一般性,可以以“赓”字为例。当用户输入拼音“geng”时,在人名库中,与发“geng”音的名字用字包括“赓、庚、耕、恒、耿、亘”等。对这些同音字作为名字用字进行计数统计,并且按其在名中出现频度的降序排列,以此预测单名用字。总的来说,为了预测单名用字,需要建立一个从拼音编码到同音候选单名用字的索引,且所有的单名用字按一个特定频次进行降序排列,构成一个单名索引表。如下面的表1所示:For a single name, suppose we examine the situation that a Chinese character w is used as the single name G 0 in a name. In order not to lose its generality, the word "Geng" can be used as an example. When the user inputs the pinyin "geng", in the database of personal names, the names with the pronunciation of "geng" include "geng, geng, geng, heng, geng, heng" and so on. These homonyms are counted and counted as the words used in names, and are arranged in descending order of their frequency of occurrence in names, so as to predict the words used in single names. In general, in order to predict single-name characters, it is necessary to establish an index from pinyin codes to homonym candidate single-name characters, and all single-name characters are sorted in descending order according to a specific frequency to form a single-name index table. As shown in Table 1 below:
表中的数字表示该字在名字中出现的频次。该索引表可以存储在诸如计算机之类的信息处理设备的存储装置中,即作为预测单名用字的识别模型。对该索引表的数学描述为:拼音编码为p的所有汉字,作为单名用字的频次,即如下面的表达式(1)表示。The numbers in the table indicate how often the word appears in the name. The index table can be stored in a storage device of an information processing device such as a computer, that is, as a recognition model for predicting single-name characters. The mathematical description of the index table is: the frequency of all Chinese characters whose pinyin code is p, as a single name, is represented by the following expression (1).
f(w,G0|p)=单名用字w的使用次数 …(1)f(w, G 0 |p) = the number of times a single name word w is used...(1)
同样,对于双名,可以假设汉字w1和w2分别作为双名的第一字G1和第二字G2。为了不失一般性,以双名名字“治中”为例。当用户输入拼音“zhizhong”时,如同单名用字,人名库中分别具有zhi和zhong在双名中作为第一字和第二字出现的同名字表。对名字“治中”的出现频次进行计数并建立如同单名用字索引表。其数学描述分别为:Likewise, for double names, Chinese characters w 1 and w 2 can be assumed as the first character G 1 and the second character G 2 of the double name respectively. In order not to lose generality, take the double name "Zhizhong" as an example. When the user inputs the pinyin "zhizhong", it is like using a single name, and the name database has lists of the same names in which zhi and zhong appear as the first and second characters in double names respectively. The occurrence frequency of the name "Zhizhong" is counted and an index table of words used for a single name is established. Their mathematical descriptions are:
f(w,G1|p)=编码为p的双名用字w作为双名第一字的使用次数f(w, G 1 |p)=encoded as p double name use word w as the use times of the first word of double name
f(w,G2|p)=编码为p的双名用字w作为双名第二字的使用次数f(w, G 2 |p)=coded as the number of times that the word w of the double name used as the second word of the double name is coded as p
…(2) …(2)
此外,双名的两个单字之间的连接频度是一个度量两个独立汉字构成双名的可能性的有效方法。其原理是二元连接参数对出现过的姓名具有将强的记忆作用。故引入单字连接频度f(w2|w1):In addition, the connection frequency between two single characters of a double name is an effective method to measure the possibility of two independent Chinese characters forming a double name. The principle is that the binary connection parameters have a strong memory effect on the names that have appeared. Therefore, the word connection frequency f(w 2 |w 1 ) is introduced:
上式表示单字连接频度等于汉字w1和w2作为双名的次数除以以汉字w1作为双名中的第一字的次数。其中汉字w1和w2作为双名的次数包括汉字w1和w2分别在双名中出现,以及一起出现的次数。The above formula indicates that the single-character connection frequency is equal to the number of Chinese characters w 1 and w 2 as double names divided by the number of times Chinese character w 1 is used as the first word in a double name. The number of times that Chinese characters w 1 and w 2 are used as double names includes the times that Chinese characters w 1 and w 2 appear in double names respectively and together.
另外,作为替换,采用双名的两个单字的带调音节的连接频度也是一种较好的度量方法。其原理是汉语姓名重视朗读的顺畅性。因此引入了带调音节连接频度f(s2|s1)函数:In addition, as an alternative, the concatenation frequency of toned syllables using double names is also a better measurement method. The principle is that Chinese names attach importance to the smoothness of reading aloud. Therefore, the tonal syllable connection frequency f(s 2 |s 1 ) function is introduced:
其中s表示输入字的音节。Where s represents the syllable of the input word.
如上面的表达式(1)-(4)所示,表达式(1)统计作为目标的单名用字频率,用来描述单名用字的使用频率。这个数值可以作为单名用字排序的依据。表达式(2)统计目标为双名用字的汉字在不同位置的频率。表达式(3)和(4)统计目标用于评价两个姓名用字及其发音的组合能力。As shown in the expressions (1)-(4) above, the expression (1) counts the target single-name word frequency, and is used to describe the use frequency of a single-name word. This value can be used as the basis for sorting single names. Expression (2) counts the frequency of Chinese characters in different positions whose target is double-name characters. The statistical objectives of expressions (3) and (4) are used to evaluate the combination ability of two names with words and their pronunciation.
对于单名,考虑到单名用字也常常作为双名用字的第二字,故设定其使用分值为:For a single name, considering that a single name is often used as the second word of a double name, the use score is set as:
E1:S(w)=0.5*(f(w,G0|p)+f(w,G2|p)) (5)E1: S(w)=0.5*(f(w, G 0 |p)+f(w, G 2 |p)) (5)
(请详细说明表达式(5)的含义)(Please elaborate on the meaning of expression (5))
对于双名,假设两个拼音产生的一个汉字候选组合为w1w2,其使用分值被设定为:For double names, assuming that a Chinese character candidate combination produced by two pinyins is w 1 w 2 , its usage score is set as:
E2:
(请详细说明上式的含义)(Please explain in detail the meaning of the above formula)
人名输入系统name input system
下面参考附图说明本发明的人名输入系统。The personal name input system of the present invention will be described below with reference to the accompanying drawings.
具有根据本发明所描述的能够高效输入人名功能的输入法可以作为现有输入法的补充部分,或者作为独立软件用于任何需要输入人名的设备中。图1是根据本发明一个实施例的人名输入系统的结构方框图。The input method with the function of efficiently inputting personal names according to the present invention can be used as a supplementary part of the existing input method, or used as independent software in any device that needs to input personal names. FIG. 1 is a structural block diagram of a name input system according to an embodiment of the present invention.
根据本发明的人名输入系统包括编码输入装置1,姓氏处理单元2,名字处理单元3,人名输出装置4,姓氏列表存储单元5,识别规则存储单元6,和识别模型存储单元7。The personal name input system according to the present invention includes a
输入编码装置1接受用户输入的字符编码序列,并将该输入字符序列提供给姓氏处理单元2。姓氏处理单元2利用姓氏列表存储单元5和识别规则存储单元6进行姓氏检测,并将检测到的姓氏提供给名字处理单元3。名字处理单元3利用识别模型7进行名字检测,将检测的结果输出到人名输出装置4,并由人名输出装置4显示与输入的字符编码序列对应的人名。下面说明图1中的各个部分的操作。The
编码输入code input
为输入汉字,用户需要以某种编码方法来把汉字转换成计算机可接受的编码串。对于基于汉语拼音的输入法,其编码串就是汉字的拼音串。考虑最具一般性的全拼输入,以输入姓名“夏海荣”为例,用户将可能输入“xia hai rong”。该拼音串输入到系统中后,被解析成“xia”、“hai”和“rong”三个独立音节。In order to input Chinese characters, users need to convert Chinese characters into computer-acceptable coded strings with some encoding method. For the input method based on Chinese Pinyin, its encoding string is the pinyin string of Chinese characters. Considering the most general Quanpin input, taking the input of the name "Xia Hairong" as an example, the user may input "xia hai rong". After the pinyin string is input into the system, it is parsed into three independent syllables "xia", "hai" and "rong".
姓氏处理Surname processing
汉语具有相当稳定的姓氏,通过建立一张姓氏列表(见表一),可以用来检测和判断用户输入的拼音中的姓氏。姓氏列表中所包括的姓氏可以通过编码进行检索,例如对于上例中的姓氏“夏”可以通过输入的“xia”进行检索。再例如,姓氏“欧阳”可以通过输入的“ou yang”进行检索。姓氏列表可以存储在姓氏列表存储单元5中。Chinese has fairly stable surnames, and by creating a list of surnames (see Table 1), it can be used to detect and judge the surnames in the Pinyin input by the user. The surnames included in the surname list can be retrieved by encoding, for example, the surname "Xia" in the above example can be retrieved by inputting "xia". For another example, the surname "Ouyang" can be retrieved by entering "ou yang". The surname list may be stored in the surname
表一:汉语姓氏列表(部分)
在该表在使用时,常常需要建立一个从编码到姓氏的反向索引,其结构图2所示。通过这个反向索引表,可以判断一个输入的音节,是否可能构成一个人名中的姓氏,以及有多少同音的姓氏。例如,当输入拼音“ai”时,姓氏处理单元2可以到姓氏列表存储单元5中查找拼音编码“ai”所对应的汉字姓氏。通常,拼音编码“ai”所对应的汉字姓氏只有“艾”,因此,检测出的姓氏既为“艾”。When the table is in use, it is often necessary to establish a reverse index from code to surname, and its structure is shown in Figure 2. Through this reverse index table, it can be judged whether an input syllable may constitute a surname in a person's name, and how many homophonic surnames there are. For example, when the pinyin "ai" is input, the
汉语姓氏中存在着少量的复姓,例如“夏侯、欧阳、尉迟、诸葛”等,其第一字本身也可作为单姓。在此情况下,作为例子,系统可以将其默认为构成一个复姓,但允许用户去进行修改。即当用户输入拼音“ouyang…”时,姓氏处理单元2可以认为该人名的姓氏为“欧阳”而不是“欧”。而当输入拼音“ou”时,姓氏处理单元2可以检出单姓用字“欧”和“区”,并且作为一种设置,可以检出复姓“欧阳”。There are a small number of compound surnames in Chinese surnames, such as "Xiahou, Ouyang, Yuchi, Zhuge", etc., and the first character itself can also be used as a single surname. In this case, as an example, the system can default it to form a compound name, but allow the user to modify it. That is, when the user inputs the pinyin "ouyang...", the
另外,汉语姓氏中还包括部分同音字,如图2中列出的“张”和“章”。没有显著有效的方法可以解决该问题,用户可以通过选字过程进行确定。In addition, Chinese surnames also include some homonyms, such as "Zhang" and "Zhang" listed in Figure 2. There is no obvious effective way to solve this problem, and the user can determine it through the word selection process.
对于用户的输入,如果系统不能找出一个合适的姓氏,则中止人名输入过程,切换到输入法的正常选字过程来进行选字输入。For the user's input, if the system cannot find a suitable surname, the process of inputting the name of the person is stopped, and the normal character selection process of the input method is switched to the character selection input.
上述识别规则可以存储在识别规则存储单元6中。The above identification rules may be stored in the identification
名字处理name manipulation
与姓氏相比,名字用字比姓氏用字要广泛的多,因此此处涉及上面描述的主要算法过程。其过程因系统处于独立模式还是连续模式而稍有差别。对于确定了的输入法程序,可以在信息处理设备上设定一个特殊的按键组合或者如按钮、菜单项等其他控制方式,来切换姓名输入的模式。Compared with the surname, the first name is much more extensive than the last name, so the main algorithm process described above is involved here. The procedure differs slightly depending on whether the system is in stand-alone or continuous mode. For a determined input method program, a special key combination or other control methods such as buttons and menu items can be set on the information processing device to switch the mode of name input.
1.独立模式1. Independent mode
在独立模式下,用户的目的就是输入人名,所以认为其输入的码串总是有效的姓名编码。在用户输入结束后,系统就可以直接获知姓名的总字数,通过姓氏处理单元2的处理后,名字处理单元3就可以获知名字的字数,即,是单名还是双名。因此,名字处理单元3可以针对单名和双名独立分别处理。In the stand-alone mode, the purpose of the user is to input a person's name, so the input code string is considered to be an effective name code. After the user's input is finished, the system can directly know the total number of characters of the name. After processing by the
图3示出了本发明的系统处理单名用字的操作流程。首先,在步骤301输入单名音节s。此后,在步骤S302,名字处理单元3查找单名索引表了。接下来,在步骤S303利用表达式(5)获得该拼音对应的所有名字用候选字,并在步骤S304挑选具有最高频次的单名候选字作为输出用字。作为例子,单名索引表可以存储在识别模型单元7中,也可以单独设置存储单元,用于存储名字用字索引表。例如,将其分为单名索引表,和双名索引表。Fig. 3 shows the operation flow of the system of the present invention to process single-name characters. First, in step 301, a single-named syllable s is input. Thereafter, in step S302, the
下面以用户输入姓名“chen geng”为例来详细说明,姓氏处理单元首先判断“chen”为单姓“陈”,然后根据索引表找出“geng”按频次排序的候选字“赓、耕、亘…”,以第一个候选字“赓”作为选择输出,并保留该列表供用户选择。Let’s take the user’s input of the name “chen geng” as an example to illustrate in detail. The surname processing unit first judges that “chen” is a single surname “Chen”, and then finds out the candidate characters of “geng” sorted by frequency according to the index table. ...", take the first candidate word "Geng" as the selection output, and keep the list for the user to choose.
图4示出了本发明的系统处理双名用字的操作流程。首先,在步骤401输入双名音节s1,s2。此后,在步骤S402,名字处理单元4查找存储的双名索引表。接下来,在步骤S403生成与输入的拼音对应的双名候选组合,并在步骤S404利用表达式(6)评价在步骤S403中生成的每个双名侯选组合。在步骤S405输出具有最高得分的双名候选作为输出用字。输出顺序可以按双名组合出现的频次排列。Fig. 4 has shown the operation flow of the system of the present invention to handle double-named characters. First, in step 401, double name syllables s1, s2 are input. Thereafter, in step S402, the
下面以用户输入“xia hai rong”为例来详细说明在独立模式下,人名输入系统识别双名的过程。系统首先判断“xia”为单姓“夏”,然后根据双名索引表找出“hai”和“rong”的按频次排序的候选字,“海、亥、还”和“容、荣、蓉、融、溶、熔”。组合并利用表达式(6)进行打分后,按得分排序双名候选字如下:海荣、海蓉、海融、海熔,用户可选择系统给出的侯选字。最后,系统将以“海荣”作为识别输出。The following takes the user's input of "xia hai rong" as an example to describe in detail the process of identifying double names by the name input system in the stand-alone mode. The system first judges that "xia" is a single surname "Xia", and then finds the candidate characters of "hai" and "rong" sorted by frequency according to the double name index table, "hai, hai, return" and "rong, rong, rong, Melt, melt, melt". After combining and scoring with the expression (6), the double-name candidate words are sorted according to the score as follows: Hairong, Hairong, Hairong, Hairong, and the user can choose the candidate words given by the system. Finally, the system will use "Hairong" as the recognition output.
2.连续模式2. Continuous mode
下面说明根据本发明在连续模式下处理人名输入的情况。在连续模式下,用户连续地输入汉字编码字串,没有明显的标志来标明人名的起始和结束,因此系统需要动态地检测出用户输入汉字编码串中存在的人名。其基本算法如下:The following describes the case of processing the input of a person's name in the continuous mode according to the present invention. In the continuous mode, the user continuously inputs the Chinese character encoding string, and there is no obvious mark to mark the beginning and end of the name, so the system needs to dynamically detect the name of the person in the Chinese character encoding string input by the user. Its basic algorithm is as follows:
首先,可以设置当前音节变量s,和s的候选字w。从当前输入的音节序列中获得当前的音节s。此后,查询姓氏列表,获得姓氏候选列表1。判断姓氏列表是否为空。如果姓氏列表为空,则结束连续模式下的姓名输入判断,并返回空结果。如果姓氏候选列表不为空,则以姓氏列表中的第一个候选作为输出姓氏。此后,用户可以判断输出的姓氏是否正确,如果输入不正确,即输出的姓氏不是用户需要的姓氏,用户可以选择一个候选字。如果输出的是非姓氏用字,则结束姓氏识别过程,返回空结果,继续汉字输入。如果输出的结果是用户需要的姓氏,则分别假设用户输入的是单名和双名。此后,按照上述表达式(5)和(6)分别对输入的编码拼音进行评分计算,并将具有最高得分的单名和双名用字显示给用户。First, the current syllable variable s and the candidate word w of s can be set. Get the current syllable s from the currently input syllable sequence. Thereafter, the surname list is queried to obtain a
此后,用户判断选出的人名是否正确。如果正确,则结束本次人名输入识别。如果用户判断选出的人名不正确,则根据给出的候选字选择名字的第一个字。如果用户确定人名输入结束,则结束本次人名输入识别。如果用户确定人名输入没有结束,本发明的系统则按照表达式(6)进行评分计算,并选择具有最高得分的双名显示给用户。用户可以判断选择的人名的第二个字是否正确。如果正确,则结束本次人名输入。如果不正确,则选择名字的第二个字,结束本次人名输入识别。Thereafter, the user judges whether the selected person's name is correct. If it is correct, the current recognition of the person's name input is ended. If the user judges that the selected person's name is incorrect, then select the first word of the name according to the given candidate words. If the user determines that the input of the person's name is finished, the current recognition of the person's name input is ended. If the user determines that the name input has not ended, the system of the present invention performs score calculation according to expression (6), and selects the double name with the highest score to be displayed to the user. The user can judge whether the second character of the selected person's name is correct. If it is correct, the current name input is ended. If it is not correct, then select the second word of the name to end the identification of this person's name input.
为了清楚起见,上述过程可用下表表示:
下面举例什么根据本发明的任命输入法,在连续模式下的识别过程。例如,假设用户输入拼音串“wo he li bao ying lao shi you guo yi mianzhi yuan”,目标字串是“我和李宝英老师有过一面之缘”。对每个音节查找其对应的姓氏,列表如下:
按照本发明的识别过程,系统首先将“wo”转换成假设姓氏“沃”后,用户在4.2.1选择“我”非姓氏用字,返回空,本次人名识别输入结束。依次执行至“li”,用户在4.2.1选择“李”,其后系统分别假设单名和双名,并通过计算向用户推荐一个候选人名“李宝英”,用户在第6.1步判断系统推荐人名错误,故由用户在6.1选择“保”,随后系统再输出人名“李保英”,正确结束。系统按照算法,依次处理每个音节,直至音节串结束。According to the identification process of the present invention, after the system first converts "wo" into the hypothetical surname "wo", the user selects "I" as a non-surname in 4.2.1, returns empty, and the input of this person's name recognition ends. Execute to "li" in turn, the user selects "Li" in 4.2.1, and then the system assumes a single name and a double name, and recommends a candidate name "Li Baoying" to the user through calculation, and the user judges that the name recommended by the system is wrong in step 6.1 , so the user selects "Baoying" in 6.1, and then the system outputs the name "Li Baoying", which ends correctly. The system processes each syllable in turn according to the algorithm until the syllable string ends.
作为替换,可以在用户每次输入姓氏和人名后对用户的输入的人名进行记录和统计。如果与以前输入的姓氏或人名相同,则利用上面的表达式(1)-(6)更新相应人名的输入频次。由此可以根据人名的使用频率,针对用户输入的拼音编码实时地改变候选排序,以便更高效地进行人名输入。As an alternative, it is possible to record and count the user's inputted personal name after each input of the user's surname and personal name. If it is the same as the previously input surname or person's name, then use the above expressions (1)-(6) to update the input frequency of the corresponding person's name. Therefore, according to the frequency of use of the name, the ranking of candidates can be changed in real time for the pinyin code input by the user, so as to input the name more efficiently.
运行于PC平台上的输入法通常可以具有一个输入区和一个信息区,用户在输入区中输入拼音等编码,而信息区中显示编码对应的汉字候选。如果本发明工作在独立模式或者作为一个独立的输入法,则输入区和信息区均可直接利用,使用过程类似于普通的输入法。如果工作在连续模式,为了不影响输入法正常的输入和选字过程,可以在其输入区附近,例如,正常输入法的下面增加一个用于显示人名检测结果的第二信息区,如图6所示。The input method running on the PC platform can usually have an input area and an information area, the user inputs codes such as Pinyin in the input area, and the Chinese character candidates corresponding to the codes are displayed in the information area. If the present invention works in an independent mode or as an independent input method, both the input area and the information area can be used directly, and the use process is similar to an ordinary input method. If working in continuous mode, in order not to affect the normal input and word selection process of the input method, a second information area for displaying the detection result of the person name can be added near the input area, for example, below the normal input method, as shown in Figure 6 shown.
在正常输入汉字的过程中,用户通过第一输入区输入汉字编码,而从第一信息区中进行选字以更正输入区中的错误汉字。本发明在检测到可能的姓名时,就在屏幕上绘出第二信息区,用户可以通过输入法未定义的按键,例如方向键或者Tab键等,或者使用鼠标,切换到第二信息区,然后从第二信息区中选择本发明提供的候选人名用字。During the normal process of inputting Chinese characters, the user inputs the Chinese character code through the first input area, and selects characters from the first information area to correct wrong Chinese characters in the input area. When the present invention detects a possible name, it draws the second information area on the screen, and the user can switch to the second information area through an undefined key of the input method, such as an arrow key or a Tab key, or use a mouse. Then select the candidate name provided by the present invention from the second information area.
本发明的人名输入法可以应用需要输入汉字的设备上,例如,个人计算机(PC),便携式计算机,移动电话,PDA(个人数字助理)等设备上。The personal name input method of the present invention can be applied to devices that need to input Chinese characters, for example, personal computers (PCs), portable computers, mobile phones, PDAs (Personal Digital Assistants) and other devices.
根据本发明的人名输入系统可以通过硬件实现。也可以利用软件或硬件和软件的结合来实现。所述程序可记录在诸如软盘,硬盘,闪盘,CD-ROM,DVD-ROM之类的机器可读的记录介质上。The personal name input system according to the present invention can be realized by hardware. It can also be realized by using software or a combination of hardware and software. The program can be recorded on a machine-readable recording medium such as a floppy disk, a hard disk, a flash disk, a CD-ROM, and a DVD-ROM.
虽然参考优选实施例对本发明进行了描述,本发明并不局限于此,而仅由所附权利要求限定,本领域技术人员在不脱离本发明精神的情况下可对本发明的实施例进行各种改变和改进。Although the invention has been described with reference to preferred embodiments, the invention is not limited thereto but only by the appended claims, and various modifications can be made to the embodiments of the invention by those skilled in the art without departing from the spirit of the invention. changes and improvements.
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410083187 CN1755669A (en) | 2004-09-29 | 2004-09-29 | Name input processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 200410083187 CN1755669A (en) | 2004-09-29 | 2004-09-29 | Name input processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1755669A true CN1755669A (en) | 2006-04-05 |
Family
ID=36688908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200410083187 Pending CN1755669A (en) | 2004-09-29 | 2004-09-29 | Name input processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1755669A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102193709A (en) * | 2010-03-01 | 2011-09-21 | 腾讯科技(深圳)有限公司 | Character input method and device |
CN101267635B (en) * | 2008-04-25 | 2011-11-23 | 中兴通讯股份有限公司 | Chinese input device for contact book of mobile phone |
CN101634928B (en) * | 2008-12-04 | 2012-01-25 | 北京搜狗科技发展有限公司 | Method and device for displaying name candidate items |
CN102647503A (en) * | 2011-02-18 | 2012-08-22 | 中兴通讯股份有限公司 | Contact person information processing method and mobile terminal |
CN104008093A (en) * | 2013-02-26 | 2014-08-27 | 国际商业机器公司 | Method and system for chinese name transliteration |
CN107784027A (en) * | 2016-08-31 | 2018-03-09 | 北京国双科技有限公司 | A kind of reminding method and device of judgement document's search key |
CN108090033A (en) * | 2017-12-27 | 2018-05-29 | 北京天融信网络安全技术有限公司 | Name detection method, device, computer-readable medium and equipment |
US10083172B2 (en) | 2013-02-26 | 2018-09-25 | International Business Machines Corporation | Native-script and cross-script chinese name matching |
CN112783333A (en) * | 2019-11-06 | 2021-05-11 | 北京搜狗科技发展有限公司 | Input method, input device and input device |
-
2004
- 2004-09-29 CN CN 200410083187 patent/CN1755669A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267635B (en) * | 2008-04-25 | 2011-11-23 | 中兴通讯股份有限公司 | Chinese input device for contact book of mobile phone |
CN101634928B (en) * | 2008-12-04 | 2012-01-25 | 北京搜狗科技发展有限公司 | Method and device for displaying name candidate items |
CN102193709B (en) * | 2010-03-01 | 2015-05-13 | 深圳市世纪光速信息技术有限公司 | Character input method and device |
CN102193709A (en) * | 2010-03-01 | 2011-09-21 | 腾讯科技(深圳)有限公司 | Character input method and device |
CN102647503A (en) * | 2011-02-18 | 2012-08-22 | 中兴通讯股份有限公司 | Contact person information processing method and mobile terminal |
US9858268B2 (en) | 2013-02-26 | 2018-01-02 | International Business Machines Corporation | Chinese name transliteration |
CN104008093A (en) * | 2013-02-26 | 2014-08-27 | 国际商业机器公司 | Method and system for chinese name transliteration |
US9858269B2 (en) | 2013-02-26 | 2018-01-02 | International Business Machines Corporation | Chinese name transliteration |
US10083172B2 (en) | 2013-02-26 | 2018-09-25 | International Business Machines Corporation | Native-script and cross-script chinese name matching |
US10089302B2 (en) | 2013-02-26 | 2018-10-02 | International Business Machines Corporation | Native-script and cross-script chinese name matching |
CN107784027A (en) * | 2016-08-31 | 2018-03-09 | 北京国双科技有限公司 | A kind of reminding method and device of judgement document's search key |
CN108090033A (en) * | 2017-12-27 | 2018-05-29 | 北京天融信网络安全技术有限公司 | Name detection method, device, computer-readable medium and equipment |
CN112783333A (en) * | 2019-11-06 | 2021-05-11 | 北京搜狗科技发展有限公司 | Input method, input device and input device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240160403A1 (en) | Multi-modal input on an electronic device | |
CN1156741C (en) | Chinese handwriting identifying method and device | |
JP5501625B2 (en) | Apparatus and method for filtering distinct characters from indeterminate text input | |
US7395203B2 (en) | System and method for disambiguating phonetic input | |
US6636162B1 (en) | Reduced keyboard text input system for the Japanese language | |
KR100656736B1 (en) | System and method for disambiguating phonetic input | |
CN100334530C (en) | Reduced keyboard disambiguating systems | |
US20040223644A1 (en) | System and method for chinese input using a joystick | |
CN1815467A (en) | Dictionary learning method, and devcie for using same, input method and user terminal device for using same | |
CN1289081A (en) | Symbol input | |
US20200293276A1 (en) | Multi-modal input on an electronic device | |
WO2022134355A1 (en) | Keyword prompt-based search method and apparatus, and electronic device and storage medium | |
CN1755669A (en) | Name input processing method and system | |
US7366984B2 (en) | Phonetic searching using multiple readings | |
JP3532780B2 (en) | An input system for generating input sequence of phonetic kana characters | |
CN1755589A (en) | Text input method and device based on keys and voice recognition | |
CN113722447B (en) | Voice search method based on multi-strategy matching | |
CN1186708C (en) | Chinese characters inputting method and its apparatus | |
CN1679023A (en) | Method and system of creating and using chinese language data and user-corrected data | |
CN1043490C (en) | Redundancy conversion method and Chinese character conversion device | |
JP5626557B2 (en) | Character string conversion device, search device, character string conversion method, character string conversion program | |
CN1581031A (en) | Method and device for inputting chinese charater phrase | |
CN112154442A (en) | Text entry and conversion of phrase-level abbreviations | |
CN1609760A (en) | Nine-bit type font Chinese character inputting method and apparatus thereof | |
JPH09134350A (en) | Kana/kanji converting method of document processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |