CA2496872A1 - Phonetic and stroke input methods of chinese characters and phrases - Google Patents

Phonetic and stroke input methods of chinese characters and phrases Download PDF

Info

Publication number
CA2496872A1
CA2496872A1 CA 2496872 CA2496872A CA2496872A1 CA 2496872 A1 CA2496872 A1 CA 2496872A1 CA 2496872 CA2496872 CA 2496872 CA 2496872 A CA2496872 A CA 2496872A CA 2496872 A1 CA2496872 A1 CA 2496872A1
Authority
CA
Canada
Prior art keywords
sequences
phonetic
ideographic
input
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA 2496872
Other languages
French (fr)
Other versions
CA2496872C (en
Inventor
Pim Van Meurs
Lu Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Historic AOL LLC
Original Assignee
America Online Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/803,255 external-priority patent/US20050027534A1/en
Application filed by America Online Inc filed Critical America Online Inc
Publication of CA2496872A1 publication Critical patent/CA2496872A1/en
Application granted granted Critical
Publication of CA2496872C publication Critical patent/CA2496872C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods
    • G06F3/0237Character input methods using prediction or retrieval techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/018Input/output arrangements for oriental characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/02Input arrangements using manually operated switches, e.g. using keyboards or dials
    • G06F3/023Arrangements for converting discrete items of information into a coded form, e.g. arrangements for interpreting keyboard generated codes as alphanumeric codes, operand codes or instruction codes
    • G06F3/0233Character input methods

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
  • Input From Keyboards Or The Like (AREA)

Abstract

A system and method for inputting Chinese characters using phonetic-based or stroke-based input method in a reduced keyboard is disclosed. By introducing common indices to ideographic characters, the system allows the ideographic characters to be shared among different type of input methods such as phonetic-based input method and stroke-based input method. The system matches input sequences to input method specific indices such as phonetic or stroke indices. These input method specific indices are then converted into indices to ideographic characters, which is then used to retrieve ideographic characters.

Claims (82)

1. A method for input ideographic characters comprising the steps of:
(a) entering an input sequence into a user input device;
wherein said user input device comprises:
a plurality of input means, each of said input means being associated with a plurality of strokes or phonetic characters, and an input sequence being generated each time when an input is selected by said user input device;
data consisting of a plurality of input sequences and, associated with each input sequence, an input method specific database containing a plurality of input sequences and, associated with each input sequence, a set of phonetic sequences whose spellings correspond to the input sequence or a set of strokes sequences corresponding to the input sequence; and an ideographic database containing a set of ideographic character sequences, wherein each ideographic character contains an ideographic index, a plurality of stroke indices to corresponding stroke sequences and a plurality of phonetic indices to corresponding phonetic sequences;
(b) comparing the input sequence with said input method specific database and finding indices to matching strokes entries or phonetic entries and said matching stroke entries or phonetic entries;
(c) converting said matching indices to stroke entries or phonetic entries to matching ideographic indices;

(d) retrieving matching ideographic character sequences from said ideographic database by said matching ideographic indices; and (e) optionally displaying one or more of said matched ideographic character sequences.
2. The method of Claim 1, wherein said stroke indices are indices of strokes sorted by stroke sequences in a stroke input system.
3. The method of Claim 2, wherein said stroke input system is a five-stroke or an eight-stroke system.
4. The method of Claim 1, wherein said phonetic indices are indices of phonetic characters sorted by actual spelling in a phonetic input system.
5. The method of Claim 4, wherein said phonetic input system is a Pinyin system or a Zhuyin system.
6. The method of Claim 1, wherein said phonetic indices are indices of input means in a phonetic input system.
7. The method of Claim 1 further comprising the step of:
prioritizing stroke or phonetic sequences that match an input sequence and prioritizing ideographic character sequences that match a stroke or phonetic sequence according to a linguistic model.
8. The method of Claim 7, wherein said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of an ideograph;

radical and number of strokes of a radical;
alphabetical order;
frequency of occurrence of ideographic character sequences, stroke sequences or phonetic sequences in formal, conversational written, or conversational spoken text;
frequency of occurrence of ideographic character sequences, stroke sequences or phonetic sequences when following a preceding character or characters;
grammar of the surrounding sentence;
application context of current input sequence entry; and recency of use or repeated use of stroke, phonetic or ideographic character sequences by the user or within an application program.
9. The method of Claim 1, wherein said phonetic sequences comprise single syllables.
10. The method of Claim 1, wherein said phonetic sequences comprise single and multiple syllables.
11. The method of Claim 1, wherein said phonetic sequences comprise user generated sequences.
12. The method of Claim 11, wherein in absence of matching phonetic sequences in said database, a sequence of matching phonetic sequences is automatically generated based on single and optionally multiple syllable phonetic sequences.
13. The method of Claim 12, wherein said sequence of matching phonetic sequences is narrowed down through user interaction.
14. The method of Claim 12, wherein a sequence of matching ideographic character sequences is automatically generated based on matching phonetic sequences to ideographic character sequences.
15. The method of Claim 14, wherein a sequence of matching ideographic character sequences is narrowed down through user interaction.
16. The method of Claim 7, further comprising the step of:
once an ideographic character sequence is selected, changing the associated priority of said matching phonetic sequence and sequence of ideographic characters.
17. The method of Claim 1, wherein the user can specify an explicit ideographic character separator.
18. The method of Claim 1, further comprising the step of:
when the user enters a sequence of phonetic characters, returning a sequence of phonetic sequences of exact matches and predictions that partially match.
19. The method of Claim 18, wherein said sequence of phonetic sequences is ordered according to a linguistic model.
20. The method of Claim 19, wherein said linguistic model comprises at least one of:
alphabetical order;
frequency of occurrence of phonetic sequences or ideographic character sequences in formal or conversational written text;
frequency of occurrence of phonetic sequences or ideographic when following a preceding character or characters;

grammar of the surrounding sentence;
application context of current character sequence entry; and recency of use or repeated use of phonetic sequences by the user or within an application program.
21. The method of Claim 1, further comprising the step of:
once the user has selected a sequence of ideographic characters, presenting the user with a list of sequences of one or more ideographic characters.
22. The method of Claim 21, wherein said list of sequences is ordered according to a linguistic model.
23. The method of Claim 22, wherein said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of an ideograph;
radical and number of strokes of radical;
alphabetical order;
frequency of occurrence of ideographic characters in formal or conversational written text;
frequency of occurrence of ideographic characters when following a preceding character or characters;
grammar of the surrounding sentence;

application context of current character entry; and recency of use or repeated use of ideographic characters by the user or within an application program.
24. The method of Claim 1, wherein the user can enter partial syllables for each of the multiple syllable words.
25. The method of Claim 24, wherein the number of partial keystrokes for each syllable is one.
26. The method of Claim 1, wherein one of said plurality of inputs is associated with a special wildcard input that is associated with zero or one of strokes.
27. The method of Claim 1, wherein one of said plurality of inputs is associated with a special wildcard input that is associated with zero or one of said phonetic characters.
28. The method of Claim 1, wherein said phonetic indices are indices of phonetic characters sorted by actual spelling in a phonetic input system.
29. A system for receiving input sequences entered by a user and generating textual output in Chinese language, said system comprising:
a user input device having a plurality of input means, each of said input means being associated with a plurality of strokes or phonetic characters, an input sequence being generated each time when an input is selected by said user input device;
an input method specific database containing a plurality of input sequences and, associated with each input sequence, a set of phonetic sequences whose spellings correspond to the input sequence or a set of strokes sequences corresponding to the input sequence;

an ideographic database containing a set of ideographic character sequences, wherein each ideographic character contains an ideographic index, a plurality of stroke indices to corresponding stroke sequences and a plurality of phonetic indices to corresponding phonetic sequences;
means for comparing the input sequence with said input method specific database and finding indices to matching strokes entries or phonetic entries and said matching stroke entries or phonetic entries;
means for converting said matching indices to stroke entries or phonetic entries to matching ideographic indices;
means for retrieving matching ideographic character sequences from said ideographic database by said matching ideographic indices; and an output device for displaying one or more matched stroke or phonetic entries, and matched ideographic characters.
30. The method of Claim 28, wherein said stroke indices are indices of strokes sorted by stroke sequences in a stroke input system.
31. The system of Claim 29, wherein said stroke input system is 5-stroke or 8-stroke system.
32. The system of Claim 28, wherein said phonetic indices are indices of phonetic characters sorted by actual spelling in a phonetic input system.
33. The system of Claim 31, wherein said phonetic input system is a Pinyin system or a Zhuyin system.
34. The system of Claim 28, wherein said phonetic indices are indices of input means in a phonetic input system.
35. The system of Claim 28, further comprising:
means for prioritizing stroke or phonetic sequences that match an input sequence and prioritizing ideographic character sequences that match a matching stroke or phonetic sequence according to a linguistic model.
36. The system of Claim 34, wherein said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of an ideograph;
radical and number of strokes of radical;
alphabetical order;
frequency of occurrence of ideographic character sequences, stroke sequences or phonetic sequences in formal or conversational written text;
frequency of occurrence of ideographic character sequences, stroke sequences or phonetic sequences when following a preceding character or characters;
grammar of the surrounding sentence;
application context of current input sequence entry; and recency of use or repeated use of stroke, phonetic or ideographic character sequences by the user or within an application program.
37. The system of Claim 28, wherein said phonetic sequences comprise single syllables.
38. The system of Claim 28, wherein said phonetic sequences comprise both single and multiple syllables.
39. The system of Claim 28, wherein said phonetic sequences comprise user generated sequences.
40. The system of Claim 38, wherein in absence of matching phonetic sequences in said database, a sequence of matching phonetic sequences is automatically generated based on single and optionally multiple syllable phonetic sequences.
41. The system of Claim 39, wherein said sequence of matching phonetic sequences is narrowed down through user interaction.
42. The system of Claim 39, wherein a sequence of matching ideographic character sequences is automatically generated based on matching phonetic sequences to ideographic character sequences.
43. The system of Claim 41, wherein a sequence of matching ideographic character sequences is narrowed down through user interaction.
44. The system of Claim 34, further comprising:
means for changing the associated priority of the matching phonetic sequence and the sequence of ideographic characters once an ideographic character sequence is selected.
45. The system of Claim 28, wherein the user can specify a particular tone for the phonetic syllable.
46. The system of Claim 28, wherein one of said plurality of inputs is associated with a special wildcard input that is associated with any or all tones.
47. The system of Claim 28, wherein the user can specify an explicit ideographic character separator.
48. The system of Claim 28, wherein once the user enters a sequence of phonetic characters, the user is returned a sequence of phonetic sequences of exact matches and predictions that partially match.
49. The system of Claim 47, wherein the sequence is ordered according to the frequency of use based on a linguistic model.
50. The system of Claim 48, wherein said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of an ideograph;
radical and number of strokes of radical;
alphabetical order;
frequency of occurrence of phonetic sequences or ideographic character sequences in formal or conversational written text;
frequency of occurrence of phonetic sequences or ideographic when following a preceding character or characters;
grammar of the surrounding sentence;
application context of current character sequence entry; and recency of use or repeated use of phonetic sequences by the user or within an application program.
51. The system of Claim 28, wherein once the user has selected a sequence of ideographic characters, the user is presented with a list of sequences of one or more ideographic characters.
52. The system of Claim 50, wherein said list of sequences is ordered according to the frequency of use based on a linguistic model.
53. The system of Claim 51, where said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of ideograph;
radical and number of strokes of radical;
alphabetical order;
frequency of occurrence of ideographic characters in formal or conversational written text;
frequency of occurrence of ideographic characters when following a preceding character or characters;
grammar of the surrounding sentence;
application context of current character entry; and recency of use or repeated use of ideographic characters by the user or within an application program.
54. The system of Claim 28, wherein one of said plurality of inputs is associated with a special wildcard input that is associated with zero or one of strokes.
55. The system of Claim 28, wherein one of said plurality of inputs is associated with a special wildcard input that is associated with zero or one of said phonetic characters.
56. A computer usable medium containing instructions in computer readable form for carrying out a process for Chinese text entry, said process comprising the steps of:
(a) entering an input sequence into a user input device;
wherein said user input device comprises:
a plurality of input means, each of said input means being associated with a plurality of strokes or phonetic characters, and an input sequence being generated each time when an input is selected by said user input device;
data consisting of a plurality of input sequences and, associated with each input sequence, an input method specific database containing a plurality of input sequences and, associated with each input sequence, a set of phonetic sequences whose spellings correspond to the input sequence or a set of strokes sequences corresponding to the input sequence; and an ideographic database containing a set of ideographic character sequences, wherein each ideographic character contains an ideographic index, a plurality of stroke indices to corresponding stroke sequences and a plurality of phonetic indices to corresponding phonetic sequences;
(b) comparing the input sequence with said input method specific database and finding indices to matching strokes entries or phonetic entries and said matching stroke entries or phonetic entries;
(c) converting said matching indices to stroke entries or phonetic entries to matching ideographic indices;

(d) retrieving matching ideographic character sequences from said ideographic database by said matching ideographic indices; and (e) optionally displaying one or more of said matched ideographic character sequences.
57. The medium of Claim 55, wherein said stroke indices are indices of strokes sorted by stroke sequences in a stroke input system.
58. The medium of Claim 56, wherein said stroke input system is a five-stroke or an eight-stroke system.
59. The medium of Claim 55, wherein said phonetic indices are indices of phonetic characters sorted by actual spelling in a phonetic input system.
60. The medium of Claim 58, wherein said phonetic input system is a Pinyin system or a Zhuyin system.
61. The medium of Claim 55, wherein said phonetic indices are indices of input means in a phonetic input system.
62. The medium of Claim 55, wherein the process further comprises the step of:
prioritizing stroke or phonetic sequences that match an input sequence and prioritizing ideographic character sequences that match a stroke or phonetic sequence according to a linguistic model.
63. The medium of Claim 61, wherein said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of an ideograph;

radical and number of strokes of a radical;
alphabetical order;
frequency of occurrence of ideographic character sequences, stroke sequences or phonetic sequences in formal, conversational written, or conversational spoken text;
frequency of occurrence of ideographic character sequences, stroke sequences or phonetic sequences when following a preceding character or characters;
grammar of the surrounding sentence;
application context of current input sequence entry; and recency of use or repeated use of stroke, phonetic or ideographic character sequences by the user or within an application program.
64. The medium of Claim 55, wherein said phonetic sequences comprise single syllables.
65. The medium of Claim 55, wherein said phonetic sequences comprise single and multiple syllables.
66. The medium of Claim 55, wherein said phonetic sequences comprise user generated sequences.
67. The medium of Claim 65, wherein in absence of matching phonetic sequences in said database, a sequence of matching phonetic sequences is automatically generated based on single and optionally multiple syllable phonetic sequences.
68. The medium of Claim 66, wherein said sequence of matching phonetic sequences is narrowed down through user interaction.
69. The medium of Claim 66, wherein a sequence of matching ideographic character sequences is automatically generated based on matching phonetic sequences to ideographic character sequences.
70. The medium of Claim 68, wherein a sequence of matching ideographic character sequences is narrowed down through user interaction.
71. The medium of Claim 61, wherein the process further comprises the step of:
once an ideographic character sequence is selected, changing the associated priority of said matching phonetic sequence and sequence of ideographic characters.
72. The medium of Claim 55, wherein the user can specify an explicit ideographic character separator.
73. The medium of Claim 55, wherein the process further comprises the step of:
when the user enters a sequence of phonetic characters, returning a sequence of phonetic sequences of exact matches and predictions that partially match.
74. The medium of Claim 72, wherein said sequence of phonetic sequences is ordered according to a linguistic model.
75. The medium of Claim 73, wherein said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of an ideograph;
radical and number of strokes of radical;
alphabetical order;

frequency of occurrence of phonetic sequences or ideographic character sequences in formal or conversational written text;
frequency of occurrence of phonetic sequences or ideographic when following a preceding character or characters;
grammar of the surrounding sentence;
application context of current character sequence entry; and recency of use or repeated use of phonetic sequences by the user or within an application program.
76. The medium of Claim 55, wherein the process further comprises the step of:
once the user has selected a sequence of ideographic characters, presenting the user with a list of sequences of one or more ideographic characters.
77. The medium of Claim 75, wherein said list of sequences is ordered according to a linguistic model.
78. The medium of Claim 76, wherein said linguistic model comprises at least one of:
number of total keystrokes in an ideograph;
radical of an ideograph;
radical and number of strokes of radical;
alphabetical order;
frequency of occurrence of ideographic characters in formal or conversational written text;

frequency of occurrence of ideographic characters when following a preceding character or characters;
grammar of the surrounding sentence;
application context of current character entry; and recency of use or repeated use of ideographic characters by the user or within an application program.
79. The medium of Claim 55, wherein the user can enter partial syllables for each of the multiple syllable words.
80. The medium of Claim 78, wherein the number of partial keystrokes for each syllable is one.
81. The medium of Claim 55, wherein one of said plurality of inputs is associated with a special wildcard input that is associated with zero or one of strokes.
82. The medium of Claim 55, wherein one of said plurality of inputs is associated with a special wildcard input that is associated with zero or one of said phonetic characters.
CA 2496872 2004-03-17 2005-02-10 Phonetic and stroke input methods of chinese characters and phrases Expired - Fee Related CA2496872C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/803,255 2004-03-17
US10/803,255 US20050027534A1 (en) 2003-07-30 2004-03-17 Phonetic and stroke input methods of Chinese characters and phrases

Publications (2)

Publication Number Publication Date
CA2496872A1 true CA2496872A1 (en) 2005-09-17
CA2496872C CA2496872C (en) 2010-06-08

Family

ID=34994202

Family Applications (1)

Application Number Title Priority Date Filing Date
CA 2496872 Expired - Fee Related CA2496872C (en) 2004-03-17 2005-02-10 Phonetic and stroke input methods of chinese characters and phrases

Country Status (2)

Country Link
CA (1) CA2496872C (en)
WO (1) WO2005089215A2 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008000058A1 (en) * 2006-06-30 2008-01-03 Research In Motion Limited Learning character segments during text input
WO2008000057A1 (en) * 2006-06-30 2008-01-03 Research In Motion Limited Learning character segments from received text
US8395586B2 (en) 2006-06-30 2013-03-12 Research In Motion Limited Method of learning a context of a segment of text, and associated handheld electronic device
US7565624B2 (en) 2006-06-30 2009-07-21 Research In Motion Limited Method of learning character segments during text input, and associated handheld electronic device
US7665037B2 (en) 2006-06-30 2010-02-16 Research In Motion Limited Method of learning character segments from received text, and associated handheld electronic device
CN102567296B (en) * 2011-01-04 2016-03-30 中国移动通信有限公司 A kind of disposal route of Chinese character information and the treating apparatus of Chinese character information
CN106708285B (en) * 2016-12-27 2019-11-08 优地网络有限公司 Search for library generating method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5893133A (en) * 1995-08-16 1999-04-06 International Business Machines Corporation Keyboard for a system and method for processing Chinese language text
US6848080B1 (en) * 1999-11-05 2005-01-25 Microsoft Corporation Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors

Also Published As

Publication number Publication date
WO2005089215B1 (en) 2006-05-04
CA2496872C (en) 2010-06-08
WO2005089215A3 (en) 2006-01-05
WO2005089215A2 (en) 2005-09-29

Similar Documents

Publication Publication Date Title
US6073146A (en) System and method for processing chinese language text
KR100656736B1 (en) System and method for disambiguating phonetic input
US6014615A (en) System and method for processing morphological and syntactical analyses of inputted Chinese language phrases
KR20120006489A (en) Input method editor
JP6245846B2 (en) System, method and program for improving reading accuracy in speech recognition
KR100835172B1 (en) System and method for searching information using synonyms
CA2496872A1 (en) Phonetic and stroke input methods of chinese characters and phrases
CN101158969A (en) Whole sentence generating method and device
KR20040101678A (en) Apparatus and method for analyzing compounded morpheme
Sharma et al. Word prediction system for text entry in Hindi
JP6126965B2 (en) Utterance generation apparatus, method, and program
JPH11238051A (en) Chinese input conversion processor, chinese input conversion processing method and recording medium stored with chinese input conversion processing program
Liang et al. An efficient error correction interface for speech recognition on mobile touchscreen devices
JP3952964B2 (en) Reading information determination method, apparatus and program
Basumatary et al. Deep Learning Based Bodo Parts of Speech Tagger
Asahiah Development of a Standard Yorùbá digital text automatic diacritic restoration system
Singh et al. Word and phrase prediction tool for English and Hindi language
CN110502128B (en) Chinese character multi-element input method and system
JP3622841B2 (en) Kana-kanji conversion device and kana-kanji conversion method
Anonthanasap et al. IMnem: Interactive mnemonic word suggestion using phonetic algorithms
JPH08272780A (en) Processor and method for chinese input processing, and processor and method for language processing
Zaghal et al. Arabic morphological analyzer with text to voice
Huang et al. Research For Hakka Input Method With Hio-Liu Accent On Mobile Platforms
Tesema et al. Enhancing the Text Production and Assisting Disable Users in Developing Word Prediction and Completion in Afan Oromo
KR100932643B1 (en) Method of grapheme-to-phoneme conversion for Korean TTS system without a morphological and syntactic analysis and device thereof

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed