CA2658586A1 - Learning character segments from received text - Google Patents
Learning character segments from received text Download PDFInfo
- Publication number
- CA2658586A1 CA2658586A1 CA002658586A CA2658586A CA2658586A1 CA 2658586 A1 CA2658586 A1 CA 2658586A1 CA 002658586 A CA002658586 A CA 002658586A CA 2658586 A CA2658586 A CA 2658586A CA 2658586 A1 CA2658586 A1 CA 2658586A1
- Authority
- CA
- Canada
- Prior art keywords
- characters
- string
- character
- electronic device
- objects
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/018—Input/output arrangements for oriental characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
- G06F40/129—Handling non-Latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
Abstract
An improved method of learning character segments from received text enables facilitated text input on an improved handheld electronic device. In receiving text on the handheld electronic device, the characters of the text are converted into the inputs with which the characters correspond. Segments and other objects are analyzed to generate a proposed character interpretation of the inputs. If at least a portion of the character interpretation differs from a corresponding portion of the received text, a character learning string comprising the differing characters are stored as a candidate. In response to receiving additional text on the handheld electronic device, the characters of the additional text are converted into the inputs with which the characters correspond. Segments and other objects are then analyzed to generate another proposed character interpretation of the series of additional inputs. If at least a portion of the another character interpretation differs from a corresponding portion of the additional received text, another character learning string comprising the differing characters of the additional received text are compared with the candidate. If a set of characters in the another character learning string match characters in the candidate, the set of characters are stored as a segment.
Claims (10)
1. A method of enabling input on a handheld electronic device comprising a processor apparatus that comprises a processor and a memory, the memory having stored therein a plurality of objects and at least a first routine, the objects comprising one or more of a plurality of raw inputs, a plurality of characters, a number of candidates, a plurality of segments, and a number of combination objects, each combination object comprising at least a representation of a segment and at least a representation of one of a character and a segment, at least some of the raw inputs each having associated therewith a number of the characters, at least some of the raw inputs each having associated therewith as the number of characters a plurality of the characters, the segments each comprising a plurality of the characters, the method comprising:
receiving a string of reference characters;
for each character of at least a portion of the string of reference characters, obtaining the raw input with which the character is associated;
employing the routine to compare at least some of the obtained raw inputs with at least some of the objects to generate a proposed character interpretation of the at least some of the obtained raw inputs;
determining that at least a portion of the proposed character interpretation and a corresponding at least portion of the string of reference characters differ;
and responsive to said determining, storing at least a representation of at least a portion of the string of reference characters as at least a portion of at least one of a candidate, a segment, and a combination object.
receiving a string of reference characters;
for each character of at least a portion of the string of reference characters, obtaining the raw input with which the character is associated;
employing the routine to compare at least some of the obtained raw inputs with at least some of the objects to generate a proposed character interpretation of the at least some of the obtained raw inputs;
determining that at least a portion of the proposed character interpretation and a corresponding at least portion of the string of reference characters differ;
and responsive to said determining, storing at least a representation of at least a portion of the string of reference characters as at least a portion of at least one of a candidate, a segment, and a combination object.
2. The method of Claim 1, further comprising:
making a determination that no candidate matches the at least portion of the string of reference characters; and responsive to said making a determination, storing the at least portion of the string of reference characters as at least a portion of a candidate.
making a determination that no candidate matches the at least portion of the string of reference characters; and responsive to said making a determination, storing the at least portion of the string of reference characters as at least a portion of a candidate.
3. The method of Claim 1, further comprising:
making a determination that at least a portion of a particular candidate matches the at least portion of the string of reference characters; and responsive to said making a determination, storing the at least portion of the string of reference characters as at least a portion of a segment.
making a determination that at least a portion of a particular candidate matches the at least portion of the string of reference characters; and responsive to said making a determination, storing the at least portion of the string of reference characters as at least a portion of a segment.
4. The method of Claim 3, further comprising deleting the particular candidate.
5. The method of Claim 3, further comprising determining that the at least portion of the string of reference characters comprises more than a predetermined quantity of characters and, responsive thereto, storing the at least portion of the string of reference characters as at least a portion of a combination object.
6. A handheld electronic device comprising an input apparatus, a processor apparatus, and an output apparatus, the processor apparatus comprising a processor and a memory having stored therein a plurality of objects and at least a first routine, the objects comprising one or more of a plurality of raw inputs, a plurality of characters, a number of candidates, a plurality of segments, and a number of combination objects, each combination object comprising at least a representation of a segment and at least a representation of one of a character and a segment, at least some of the raw inputs each having associated therewith a number of the characters, at least some of the raw inputs each having associated therewith as the number of characters a plurality of the characters, the segments each comprising a plurality of the characters, the memory having stored therein a number of routines which, when executed by the processor, cause the handheld electronic device to be adapted to perform operations comprising:
receiving a plurality of characters;
for each character of at least a portion of the plurality of characters, obtaining the raw input with which the character is associated;
employing the routine to compare at least some of the obtained raw inputs with at least some of the objects to generate a proposed character interpretation of the at least some of the obtained raw inputs;
determining that at least a portion of the proposed character interpretation and a corresponding at least portion of the string of characters differ; and responsive to said determining, storing at least a representation of at least a portion of the string of characters as at least a portion of at least one of a candidate, a segment, and a combination object.
receiving a plurality of characters;
for each character of at least a portion of the plurality of characters, obtaining the raw input with which the character is associated;
employing the routine to compare at least some of the obtained raw inputs with at least some of the objects to generate a proposed character interpretation of the at least some of the obtained raw inputs;
determining that at least a portion of the proposed character interpretation and a corresponding at least portion of the string of characters differ; and responsive to said determining, storing at least a representation of at least a portion of the string of characters as at least a portion of at least one of a candidate, a segment, and a combination object.
7. The handheld electronic device of Claim 6, further comprising:
making a determination that no candidate matches the at least portion of the string of characters; and responsive to said making a determination, storing the at least portion of the string of characters as at least a portion of a candidate.
making a determination that no candidate matches the at least portion of the string of characters; and responsive to said making a determination, storing the at least portion of the string of characters as at least a portion of a candidate.
8. The handheld electronic device of Claim 6, further comprising:
making a determination that at least a portion of a particular candidate matches the at least portion of the string of characters; and responsive to said making a determination, storing the at least portion of the string of characters as at least a portion of a segment.
making a determination that at least a portion of a particular candidate matches the at least portion of the string of characters; and responsive to said making a determination, storing the at least portion of the string of characters as at least a portion of a segment.
9. The handheld electronic device of Claim 8, further comprising deleting the particular candidate.
10. The handheld electronic device of Claim 8, further comprising determining that the at least portion of the string of characters comprises more than a predetermined quantity of characters and, responsive thereto, storing the at least portion of the string of characters as at least a portion of a combination object.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CA2006/001088 WO2008000057A1 (en) | 2006-06-30 | 2006-06-30 | Learning character segments from received text |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2658586A1 true CA2658586A1 (en) | 2008-01-03 |
CA2658586C CA2658586C (en) | 2012-07-10 |
Family
ID=38845059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2658586A Active CA2658586C (en) | 2006-06-30 | 2006-06-30 | Learning character segments from received text |
Country Status (2)
Country | Link |
---|---|
CA (1) | CA2658586C (en) |
WO (1) | WO2008000057A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2014191355A (en) * | 2013-03-26 | 2014-10-06 | Oki Data Corp | Character input device, and character input method |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7124080B2 (en) * | 2001-11-13 | 2006-10-17 | Microsoft Corporation | Method and apparatus for adapting a class entity dictionary used with language models |
US7228267B2 (en) * | 2002-07-03 | 2007-06-05 | 2012244 Ontario Inc. | Method and system of creating and using Chinese language data and user-corrected data |
US7478033B2 (en) * | 2004-03-16 | 2009-01-13 | Google Inc. | Systems and methods for translating Chinese pinyin to Chinese characters |
CA2496872C (en) * | 2004-03-17 | 2010-06-08 | America Online, Inc. | Phonetic and stroke input methods of chinese characters and phrases |
-
2006
- 2006-06-30 WO PCT/CA2006/001088 patent/WO2008000057A1/en active Application Filing
- 2006-06-30 CA CA2658586A patent/CA2658586C/en active Active
Also Published As
Publication number | Publication date |
---|---|
WO2008000057A1 (en) | 2008-01-03 |
CA2658586C (en) | 2012-07-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2647938A1 (en) | Handheld electronic device and method for learning contextual data during disambiguation of text input | |
DK1952285T3 (en) | System and method for crawling and comparing data that has word-like content | |
WO2006100509A3 (en) | Human-to-mobile interfaces | |
WO2006100505A3 (en) | Human-to-mobile interfaces | |
GB2451035A (en) | Handheld electronic device and method for performing optimized spell checking during text entry by providing a sequentially ordered series of spell-check algo | |
WO2019201511A8 (en) | Method and data processing apparatus | |
JP2012256354A5 (en) | ||
CA2636207A1 (en) | Handheld electronic device providing proposed corrected input in response to erroneous text entry in environment of text requiring multiple sequential actuations of the same key, and associated method | |
US20130141457A1 (en) | Electronic device capable of recovering garbled characters and method for recovering garbled characters | |
PH12021550937A1 (en) | Information providing system, information providing method, and data structure of knowledge data | |
GB2477703A (en) | A method and system for analysing data sequences | |
CA2509014A1 (en) | Handheld electronic device with text disambiguation | |
CN103577547A (en) | Webpage type identification method and device | |
EP2120156A3 (en) | Character input program, character input device, and character input method | |
CA2647934A1 (en) | Handheld electronic device and method for employing contextual data for disambiguation of text input | |
CA2658586A1 (en) | Learning character segments from received text | |
RU2015103742A (en) | METHOD AND DEVICE FOR UPDATING USER DATA | |
WO2009001696A1 (en) | Information processing device, program and information processing method | |
CA2605785A1 (en) | Handheld electronic device with reduced keyboard and associated method of providing improved disambiguation with reduced degradation of device performance | |
WO2006105641B1 (en) | Handheld electronic device with text disambiguation employing advanced editing feature | |
CA2554397A1 (en) | Handheld electronic device with disambiguation of compound word text input employing separating input | |
CA2639224A1 (en) | Handheld electronic device and associated method providing disambiguation of an ambiguous object during editing and selectively providing prediction of future characters | |
CA2582590A1 (en) | Handheld electronic device including automatic preferred selection of a punctuation, and associated method | |
CA2627755A1 (en) | Use of a suffix-removing spell-check algorithm for a spell-check function, and associated handheld electronic device | |
CA2653843A1 (en) | Learning character segments during text input |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |