CA2658586A1 - Learning character segments from received text - Google Patents

Learning character segments from received text Download PDF

Info

Publication number
CA2658586A1
CA2658586A1 CA002658586A CA2658586A CA2658586A1 CA 2658586 A1 CA2658586 A1 CA 2658586A1 CA 002658586 A CA002658586 A CA 002658586A CA 2658586 A CA2658586 A CA 2658586A CA 2658586 A1 CA2658586 A1 CA 2658586A1
Authority
CA
Canada
Prior art keywords
characters
string
character
electronic device
objects
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA002658586A
Other languages
French (fr)
Other versions
CA2658586C (en
Inventor
Vadim Fux
Sergey Kolomiets
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BlackBerry Ltd
Original Assignee
Research In Motion Limited
Vadim Fux
Sergey Kolomiets
2012244 Ontario Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Research In Motion Limited, Vadim Fux, Sergey Kolomiets, 2012244 Ontario Inc. filed Critical Research In Motion Limited
Publication of CA2658586A1 publication Critical patent/CA2658586A1/en
Application granted granted Critical
Publication of CA2658586C publication Critical patent/CA2658586C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/018Input/output arrangements for oriental characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Abstract

An improved method of learning character segments from received text enables facilitated text input on an improved handheld electronic device. In receiving text on the handheld electronic device, the characters of the text are converted into the inputs with which the characters correspond. Segments and other objects are analyzed to generate a proposed character interpretation of the inputs. If at least a portion of the character interpretation differs from a corresponding portion of the received text, a character learning string comprising the differing characters are stored as a candidate. In response to receiving additional text on the handheld electronic device, the characters of the additional text are converted into the inputs with which the characters correspond. Segments and other objects are then analyzed to generate another proposed character interpretation of the series of additional inputs. If at least a portion of the another character interpretation differs from a corresponding portion of the additional received text, another character learning string comprising the differing characters of the additional received text are compared with the candidate. If a set of characters in the another character learning string match characters in the candidate, the set of characters are stored as a segment.

Claims (10)

1. A method of enabling input on a handheld electronic device comprising a processor apparatus that comprises a processor and a memory, the memory having stored therein a plurality of objects and at least a first routine, the objects comprising one or more of a plurality of raw inputs, a plurality of characters, a number of candidates, a plurality of segments, and a number of combination objects, each combination object comprising at least a representation of a segment and at least a representation of one of a character and a segment, at least some of the raw inputs each having associated therewith a number of the characters, at least some of the raw inputs each having associated therewith as the number of characters a plurality of the characters, the segments each comprising a plurality of the characters, the method comprising:
receiving a string of reference characters;
for each character of at least a portion of the string of reference characters, obtaining the raw input with which the character is associated;
employing the routine to compare at least some of the obtained raw inputs with at least some of the objects to generate a proposed character interpretation of the at least some of the obtained raw inputs;
determining that at least a portion of the proposed character interpretation and a corresponding at least portion of the string of reference characters differ;
and responsive to said determining, storing at least a representation of at least a portion of the string of reference characters as at least a portion of at least one of a candidate, a segment, and a combination object.
2. The method of Claim 1, further comprising:

making a determination that no candidate matches the at least portion of the string of reference characters; and responsive to said making a determination, storing the at least portion of the string of reference characters as at least a portion of a candidate.
3. The method of Claim 1, further comprising:

making a determination that at least a portion of a particular candidate matches the at least portion of the string of reference characters; and responsive to said making a determination, storing the at least portion of the string of reference characters as at least a portion of a segment.
4. The method of Claim 3, further comprising deleting the particular candidate.
5. The method of Claim 3, further comprising determining that the at least portion of the string of reference characters comprises more than a predetermined quantity of characters and, responsive thereto, storing the at least portion of the string of reference characters as at least a portion of a combination object.
6. A handheld electronic device comprising an input apparatus, a processor apparatus, and an output apparatus, the processor apparatus comprising a processor and a memory having stored therein a plurality of objects and at least a first routine, the objects comprising one or more of a plurality of raw inputs, a plurality of characters, a number of candidates, a plurality of segments, and a number of combination objects, each combination object comprising at least a representation of a segment and at least a representation of one of a character and a segment, at least some of the raw inputs each having associated therewith a number of the characters, at least some of the raw inputs each having associated therewith as the number of characters a plurality of the characters, the segments each comprising a plurality of the characters, the memory having stored therein a number of routines which, when executed by the processor, cause the handheld electronic device to be adapted to perform operations comprising:
receiving a plurality of characters;
for each character of at least a portion of the plurality of characters, obtaining the raw input with which the character is associated;

employing the routine to compare at least some of the obtained raw inputs with at least some of the objects to generate a proposed character interpretation of the at least some of the obtained raw inputs;

determining that at least a portion of the proposed character interpretation and a corresponding at least portion of the string of characters differ; and responsive to said determining, storing at least a representation of at least a portion of the string of characters as at least a portion of at least one of a candidate, a segment, and a combination object.
7. The handheld electronic device of Claim 6, further comprising:
making a determination that no candidate matches the at least portion of the string of characters; and responsive to said making a determination, storing the at least portion of the string of characters as at least a portion of a candidate.
8. The handheld electronic device of Claim 6, further comprising:
making a determination that at least a portion of a particular candidate matches the at least portion of the string of characters; and responsive to said making a determination, storing the at least portion of the string of characters as at least a portion of a segment.
9. The handheld electronic device of Claim 8, further comprising deleting the particular candidate.
10. The handheld electronic device of Claim 8, further comprising determining that the at least portion of the string of characters comprises more than a predetermined quantity of characters and, responsive thereto, storing the at least portion of the string of characters as at least a portion of a combination object.
CA2658586A 2006-06-30 2006-06-30 Learning character segments from received text Active CA2658586C (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CA2006/001088 WO2008000057A1 (en) 2006-06-30 2006-06-30 Learning character segments from received text

Publications (2)

Publication Number Publication Date
CA2658586A1 true CA2658586A1 (en) 2008-01-03
CA2658586C CA2658586C (en) 2012-07-10

Family

ID=38845059

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2658586A Active CA2658586C (en) 2006-06-30 2006-06-30 Learning character segments from received text

Country Status (2)

Country Link
CA (1) CA2658586C (en)
WO (1) WO2008000057A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014191355A (en) * 2013-03-26 2014-10-06 Oki Data Corp Character input device, and character input method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7124080B2 (en) * 2001-11-13 2006-10-17 Microsoft Corporation Method and apparatus for adapting a class entity dictionary used with language models
US7228267B2 (en) * 2002-07-03 2007-06-05 2012244 Ontario Inc. Method and system of creating and using Chinese language data and user-corrected data
US7478033B2 (en) * 2004-03-16 2009-01-13 Google Inc. Systems and methods for translating Chinese pinyin to Chinese characters
CA2496872C (en) * 2004-03-17 2010-06-08 America Online, Inc. Phonetic and stroke input methods of chinese characters and phrases

Also Published As

Publication number Publication date
WO2008000057A1 (en) 2008-01-03
CA2658586C (en) 2012-07-10

Similar Documents

Publication Publication Date Title
CA2647938A1 (en) Handheld electronic device and method for learning contextual data during disambiguation of text input
DK1952285T3 (en) System and method for crawling and comparing data that has word-like content
WO2006100509A3 (en) Human-to-mobile interfaces
WO2006100505A3 (en) Human-to-mobile interfaces
GB2451035A (en) Handheld electronic device and method for performing optimized spell checking during text entry by providing a sequentially ordered series of spell-check algo
WO2019201511A8 (en) Method and data processing apparatus
JP2012256354A5 (en)
CA2636207A1 (en) Handheld electronic device providing proposed corrected input in response to erroneous text entry in environment of text requiring multiple sequential actuations of the same key, and associated method
US20130141457A1 (en) Electronic device capable of recovering garbled characters and method for recovering garbled characters
PH12021550937A1 (en) Information providing system, information providing method, and data structure of knowledge data
GB2477703A (en) A method and system for analysing data sequences
CA2509014A1 (en) Handheld electronic device with text disambiguation
CN103577547A (en) Webpage type identification method and device
EP2120156A3 (en) Character input program, character input device, and character input method
CA2647934A1 (en) Handheld electronic device and method for employing contextual data for disambiguation of text input
CA2658586A1 (en) Learning character segments from received text
RU2015103742A (en) METHOD AND DEVICE FOR UPDATING USER DATA
WO2009001696A1 (en) Information processing device, program and information processing method
CA2605785A1 (en) Handheld electronic device with reduced keyboard and associated method of providing improved disambiguation with reduced degradation of device performance
WO2006105641B1 (en) Handheld electronic device with text disambiguation employing advanced editing feature
CA2554397A1 (en) Handheld electronic device with disambiguation of compound word text input employing separating input
CA2639224A1 (en) Handheld electronic device and associated method providing disambiguation of an ambiguous object during editing and selectively providing prediction of future characters
CA2582590A1 (en) Handheld electronic device including automatic preferred selection of a punctuation, and associated method
CA2627755A1 (en) Use of a suffix-removing spell-check algorithm for a spell-check function, and associated handheld electronic device
CA2653843A1 (en) Learning character segments during text input

Legal Events

Date Code Title Description
EEER Examination request