GB1403816A - Apparatus for identifying an unidentified item of data - Google Patents
Apparatus for identifying an unidentified item of dataInfo
- Publication number
- GB1403816A GB1403816A GB4240273A GB4240273A GB1403816A GB 1403816 A GB1403816 A GB 1403816A GB 4240273 A GB4240273 A GB 4240273A GB 4240273 A GB4240273 A GB 4240273A GB 1403816 A GB1403816 A GB 1403816A
- Authority
- GB
- United Kingdom
- Prior art keywords
- character
- characters
- group
- word
- context word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
Abstract
1403816 Character recognition systems INTERNATIONAL BUSINESS MACHINES CORP 10 Sept 1973 [3 Oct 1972] 42402/73 Heading G4R An apparatus for identifying an unidentified data item, such as a character, in a group of such items, such as an intelligible word, comprises means for determining a first group of possible identities of the item. Items adjacent the unidentified item are identified and each is classified in one of a set of groups to provide a context word. The context word is used to derive a second group of possible identities, and the first and second groups are compared to determine the correct identity of the unidentified item. The various operations may be performed by hardware or software. As applied to a character recognition system, a recognition unit (not described) attempts to identify successive characters, and provides a code signal indicative of each character. Each code signal may be accompanied by a "confideuce" flag indicating whether the identification is deemed to be correct or to be unreliable. In the latter case the first group of possible identities is determined, e.g. by table look-up, or by other means (Fig. 6, not shown). Thus, for example, a character unconfidently identified as O may result in a first group, termed a confusion list, consisting of O, D, Q and U. The context word is derived by consideration of characters adjacent the character. For example, if the character is the O of GROUP the context word may be derived from GR and UP, or from GR, RU and UP, The codes for these characters are used to classify each of the characters into one of a number of sets of characters. The sets constitute the context word. Since only certain combinations of characters are permitted in the language under consideration, the characters forming two sets of characters can be associated only with certain other characters. The context word can therefore be used to select, e.g. by table look-up, a set of characters most probably associated with the sets forming the context word. This set (the second group) is compared with the first group (confusion list) and the common character is deemed to be the correct identity of the unreliably identified character. If the most probable set does not contain a character appearing in the confusion list the unreliable character is deemed to be unidentified and a reject signal is given. A reject signal may also be given if certain of the characters from which the context word is derived are unreliable, provided that the character under consideration has not been reliably identified by the prior recognition unit. Hardware implementations of the above arrangement are described with reference to Figs. 2 and 4 (not shown). The arrangement may be combined with a further processor (Fig. 3, not shown) which, if the recognition unit has not reliably identified the character, determines whether it is part of a short word, i.e. one of less than 6 characters. If it is not, the character is determined as above, but if it is then the fide characters address a small-word dictionary table look-up which indicates directly the most likely identification of the character. The character codes provided by the above arrangements may be passed to a further unit (Fig. 5, not shown) which detects character sequences whose probability of occurrence is so low that they may be considered to be impossible. For example the sub-sequence GRO, ROU, and OUP of the sequence GROUP are examined, and the common character O is rejected if any sub-sequence is impermissible.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US29470672A | 1972-10-03 | 1972-10-03 |
Publications (1)
Publication Number | Publication Date |
---|---|
GB1403816A true GB1403816A (en) | 1975-08-28 |
Family
ID=23134578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB4240273A Expired GB1403816A (en) | 1972-10-03 | 1973-09-10 | Apparatus for identifying an unidentified item of data |
Country Status (4)
Country | Link |
---|---|
JP (1) | JPS4973934A (en) |
DE (1) | DE2349116A1 (en) |
FR (1) | FR2201788A5 (en) |
GB (1) | GB1403816A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2222475A (en) * | 1988-08-10 | 1990-03-07 | Caere Corp | Optical character recognition |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57105088A (en) * | 1980-12-22 | 1982-06-30 | Toshiba Corp | Character reader |
-
1973
- 1973-08-22 FR FR7330988A patent/FR2201788A5/fr not_active Expired
- 1973-09-10 GB GB4240273A patent/GB1403816A/en not_active Expired
- 1973-09-19 JP JP48105122A patent/JPS4973934A/ja active Pending
- 1973-09-29 DE DE19732349116 patent/DE2349116A1/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2222475A (en) * | 1988-08-10 | 1990-03-07 | Caere Corp | Optical character recognition |
US5131053A (en) * | 1988-08-10 | 1992-07-14 | Caere Corporation | Optical character recognition method and apparatus |
US5278918A (en) * | 1988-08-10 | 1994-01-11 | Caere Corporation | Optical character recognition method and apparatus using context analysis and a parsing algorithm which constructs a text data tree |
US5278920A (en) * | 1988-08-10 | 1994-01-11 | Caere Corporation | Optical character recognition method and apparatus |
US5381489A (en) * | 1988-08-10 | 1995-01-10 | Caere Corporation | Optical character recognition method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
FR2201788A5 (en) | 1974-04-26 |
DE2349116A1 (en) | 1974-04-18 |
JPS4973934A (en) | 1974-07-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP0849688A2 (en) | System and method for natural language determination | |
US3492646A (en) | Cross correlation and decision making apparatus | |
EP2095277B1 (en) | Fuzzy database matching | |
CN106126235A (en) | A kind of multiplexing code library construction method, the quick source tracing method of multiplexing code and system | |
US3167746A (en) | Specimen identification methods and apparatus | |
GB1366009A (en) | Reading apparatus | |
CA1050167A (en) | Bayesian online numeric discriminator | |
US4003025A (en) | Alphabetic character word upper/lower case print convention apparatus and method | |
CA2006230C (en) | Method and apparatus for validating character strings | |
GB1403816A (en) | Apparatus for identifying an unidentified item of data | |
US20200387691A1 (en) | A quick match algorithm for biometric data | |
JP2000231559A (en) | Information processor | |
GB1102359A (en) | Improvements in or relating to cross correlation apparatus | |
GB1338287A (en) | Pattern classifying apparatus | |
EP0622752A2 (en) | Apparatus and method for a lexical post-processor for a neural network-based character processor | |
EP0566848A2 (en) | System for the automated analysis of compound words | |
GB1326141A (en) | Raster process for classifying characters | |
JPS61114388A (en) | Character input device | |
JPH02181269A (en) | Address recognizing system | |
JPS60138689A (en) | Character recognizing method | |
US20130317805A1 (en) | Systems and methods for detecting real names in different languages | |
CN115952798A (en) | Named entity recognition method, device, server and storage medium | |
KR900700973A (en) | Character Recognition Device | |
JPS5953986A (en) | Character recognizing device | |
JP2923295B2 (en) | Pattern identification processing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PS | Patent sealed [section 19, patents act 1949] | ||
PCNP | Patent ceased through non-payment of renewal fee |