GB1403816A - Apparatus for identifying an unidentified item of data - Google Patents

Apparatus for identifying an unidentified item of data

Info

Publication number
GB1403816A
GB1403816A GB4240273A GB4240273A GB1403816A GB 1403816 A GB1403816 A GB 1403816A GB 4240273 A GB4240273 A GB 4240273A GB 4240273 A GB4240273 A GB 4240273A GB 1403816 A GB1403816 A GB 1403816A
Authority
GB
United Kingdom
Prior art keywords
character
characters
group
word
context word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
GB4240273A
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB1403816A publication Critical patent/GB1403816A/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

1403816 Character recognition systems INTERNATIONAL BUSINESS MACHINES CORP 10 Sept 1973 [3 Oct 1972] 42402/73 Heading G4R An apparatus for identifying an unidentified data item, such as a character, in a group of such items, such as an intelligible word, comprises means for determining a first group of possible identities of the item. Items adjacent the unidentified item are identified and each is classified in one of a set of groups to provide a context word. The context word is used to derive a second group of possible identities, and the first and second groups are compared to determine the correct identity of the unidentified item. The various operations may be performed by hardware or software. As applied to a character recognition system, a recognition unit (not described) attempts to identify successive characters, and provides a code signal indicative of each character. Each code signal may be accompanied by a "confideuce" flag indicating whether the identification is deemed to be correct or to be unreliable. In the latter case the first group of possible identities is determined, e.g. by table look-up, or by other means (Fig. 6, not shown). Thus, for example, a character unconfidently identified as O may result in a first group, termed a confusion list, consisting of O, D, Q and U. The context word is derived by consideration of characters adjacent the character. For example, if the character is the O of GROUP the context word may be derived from GR and UP, or from GR, RU and UP, The codes for these characters are used to classify each of the characters into one of a number of sets of characters. The sets constitute the context word. Since only certain combinations of characters are permitted in the language under consideration, the characters forming two sets of characters can be associated only with certain other characters. The context word can therefore be used to select, e.g. by table look-up, a set of characters most probably associated with the sets forming the context word. This set (the second group) is compared with the first group (confusion list) and the common character is deemed to be the correct identity of the unreliably identified character. If the most probable set does not contain a character appearing in the confusion list the unreliable character is deemed to be unidentified and a reject signal is given. A reject signal may also be given if certain of the characters from which the context word is derived are unreliable, provided that the character under consideration has not been reliably identified by the prior recognition unit. Hardware implementations of the above arrangement are described with reference to Figs. 2 and 4 (not shown). The arrangement may be combined with a further processor (Fig. 3, not shown) which, if the recognition unit has not reliably identified the character, determines whether it is part of a short word, i.e. one of less than 6 characters. If it is not, the character is determined as above, but if it is then the fide characters address a small-word dictionary table look-up which indicates directly the most likely identification of the character. The character codes provided by the above arrangements may be passed to a further unit (Fig. 5, not shown) which detects character sequences whose probability of occurrence is so low that they may be considered to be impossible. For example the sub-sequence GRO, ROU, and OUP of the sequence GROUP are examined, and the common character O is rejected if any sub-sequence is impermissible.
GB4240273A 1972-10-03 1973-09-10 Apparatus for identifying an unidentified item of data Expired GB1403816A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US29470672A 1972-10-03 1972-10-03

Publications (1)

Publication Number Publication Date
GB1403816A true GB1403816A (en) 1975-08-28

Family

ID=23134578

Family Applications (1)

Application Number Title Priority Date Filing Date
GB4240273A Expired GB1403816A (en) 1972-10-03 1973-09-10 Apparatus for identifying an unidentified item of data

Country Status (4)

Country Link
JP (1) JPS4973934A (en)
DE (1) DE2349116A1 (en)
FR (1) FR2201788A5 (en)
GB (1) GB1403816A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2222475A (en) * 1988-08-10 1990-03-07 Caere Corp Optical character recognition

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57105088A (en) * 1980-12-22 1982-06-30 Toshiba Corp Character reader

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2222475A (en) * 1988-08-10 1990-03-07 Caere Corp Optical character recognition
US5131053A (en) * 1988-08-10 1992-07-14 Caere Corporation Optical character recognition method and apparatus
US5278918A (en) * 1988-08-10 1994-01-11 Caere Corporation Optical character recognition method and apparatus using context analysis and a parsing algorithm which constructs a text data tree
US5278920A (en) * 1988-08-10 1994-01-11 Caere Corporation Optical character recognition method and apparatus
US5381489A (en) * 1988-08-10 1995-01-10 Caere Corporation Optical character recognition method and apparatus

Also Published As

Publication number Publication date
FR2201788A5 (en) 1974-04-26
DE2349116A1 (en) 1974-04-18
JPS4973934A (en) 1974-07-17

Similar Documents

Publication Publication Date Title
EP0849688A2 (en) System and method for natural language determination
US3492646A (en) Cross correlation and decision making apparatus
EP2095277B1 (en) Fuzzy database matching
CN106126235A (en) A kind of multiplexing code library construction method, the quick source tracing method of multiplexing code and system
US3167746A (en) Specimen identification methods and apparatus
GB1366009A (en) Reading apparatus
CA1050167A (en) Bayesian online numeric discriminator
US4003025A (en) Alphabetic character word upper/lower case print convention apparatus and method
CA2006230C (en) Method and apparatus for validating character strings
GB1403816A (en) Apparatus for identifying an unidentified item of data
US20200387691A1 (en) A quick match algorithm for biometric data
JP2000231559A (en) Information processor
GB1102359A (en) Improvements in or relating to cross correlation apparatus
GB1338287A (en) Pattern classifying apparatus
EP0622752A2 (en) Apparatus and method for a lexical post-processor for a neural network-based character processor
EP0566848A2 (en) System for the automated analysis of compound words
GB1326141A (en) Raster process for classifying characters
JPS61114388A (en) Character input device
JPH02181269A (en) Address recognizing system
JPS60138689A (en) Character recognizing method
US20130317805A1 (en) Systems and methods for detecting real names in different languages
CN115952798A (en) Named entity recognition method, device, server and storage medium
KR900700973A (en) Character Recognition Device
JPS5953986A (en) Character recognizing device
JP2923295B2 (en) Pattern identification processing method

Legal Events

Date Code Title Description
PS Patent sealed [section 19, patents act 1949]
PCNP Patent ceased through non-payment of renewal fee