CA1062810A - Regional context maximum likelihood ocr error correction apparatus - Google Patents

Regional context maximum likelihood ocr error correction apparatus

Info

Publication number
CA1062810A
CA1062810A CA221,755A CA221755A CA1062810A CA 1062810 A CA1062810 A CA 1062810A CA 221755 A CA221755 A CA 221755A CA 1062810 A CA1062810 A CA 1062810A
Authority
CA
Canada
Prior art keywords
character
word
error
probability
storage means
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
CA221,755A
Other languages
French (fr)
Inventor
Walter S. Rosenbaum
Jean M. Ciconte
Allen H. Ett
John J. Hilliard
Ellen W. Bollinger
Anne M. Chaires
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Application granted granted Critical
Publication of CA1062810A publication Critical patent/CA1062810A/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Character Discrimination (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

ABSTRACT OF THE DISCLOSURE:
A data processing system is disclosed for selecting the correct form of a garbled input word misread by an optical character reader so as to change the number of characters in the word by character splitting or concatenation. Dictionary words are stored in the system, having characters which are flagged for segmentation or concatenation OCR misread propen-sity. The OCR word and a dictionary word are loaded into a pair of associated shift registers, aligning their letters on one end. The dictionary word characters are inspected for error propensity flags. When a splitting propensity, for example, is found for a character, special conditional pro-bability values are accessed from a storage and a calculation is performed of the probability that the first character of the dictionary word was split by the OCR into the first and second characters of the OCR word This regional context probability is compared with the probability of a simple substitution error for the characters. If the probability of segmentation is larger, the OCR characters in the first shift register are shifted one space with respect to the dictionary word characters in the second shift register so that subsequent character pairs to be compared are properly matched. The greater calculated probability is combined in a running product. The dictionary word with the largest running product is output by the system as the most likely correct form of the garbled OCR input word.
In addition to optical character recognition, the system disclosed may be applied to correcting segmentation errors in ph?n?m?-characters output from a speech analyzer.

Description

~062810 FlELD OF THE INVE~TION:
2 The invention disclosed herein relates to data pro-
3 cessing devices and more particularly relates to post
4 processing devices for optical character recognition machines and speech analyzers.
6 BACKGROUND OF THE IN~ TION:
7 State of the art optical character recognition 8 machines, typically have-two principal types of character 9 misrecognition modes: Substitution and segmentation.- Sub-stitution manifests itself in two ways. The first is 11 character substitution, where the recognition unit has 12 captured the video information of a single character, but 13 the features required for aiphabetical determination are 14 aliased as another charactér. ~ogically this can only occur if there is some degree-of similarity in the shape of the 16 respective alphabetic characters involved. Examples of such 17 character combinations are: B, D; D, O; O! C; 1, i; etc.
18 The second form of substitution manifestation is the character 19 reject. As in character substitution, the recognition unit captures a single charact~r. However, rejection occurs 21 because of the inability of the recognition logic to rélate 22 to any character or because more than one set of alpha 23 determination criterià are satisfied by the character 24 features isolated. This condition is reerred to as a character reject. In-the prior art, apparatus for salact-26 ing the correct form of a garbled input word misread by 27 an OCR has been limited to correcting errors in the 28 substitution misrecognition mode. For improving the 29 performance of an optical character reader, the prior art discloses the use of conditional probabilities for simple q~

1062~310 1 substitution of one character for another or of character 2 rejection, for calculating a total conditional probability 3 that its input OCR word was misread, given that a predéter-4 mined dictionary word was actually scanned by the OCR. But the prior art deals only with the simple substitution of 6 confusion pairs occupying the same corresponding location 7 in the OCR word and in thè directory word. The OCR word 8 and the directory word must be of the same length. The 9 prior art neither recognizes nor addresses the problem of the optical character reader's segmentation misrecognition 11 mode.
12 Segmentation misrecognition differs from that of 13 simple substitution in that its independent events corre-14 spond to grouplngs of at least t~o characters. Nominally there are three types of segmentation errors. They are:
16 horizontal splitting segmentation, concatenation segmentation, 17 and crowding segmentation. The underlying mechanical factor 18 which all the above segméntation types have in common is 19 that they are generated by the improper delineation of the character beginning and end points. Segmentation errors 21 occur quite frequently in OCR output streams and constituta 22 a substantial impediment to accuracy in text processing 23 applications.
24 OBJECTS OF THE INVENTION:
It is an object of the invention to select the 26 correct form of a garbled input word misread by an OCR, 27 in an improved manner.
28 It is an additional object of the invention to 29 select the correct form of a spoken input word misread by a speech analyzer, in an improved manner.

WA9-73-006 -3~

- 106Z8~L0 ~ It is another object of the invention to select the 2 correct form of a garbled input word misread by an OCR, 3 employing an apparatus for executing a conditional probability 4 analysis, in an improved manner.
It is still a further-object of the invention to 6 select the correct-form of a garbled word output by an optical 7 character reader, the word having undergone a change in the 8 number of characters therain by character splitting or a 9 character concatenation.-SUMMARY OF T~E INVENTION:
11 A data processing system is disclosed, for selecting 12 the correct form of a garbied input word misread by an optical 13 character reader so as to change the number of characters in 14 the word by character spiitting or concatenation. A dictionary of the words expected to be read by the OCR, is maintained in 16 the system. In association with selected characters in the 17 stored dictionary words, are error flags indicating the 18 propensity for their associated characters to undergo split_ 19 ting or concatenation s~gmentation during an OCR read operation.
A set of conditional proba~ilities that the selected characters 21 will undergo splitting or concatenation into OCR misread 22 characters, is also stored in the system. When a garbled OCR
23 word is input to the system, it is compared with each stored 24 dictionary word by loading-the two words in a pair of associated shift registers and aligning their letters on 26 one end. The system then-calculates the total conditional 27 probability that the OCR word in the irst shit register 28 was misread given that the dictionary word in the second 29 shift register was actuaiiy scanned by the OCR. The total conditional probability calculation involves inspecting the 1 characters in the dictionary word for segmentation error 2 propensity flags. The system storage is accessed for the 3 conditional probability of a segmentation or concatenation 4 error when indicated. ~eans are provided for calculating the probability product that, for example, a segmentation 6 of the first dictionary word character into the first and 7 second OCR word characters occurred in conjunction with the 8 simple substitution of the third OCR character for the 9 second dictionary word character. This probability must be compared in a comparator, with the probability that the first 11 and second OCR word characters were simply substituted for 12 the first and second OCR word characters, respectively. If 13 the comparator determines that the probability of segmenta-14 tion, which involves an increase in the misread OCR word length, is greater than the probability of a simple 16 character substitution, a shift control causes the contents 17 of the shift register containing the OCR word, to be shifted 18 the extra space with respect to the contents of the shift 19 register containing the dictionary word. This is done to realign the characters therein so that subsequent character 21 pairs to be compared are properly matched. The converse 22 differential shifting operation is employed when a concatena-?3 tion error is suspected by the system. The conditional 24 probability contained in the probability product which the comparator finds to be greater, is multiplied by the running 26 product of conditional probabilities previously determined 27 for the dictionary word. The running products for matching 28 all the dictionary words with the OCR words, are calculated 29 and the dictionary word having the maximum value for its 30 running product is output by the system as the most likely 31 correct form for the garbled input OCR word.

1 The apparatus may also be applied to the correction 2 of segmentation errors in phoneme-characters output fro~ a 3 speech analyzer.
4 DESCRIPTION OF THE DRAWINGS:
The foregoing-and other objects, features and advan-6 tages of the invention will be apparent from the following 7 more particular description of the preferred embodiments of 8 the invention, as illustrated in the accompanying drawings.
9 Figure 1 shows schematically the basic mechanism for segmentation manipulation.
11 Figure 2 is a video scan of a character pair that 12 can result in a crowding segmentation.
13 Figure 3 is a detailed logic diagram of the regional 14 context maximum likelihood OCR error correction apparatus.
Figure 4 is a detailed logic diagram of the shift 16 control 20.
17 Figure 5 is a detailed logic diagram of the 18 multiplexor 94.
19 Figure 6 is a detailed logic diagram of the multiplexor 96.
21 Figure 7 is a detailed logic diagram of the 22 multiplex timing 108.
23 Figure 8 is a detailed logic diagram of the 24 multiplexor 128.
Figure 9 is a detailed logic diagram of the 26 shift command 162.
27 DISCUSSION OF THE PREFERRED E~BODI~ENT:
28 Theory: Error correction by the Regional Context 29 Maximum Likelihood technique is performed by means of a conditional probabilistic analysis. This approach evaluates 31 the likelihood that each member of a predetermined class of 32 reference words being considered, could have been mapped into ~062810 1 the garbled character string by means o the OCR segmenta-2 tion error prepensities.- The subitance of the likelihood 3 analysis physically means the computation of an analog 4 distance between a reference word and the garbled data, weighted by the a prieri probability that the refenence 6 word would have occurred in the alpha fields being OCR
7 scanned. Mathematicaily this-analysis is formulated by 8 the conditional probabilistic-statement 9 P(reference word-¦~arbled alpha stripg) =
P(reference word, ~arbled alpha string) (1) P(garbled a pha str1ng 11 The denominator ~f equation (1) is essentially a scaling 12 factor and has the same ~alue for all the entries being 13 compared to the garble~ alpha string. Hence, the ralative 14 ranking of eaçh antry (i.e., the probability of each reference word mapping int~-the garbled alpha string~ is 16 based bn the numerator yieided in equation (2~. Tharefore, 17 for the rest of our error correction analysis, we only 18 have to focus on what-maximizes the numerator. Applying 19 Bayes theorem, the numerator in equation ~1) can be restated as:
21 Ptreference-word,-garoled alpha string) = (2) 22P(garbled alpha st~ing¦reference word). P(reference word) 23The probability factor~ P keference word) is calied 24 the a priori probability of the event. Ih this case, it is the probability that the raference word being campared to the 26 garbled character string would appear in ~he textual data 27 being ~canned. The a priori probabilities related tD the 28 occurrence of a word in taxtual data being scanned iB- a~
29 function of the generic form of subject matter to which it .
1 pertains. Although these a-priori probabilities are empiri-2 cally determine~-, for -ganarai-text processing applications, 3 their value is considered-uniform over all words, as a first 4 approximation. Thus, f~r-ganeral text processing, the a priori term in equation--(2~ is dropped. In maii pr~-cassing 6 applications, the a priori-probabilities related to each entry 7 in the directory -can be its occurrence rate in the National 8 Zip Code Directory. ~ m~ra-accurate and correct computation 9 would follow by bsing a ~ata base where the referenca word occurren~e probability-depends on actual mailpiece volume.
11 The probability factor:
12 P(directory-entry; garbled alpha ~tring3 (3) 13 is called the likelih~od. -The major computational-efort-of 14 the Regional Context Maximum Likelihood Error Correction-procedure centers around tha-evaluation of this expression-.
16 In the evaiuation of tha likelihood factor thera 17 must be captured in a probabilistic form, the misrea~ propan-18 sities of the OCR. Tha ~onditional format of equation -(3) 19 poses the likelihood as: "Given a refetenca word, what is-the probability of the OCR misread propensities having m-apped 21 it into the garbled alpha string." Since the OCR rec~gnizes 22 an alpha field on a character-by-character basis, (i.e., it 23 does not directly recogniza words as single entitias3, --24 equation (3) i9 really the product qf a series o independent probabilistic events. In this-perspective, thare ara two 26 categories of OCR misrecognition that must be addtessed.
27 They are:
28 a. Substituti~n 29 b. Segmentation.

10628~0 1 5~bstituti~n Maximum~Likalihood Analysis: OCR
2 substitution manifests ltself in two ways. The first ls 3 character substitution-. This-implies the recognitiDn unit 4 has captured the video of a single character, but the fea-tures required for alpha determination are aliased as another 6 character. Logically, this can only occur if th~-e lS ~ome 7 degree of similarity in shapes of thé-xespective alpha 8 characters involved. Examples of such letter combinations 9 are: B, D; D, O; O, C; 1, i; etc. The Yecond form of the substitution mani~estation is character reject. As with 11 character substitution, the recognition unit capt~r~s a single 12 character. However, rejection occurs because of the inability 13 of the recognition logic to relate to any character or because 14 more than one set of alpha determination logic is satisfied by the character features isolated. This general sit~atlon 16 is referred to as character reject. In this discussion, all ~ 17 rejects are denoted by asterisk (*).
18 From a probability standpoint, both of the proceeding 19 misread effects can be posed as simple independent conditional probabilities. Respectiveiy, character substitution and re-21 ject substitution would-enter equation (3) as:
22 PC(LilLi) (4) 23 c( I i) (5) 24 They re~resent the-probability that the alpha character Li i~ scannéd by the OCR and Li or * are ~utput.
26 This probability-data is-derived from a character c~nf~sio~
27 matrix and is prestored in storage, requiring no c~mputation 28 time. The character confusion statistics may be compiied 29 separately relative to upper and lower alpha characters.

~06Z810 Example 1 indicates how equations of the form (4) 2 and (5) can be applied in the sayeSian decision process used 3 in the invention. The garbled word is CDRNWA*L and the entry 4 form the predetermined class of reference words that is belng tested is CORNWALL. The llkelihood factor is given by the 6 probabilistic series of independent events as shown in 7 Example 1.

8 The likelihood factor hence is the product of a 9 number of independent-character confusion probabilities that results in a relative value which can be compared with that 11 generated by each of the other words under test. The 12 reference word which has the highest probability of being 13 the original word is chosen, provided it meets certain 14 reasonableness criteria.

Example 1:

16 Garbled Word = CDRNWL*L

17 Reference Word = CORNWALL

18 Likelihood Factor = P(CORNWALL¦CDRNWA*L
19 P tCIC)PC (DIO)PC (R¦R)PC (N¦N) .~ Pc~ ¦ c Segmentation Maximum Likelihood Analysis:
21 Segmentation differs from substitution in that its independ-22 ent events correspond to groupings of at least two characters.
23 Nominally, there are three types of segmentation errors.
24 They are:
a. Horizontal Splitting Segmentation 26 b. Concatenation 5egmentation 27 c. Crowding Segmèntation 28 The underlying-mechanical factor which all the 29 above segmentation types have in common is that they are generated by the improper discernment of character beginning 31 and end points.

10628~0 \

1 Horizontal Splitting Segmentation (HSS): Hori ~ontal 2 Splitting Segmentatin (HSS) is prone to broad (wide) upper-3 case characters such as W, M, N, U, O, and C. The HSS
4 effect evolves when the recognition unit is misled into cutting one of these characters into two portions. Each 6 portion is in turn reviewed-by the recognition logic as i~
7 it were a legitimate character. This results in several 8 patterns of characters and/or asterisk misrecognitions.
9 Several of the more common forms are indicated in Table I.
TABLE I. Horizontal Splitting Segmentation IH OD
PL OE
H PI OK
*I o OS
O*
I* *I
ME ,OY
MR
NN LI
N* U TI
M *C I*
*L
*N W I*W
**
IL
NO
NT
N R*
RI
TN
*I
11 From a probabilistic standpoint, the preceding 12 mi~read effect can be passed-as a dual aliasing efect 13 conditional upon-the occurrence of one of the set of upper-14 case letters noted above. Functionally, this is indicated as:
c j j+ll i 16 Obviously the evaluation of equation (1) becomes 17 more complicated when the HSS effect must be taken into 18 account. The control logic of the calculation must consider 1 three possible conditions when one of the segmentation prone 2 characters -Li - is encountered. They are:
seg 3 a. Li has given rise to a simple substitution seg 4 effect of the form:
5PC(LjlLiSeg) (6) 6c( ¦ i ) (7) 7 9eg 8 b. Li has been improperly segmented given rise seg 9 to the HSS effect of the form:
c j Jj+ll iseg (8) 12 c. Li has been properly recognized and output seg 13 giving rise to:
14c( i I i ) (9) seg seg 16 The presence of this last possibility is especially 17 difficult to correctly discern since the most common type of 18 character HSS recreates itself and an additional spurious 19 character.
The analytic details of the inclusion of HSS in the 21 evaluation of the likelihood ~actor (equation 3), will be 22 delayed until it can be elaborated in conjunction with the 23 concatenation se~mentation error to be discussed next.
24 Concatenation Segmentation ~CS): Concatenation segmentation -(CS) is~nearly the mirror i~age o~ HSS. It 26 mainly occurs among closely spaced lower case characters.
27 Mechanically, CS evolves when the recognition unit is un-able 28 to discern in the scan, the presence to two individual 29 characters. Hence, the OCR recognition logic proceeds to process the characters in a logically concatepated manner.

~0621310 1 This effect occurs mainly due to address data 2 being printed in a stylized manner or by crowded typewriter 3 slugs. Table II contains several of the most CS prone 4 letter combinations.
In a probabilistic format the CS event can be 6 posed as:
C(Lj LiLi+l) (ln) Pc ( LiLi+l) ( 11) TABLE II. Concatenation Segmentation br do en ff fr gu la ok or rv sa tn *

fr ir la mr or ra rg ro rs sa el en er es ne ue ja sa ta a ck ch ci el kl la kl ra rt m io jo or o mr ok k nen ry y rv dy d em 1 The latter event tll) may be partic~larly difficult 2 to isolate while evaluating equation ~3~. This follows, 3 since Li itself may have a high propensity for mapping into 4 a reject character (*), and therefore be suggestive of a plain substitution instead of a CS.
6 Further Evaluation of the Likelihood Factor:
7 To structure an effective and efficient eval~ation 8 methodology for the likelihood factor equation (3), the 9 commonality of its possible constituents must be stressed.
Es~entially each of the candidate aliasing effects described 11 above is constituted as a confusion probability. The only 12 additional factor that must be accommodated in the analysis is, 13 that uniike the treatment of simple substitution sh-own in 14 Example 1, a one-to-one correspondence between characters in a reference word and the-garbied data field no longer strictly 16 holds. This, of course, follows since the occurrence of an 17 HSS error in one character of a directory word will create 18 two characters in its garbled representation. The direct 19 converse holds for CS error. Implicit in each of the above segmentation possibilities is the requirement ko realign 21 the remainder of the garbled error word to compensate for 22 the character misalignment effect incurred due to the 23 presence of a se~mentation error.
24 To conigure an efficient and relaiable method and apparatus which accommodatas the proceding segmentation 26 oriented considerations, when evaluating the Likelihood Factor, 27 three innovations are appended to the procedures as appiied 28 in the evaluation of equation (3). The innovations are:

1062~10 1 a. Exception Character and Character Pair Flagging.
2 b. Right to Left Structuring of the Likelihood Evaluation.
3 c. Use of Regional Contaxt.
4 Exception Character and Character Pair Flagging:
There are about half a dozen HSS prone characters 6 and about a dozen CS prone character pairs. By thamselves 7 they constitute only a small part of the alpha composition of 8 the class of reference words. Unless a flag is encountered, 9 the likelihood factor analysis proceeds as if simple character substitution was the only garbling factor to be c~nsidered.
11 Only when a flag is encountered does the invention execute the 12 logic or treating possible segmentation occurrences.
13 Special characters can be inserted into each word 14 where its segmentation prone characters or character pairs exist. This, h~wever, has the drawback of increasing average 16 word length and destroying the compactness of the reference 17 word dictionary that is important for I/O efficiency-. Hence, 18 to accommodate the-flagging and storage requirements, a 19 special alpha character storage convention is ad~ptad. Each alpha character is stored using only 5 of the 8 bits usually 21 used to store a character. The only 3 bits will then be 22 used as 8 flag code combinations, two of which are 23 respectively delegated for HSS character and CS character 24 pairs.
If for illustrative purposes we denote the HSS code 26 by "1" and the CS code-by "?I' then for example the word~
27 "Walston", which contains both HSS and CS occurrences, would 28 be stored in the dictionary as:
29 ¦WI ¦ ¦A¦ ¦L¦ ¦S¦ ¦T?¦ ¦OI ¦NI ¦

1 Right to Left Structuring of Likelihood Evaluation:
2 The likelihood factor is multiplicative expression 3 and hence is associative. Its evaluation can proceed from 4 right to left as well as from left to right. In example 1, the progression was from left to right, the viewpoint of a 6 human reader. The OCR, ~n the other hand, usually recog-7 nizes and processes characters from right to left.
8 It appears the stabiiity o the likelihood factor 9 analysis is enhanced if its evaluation proceeds from-right to left. This is not intended to preclude other formats of 11 progression through thè word but rather to indicate a mode 12 which offers promise in prasent applications. This foliows 13 since it allows the delineation of independent events to be 14 evolved in the sama sequence of analysis, and creates a perspective that is better suited for analyzing seri~s of 16 segmentation prone characters. In addition, accurate -17 initial alignment of the garbled data and the dictionary 18 entry tends to be more réadily achieved when the pairing 19 starts from the right and proceeds to the left. The mis-leading effect of an added character at the beginning of 21 the address field due to strong left edge effects of a 22 first position upper case letter, is circumvented. This 23 increases the possibility c~ isolating such spurious 24 characters and addressing their error correction in an ad hoc manner.
26 Use of Regional Context:
27 The unifying factor for allowing HSS and CS to 28 be effectively accommodated-in the likelihood factor 29 computation is "regional context."

1062~10 1 Unless an alpha character in a directroy entry-is 2 preceded by a flag, it is assumed that it enters in~o the 3 likelihood factor analysis as an event of the form 4 PC(Li¦Lj), where i=j is among the possibilities. This implies that only the possibility of simple substitution is 6 being assumed. If a flag-is encountered in the dircetory 7 entry, then the analysis associated with the likelihood 8 factor must address, in addition to simple substitution, 9 the possibility of segmentation. The use of regional context now enters as follows:
11 Assume that the flag indicates the possibility 12 of HSS. At this point, as indicated previously, three 13 possibilities exist. They are:
14 (1 and 2): PctLjlLiseg where j = iseg is included 16 (3) PC(LjLj+llLiseg) 17 To proceed with the evaluation of the likelihood 18 factor a decision must be made between ~vents 1 and 2 and 19 the ~SS event posed by 3. The decision mechanism rest on the use o Regional Context.
21 If condition 3 is true then the remainder of the 22 garbled character string must be right adjusted ona 23 position. This changes the existing correspondence relation-24 ship between the characters of the garbled alpha string and the reference word.
26 The cha~ge or shift in "adjacent context" is 27 reflected in terms of the likelihood factor constituents as:
28 c( jLj+l¦LiSeg) c( i+2l i+l) (12) 1 If condition 1 or 2 is correct, then the "adjacent 2 context" is not distributed and the likelihood factor 3 constituents corresponding to those in equation (12) are:
4 PC(LilLi ) Pc(Lj+llLi+l) (13) TABLE III. P(WAl*CL*¦WIALST?ONI) Exception Logic Likelihood Factor Evaluation Decision 1:
Determine if a possible HSS
occurred to ~N
A ~ PC(1* IN) PC(C 1 ) versus B - PC(* IN) PC(11 ) Assume A accepted (i .e., P (l¦O) is an impossible event) Pc(l*lN)Pc(ClO) Decision 2:
Determine if a possible CS
occurred to "ST"
A ~ Pc(* ¦ST) PC(1 ¦L) versus B - PC(*¦T)P(1¦S) Assume A accepted: (i.e., P (l¦S) is an impossible event) P (l*¦N)P (C¦o)P (*¦ST) PCC(llL)Pc~AlA) C
Decision 3:
Determine if a possible HSS
occurred to "W"
P (Space,W¦W) is an impossible e~ent pc(wlw)accepted Pc ~ I N) PC (C I ) PC ( * I ST) PC~llL)Pc~AlA)Pc~wlw) l The decision concerning the presence or absence of 2 HSS then follows by which formulation, equation (12) or (13), 3 yields the larger probability value.
4 Similarly if a flag denoted the presence of a character pair which is prone to CS then:

6 Pc(L;ILiLi+l) PC(Lj+llLi~2) (14) 7 would pose the related constituents of the-likelihood factor 8 under that-supposition. This expression would be evaluated 9 relative to:

pC(LilLi) PC(Lj~llLi+l) (15) ll which is the likelihood factor evaluation-progression that 12 would exist in-the absence of a CS mis-read. The decision 13 criterion, as-with HS5 mis-read, would-be-based-on the 14 relative magnitudes of the respective products in equations (14) and ~15). ----16 Figure i and Table III further iliustrate the 17 implementation~of~"Regional Context"-in segmentation type 18 error correction; Figure 1 shows schematically-the basic l9 mechanism for splitting segmentation manipulation ~or the word CO~NWALL.- Table III shows the-step-by~step evaluation 21 of the likeiihood factor corresponding-to-the-directory 22 entry "Walston" whare OCR-garbled word has the form "WAl*CI*."
23 Crowding Segmentation~
24 Crowding segmentation tCR~ differs from HSS and CS
error types-by not effecting word iength;~~The causative 26 actors related to CRS are character spacing and ~uxtaposition.

WA9-73-006 -l9-~06Z810 1 A potential CRS event occurs when the recognition unit 2 isolates two characters whose close proximity induces the 3 OCR to misassign their scgmentation point. This effec-4 tively mixes portions o one character into the video representation of the other. A mis-read results if the 6 addition of the neighboring character segment:
7 a. Creates a composite that triggers the recognition 8 logic of a different character 9 b. Interferes with the recognition analysis and leads to a reject ~*) output.
11 The preceding decision related to the video 12 mechanics which evolve a CRS segmentation event. ~he 13 overxiding typographical factor behind CRS~is the character 14 geometry. Only a relative few of the 676 possible diagrams are prone to snowballing a print crowding effect into a 16 character misread as described above. An example of such 17 a character pair and the evolutio~ of:~-CRS event is 18 shown in Figure 2 where the "re" diagram-.maps into a "n*"
19 combination. It should be noted that-the-observed video would not have evolved if the subject diagram was "er"
21 or "ri".
22 The appropriate confusion data related to the .23 CRS events can be respectively.quantified in the form:
24 c(LjLj+11LiLi+1) (16) The evaluation of the likelihood factor in 26 equation (3) follows, by first denoting-the possible CRS
27 prone letter digrams in the directory entries by a special 28 character. The evaluation progression at this point then 29 considers the two possibilities:

10~;2~310 c( j+llL~ C(L~ILi) (17) 2 versus 3 pc~Lj~lLilLiLi~l~ (18) 4 Since unlike the HSS and CS evaluation, no changes in character string length must be taken into account, the 6 choice of how to treat and include the digrams-Li~lLi, in 7 the likelihood calculation, follows from a determination of 8 which of the expressions equations (i7) or (18~ yields the 9 larger probability.
Expediencies for Decreasin~ Apparatus Requirements:
11 The Regional Context procedure must be accomplished 12 under real time constraints. A further-substantial decrease 13 in computational re~uirements wili accrue-by-appending to the 14 basic error correction method, a dictionary candidate screen-ing process.
16 The package of logical screening processes includes:
17 a. Go/No Go Thresholding 18 b. Premature Termination Threshold.
19 Go/No Go Thresholding:
Go/No Go Thresholding accomplishes a substantial 21 decrease in computation to be performed-by t~e error correct-22 ion function by terminating the consideration-of a directory 23 entry as soon as its likelihood factor-drops below a fixed 24 tolerance, of the largest likelihood factor yielded as yet in the analysis. It will be recalled, the likelihood factor, 26 equation (3) measures in a probabilistic fashion, the degree 10628~10 of match or mismatch present between a- garbled word and 2 fetched directory entry. The- evaluation- fo~nat of the 3 likelihood naturally lends itséif to- this- type of thresh-4 olding. It is evaluated as a- series of multipli-cations of confusion probabilities ~values between 0 and less than 1).
6 As with any multiplicative series of terms less than 7 one, such successive multiplication decreases the value of the 8 existing product.
9 The normal tolerance level is taken to be the largest likelihood computed so far in the analysis. Use of 11 this threshold criterion will markedly decrease computation.
12 Following is an example of the Go/No Go Thresholding 13 implementation. If we assume an 80 percent likelihood factor 14 has been yielded so far in the analysis, then the tolerance level, assuming a 10 percent factor will be 16 percent. Hence, 16 for the error correction routine to continue, consideration of 17 any forthcoming entries requires it maintain a likelihood 18 value of at least 72 percent. Most directory entries show a 19 suficient incompatibility within one of two characters to drop below the tolerance and are therefor terminated.
21 Premature Termination Threshold:
22 This thresholding operation is related to the Go/No 23 Go Threshold, ocusing, however, on the other end of the 24 probability spectrum. It allows termination of a candidate directory word as soon as it drops below a minimum acceptable 26 probability threshold. Its value follows from the fact that 27 no matter how dissimilar a garbled word and a directory 28 entry are a likelihood factor is computable. Such a 29 likelihood computation however quickly converges toward 106Z~310 1 zero. By placing a lower limit on the acceptable likelihood 2 value, the term by term comparison of an only casually related 3 directory entry can be terminated prematurely as soon as it 4 drops below the threshold.
Although the discussion and analysis related to the 6 evaluation of the likelihood factor has been posed in terms 7 o a series of multiplication operations, they may alternately 8 perform as an addition of prestored logarithmic values (logs) 9 of probabilities The Apparatus:
11 A data processing system is disclosed for selecting 12 the correct form of a garbled input word misread by an 13 optical character reader so as to change the number of 14 characters in the word by character splitting or concatenation.
Dictionary words are stored in the system, having characters 16 which are ~lagged for segmentation or concatenation OCR misread 17 propensity. The OCR word and a dictionary word are loaded into 18 a pair of associated shift registers, aligning their letters 19 on one end. The dictionary word characters are inspected for error propensity flags. ~hen a splitting propensity, for 21 example, is found for a character, special conditional 22 probability values are accessed from storage and a calculation 23 is performed of the probability that the first character of 24 the dictionary word was split by the OCR into the first and second characters of the OC~ word. This regional context 26 probability is compared with the probability of a simple 27 substitution error for the characters. If the probability 28 of segmentation is larger, the OCR characters in the first 29 shift register are shifted one space with respect to the 1 dictionary word characters in the second shift register so 2 that subsequent character pairs to be compared are properly 3 matched. The greater calculated probability is combined 4 in a running product. The dictionary word with the largest running product is output by the system as the most likely 6 correct form of the garbled OCR input word.
7 Figure 3 is a detailed block diagram of the regional 8 context maximum likelihood OCR error correction apparatus. An 9 optical character recognition machine outputs on line 2 a sequence of alphabetic fields and numeric fields which have 11 been scanned from a text under examination. Line 2 constitutes 12 the OCR input to the inventive apparatus shown in Figure 3.
13 The OCR input line 2 inputs the sequences of alphabetic and 14 numeric character fields to the word separation detector 4 which detects æeparation markers between character fields 16 and identifies the fields as alphabetic or numeric. Numeric 17 fields are outputted from the word separation detector 4 onto 18 line 6 and are directed to the alphanumeric output register 8 19 and then to the system output line 10. Alphabetic character fields are directed by the word separation detector 21 4 over line 12 to the OCR word shift register 14. The 22 character counter 18, connected to the alpha field output 23 line 12, counts the number of characters input to the OCR
24 word shift register 14, and transmits the count over line 19 to the shift control 20. A detailed logic diagram of shift 26 control 20, is shown in Figure 4. Shift register 14 stores 27 the characters of the input OCR word in an arrangement 28 ordered in the sequence of which the characters are received 29 over input line 2. Shift register 14 has three adjacent 10628 ~0 storage cells Kl, K2~ and K3 and the e3nd character of the ;~ input word is initially stored in cell Kl.
3 A dictionary storage 28 is shown ln Figure 3 which 4 stores the predetermined class of reference words as a dictionary. For general English text processing applications, 6 the words appearing in conventional dictionary such as 7 Webster's Third International, may be stored in storage 28.
8 For specialized text processing applications, more limited 9 and vocabularies may be employed. In mail processing applications a national street name directory may be employed.
11 Selected characters composing selected ones of the dictionary 12 words stored in storage 28 have in association therewith an 13 error propensity indicium or flag for indicating the propen-14 sity of the character to being misread by the optical character reader, through an error mode which changes the number of 16 characters in the misread word. The dictionary -store 28 has 17 a cantrol input line 46 connected to an.output of the word 18 separation detector for i~dicating when a new alphabetic 19 character field has been received from the OCR over line 2.
If the analysis of the most likely correct form for the next 21 previously received OCR word has been completed, the receipt 22 of a signal over line 46 causes the dictionary store to 23 reset its list of words so that the analysis of the new 24 OCR input word may commence. The dictionary store 28 may optionally have a bulk storage input 3 which could for 26 example supply selected categories of reerence woxds which 27 are most likely to match with the particular type of OCR
28 word received on the OCR input line 2.
29 A dictionary word shift register 26 is shown in 30 ~igure 3 having an input connected to the output of the 31 dictionary ctorage 28. The shift register 26 ~tores the characters of a dictionary word input from the dictionary 2 storage 28, in an arrangement ordered in the sequence in 3 which the characters are received. The dictionary word 4 shift register 26 has three adjacent storage cells Ll, L2, and L3 and the end character of the dictionary word 6 loaded into shift register 26 is initially stored in cell 7 Ll. The character in the Ll cell of shift register 26 8 should correspond with the character in the Rl cell of the 9 shift register 14. To acco~nodate the flagging of segmentation misread propensities for the reference words, 11 a special alpha character storage convention can be adopted.
12 Each alpha character is stored using only five of the eight 13 bits usually used to represent a character. The other 14 three bits are used as the flag code. There are therefore eight flag code combinations, two of which are respectively 16 assigned for horizontal splitting segmentations of a character 17 and concatenation ~egmentation of charaGter pairs. However, 1~ to more clearly illustraté the apparatus of the invention, 19 Figure 3 shows an alternate configuration with a flag bit shift register 34 having an input connected to the output 21 of the dictionary storage 28 for storing the flag bit 22 indicating the OCR misread propensity for selected characters 23 stored in the dictionary shift register 26. It is recognized 24 that a completly separate flag bit shift register 34 connected by means of line 32 to the dictionary storage 28 and holding 26 flag bits distinct from the alphabetic characters stored in 27 the dictionary word shift register 26, would be an equally 28 feasible embodiment, to the preferred one disclosed. The 29 shift control 20 controls the shifting of the contents of the OCR shift register 14 over line 22, and controls the 106Zl310 1 shifting of the contents of the dictionary shift register 2 26 over line 24. The shifting o~ the contents ~f the flag 3 bit shift register 34 is also controlled by line 24 so 4 that shifting operations for the dictionary shift register 26 and the flag bit shift register 34 are always in unison.
6 The shift control 20 accepts over line 24 from the dictionary 7 store 28 the dictionary word length for the dictionary word 8 s~ored in the dictionary shift register 26. A detailed 9 logic diagram of the shift control 20 is ~hown in Figure 4.
The multiplexor 94 has three input lines connected 11 to the OCR word shift register 14, line 82 connecting to the 12 Kl cell, line 84 connecting to the K2 cell, and line 86 13 connected to the K3 cell. A detailed illustration of the 14 multiplexor 94 is shown in Figure 5.
The multiplexor 96 has three inputs connected to 16 the dictionary word shift register 26, line 88 connected to 17 the ~1 cell, line 90 connected to the L2 cell and line 92 18 connected to the L3 cell. A detailed illustration of the 19 multiplexor 96 is shown in Figure 6.
2~ The flag decoder 100 in Figure 3 has an input 21 connected to the last cell 35 in the flag bit shift register 22 34, corresponding to the Ll cell in the dictionary word 23 shift register 26. The flag decoder has four output lines;
24 102a indicating a probable simple substitution, 102b indicating a probable character splitting, 102c indicating a 26 probable character pair concatenation and 102d indicating a 27 probable character pair crowding. The flag decoder output 28 lines are collectively denoted as line 102.

1 The multiplex timing 108, shown in Figure 3, has an 2 input line 102 connected to the flag decoder 100, and an 3 outp~t line 110 connected to multipliors 94, 96 and 12B. A
4 detailed logic diagram of the multiplex timing 108 is shown S in Figure 7.
6 The address register 116 is connected by input lines 7 112 and 114 to the multiplexor 94. The address register 122 8 is connected by input lines 118 and 120 to the multiplexor 96.
9 The conditional probability storage matrix 124 has a first and second input connected to the address registers 11 116 and 122, respectively. ~he conditional probability 12 storage matrix stores a first type conditional probability 13 P(Kn/Lm) that the OCR word character stored in cell Kn f 14 said first shift register was misread by character substitution given that the dictionary word character stored in cell Lm f 16 said second shift register was actually 3canned, for n=1, m=l;
17 n=2, m=2; n=2, m=3; and n=3, m=2. The conditional 18 probability storage matrix also stores a second type 19 conditional probability P(KlK2¦Ll) that the OCR word character stored in the cells Kl and K2 of the OCR word shift register 21 14 were misread by character splitting, given that the 22 dictionary word character stored in cell Ll of said 23 dictionary word shift register 26 was actually scanned.
24 The conditional probability storage matrix 124 also stores a third type conditional probability P(K1¦LlL2) that the 26 OCR word character stored in cell K1 of th~ OCR word shift 27 register 14 was misread by character concatenation, gi~en 28 that the dictionary word character stored in cell Ll and L2 29 of said dictionary word shift register 26 were actually scanned. The conditional probability storage matrix 124 31 contains still a fourth type conditional probability 10628~0 1 P~K1K2 ¦L1L2) that the OCR word characters stored in cells 2 Kl and K2 of the OCR word shift register 14 were misread 3 by character crowding, given that the dictionary word 4 characters stored in cells Li and L2 of the dictionary word shift register 26 were actually scanned.
6 Table IV shows some sample values for the 7 probabilities stored in the probability storage 124 for the B first, second, third and fourth type conditional probabilities.
9 In theory, the combinations of upper and lower case letters considered could be exhausted with conditional probabilities 11 being stored for P(Ai¦Aj) P(AiAj¦Ak) P(Ai¦AjAk) and 12 P(AiAj¦AkAl), i, j, k and 1 taking all possikle values.
13 However, many of these conditional probabilities are found 14 to be vanishingly small, and thus only those letter combinations having a probability greater than a selected 16 threshold, are stored, in practical applications. This 17 threshold is emperically determined and depends upon the 18 particular OCR characterizéd.

900-~L-6~M
P o o o o o ~~o 8 ~~~ ~~~
P ooo oo o o o o o o o o o I i I I I I I I I I I
I I I I I I I I I I I
o o o o ~ ~ o ~ o o o------o c~----o o o o oo oo o o o o o o . o o l l l l l l l l l l l ~! I I I I I I I I I I I
E~ o o O O O O O
3 ooo oo oo o o o o 8 o o o ~ o o 8 u~ g g $
~ ~ o o o------o o------o o ------ ------ooo o oo o o 'I __o o___o o ___ ___ :c o o o o o o o ~ ~ ooo oo oo v~ o o o o o o o .

o o o o o o ` o $ ~
o o o o o o o o o o o o o o o ~ o o o o o Oo 80 o o o o o o o t I ~ l ' ',, l l, l l l l l l l l l l l H O _I 0-__0 o___o o O o___~__o w ~ o o o o 8 o g o o o 8 o . o o o o o o o o o ~; ~ x ~1 _I ~1 _I ,1 ~1--I --I
E~ o ~ o o o o o o o o ~
~ oo~ oc~ oooo_o o o8 go $8$$ 8 .
o o o o o o o o o o o ~ 8 8 8---8 8---8 $ g 'o~~~g~~8 .. o o o o o o o o o o o O O CS~__O 0___0 0 0 ~L__O __O
~ go 80 888 $ $
oo oo ooo o o E~ ~ o ~ g--8 o~~~8 8 8 g--g--8 8. o. 8,o, 8. 8.8o o. 8, ¢ ` o Co'~~8 g--8 8 8 8---g--8 E~ 88 o8 88gg 8 8 .. .. .... . .
u, ~ _I
~ o u~
¢ ~ z; O ~

~ Z; P. ~ E~ ~ ~ ¢ ~: ¢ C~ E~ ~ ~ cn 106Z~10 1 The aforesaid probabilities are accessed by the 2 ~omponent OCR word characters and dictionary word characters 3 which are selectively switched by the multiplexors 94 and 4 96 under the control of the flag decoder 100, into the address registers 116 and 122, respectively. The operation 6 of the multiplex timing 108 is shown in Table V. The 7 operation of the multiplex 94 and the multiplex 96 is 8 shown in Table VI.

TABLE V. Multiplex Timing t108) FLAG NUMBER PULSES
Substitution Splitting 4 Concatenation 4 Crowding 4 TABLE Vla. Multiplex Switchihg Flag bit = Character Substitution Timing Pulse (108) Multiplex (94) Kl NONE NONE NONE
Multiplex (96 Multiplex (128) line (164) TABLE Vlb. Multiplex Switching Flag bit = Character Splitting Timing Pulse (108~

Multiplex (94) Kl R2 Kl,R2 R3 Multiplex ~96) Ll L2 Ll 2 Multiplex (128) Reg. Reg. Reg. Reg.
(130) (132) (134) (136) " 106~810 TABLE Vlc. Multiplex Switching Flag Bit = Character Concatenation Timing Pul~e ~108) Multiplex (94) Kl K2 Kl K2 Multiplex (96) Ll L2 Ll,L2 L3 Multiplex (128) Reg. Reg. Reg. Reg.
(130) (132) (134) (136) TABLE Vld. Multiplex Switching Flag Bit = Character Crowding Timing Pulse (108) Multiplex (94) Kl K2 1' 2 Multiplex (96) Ll L2 Ll,L2 OPEN
Multiplex (128) Reg. Reg. Reg.
(130) (132) (134) Load 1 into Reg. (136) ~he multiplex 128 has a data input c:onnected to 2 the probability output line 126 from the conditional 3 probability storage matrix 124. The multiplex 128 operates 4 under the control of the multiplex timing 110 and the flag decoder 100 to se~uentially distribute conditional 6 probabilities accessed from the storage matrix 124 into 7 registers 130, 132, 134, 136 or onto line 164, as is 8 described in the multiplex switching Table VI. A detailed 9 illustration of the circuitry for the multiplex 128 is shown in Figure 8.
11 A first multiplier 138 having an input connected 12 to the outpUt of register 130 and register 132, multiplies 13 a first received conditional probability by a second 14 received conditional probability accessed from the.
conditional probability storage matrix 124. The multiplier 16 138 outputs the first probability pr~duct on line 142 17 to the comparator 146.
18 The multiplier 140 has inputs connected to registers 19 134 and 136, for multiplying a third received conditional probability by a fourth received conditional probability 21 accessed from the conditional probability storage matrix 22 124. The second probability product calculated thereby 23 is outputted on line 144 to the comparator 146.
24 The comparator 146 compares the relative magnitudes of the first probability product and the second probability 26 product outputted on lines 142 and 144. If the first proba-27 bility product i~ greater than tbe ~econd probability 28 product, a ~ignal is outputted from the comparator 146 on 29 output line 158 to the gate 148 and to the shift co~[~nand 162.
Alternately, if the second probability product is greater 106Z~310 1 than the first probability product, ~he comparator 146 2 outp~ts on its output line 160 a ~ignal to the yate 150 and 3 to the shift command 162.
4 The shift command 162 has a control input line 102 connected to the flag decoder 100 and an output line 80 6 connected to the shift control 20. A detailed illustration 7 of the shift command 162 is shown in Figure 9 and the 8 operation of the shift command 162 and shift control 20 is 9 illustrated in Table VII. The output line 80 is composed of our lines which direct the shift control 20 to ~hift to 11 the left registers 14 and 26 as follows: line 80a one cell 12 each; line 80b two cells each; line 80c two cells and one 13 cell respectively; and line 80d one cell and two cells 14 respectively.

TABT.E VII. Shift Control FLAG LARGER POSITIONS POSITIONS
PRODUCT SHIFTED IN S~IFTED IN
(MULTIPLIE~) OCR WORD DICTIONARY
S/R (14? S/R (34) Substitution (na) Segmentation:
Splitting (138) (140) Concatenation (138) (140) 1 2 Crowding (138) (140) 2 2 If the flag decoder 100 determines that the error 16 propensity flag for the character ~tored in the Ll cell of 17 the dictionary word shift register 26 ~hows a propensity to 18 simple character substitution, the operation is in accordance with Table VIa. The multiplexor 94 ~ignalled 2 over line 102a by the flag decoder 100 and receiving a single 3 timing pulse over line 110, switches the contents of the 4 Kl cell of the OCR word shif~ register 14 over line 114 to the address register 116. Simultaneously, the multiplexor 6 96 signalled over line 102a by the flag decoder 100 and 7 receiving a single timing signal over line 110, switches 8 the contents of the Ll cell in the dictionary word shift 9 register 26 over the line 88 and over the line 120 to the address register 122. Then the conditional probability 11 P(Kl¦Ll) is accessed from the initial probability storage 12 matrix 124 and outputted over the probability output line 13 126 to the multiplexor 128. The multiplexor 128, signalled 14 over line 102a by flag decoder 100 and receiving a single timing signal over line 110, switches the first type 16 conditional probability input over line 126 to the output 17 line 164, bypassing the comparator 146.
18 If, instead, the error propensity flag stored in 19 association with the character stored in the Ll cell of the dictionary word shift register 26 indicates a 21 propensity to character splitting, then the operation is 22 in accord with Table VIb. The multiplexor 94, signalled 23 over line 102b by the flag decoder 100 and receiving four 24 timing signal~ over line 110 from multiplex timing 108, will ~witch the sequence of four characters or character 26 combinations from the cells Kl, K2 and K3 of the OCR
27 word ~hift register 14, to the address register 116.
28 Simultaneously, the multiplexor 96 will switch a 29 sequence of four characters or character combinations from the cells Ll, L2 and L3 to the address register 122.
WA9-73-006 ~35~

1062~10 1 In response to the first timing pulse from multiplex 2 timing 108, the first Gharacter switched by the 3 multiplexor 94 ls the content~ of the Kl cell and the 4 first character switched by the multiplexor 96 is the contents of the Ll cell. The conditional probability 6 P~Kl¦Ll) is accessed from the conditional probability storage 7 matrix 124 and outputted over probability line 126 to the 8 multiplexor 128. The multiplexor 128, under the control of 9 the flag decoder 100 has a signal on the input line 102b indicating a splitting propensity and in response to the 11 first timing pulse from multiplier timing 108, switches the 12 conditional probability input on line 126 to the register 13 130, for eventual processing by the comparator 146.
14 In response to the second timing pulse for multiplex timing 108, the second character switched by the multiplexor 94 is 16 the contents of the K2 cell and the second character switch-17 ed by the multiplexor 96 is the content~ of the L2 cell.
18 The conditional probability P(K2¦L2) is accessed from the 19 conditional probability storage matrix 124 and outputted over the probability line 126 to the multiplexor 128. The 21 multiplexor 128, under the control of the flag decoder 100, 22 has a ~ignal on the input line 102~ indicating a ~plitting 23 propensity and in response to the second timing pulse from 24 multiplex timing 108, switches the second conditional probability input on line 126 to the register 132, for 26 eventual proces6ing by the comparat~r 146. In response 27 to the third timing pulsQ from the mu~tiplex timing 108, 28 the third character combination switched by the multiplexor 29 94 is the contents of the Kl cell and the R2 cell and the third character switched by the multiplexor 96 is the contents of the Ll cell. The conditional probability 2 P(KlK2¦Ll) is acc:essed from the conditional probability 3 storage matrix 124 and outputted over probability line 4 126 to the multiplexor 128. The multiplexor 128, under control of the flag decoder 100, has a signal on the 6 input line 102b indicating a splitting propensity and in 7 response to the third timing pulse from multiplex timing 108, 8 switches the third conditional probability input on line 126 9 to the xegister 134, for eventual processing by the comparator 10 146. In response to the fourth timing pulse for multiplex 11 timing 108, the fourth character switched by the multiplexor 12 94 is the contents of the K3 c:ell and the fourth character 13 switched by the multiplexor 96 i8 the contents of the i2 cell.
14 The conditional probability P(K3¦L2) is accessed from the 15 conditional probability storage matrix 124 and outputted over 16 probability line 126 of the flag decoder 100, has a signal on 17 the input line 102b indicating a splitting propensity and in 18 response to the fourth timing pulae from multiplex timing 19 108, switches the fourth conditional probability input on line 126 to the register 136, for eventual processing by 21 the c~mparator 146.
22 If, instead, the error propensity flag stored in 23 assoc:iation with the character stored in the Ll cell of the 24 dictiona~ word shift register 26, indicates a propensity to 25 character concatenation, then the operation is in accord with 26 Table VIc. The multiplexor 94, ~ignalled over line 102c 27 by the flag decc~der 100 and receiving four timing ~ignals 28 over line 110 from the multiplex timing 108, will switch 29 the sequence of four characters or character combinations 30 from the cells Kl, K2 and K3 of the OCR word ~hift register 1 14, to the address register 116. Simultaneously, the 2 multiplexor 96, signalled over line 102C by the flag 3 decoder 100 and receiving four timing signals over line 4 110 from the multiplex timi~ng 108, will shift the sequence of four characters or character combinations from the 6 cells Ll, L2 and L3 to the address register 122. In 7 response to the first timing pulse from the multiplex 8 timing 108, the first character switched by the multiplexor 9 94 is the contents of the K1 cell and the first chaLacter switched by the multiplexor 96 is the contents of the L
11 cell. The conditional probability P(Kl¦Ll) is accessed 12 from the conditional probability storage matrix 124 and 13 outputted over probability line 126 to the multiplexor 128.
14 The multiplexor 128, under the control of the flag decoder, has a signal on the input line 10 2C indicating a concatenation 16 propensity and in response to the first timing pulse from 17 multiplex timing 108, switches the first conditional 18 probability input on line 126 to the register 130, for 19 eventual processing by the comparator 146. In response to 20 the second timing pulse from the multiplex timing 10 8, 21 the second character switched by the multiplexor 94 is the 22 contents of the K2 cell and the second character switched 23 by the multiplexor 96 is the contents of the L2 cell. The 24 second conditional probability P(K2 ¦L2) is accessed from the conditional probability storage matrix 12 4 and outputted 26 over probability line 126 to the multiplexor 128. The 27 multiplexor 128, under control of the flag decoder 100, 28 has a signal on input line 102C indicating a concatenation 29 propensity and in response to the second timing pulse from multiplex timing 108, switches the second conditional - 1062~10 probability input on line 126 to the register 132, for 2 eventual proce sing by the compara.tor 146. In response 3 to the third timing pulse from multiplex timing 108, the 4 third eharacter switched by the multiplexor 94 is the eontents of the Ll cell and the L2 cell. The conditional 6 probability P(Kl¦Ll, L2) is aceessed from the 7 eonditional probability storage matrix 124 and outputted 8 over probability line 126 to the multiplexor 128. The 9 multiplexor 128, under the eontrol o~ the flag decoder 100, has a signal on input line 102c indicating a concatenation 11 propensity and in response to the third timing p~lse for 12 multiplex timing 108, switches the third conditional 13 probability input on line 126 to the register 134, for 14 eventual processing by the comparator 146. A response to the fourth timing pulse from the multiplex timing 108, 16 the fourth eharacter switched by the multiplexor 94 is 17 the eontents of the K2 cell and the fourth character 18 switehed by the multiple~or 96 is the eontents of the L3 19 eell. The eonditional probability P(K2 ¦L3) is aecessed from the conditional probability storage matrix 124 and 21 outputted over probability line 126 to the multiplexor 22 128. The multiplexor 128, under the control of the flag 2 3 decoder 100, has a signal on the input line 102c 2 4 indicating a eoncatenation propensity and in response to the fourth timing pulse from multiplex timing 108, switches 26 the fourth eonditional probability input on line 126 to 27 the register 136, for eventual proeessing by the 2 8 eomparator 146 .

106Z8~0 1 If, instead, the error propensity flags stored 2 in association with the character stored in the Ll cell 3 of the dictionary word shift register 26, indicates a 4 propensity to character crowding, then the operation is in accord with Table VId. The multiplexor 94, signals 6 over line 102d by the flag decoder 100 and receiving 7 four timing signals over line 110 from the multiplex 8 timing 108, will switch the sequence of four characters 9 or character combinations from the cells Kl and K2 f the OCR shift register 14 to the address register 116.
11 Simultaneously, the multiplexor 96, signalled over line 103d 12 by the flag decoder 100 and recei~ing four timing signals 13 over line 110 from multiplex timing 108, will switch a 14 sequence of four characters or character combinations from the cells Ll and L2 to the address register 122. In 16 response to the first timing pulse from multiplex timing 17 108, the ~irst character switched by the multiplexor 94 18 is the contents of the Kl cell and the first character 19 switched by the multiplexor 96 is the contents of the L
cell. The conditional probability P(Kl¦L2) is accessed 21 from the conditional probability storage matrix 124 and 22 outputted over probability line 126 to the multiplexor 128.
23 The multiplexor 128, under control of the flag decoder 100, 24 has a signal on the input line 102d indicating a crowding propensity and in response to the first timing pulse from 26 multiplex timing 108, switches the conditional probability 27 input on line 126 to the ~egister 130, for eventual 28 processing by the comparator 146. In response to the 29 second timing pulse from the multiplex timing 108, the second Gharacter switched by the multiplexor 94 is the WA9-73-006 ~40-- ~06ZI!310 1 contents of the K2 cell and the second character switched 2 by the multiplexor 96 is the contents of the L2 cell.
3 The conditional probability P(K2¦L2) is accessed from ~he 4 conditional probability storage matrix 124 and outputted over probability line 126 to the multiplexor 128. The multiplexor 128, under the control of the flag decoder 7 100, has a signal on the input line 102d indicating a 8 crowding propensity and in response to the second timing 9 pulse from the multiplex timing 108, switches the second conditional probability input on line 126 on the register 11 132 for eventual processing by the comparator 146.
12 In response to the third timing pulse from the multiplex 13 timing 108, the third character combination switched by the 14 multiplexor 94 is the contents of the Kl cell and the K2 cell and the third character combinations switched by the 16 multiplexor 96 is the contents of the Ll cell and the 17 contents of the L2 cell. The conditional probability 18 P(Klk2¦LlL2) is accessed from the conditional probability 19 storage matrix 124 and outputted over probability line 126 to the multiplexor 128. The m~ltiplexor 128, under the 21 control of the flag decoder 100, has a signal on input 22 line 102d indicating a crowding propensity and in response 23 to the third timing pulse from multiplex timing 108, switches 24 the third conditional probability input on line 126 to the register 134, for eventual processing by the comparator 146.
26 In response to the fourth timing pulse from multiplex timing 27 108, no characters are switched by the multiplexor 94 and 28 no characters are switched by the multiplexor 96. ~he 29 multiplexor 128, under the control of the flag decoder 100, has a signal on the input line 102d indicating a crowding 1062~310 1 propensity and in response to the :Eourth timing pulse from 2 multiplex timing 108, loads the ~alue 1 which is stored ln 3 the storage register 324, into the register 136, for 4 eventual processing by the comparator 146.
The contents of the register 130 i9 multiplied 6 times the contents of the register 132 by the multiplier 7 means 138 and the product is output on line 142 to the 8 comparator 146. The contents of the register 134 is 9 multiplied times the contents o the register 136 by the multiplier means 140 and the product i~ output on line 11 144 to the comparator 146. The product output by the 12 ~irst multiplier mean~ 138 is the Lirst probability product 13 and the product output by the second multiplier means 140 14 i8 the second probability product. The relative magnitudes of the first probability product and the second probability 16 product are compared in the comparator 146. If the first 17 probability product on line 142 is larger than the second 18 probability product on line 144, the comparator 146 outputs 19 the gating signal on line 158 enabling the gate 14~ 80 as to pass the contents of register 130 on line 149 to the 21 line 152 and then to line 156 . The comparator's signal 22 on line 158 is input to the 3hift command 16 2. If the 23 gecond probability product on line 144 i~ larger than the 24 first probability product on line 142, the comparator 146 outputs a gating signal on line 160 enabling the gate 26 150 so as to pass the contents of reglster 134 on line 151 27 to the line 156. The comparator's output signal on line 160 2B is input to the shift command 162. The first probability 29 product represents the probability that simple character substitutions will occur between the characters stored in W~9-73-006 -42-~06Z~10 1 the Ll cell and the Kl c~ll and the characters ~tored in the 2 L2 cell and the K2 cell. The second probability product 3 represents the probability that a 3egmentatiotl error 4 through character splitting, concatenation, or arowding has occurred between the characters in the dictionary word 6 shlft register 25 and ~he OCR word shift register 14. If 7 the first probability product is larger than the second 8 probability product, then the word stored in the OCR word 9 ~hift register 14 and the word g ored in the dictionary word shit regi~ter 26 can be simultaneously shifted by 11 the same amount in proces~ing the next ~et of letters 12 therein. However, i the ~econd probability product is 13 larger than the first probability product, then some 14 form of character ~egmentation has taken place which may require the differential ~hifting of the word stored 16 in the OCR word shift regi~ter 14 with respect to that 17 for the word stored in ~he dictionary word shift register 18 26, before further processing of subsequent letters in 19 the w~rd can be commenced. This differential shifting de¢ision i9 accompli~hed by the shift command 162.
21 mis ~hift command 162, a detailed diagram of 22 whlch is shown in Figure 9, recei~es control inputs over 23 line 102 from the flag decoder 100. Line 102a indicates 24 a substitution propensity, line 102b lndicates a splitting propensity, line 102c lndicates a concatenation propensity, 26 and line 102d indicates a crowding propensity. The shift 27 command 162 receives over line 158 a gating ~ignal from 28 the comparator 146 indicating that the irst probability 29 product iB greater or receivas over thé line 160 a gating ~ignal from the comparator 146 indicating that the second probability product is greater. The shift co~unand 162, 2 employing the logic shown in Figure 9, outputs a shift 3 command signal over line 80 to the shif~ control 20.
4 Shift control 20 controls the shifting of the contents of the OCR word shift register 14 over line 22 and 6 controls the shifting o the contents of the dictionary 7 word shift register 26 and the flag bit shift register 8 34 over line 24. When the flag decoder 100 indicates 9 over line 102a that the character in cell Ll has a propensity for simple substitution, the shift colrunand 11 162 outputs on line 80a a signal to the shift control 12 20 to shift both the OCR shit register 14 and the dictionary 13 word shift register 26 by one cell. When the flag decoder 14 100 indicates over line 102b that the character in cell L
has a splitting propensity, the shift comrnand 162 will 16 output on line 80a a signal to the shift control 20 to shift 17 both shift register 14 and shift register 26 by one cell 18 when the first probability product is greater than the 19 seco~Ld probability product. When the flag decoder 100 indicates over line 102b that the character in the Ll cell 21 has a splitting propensity, the shift command 162 will 22 output c~ver line 80c, a signal to the shift control 20 2~ to shift the OCR word shift register 14 by two cells and 24 to shift the dictionary word shift register 26 by one cell if the second probability product is greater. When 26 the flag decoder 100 indicates over line 102c that the 27 character pairs stored in cells Ll and L2 ha~re a concat-28 e~ation propensi~y, the first co~unand 162 will output 29 8 ~ignal on line 80a to the shift control 20 to shit both the shift re~i9ter 14 and the shift register 26 by one cell when the first probability product is greater than 2 the second probability product. When flag decoder 100 3 signals over line 102c that the character pair stored in 4 cells Ll and L2 have a concatenation propensity, the shift command 162 will output on line 80d a signal to 6 the shift control 20 to shift the OCR word register 14 7 by one cell and the dictionary word shift register 26 8 by two cells when the second probability product is 9 greater than the first probability product. When the flag decoder 100 indicates over line 102d that the 11 character pair in cells Ll and L2 ha~e a crowding 12 propensity, the shift command 162 will output on line 13 80a to the shift control 20 a signal commanding the shift 14 register 14 and the shift register 26 to both shift by 15 one cell when the first probability product is greater 16 than the second probability product. When the flag 17 decoder 100 signals on line 102d that the character pair 18 stored in cells Ll and L2 ha~e a crowding propensity, 19 then the shift command 162 will output on line 80b a 20 command to the shift control 20 shift both the OCR word 21 shift register 14 and the dictionary word shift register 22 26 by two cells when the second probability product is 23 greater than the first probability product.
24 The conditional probability for a simple 25 substitution outputted by the multiplexor 128 onto the 26 line 164 and the conditional probabilities outputted by 27 register 130 for a simple 3ub6titution and ~egister 134 28 for segmentation, are directed along line 156 to a run-29 ning product calculating means comprising the multiplier 58 and the product register 56 and the clear and store 54.

~062810 1 One object of the apparatus shown in Figure 3 is to find 2 that dictionary word stored in the dictionary storage 28 3 which has the highest r~nning product when compared with 4 the OCR word stored in the OCR word shift register 14. As each new dictionary word iB outputted from the dictionary 6 store over line 30 to the dictionary word shift register 7 26, the dictionary store transmits a signal over line 55 8 to the clear in store 54 to clear the contents of the 9 product register A56 and ~tore the value one therein.
I0 Afi each conditional probability value is received over ll line 156 in the multiplier 58, that probability is 12 multiplied times the contents o~ the product register 13 (~)56 and the product thereof is stored in the product 14 register (A)56. Thus, as the dictionary word stored in the dictionary word shift register 26 is compared with 16 the OCR word stored in the OCR word shift register 14, 17 a running product of the probabilities that the OCR word 18 was misread given that the dictionary word was actually l9 scanned, is calculated and stored in the product register 56. A~ter the dictionary word stored in the 21 dictionary word shift register 26 has been completely 22 processed, the contents of the product register 56 will 23 contain the total probability product for the comparison 24 of the dictionary word with the OCR word. If the total probability product stored in register 56 has a magnitude 26 greater than those for total probability products:
27 previously calculated for dictionary words compared to 28 the OCR word presently stored in shift reyister 14, the 29 total probability product in product register 56 is transferred by means of gate 64 to the highest product 1062~0 1 register (B)66. The comparator 62 maintains a running 2 comparison of the magnitude of the running product 3 ~eing calculated and ~tored in the product register (A) 4 56 with the highest product register (B)66. The magnitude of the contents of the product register (A) 6 56 starts with a value of one which is inserted by 7 the clear and store 54 at the beginning of the comparison 8 for each dictionary word loaded into shift register 26.
9 The magnitude of each probability inputted over line 156 to the multiplier 58 is less than one and thus the 11 magnitude of the running product-being calculated and 12 stored in the product register (A)56 becomes monotonically 13 smaller as the comparison of the dictionary word to the 14 OCR word continues. If the comparator 62 determines that the magnitude of the running product stored in the product 16 register (A)56 is smaller than the contents of the highest 17 product register (B)66, the comparator 62 outputs on the 18 word abort line 78 a signal to the dictionary store 28 to 19 transmit the next dictionary word from the dictionary store over line 30 to the dictionary word shift register 21 26, thereby terminating the comparison of the existing 22 dictionary words with the OCR word. The OC~ word shift 23 register 14 is simultaneously signalled to recirculate the 24 OCR word to its initial position so that the next comparison may commence. As each dictionary word is 26 outputted from the dictionary store 28 over line 30 to 27 the dictionary word shift register 26, dictionary word is 28 also transmitted over line 40 to the word register 38.
29 The comparator 62 maintains a gating signal on line 70 so long as the running product stored in the product register (A)56 remains greater than the contents of the 2 highest product register (B)66. When the last letter 3 in the dictionary word stored in the dictionary woxd 4 shift register 26 has been compared with corresponding letter in the OCR word stored in the OCR word shift 6 register 14, the shift control 20 outputs on line 74 7 an end of word signal which is transmitted to the gate 8 72. If the contents of the product register (A)56 is 9 still greater than the contents of the highest product 10 register (B)66, then the gate i2 is enabled and the 11 gating signal from the comparator 62 on line 70 is 12 transmitted to and enables the gate 64. Gate 64 13 then transmits the total probability product stored in the 14 product register (A) 56 to the highest product register (B) 15 66. Simultaneously the gating pulse from gate 72 is 16 transmitted to gate 42 over line 76 and enables gate 42 17 thereby transmitting the dictionary word stored in the 18 word register 38 to the best word register 44. The end 19 of word signal on line 74 from the shift cohtrol 20 is 20 also transmitted to the dictionary store 28, thereby 21 causing the dictionary store to transmit to the 22 dictionary word shift register 26 the next dictionary 23 word. After all the dictionary words stored in the 24 dictionary store 28 have been compared to the OCR word 25 stored in the OCR word shift register 14, an end of 26 dictionary list signal is outputted on line 48 from the 27 dictionary store 28 to the gate 50, enabling the 28 transmission of dictionary word stored in the best word 29 register 44 out onto line 52 as the most likely alpha 30 field. This alpha field is inputted to the alpha n~meric 1 output register 8 for outputting on the output line 10.
2 Thus, the dictionary word stored in dictionary store 28 3 which was most likely misread by the OCR as the OCR
4 word stored in the OCR word shift register 14, is outputted on line 10.
6 The detailed figure of the shift control 20 7 shown in Figure 4 shows the modulo four counter 204 and 8 the modulo four counter 214 receiving a signal over one 9 of the lines 80a, 80b, 80c or 80d from the shift command 162, which places a limiting value of 1 or 2 on the 11 counters 204 or 214 necessary to generate an output for 12 the reset on the flip flop 206 or 216, respectively. In 13 this manner, the counter 204 can be programmed to shift 14 the OCR word shift register 14 by 1 or 2 cell positions and the counter 214 can be programmed to shift the 16 dictionary word shift register 26 by 1 or 2 cell positions.
17 The clock oscillator 200a and 200b can be a 18 single oscillator whose output wa~eform is counted by the 19 coun~ers 204 and 214. A signal on one of the lines 80 from the shift command 162 sets the limit on counter 204, and 21 through the OR gate 202, resets and starts the counter 204 22 counting the output waveform from the clock oscillator 200a.
23 At the same time the flip flop 206 is set and the A output 24 from the flip flop turns on the AND gate 208 thereby trans-mitting the waveform from the clock oscillator to the 26 decrementing counter 210 and over the output line 22 to 27 the OCR word shift register 14. The decrementing counter 28 210 has been loaded over line 19 with the OCR character 29 count from the character counter 18. As an example of operation, if the shift command 162 has outputted a WA9~73-006 -49-signal over line 80c to the shift control 20, a limit 2 value of 2 is set in the counter 204. Thus, 2 ~iming 3 pulses from the clock osclllator 200a will be transmitted 4 through the AND gate 208 to the decrementing COlmter 210 and over line 22 to the OCR word shift register 14. When 6 the counter 204 reaches ~he limit of 2 and outputs a 7 pulse to the reset of the flip flop 206, the AND gate 8 208 is turned off. In this manner the OCR word 6hift 9 register has been shifted two positions and the decre-menting counter has subtracted from the original 11 character count for the OCR word, the value of two.
12 A 3imilar operation obtalns for the counter 214 which 13 shifts the dictionary wbrd ~hift register 26 and the 14 flag bit shift régistsr 34. When the decrementing counters 210 or 220 have their contents reduced to zero, 16 having fully processed oither the OCR word or the 17 dictionary word respectively, a zero output signal is 18 placed on line 74 which signals the dictionary store 19 28 and 72 as previously discussed.
The detailed figure of the multiplex timing 108 21 shown in Figure 7, depicts the modulo 4 counter 294 whose 22 counting limit is selected by the means of a signal over 23 line 102a, 102b, 102c or 102d from the flag decoder 100.
24 When the dictionary word ~hift register 26 shifts a new character into the cell Ll, the corresponding flag bit 26 stored in the flag bit shift register 35 is outputted over 27 line 98 to -the flag decoder 100. If the flag bit indicates 28 a simple ~ubstitution propensity, a signal is outputted 29 from the flag decoder 100 over line 102a setting a limit value of one into the counter 294. If the flag bit out-WA9-73-006 ~50-1062~10 1 putted to the flag decoder indicates a splitting, concat-2 enation or crowding propensity, the flag decoder 100 3 outputs a signal over one of the lines 102b, 102c or 102d, 4 respectively, setting a limit value of four in the counter 294. The counter 294 is reset and started by means of the 6 OR gate 292 upon receipt of the limit value, and the counter 7 commences to count the oscillator pulses issued from the 8 oscillator 290. Simultaneously the signal from the OR
9 gate 292 sets the flip flop 296 and thereby activates the AND gate 298 so that the oscillator pulses from the 11 oscillator 290 are output on the line 110 as the 12 multiplexed timing pulses to the multiplexor's 94, 96 13 and 128. If a signal has been received over line 102b, 14 a limit of four is set in the counter 294 and the AND
gate 298 permits four multiplexed timing pulses to be 16 outputted on line 110 before the counter 294 reaches 17 the limit value of four and resets the flip flop 296 18 thereby deactivating the AND gate 298.
19 The detailed figure of the multiplexor 94 in Figure 5 shows the modulo 4 counter 230 receiving the 21 timing pulses over line 110 from the multiplexed 22 timing 108. The counter 230 has four output lines, the 23 first labeled "1" connected to the AND gate 232 is on 24 when the counter counts the first timing pulse over line 110. The second output line from counter 230 26 labeled "2" connected to the AND gate 236 is on only 27 when the counter counts the second timing pulse over 28 line 110. The third output line labeled "3" connected 29 to the AND gate 238 and 240 is on only when the counter 230 counts the third timing pulse over line 110. The 1062~310 1 fourth ouput line labeled "4" connected to the AND gate 2 242 is on only when the counter 230 counts the fourth 3 timing pulse input over line 110. If the error 4 propensity flag corresponding to the character stored in the Ll cell of the dictionary word shift register 26 6 indicates a simple substitution, the flag decode 100 7 will output on line 102a a substitution signal which is 8 input to the AND gate 258 of the multiplexor 94 in 9 Figure 5. Since a signal on line 102a is input to the multiplexed timing 108 shown in Figure 7 and sets the 11 limit value of the counter 294 to one, only a single 12 time p~lse will be incident over line 110 to the modulo 13 4 counter 230 of the multiplexor 94 in Figure 5. Counter 14 230 counts the timing pulse over line 110 and turns on the output line labeled "1" connected to the AND gate 232.
16 AND gate 232 thereby gates the contents of the Kl cell 17 on line 82 from the OCR word shift register 14, onto line 18 114 to the address register 116. Output line labeled 19 "1" from the counter 230 is also connected to the gate 258 and a signal thereon conditions gate 258 in combination 21 with the substitution signal on line 102a to transmit a 22 signal through the delay 234 to reset the counter 230.
23 Simultaneously, in the detailed figure of the multiplexor 24 96 shown in Figure 6, the modulo 4 counter 260 receives the first timing pulse over line 110 and turns on the 26 output line labeled "1" connected to the AND gate 262 which 27 gates the contents of the Ll cell over line 88 from the 28 dictionary word shift register 26 onto line 120 to the 29 address register 122. The output line "1" from the counter 260 is also connected to the AND gate 263 which is thereby conditioned to pass the substitution signal on line 102a 2 to the delay 264 to reset the counter 260. Thus~ the 3 contents of the Kl cell of the OCR word shift register 4 14 becomes the contents of the address register 116 and the contents o~ the Ll cell of the dictionary word 6 shift register 26 becomes the contents o~ the address 7 register 122 for the purpose of accessing the conditional 8 probability for simple ~ubstitution P(k~ ) . The 9 detailed figure of the multiplex 128 shown in Figure 8 shows the modulo four counter 302 which receives the 11 timing pulse over line 110 from the multiplex timing 108.
12 Counter 302 has four output lines labeled "1", "2", "3"
13 and "4", each of which is on only when its respective value 14 is counted by the counter 302. The multiplex 128 upon receiving the substitution signal over line 102a activates 1~ AND gate 308 and deactiVateS AND gate 306. With a 17 receipt of the first timing pulse over line 110, the AND
18 gate 304 is activated thereby passing the conditional 19 probability product Ptkl¦ll) accessed from the conditional probability storage matrix 124 and input over line 126, 21 through A21D gate 304 and through AND gate 308 onto line 22 164 and sent to the multiplier 58. The output from ~ND
23 gate 308 serves as the reset signal which is delayed by 24 the delay 309 and resets the counter 302. The operation of the multiplexors 94, 96 and 128 when the error 26 propensity flag associated with the character in the L
27 cell indicates splitting, concatenation or crowding, 28 ~nvolves the sequential loading of registers 130, 132, 29 134 and 136 with conditional probabilities accessed from the storage matrix 124, and will be discussed later in 31 connection with the section on operation.

1(~62810 1 It is recognized that without departing from 2 the spirit and scope of the invention as disclosed in 3 Figure 3, the dictionary store 28 and the conditional 4 probability storage matrix 124 can each be a part of the qame storage means. It is further seen that 6 instead of employing the OCR word shift register 14 7 and the multiplex 94 in conjunction with the dictionary 8 word shift register 26 and the multiplex 96 to 9 differentially shift the characters in the respective words therein as has been disclosed, that the OCR word 11 and the dictionary word could be respectively stored in 12 two stationary registers, each character position of which 13 was connected to a 3witching means for switching th~
14 selected combination of characters discussed, to the address registers 116 and 122. To implement such a 16 switching means, the characters of the OCR word would be 17 stored in a first stationary register with the characters 18 arranged in the sequence of receipt from the OCR, with 19 a first character at a given end of the OCR word defining a first position of an error word origin. This would 21 correspond to the Kl cell in the OCR word shift register 22 14. The characters and error propensity indicia of the 23 dictionary word would be stored in a second stationary 24 register with the characters arranged in a sequence to correspond with the sequence of characters in the first 26 stationary register. A first character in the dictionary 27 word would be positioned to correspond with the first 28 character in the OCR word, defining a first position 29 for a reference word origin. This of course corresponds to the cell Ll in the dictionary word shit register 26.

1 ~hen, for example, the error propensity indicium for the 2 character located at the reference word origin in the 3 reference word indicates a character splitting propensity, 4 the following sequence of events would take place in the switching means. A first conditional probability of a 6 simple substitution that given the character located at 7 the reference word origin in the reference word was 8 scanned, that the OCR substituted the character located 9 at the error word origin in the error word. This would correspond to the simple substitution probability 11 P(Kl¦Ll) as previously discussed. Then the switching 12 meane switches the character next to the character at 13 the reference word origin in the reference word and 14 the character next to the character located at the error word origin in the error word to access the second 16 conditional probability corresponding to P(X2¦L2) as 17 previously discussed. The switching means would then 18 switch the character located at the reference word 19 origin in the reference word and the character located at the error word origin and the character next to the 21 character located at the error word origin in the error 22 word i~ order to access a third conditional probability 23 corresponding to P(KlK2¦Ll) as previously discussed.
24 Finally, the switching means would switch the character next to the character located at the reference word 26 origin in the reference word and the second next 27 character to the character located at the error word 28 origin in the error word so as to access a fourth 29 conditional probability corresponding to P(K3¦L2) as previously discussed. Such a switching means would 1 select subsequent sets of corresponding characters in 2 the OCR word and the dictionary word for comparison in 3 aecordance with the shift commands from the shift 4 aommand 162. The switching means under the control of a shift command 162, would shift the location of both the 6 error word origin and the reference word origin by a 7 single eharacter position when the eomparator indicates 8 simple substitution is more probable and a splitting 9 segmentation. The switching means would shift the error word origin by two character positions and shift 11 the reference word origin by one character position 12 when splitting segmentation appears more probable than 13 simple eubstitutionl in a manner analogous to that for 14 the shift register operation previously diseussed.
Where the error propensity indicium indicated the 16 propensity to character concatenation, the switching 17 means which shifts the error word origin and the 18 reference word origin by one charaeter position when the 19 probability of simple substitution is greater than that of eoncatenation segmentation. The switching means 21 wouid shift the error word origin by one character 22 position and the referenee word origin by two eharacter 23 positions when the probability of eharacter eoncatenation 24 was greater than the probability of simple substitution.
There are about half a dozen horizontal splitting 26 segmentation prone characters which are shown in Table I
27 and about a dozen concatenation segmentation prone 28 eharaeter pairs shown in Table II. By themselves they 29 eonstitute only a small part of the alpha eomposition of the dictionary words stored in the dietionary storage 106Z81C~
1 28. In the preferred embodiment, unless a flag i8 2 encountered associated with the character or character 3 pair, the likelihood factor analysis proceeds as if 4 ~imple character substitutio~ was the only garbling factor to be considered. Only when a flag is encountered 6 does the operation of the invention incorporate an 7 analysis of possible segmentation occurrences. In the 8 preferred embodiment, the characters and character 9 pairs shown in Tables I and II are specially encoded in the form stored in the dictionary store 28 so 11 that the flag code is a part of the alpha character 12 code. Each alpha character is stored using only five 13 of the eight bits usually used to store a character.
14 The other three bits are then available for designating character seg~entation, character pair concatenation and 16 character pair crowding propensities. An alternate 17 embodiment ban be employed however 90 that it is unnec-18 es~ary to engage in a special coding of the segmentation 19 prone characters stored in the dictionary storage 28. In this alternate embodiment, the flag bit shift register 34 21 shown in Figure 3 would be eliminated and in its place 22 is substituted a writable-read only atorage having a 23 ROS address register with inputs connected to lines 88 24 and 90 for the Ll and L2 cells of the dictionary word shift register 26. The writable-read only storage would 26 have line 98 as its output line inputting into the flag 27 decode 100. This writable-read only storage would 28 contain the information on splitting segmentation shown 29 in Table I and the information on concatenation segmen-tation shown in Table II. The dictionary words stored WA9-73-006 ~57~

1 in the dictionary words sto~ed in t~e dictionary storage 28 2 could then be coded in conventional fashion. When a 3 dictionary word is loaded into the dictionary word shift 4 register 26, the writable-read only storage would have as its input the contents of the cells Ll and L2, such 6 combination accessing an error propensity indicium 7 indicating whether simple substitution, splitting, 8 concatenation or crowding would be a possible mode 9 for garbling the characters. The error propensity indicium would be outputted on line 98 in a form similar 11 to that outputted by the cell 35 in the flag bit shift 12 register 34 of the preferred embodiment shown in 13 Figure 3. The ROS address register and the writable-14 read only storage constitutes a means for generating an error propensity indicium.
16 OPERATION:
17 The operation of the regional context maximum 18 likelihood OCR error correction apparatus shown in Figure 19 3 will be illustrated by processing the word "Wreck"
which the OCR misread as "I~n*c". Recall that the 21 asterisk sign indicates a rejected or unrecognized 22 character. Three dictionary words stored in the 23 dictionary storage 28 will be compared with the OCR
24 input word "IWn*c", "Break", "Wreck" and "Freak". The analysi~ for each of these words is shown in Tables VIII, 26 IX and X respectively. The illustration of the operation 27 begins after the apparatus of Figure 3 has compared the 28 dictionary word "Break" with the OCR input word "IWn*c".
29 We shall assume that up to this point the dictionary word having the highest total conditional probability product 1 of having been nusread by the OCR as the OCR input word 2 "IWn~c" is the word "Break" and that therefore the word 3 "Break" is stored in the best word register 44 and its 4 total conditional probability of 4.4 times 10 as is ~hown in Table VIII, is stored in the highest product 6 register (B)66. The OCR input word is shifted by means 7 of the shift control 20 so that the first letter to be 8 testedj namely the letter "c" is positioned in the K
9 cell. Simultaneously, the dictionary store 28 loads the word "Wreck" into the dictionary word shift register ll 26 such that the letter "k" is positioned in the Ll 12 cell. Simultaneously, the flag associated with the 13 letter "k" of the dictionary word stored in the cell 14 Ll, is loaded b.y the dictionary store 28 into the cell 35 of the flag bit shift register 34. Table IX shows 16 the relative position of the OCR word, the dictionary 17 word and the flag bits in this stage 1. Table IX ~hows 18 flag bits for concatenation as the question mark "?", 19 for crowding as the sharp symbol "#", and ~or splitting as the exclamation mark "1". It is seen in Table IX
21 that the letter pair "ck" of the word "Wreck" is prone 22 to concatenation error, as can be confirmed by reference 23 to Table II. Table IX further shows that the flag bit 24 for crowding is associated with the character pair "re"
in the word Wreck, as can be confirmed by the reference 26 to Figure 2. Table IX further shows that the letter 27 "W" in the word Wreck is prone to splitting, as can be 28 confirmed by a reference to Table I.

WA9-73-006 -59~

10628~0 1 In the first stage shown in Table IX, the 2 flag bit for the first letter "k" for the dictionary ~ word "Wreck" has associated with it the error propensity 4 flag "?" indicating a concatehation propensity. The concatenation propensity flag is located in cell 35 of 6 the flag bit shift register 34 and is detected over 7 line 98 by the flag bit decoder 100, which generates 8 an output signal on line 102c indicating a concatenation 9 propensity. The signal on line 102c causes the multiplex timer ~hown in Figure 7 to set a limit equal to four for 11 the counter 294 and resets and starts the counter 294.
12 The signal on line 102c also sets the flip flop 296 there-13 by enabling the gate 298 permitting the pulses issuing 14 from the clock oscillator 290 to be output on line llO
to accomplish the multiplex timing. Since the limit 16 of four has been set for counter 294, four successive 17 timing pulses will be emitted by the multiplex timer 18 108 before the counter 294 outputs a reset pulse to the 19 flip flop 296 thereby disabling the AND gate 298 and stopping the issuance of timing pulses on line 110.
21 Selected conditional probabilities are accessed from 22 matrix 124 as follows. The signal on line 102c in 23 conjunction with the first timing pulse from the 24 multiplex timer 108, causes the multiplexor 94 of Figure 5 to connect the kl cell line 82 to line 114 26 thereby transferring the letter "c" from the Kl cell 27 to the address register 116. ~he first timing pulse 28 from the multiplex timer 108 causes the multiplexor 29 96 of Figure 6 to connect the Ll cell by means of line 88 to line 120 thereby transferring the contents 1 o~ the Ll cell which is the letter "k" to the ~ddress 2 register 122. Thus, the cond tional probability 3 P~c/k) which equals 6.7 X 10 as is shown in Table 4 IX, is accessed from the conditional probability storage matrix 124 and outputted over line 126 to 6 the multiplexor 128. The multiplexor 128 of Figure 7 8 in response to this first timing pulse over line 8 110, causes this first conditional probability to be 9 transerred from line 126 through the ~ND gate 304 and the ~ND gate 306 and loaded into register 130.
11 In response to the second timing pulse from the 12 multiplex timing 108, the multiplexor 9 4 connects 13 the K2 cell via line 84 to line 114 thereby loading 14 the reject symbol "*" from the OCR input word "IWn*c"
into the ~ddress register 116. Simultaneously, the 16 second timing pulse from the multiplexor 108 causes 17 the contents of the L2 cell to be connected via line 18 90 to line 120 ~hereby transferring the letter "c" from 19 the dictionary word "Wreck" to the address register 122.
Then the conditional probability P~/c) which equals 21 2.4 X 10 2 as is shown in Table IX, is acces6ed rom the 22 conditional probability storage ~atrix 124 outputted 23 oVer line 126 to the multiplexor 128. Tha multiplexor 24 128 in response to the second timing pulse over line 110 ca~ses the second conditional probability over line 26 126 to be tran~ferred via the AND gate 312 to register 27 132. In re~ponse to the third timing pulse over line 28 110 and the concatenation signal on line 102c, the 29 multiplexor 94 connects the contents of the Kl cell over line 82 via the AND gate 248 to the AND gate ~6;~810 1 238 to lines 114 thereb~ loading the letter "c" from 2 the OC~ word into the address register 116. Simulta-3 neously, the third timiny pulse from the multiplex 4 timing 108 in conjunction with the signal. for S concatenation over line 102c causes the loading of 6 the contents of the Ll` cell via line 88 through the 7 AND gate 276 and the AND gate 270 onto the li.ne 118 8 thereby transferring the contents of the Ll cell 9 which is the letter "k" to the address register 122.
Simultaneously the contents of the L2 cell is 11 transferred via line 90 through the AND gato 278 and 12 the AND gate 268 onto line 120 thereby transferring 13 the letter "c" from the dic~ionary word to the 14 address register 122. Thus, the conditional pro-bability P(c/ck) which equals 1.9 X 10 2 as is 16 shown in Table IX, is accessed from the conditional 17 probability storage matrix 124 and output on line 126 18 to the multiplexor 128. ~he multiplexor. 128 in response 19 to the third timing pulse from the multiplex timing 108 transfers this third probability on line 126 via the AND
21 gate 314 to load the register 134. In response to the 22 fourth timing pulse from the multiplex timing 108, and 23 the concatenation signal on line 102c, the multiplexor 24 94 transfers the contents of cell K2 via line 84 through ~ND gate 256 and ~ND gate 242 to line 114 thereby 26 transferring the "~" from the OCR word "IWn*c" stored in 27 the OCR shift register 14 to the address register 116.
28 Simultaneously, in response to the fourth timing pulse 29 from the~multiplex timing 108 and the concatenation signal over line 102c, the multiplexor 96 transfers the 10~2~310 1 contents of the L3 cell over line 92 throu~fh AND gate 2 286 and AND gate 272 ~o the line 120 thereby transerring 3 the letter "e" from the dictionary word "Wreck" to the 4 address register 122. The conditional probability P(*/e) which equals 2.8 X 10 2 as is shown in Table IX, 6 is accessed from the conditional probability storage 7 matrix 124 and transfexred over line 126 to tha 8 multiplexor 128. In response to the fourth timiny pulse 9 from the multiplex timing 108, the multipl.exor 128 transfers the fourth probability on line 126 by means 11 of the AND gate 316 and the AND gate 320 and load~
12 it in the register 136. Probability products are now 13 calculated from the conditional probabilities and their 14 respective magnitudes compared. The multiplier 13~
multiplies the contents of reyister 130 and reyister 132 16 to obtain a first probability 1.6 X 10 4. Simultaneously, 17 the multiplier 140 multiplies the contents of the 18 regi~ter 134 and the register 136 and obtains a second 19 probability product 5.3 X 10 4. ~he products generated by the multiplier 138 and 140 are compared by the 21 comparator 146~ The comparator determines that the 22 probability product from the multiplier 140 is larger 23 than that for the multiplier 138. This indicates that 24 a character concatenation error was more likely than a simple substitution error for the characters occupying 26 the Kl and K2 cells of the OCR word and the Ll, L2 and 27 L3 cells of the dictionary word.
28 The conditional probability contained in the 29 larger probability product is now multiplied in the dictionary word. The comparator therefore activates 1 line 150 enabling the gate 150 ~o that the contents 2 P(c¦ck) of the register 134 is transferred over line~
3 151 and 153 to line 156 and thus entered into the 4 multiplier 58. This conditional probability P(c¦ck) i5 S multiplied times the contenta of the product register 6 ~A) 56, which is one, and the running product which is 7 5.3 X 10 4 is stored in the product register 56. The 8 comparator 62 compares the magnitude o~ the contents of 9 product register A with the contents of the highest product register (B) 66 and determines that A is 11 greater than B, Thus no signal is output on the word 12 abort line 78. The line 70 is conditioned on, but 13 the gate 72 is not enabled by the shift control 20 14 over the end line 74 and therefore the gate 64 is not enabled at this time.
16 A shift command is iqsued to diferentially 17 shift tha contents of the OCR word S/R 14 and that 18 of the dictionary word S/R 26 and flag bit ~/R 34.
19 The comparator 146 activating line 160 causes the shift command 162 of Figure 9 to enable the AND gate 33B shown 21 in Figure 9, in conjunction with the concstenation signal 22 on line 102c, to output a signal on line 80d which is 23 the shift command transferred to a shift control 20~
24 Shit control 20 is instructed to shift the OCR shit 2S register 14 by one position and the dictionary word shift 26 register 26 and the flag bit shit regi~ter 34 by two 27 positions. ~he result of this dif~erential shifting 28 is shown in Table IX, stage 2. Since four timing pulses 29 have been issued by the multiplex timing 108, the modulo f~ur counter 230 in multiplexor 94, the modulo 10621~10 1 ~our counter 260 in multiplexor 96 and modulo four 2 counter 302 of the multiplexor 128, are automatically 3 reset to zero and are ready for the analysis of the 4 next set o characters.
In the second stage o~ the analy~is for 6 comparing the dictionary word "Wreck" with the OCR
7 word "IWN*c", the alignment of the characters is such 8 ~hat the dictionary wor~ letter "e" in the Ll cell 9 corxesponds to the asterisk "*" of the OCR word in the Kl cell and the dictionary word character "r"
11 in the L2 cell corresponds to the character "n" in 12 the OCR word in the K2 cell. The dictionary word 13 character pair "re" has a crowding flag assQciated 14 therewith in the flag bit shift register cell 35.
This crowding propensity flag is transferred via line 16 98 to the flag decoder 100 which is~ue3 a crowding 17 signal over line 102d. In respcnse to crowding signal 18 on line 102d, the multiplex timing 108 ~hown in 19 Figure 7 ~ets a limit of four on the counter 294 and resets and starts the counter 294 and sets the flip 21 flop 296 thus enabling the AN~ gate 298 to transfer four 22 clock pulses from the clock oscillator 290 over the line 23 110.
24 - SeIected conditional probabilities are accesaed as follows. The first find second timing pulses issuing 26 from the multiplex timing 108 cause the ~ame sequenoe of 27 events to occur in the multiplexor 94, multiplexor 96, 28 and multiplexor 128 as was described ~or the concatenation 29 operation next preceeding. Thus, the conditional probability P(*¦e) which equala 2.8 X 10 2 as is shown 10628~0 1 in ~able IX, is stored in regi ter 130 and the 2 condi~ional probability P(n¦r)~which equals 5.9 X 10 3 3 as i8 shown in Table IX, is stored in register 132. In 4 response to the third timing pulse issuing from multiplex timing 108, and the crowding signal on line 102d, the 6 multiplexor 94 of Figure 5 transfers the contents of the 7 Kl cell over line 82 by means of the AND gate 250 and 8 the AND gate 240 to the line 112 to load the address 9 register 116. Simultaneously the multiplexor 94 transfers the contents of the K2 ¢ell over line 84 by 11 means of AND gate 252 and AND gate 238 to line 114 thereby 12 loading the letter "n" into the address register 116.
13 The character pair "n*" is now in the address register 14 116. In response to the third timing pulse from the multiplex timing 108 and in conjunction with the crowding 16 signal on line 102d, the multiplexor 96 of Figure 6 17 transfers the contents of the Ll cell over line 88 by 18 means of the ~ND gates 280 and 270 to the line 118 and 19 transfers the contents of the L2 cell via line 90 by means of the AND gates 282 and 268 to the line 120 thereby trans-21 ferring the characters "r" and "e" from cells L2 and Ll 22 respectively to the address register 122. The conditional 23 probability P(n*¦re) which equals 5.6 X 10 4 as is shown 24 in Table IX is accessed from the conditional probability storage matrix 124 and output on line 126 to the multi-26 plexor 128. In response to the third timing pulse on 27 line 110, the multiplexor 128 of Figure 8 tLansfers 28 the third probability on line 126 by means of AND gate 29 314 to register 134. In response to the fourth timing pulse in the multiplexor timing 108, the multiplexor 31 94 transfers nothing from the OCR word shift register, 1 the multiplexor 96 transfers nothing from the dictionary 2 word shift register 26. The multiplexor 128 of Figure 8, 3 in response to the crowding signal on line 102d causes 4 the value one stored in the register 324 to be loaded by means of the AND gate 322 and the AND gate 320 into 6 the register 136.
7 Probability products are now calculated and 8 compared. The multiplier 138 now multiplies the contents 9 of register 130 times the contents of register 132 and generates the product 1.6 X 10 4. The multiplier 140 11 multiplies the contents of register 134 times the 12 contents of register 136 and generates the product 13 5.6 X 10 4. The comparator 146.compares the relative 14 magnitudes of these products and determines that the contents of multiplier 140 is larger than that of 16 multiplier 138. This indicates that the probability 17 that a character crowding error has occurred is greater 18 than the probability of a simple substit.ution for the 19 characters stored in the Kl and K2 cells and the Ll and L2 cells.
21 The conditional probability contained in the 22 larger product is passed on. The comparator 146 activates 23 line 160 thereby enabling the gate 150 so that contents 24 P(n*¦re) of the register 134 is transferred over lines 151 and 153 to line 156 and thus to the multiplier 58.
26 The multiplier 58 multiplies.the conditional probability 27 P(n~¦re) times the contents of product register A 56 28 and stores the contents in product register A 56. New 29 running product has the magnitude of 3.0 X 10 7 which the comparator 62 determines is still larger than the 106Z8~0 eontents of the highest product register 66 which is 2 4.4 X 10 13, and therefore the word abort line 78 i9 3 not aetivated. Although line 70 is aetivated, the gate 4 72 remains disabled since the shift control has not yet eome to the end of the dictionary woxd or the OCR word 6 as would be indicated on output line 74. Thus, gates 7 64 and 42 are not yet enabled.
8 A shift command is now issued. The comparator 9 146 aetivating line 160 in eonjunction with the erowding signal on line 102d causes the shift eommand 162 of 11 Figure 9 to output a signal on line 80b. The signal 12 on line 80b is the shift eommand to the shift eontrol 13 20 causing the shift control to shift the OCR word shift 14 register by two eells and the dictionary word shift 15 register in flag bit shit register by two cells. Thus 16 in the third stage of analysis as shown in Table IX, 17 stage three has the letter "W" of the dietionary word 18 matched with the letter "W" of the OCR word.
19 The third stage of the comparative analysis of 20 the dietionary word "Wreck" and the OCR word "IWn*c" now 21 eommenees. An error propensity flag indicating a split-22 ting propensity, is associated with the letter "W" stored 23 in the Ll cell. The charaeter splitting propensity flag 24 i8 transferred from cell 3~ of the flag bit shift register 34 over line 98 to the flag decoder 100 which issues an 26 output signal over line 102b. A signal on line 102b 27 indicating a splitting propensity, sets a limit of four 28 in the eounter 294 of the multiplex timing 108 shown in 29 Figure 7, resets and starts the counter 294, and sets 1~628~0 the flip flop 296 thereby enabling the AND gate 298 to transfer four timing pulses from the clock oscillator 290 to the line 110.
Selected conditional probabilities are accessed as follows. The first and second timing pulses issuing from the multiplex timing 108 causes the conSecutive transfer of the contents of cells Kl and Ll and K2 and L2 as obtained for the concatenation analysis previously discussed. Thus, the conditional probability P(W¦W) which equals 0.90 as is shown in Table IX, is loaded into register 130 and the conditional probability P(I¦ ) which equals 0.15 X 10 3 is loaded in register 132. In response to the third timing pulse issuing from multiplex timing 108 and the splitting signal on line 102b, the multiplexor 94 of Figure 5 transfers the contents of the Kl cell over line 82 by means of the AND gate 244 and the AND gate 240 to line 112 and transfers the contents of the K2 cell over line 84 by means of AND gate 246 and AND gate 238 to line 114. Thus, 'the characters 'IIW" are loaded into the address register 116. Simultaneously in response to the third timing pulse issuing from multiplex timing 108 and the splitting signal on line 102b, the contents of the Ll cell is transferred by means of multiplexor 96 of Figure 6, over line 88 by means of AND gate 274 and AND
gate 268 to line 120 thereby loading the character '~"
into the address register 122. In response to the third timing pulse over line 110, the multiplexor 128 of Figure 8 transfers this third probability from line 126 by means of AND gate 314 to the register 134. In response to the fourth timing pulse and the splitting error signal over line 102b, the multiplexor 94 transfers 2 the contents of the K3 cell (which is a blank) over line 3 86 by means o the AND gate 254 and the AND gate 242 4 to line 114. In response to the fourth timing pulse and splitting error signal over line 102b the 6 multiplexor 96 transfers the contents of the L2 cell 7 ~which is blank) by means of ~ND gate 284 and AND gate 8 272 to line 120. These blanks are loaded ln the address 9 register 116 and the address register 122. Then the conditional probability P( ¦ ) which equals 0.99 a~ is 11 shown in Table IX, is accessed from the conditional 12 probability storage matrix 124 and outp~tted on line 13 126 to the multiplexor 128. The multiplexor 128 in 14 response to the fourth timing si~nal 110 transfers this ourth probability on lir.e 126 by means of AND
16 gates 316 and 320 to the register 136.
17 Probability products are now calculated and 18 compared. The multiplier 138 multiplies the contents of 19 the register 130 times the contents o~ register 132 obtains a product 1.3 X 10 3. The multiplier 140 multiplies 21 the contents of the register 134 times the contents of the 22 register 136 and obtains the product 3.5 X 10 3. The 23 comparator 146 determines that the contents of the multiplier 24 140 is greater which lndicates that the probability of the character "W" splitting into the characters "I" and "W"
26 is greater than the probability o the simple substitution 27 of the letter "W" into the letter "W" and the blank into 28 the "I".

WA9-73-006 ~70-106Z8~0 The conditional probability contained in the 2 larger product is passed on. Thus the comparator 146 acti-3 vates line 160, thereby enabling gate 150 to transfer the 4 ~ontents P(IW¦W) of the register 134 over lines 151 and 153 to line 156 and thus to the multiplier 58. Multiplier 58 6 multiplies the contents o the product register 56 (A) times 7 the conditional probability P(IW¦W) transferred from the 8 register 134 and obtains a new running product having a 9 magnitude of 1.1 X 10 9. The comparator 62 determines that the contents of the product register 56 (A) is larger than 11 the contents of the highest product register 66 (B) which 12 is 4.4 X 10 13 and thus no word abort signal is output on 13 line 78. Line 70 is activated.
14 A shift co~nand is now issued. The comparator 146 having activated line 160 in conjunction with the 16 splitting error signal on line 102b causes the AND gate 17 336 of the shift co~[unand 162, to be enabled thereby 18 outputting a signal on line 80c. This commands the 19 shit control 20 to diffe~entially shift the OCR word shift register by two positions and the dictionary word 21 shift register and the flag bit shit register 34 by 22 one po6ition respectively.
23 A new "best word" is recognized. Decrementing 24 counter 220 of the shift control 20 shown in Figure 4 indicates that the last letter o the dictionary word has 26 been reached and therefore a signal is output on line 74 27 indicating the end o the word has been reached. This 28 ~ignal enables gate 72 which thereby enables gate 64 29 permitting the transer of the contents o the product register ~A) 56 to the highest product register (B) 66.

Simultaneously, the gate 42 is acti~rated thereby 2 transferring the contents of the word register which is 3 the word "~reck" to the best word register 44.
4 The system has now determined a new best word "Wreck" which is stored in register 44 and which has a 6 corresponding total conditional probability product 7 1.1 X 10 9 which is stored in register 66. Since the 8 shift control has determined that the end o~ the word has 9 been reached, the OCR word "IWn*c" is now shited back to its initial position so that the "c" occupies the Kl cell.
11 Simultaneously the dictionary store loads the next word 12 "Freak" into the dictionary word shift register 26 and 13 the corresponding flag bits into the flag bit shift 14 register 34, as is shown in Table X, stage 1.
The comparison of the word "Freak" with the 16 OCR word "I~n*c" will be briefly described to illustrate 17 the operation of the apparatus of Figure 3 for simple 18 substitution. The flag bit for simple sl~bstitution, in 19 this case no flag bit at all, is stored in cell 35 of the flag bit shift register 34. This indication of the 21 simple substitution error causes the îlag decoder 100 22 to issue ~imple substitution signal on line 102a. Simple 23 substitution signal on line 102a sets a limit of one in 24 the counter 294 of the multiplexor 108 and thus only a single timing pulse is issued over the line 110. The 26 multiplexor 94 in response to this first timing pulse, 27 connects cell Kl with line 114 loading the letter "c"
28 into the address register 116. The counter 230 is then 29 reset. Similarly, the multiplexor 96 connects the contents of the Ll cell with line 120 thereby loading a 1062~10 1 "k" into the address register 122 and then the counter 2 260 is reset. In response to the first timing pulse 3 on the line 110 the substitution signal on line 102a, 4 the multiplexor 128 transfers the aonditional probability P(c¦k) which equals 6.7 X 10 3 as is shown in Table X, 6 from line 126 via AND gates 304 and 308 to line 164 7 which connects with line 156 thereby directly inputting 8 the probability to the multiplier 58, bypassing the 9 registers 130-136. The calculation of the running product aommences as previously described. The shift 11 command 162, upon receipt of the substitution ~ignal 12 102a, issues a signal on line 80a causing the shift 13 control 20 to shift the OCR word shi~t register 14 14 and ths dictiohary word shift register and flag bit shift registers by a single cell. The comparison o 16 the dictionary word "Freak" with the OCR word "IWn*c"
17 continues as is shown in Table X and results in a total lB probability product of 1.5 X 10 12. This total 19 product when compared in the comparator 62 with the contents of the highest product register 66 which is 21 1.1 X 10 9, causes a word abort signal to be output 22 on line 78, stopping further processing of the 23 dictionary word "Freak" and causing the resetting of the 24 OCR word shift register 14 and the loading of the next word in the dictionary store 28 into the dictionary word 26 shift register 26.
27 After all the words stored in the dictionary 28 store 28 ha*e been compared with the OCR word "IWn*c"
29 the dictionar~ store 28 outputs on line 48 an end of dictionary list signal which enables gate 50 thereby 1~6Z810 1 connecting the contents of the best word register 44 with 2 the output line 10. Since the dictionary word "Wreck" was 3 stored as the best word in register 44, the system outputs 4 the word "Wreck" as the best estimate of the word which was actually scanned by the OCR when it output the word "IWn*c".
6 The regional context maximum likelihood error 7 correction apparatus shown in Figure 3 can be applied to 8 post-processing the phoneme-character recognition stream 9 output from a speech analyzer. Speech analyzers, such as is disclosed in United States patent 3,646,576 to Griggs, 11 analyze continuous human speech into component phoneme-12 character units. Researchers report that a problem in the 13 recognition to continuous speech is the accurate segmentation 14 of the speech signal into phoneme units. The subject regional context maximum likelihood error correction apparatus can be 16 used to correct segmentation errors in the phoneme-character 17 recognition stream output from a speech analyzer. In the 18 system shown in Figure 3, input line 2, is the phoneme-19 character output line from a speech analyzer, carrying the phoneme-character recognition stream. Dictionary store 28 21 contains a vocabulary of valid spoken word expressions, each 22 comprised of its component phoneme-characters. The 23 segmentation errors which occur in conventional speech 24 analyzers are similar to the segmentation errors in optical character recognition machines discussed above, namely 26 splitting, concatenation and crowding. The spoken word 27 expressions stored in dictionary store 28 have selected 28 phoneme-characters which are flagged for segmentatio~, 29 concatenation or crowding misread propensity of the speech analyzer. The conditional probability storage matrix 124 ~06Z~10 1 contains the conditional probabilities for phoneme-character 2 combinations which have the propensity for splitting, concat-3 enation or crowding segmentation errors. The propensities 4 are a characteristic of the speech analyzer. The operation of the regional context maximum likelihood error correction 6 apparatus for post-processing the phoneme-character recogni-7 tion stream from a speech analyzer, is similar to that 8 discussed above for application of the apparatus to optical 9 character recognition. The spoken word expression in the output recognition stream from the speech analyzer is input 11 over line 2 and loaded into shift register 14. A dictionary 12 spoken word is loaded into the dictionary word shift 13 register 26 from the dictionary store 28. The phoneme-14 characters of the input word and the dictionary word are aligned on one end. When a splitting propensity, for 16 example, is flagged for a phoneme-character, in the dictionary 17 spoken word expression, conditional probability values are 18 accessed from the conditional probability storage matrix 124.
19 A calculation is then performed of the probability that the first phoneme characte~ of the dictionary word was split by 21 the speech analyzer into the first and second phoneme-22 character of the spoken word expression in the output 23 recognition stream. This regional context probability is 24 compared with the probability of a simple substitution error for the phoneme-characters. If the probability of 26 segmentation is larger, the phoneme-characters in shift 27 register 14 are shifted one space with respect to the 28 phoneme characters in dictionary word shift register 26 29 so that subsequent phoneme-character pairs to be compared are properly matched. The greater calculated probability 106Z~10 1 is combined in a running product in register 56. The 2 spoken word expression in the dictionary storage 28 having 3 the largest running product, ls output by the system 4 over line 52 as the most likely correct form of the garbled word input from the speech analyzer.
6 While the invention has been particularly shown 7 and described with reference to the preferred embodiments 8 thereof, it will be understood by those skilled in the art 9 that the foregoing and other changes in form and detail may be made therein without departing from the spirit and 11 scope of the invention.

106Z~310 TABLE VIII. "Break"

(1) Simple Substituti~n OCR I W n * C
DICT B r e a k FLAG #
.

P(c¦k) = 6.7 X 10 3 (2) Simple Substitution OCR I W n *
DICT B r e a #

P(*la) = 1.1 X 10-2 Running Product = 7.3 X 10 5 (3) Crowding OCR I W - n DICT B r e #
P(n¦ e) 1.0 X 10-3 P(W¦ r) 2.0 X 10-4 X = 2 X 10-7 P(Wn ¦ re) 3 X 10-5 (4) Simple Substitution OCR
DICT B
P(I¦B) = 2.0 X 10 Total Product = Running Product = 4.4 X lo-13 TABLE IX. "Wreck"
~1) Concatenation OCR I W n * c DICT ~ r e c k FLAG "!" # "?"

P(c¦k) = 6.7 X 10 3 X = 1.6 X 10 4 ! P ( ¦ C) = 2.4 X 10 P(clc~) = 1.9 X 10 2 4 . -2 ~ = 5.3 X 10 P~*¦e) = 2.8 X 10 Running Product = 5.3 X 10 4 (2) Crowding OCR I W n *
-DICT W r e "?" #
P(*~e) = 2.8 X 10 2 P(n¦r) = 5.9 X 10 3 X = 1.6 X 10 4 P(n*¦re) = 5.6 X 10 4 Running Product = 3.0 X 10 7 (3) Splitting OCR
DICT
"?"
P~WI~) = 90 _3 X = 1.3 X 10 P(I¦ ) = .15 X 10 3 P(IW¦W1 = 3.5 X 10 3 _3 X = 3.5 X 10 P( I ) = ~.
- Total Product = Running Product = 1.1 X 10 9 106Z81(~
TABLE X. "Freak"
( 1) Simple Substitution OCR I W n * C
DICT F r e a k #

P(c¦k) = 6.7 X 10 3 (2) Simple Substitution OCR I W n *
DICT F r e a #

P(*¦a) = 1.1 X 10 2 Running Product = 7.3 X 10 5 (3) Crowding OCR I W n DICT F r e #

P(n¦e) = 1.0 X 10 3 P(WIr) = 2.0 X 10 4 X = 2 X 10 7 P(Wn¦re) = 3 X 10 5 Running Product = 2.2 X 10 9 (4) Simple Substitution OCR
DICT F
P(I¦F) = 7.0 X 10 4 Total Product = Running Product = 1.5 X 10 12

Claims (9)

    The embodiments of the invention on which an exclusive property or privilege is claimed are defined as follows:
    1. A data processing system for selecting the correct form of an input error word garbled by an OCR
    splitting error, the correct form of the error word being a member of a predetermined class of reference words, each comprising a plurality of characters, comprising:
    a storage means for storing said predetermined class of reference words, selected characters composing the reference words having stored in said storage means an error propensity indicium for indicating the propensity of the character to being misread through a splitting error, said storage means storing a first type conditional probability that a first character can be output by said OCR through character substitution, given that a second character was actually scanned, and a second type conditional probability that a pair of adjacent characters can be output by said OCR
    through character splitting, given that a third character was actually scanned;
    a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said OCR, with a first character at a given end of said error word defining a first position for an error word origin;

    Claim 1 Continued:
    a second register means connected to said storage means for storing the characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond with said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin;
    decoding means connected to said second register for decoding the error propensity indicium corresponding to the character located at said reference word origin in said reference word;
    accessing means connected to said storage means for accessing from said storage means, when said decoded indicium indicates a character splitting propensity, a first one of said first type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR substitut-ed the character located at said error word origin in said error word;
    said accessing means accessing from said storage means when said decoded indicium indicates a character split-ting propensity, a second one of said first type condition-al probability that given the character next to the character at said reference word origin in said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word;

    Cl??? 1 Continued:
    multiplying means connected to said storage means for multiplying said first one and said second one of said first conditional probabilities, as a first product;
    said accessing means accessing from said storage means when said decoded indicium indicates a character split-ting propensity, a first one of said second type conditional probability that given the character located at said reference word origin in said reference word was scanned, that the OCR
    split it into the character located at said error word origin and the character next to the character located at said error word origin in said error word;
    said accessing means accessing from said storage means when said decoded indicium indicates a character split-ting propensity, a third one of said first type conditional probabilities that given the character next to the character located at said reference word origin in said reference word was scanned, that the OCR substituted the second next character to the character located at said error word origin in said error word;
    said multiplying means multiplying said first one of said second type conditional probability and said third one of said first type conditional probability as a second product;
    comparison means connected to said multiplying means for comparing the relative magnitudes of said first and said second product;

    Cl??? 1 Continued:
    a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabili-ties if said second product is greater than said first product;
    a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability product is greater than said second probability product;
    said shifting means shifting said error word origin by two character positions and shifting said reference word origin by one character position when said second probability product is greater than said first probability product;
    whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined.
  1. CLAIM 1
  2. 2. The data processing system of Claim 1, which further comprises:
    said error propensity indicia being stored in said storage means in association with said selected characters composing said reference words.
  3. 3. The data processing system of Claim 1, which further comprises:
    said error propensity indicia being stored in said storage means in association with selected characters in tabular form, separate from said reference words;
    said decoding means having a data connection with said storage means, for accessing an error propensity indicium corresponding to the character located at said reference word origin for the reference word stored in said second register means.
    CLAIMS 2 & 3
  4. 4. A data processing system for selecting the correct form of an input error word garbled by an OCR crowding error, the correct form of the error word being a member of a predetermined class of refer-ence words, each comprising a plurality of characters, comprising:
    a storage means for storing said predetermined class of reference words in a storage means, selected characters composing the reference word having stored in said storage means an error propensity indicium for indicating the propensity of the character to being misread through a crowding error;
    said storage means storing a first type conditional pro-bability that a first character can be output by said OCR through char-acter substitution, given that a second character was actually scanned, and a second type conditional probability that a first pair of adjacent characters can be output by said OCR through character crowding, given that a second pair of adjacent characters was actually scanned;
    a first register means connected to an input line for storing the characters of said error word arranged in the sequence of receipt from said OCR, with a first character at a given end of said error word defining a first position for an error word origin;
    a second register means connected to said storage means for storing the characters and error propensity indicium of a first reference word from said predetermined class in said storage means, arranged in a sequence to correspond to said sequence of characters in said first register means, with a first character in said reference word corresponding to said first character in said error word, defining a first position for a reference word origin;
    decoding means connected to said second register for decoding the error propensity indicium corresponding to the character stored at said reference word origin in said reference word;
    accessing means connected to said storage means for accessing from said storage means when said decoded indicium indicates a character pair crowding propensity, a first one of said first type conditional probability than given the character located at said refer-ence word origin in said reference word was scanned, that the OCR sub-stituted the character located at said error word origin in said error word;
    said accessing means accessing from said storage means when said decoded indicium indicates a character pair crowding propen-sity, a second one of said first type conditional probability that given the character next to the character at said reference word origin and said reference word was scanned, that the OCR substituted the character next to the character located at said error word origin in said error word;
    multiplying means connected to said storage means for multiplying said first one and said second one of said first type con-ditional probabilities as a first product;
    said accessing means accessing from said storage means when said decoded indicium indicates a character pair crowding propen-sity, a first one of said second type conditional probability that the character located at said reference word origin and the character located next to the character located at said reference word origin in said refer-ence word was scanned, that the OCR executed a crowding error and output the character located at said error word origin and the character next to the character located at said error word origin in said error word;
    comparison means connected to said multiplying means for comparing the relative magnitudes of said first product and said second type conditional probability accessed from said storage means;
    a running product calculating means connected to said storage means, for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second type conditional probability accessed or said first one of said second type conditional probability if said second type condi-tional probability is greater than said first product;
    a shifting means connected to said comparison means for shifting the location of both said error word origin and said reference word origin by one character position when said first probability pro-duct is greater than said second type conditional probability;
    a shifting means shifting both said error word origin and said reference word origin by two character positions when said second type conditional probability is greater than said first probability pro-duct;
    whereby the reference word stored in said storage means having the highest conditional probability of having been misread as the error word stored in said first register, can be determined.
  5. 5. The data processing system of Claim 4, which further com-prises:
    said error propensity indicia being stored in said storage means in association with said selected characters composing said refer-ence words.
  6. 6. The data processing system of Claim 4, which further com-prises:
    said error propensity indicia being stored in said storage means in association with selected characters in tabular form, separate from said reference words;
    said decoding means having a data connection with said storage means, for accessing an error propensity indicium corresponding to the character located at said reference word origin for the reference word stored in said second register means.
  7. 7. In a system for recognizing speech, a data processing system for selecting the correct form of an input error word garbled by a speech analyzer concatenation error, the correct form of the error word being a member of the predetermined class of reference words, each comprising a plurality of phoneme-characters, comprising:
    a storage means for storing said predetermined class of reference words in a storage means, selected phoneme-characters composing the words in said class having stored in said storage means an error propensity indicium for indicating the propensity of the phonema-char-acter to be misread through a concatenation error;
    said storage means storing a first type conditional pro-bability that a first phoneme-character can be output by said speech analyzer through phoneme-character substitution, given that a second phoneme-character was actually spoken, and a second type conditional pro-bability that a first phoneme-character can be output by said speech analyzer through phoneme-character concatenation, given that a pair of adjacent phoneme-characters were actually spoken;
    a first register means connected to an input line from said speech analyzer for storing the phoneme-characters of said error word in a first register, arranged in a sequence of receipt from said speech analyzer, with a first phoneme-character at a given end of said error word defining a first position for an error word origin;
    a second register means connected to said storage means for storing the phoneme-characters of a first reference word from said predetermined class in said storage means, arranged in a sequence to cor-respond with said sequence of phoneme-characters in said first register means, with a first phoneme-character in said reference word corresponding to said first phoneme-character in said error word, defining a first posi-tion for a reference word origin;
    decoding means connected to said second register for decoding the error propensity indicium corresponding to the phoneme-character loca-ted at said reference word origin in said first reference word;
    accessing means connected to said storage means for access-ing from said storage means when said decoded indicium indicates a phoneme-character concatenation propensity, a first one of said first type con-ditional probability that given the phoneme-character located at said first reference word origin of said reference word was spoken, that the speech analyzer substituted the phoneme-character located at said error word origin in said error word;
    said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character concatenation pro-pensity, a second one of said first type conditional probability that given the phoneme-character next to the phoneme-character located at said refer-ence word origin in said reference word was spoken, that the speech analyzer substituted the phoneme-character next to the phoneme-character located at said error word origin in said error word;
    multiplying means connected to said storage means for multiplying said first one and said second one of said first type con-ditional probabilities as a first product;
    said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character concatenation propensity, a first one of said second type conditional probability that given the phoneme-character located at said reference word origin and the phoneme-character next to the phoneme-character located at said reference word origin in said reference word were spoken, that the speech analyzer concatenated them into the phoneme-character located at said error word origin in said error word;
    said accessing means accessing from said storage means when said decoded indicium indicates a phoneme-character concatenation propensity, a third one of said first type conditional probabilities that given the phoneme-character second next to the phoneme-character located at said reference word as spoken, that the speech analyzer substituted the phoneme-character next to the phoneme-character located at said error word origin in said error word;
    said multiplying means multiplying said first one of said second type conditional probability and said third one of said first type conditional probability as a second product;
    comparison means connected to said multiplying means for comparing the relative magnitude of said first and said second product;
    a running product calculating means connected to said storage means for multiplying a running product times said first one of said first type conditional probabilities if said first product is greater than said second product or said first one of said second type conditional probabilities if said second product is greater than said first product;
    a shifting means connected to said comparison means for shifting said error word origin and said reference word origin by one phoneme-character position when said first probability product is greater than said second probability product, said shifting means shifting the error word origin by one phoneme-character position and the reference word origin by two phoneme-character positions when said second probability product is greater than said first probability product;
    whereby the reference word having the greatest total con-ditional probability that the error word was output by the speech analyzer given that the reference word was spoken, can be determined.
  8. 8. In a system for recognizing speech, the data processing system of Claim 7, which further comprises:
    said error propensity indicia being stored in said storage means in association with said selected characters composing said refer-ence words and being loaded into said second register means in conjunction with the characters of the reference word loaded therein.
  9. 9. In a system for recognizing speech, the data processing system of Claim 7, which further comprises;
    said error propensity indicia being stored in said storage means in association with selected characters in tabular form, separate from said reference words;
    said decoding means having a data connection with said storage means, for accessing an error propensity indicium corresponding to the character located at said reference word origin for the refer-ence word stored in said second register means.
CA221,755A 1974-04-10 1975-03-10 Regional context maximum likelihood ocr error correction apparatus Expired CA1062810A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US45982074A 1974-04-10 1974-04-10

Publications (1)

Publication Number Publication Date
CA1062810A true CA1062810A (en) 1979-09-18

Family

ID=23826267

Family Applications (1)

Application Number Title Priority Date Filing Date
CA221,755A Expired CA1062810A (en) 1974-04-10 1975-03-10 Regional context maximum likelihood ocr error correction apparatus

Country Status (8)

Country Link
JP (1) JPS5521384B2 (en)
BE (1) BE824366A (en)
CA (1) CA1062810A (en)
DE (1) DE2460757C2 (en)
FR (1) FR2267590B1 (en)
GB (1) GB1454148A (en)
IT (1) IT1033223B (en)
NL (1) NL7503946A (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4201881A (en) * 1979-03-28 1980-05-06 Wisconsin Alumni Research Foundation 24,24-Difluoro-1α,25-dihydroxycholecalciferol
JPS6055866B2 (en) * 1983-05-09 1985-12-06 株式会社日立製作所 character recognition device
JPS6274181A (en) * 1985-09-27 1987-04-04 Sony Corp Character recognizing device
JPS6297081A (en) * 1986-10-08 1987-05-06 Hitachi Ltd Character recognizer
JPH01167469U (en) * 1988-05-14 1989-11-24
JPH01176171U (en) * 1988-06-03 1989-12-15
GB2289969A (en) * 1994-05-24 1995-12-06 Ibm Character segmentation

Also Published As

Publication number Publication date
JPS50137037A (en) 1975-10-30
JPS5521384B2 (en) 1980-06-09
FR2267590B1 (en) 1977-05-20
DE2460757C2 (en) 1983-08-18
IT1033223B (en) 1979-07-10
DE2460757A1 (en) 1975-10-23
GB1454148A (en) 1976-10-27
BE824366A (en) 1975-05-02
NL7503946A (en) 1975-10-14
FR2267590A1 (en) 1975-11-07

Similar Documents

Publication Publication Date Title
US3969700A (en) Regional context maximum likelihood error correction for OCR, keyboard, and the like
US3988715A (en) Multi-channel recognition discriminator
US3995254A (en) Digital reference matrix for word verification
US5617488A (en) Relaxation word recognizer
TWI435276B (en) A method and apparatus for recognition of handwritten symbols
US4379282A (en) Apparatus and method for separation of optical character recognition data
JP2001043310A (en) Device and method for correcting document picture
CA1062810A (en) Regional context maximum likelihood ocr error correction apparatus
US5335289A (en) Recognition of characters in cursive script
Souibgui et al. A few-shot learning approach for historical ciphered manuscript recognition
US3259883A (en) Reading system with dictionary look-up
US8260054B2 (en) Methods for matching image-based texual information with regular expressions
US4138662A (en) Character reader
CN112651402A (en) Character recognition method and device
Parwej An empirical evaluation of off-line Arabic handwriting and printed characters recognition system
US3271739A (en) Multi-level test system for specimen identification
US3344399A (en) Segmentation method and apparatus
US5361204A (en) Searching for key bit-mapped image patterns
Lupinski et al. On the use of attention mechanism in a Seq2Seq based approach for off-line handwritten digit string recognition
KR100332752B1 (en) Method for recognizing character
Seni et al. Diacritical processing for unconstrained online handwriting recognition using a forward search
Aldarrab et al. Segmenting numerical substitution ciphers
JPH0634253B2 (en) Misreading character correction processor
JP2939945B2 (en) Roman character address recognition device
JPS629958B2 (en)