JPH05174195A - English character recognizing device - Google Patents

English character recognizing device

Info

Publication number
JPH05174195A
JPH05174195A JP3007640A JP764091A JPH05174195A JP H05174195 A JPH05174195 A JP H05174195A JP 3007640 A JP3007640 A JP 3007640A JP 764091 A JP764091 A JP 764091A JP H05174195 A JPH05174195 A JP H05174195A
Authority
JP
Japan
Prior art keywords
character
word
candidate
area
adjacency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP3007640A
Other languages
Japanese (ja)
Inventor
Ryoichi Yushimo
良一 湯下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP3007640A priority Critical patent/JPH05174195A/en
Publication of JPH05174195A publication Critical patent/JPH05174195A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

PURPOSE:To shorten processing time in a processing circuit for a spelling check or the like by effectively ordering candidate character strings in the case of executing the spelling check. CONSTITUTION:This device is provided with a character recognition part 5 to output the plural candidate characters based on the graphical features of a character area, and a spelling check part 7 checks the candidate characters outputted from the character recognition part from the high order of word candidates while storing those characters for one word so as to calculate probability as the word by adding a corrected value calculated based on the product of recognizing accuracy for respective characters in respect to the combination and adjacent frequency for each statistically obtained character position.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、英文字の認識を行う英
文字認識装置に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an English character recognition device for recognizing English characters.

【0002】[0002]

【従来の技術】近年、文字認識装置をコンピュータ等の
入力装置として利用する要求が高まっており、安定な認
識結果を効率的に得ることの出来る文字認識装置がコン
ピュータ等のシステムの性能向上に不可欠となってい
る。従来の認識装置は、一つの入力の認識結果として複
数の候補文字が得られた場合、候補文字の中から正解文
字を決定するために、その前後の文字からなる単語の認
識結果によりいくつかの文字列を生成し、その文字列の
中からスペルチェック等の手法を用いて正しい綴りとな
る文字列を決定し認識結果としていた。
2. Description of the Related Art In recent years, there has been an increasing demand for using a character recognition device as an input device for a computer or the like, and a character recognition device capable of efficiently obtaining a stable recognition result is indispensable for improving the performance of a system such as a computer. Has become. When a plurality of candidate characters are obtained as the recognition result of one input, the conventional recognition device determines some correct characters from among the candidate characters, and therefore, some recognition characters are used depending on the recognition result of the words before and after the character. A character string was generated, and a character string that would be the correct spelling was determined from the character string by using a method such as spell check, and the result was recognized.

【0003】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来の文字認識結果は、単語の候補となる文字列を生成す
る際、認識結果の確度のみをもとに候補文字列の順位付
けがなされていたため同一単語内での複数の文字が多く
の候補文字を持っていた場合、候補文字列の順位付けが
必ずしも妥当でなく、スペルチェック等の処理の回数が
増加し処理時間の増大を招いていた。本発明は上記従来
の問題を解決するものであり、スペルチェック等の処理
回数が少なく処理時間の短い英文字認識装置を提供する
ことを目的とするものである。
However, in the above-mentioned conventional character recognition result, when a character string that is a candidate for a word is generated, the candidate character strings are ranked based only on the accuracy of the recognition result. When a plurality of characters in the same word have many candidate characters, the ranking of the candidate character strings is not always appropriate, and the number of processes such as spell check increases, resulting in an increase in processing time. The present invention solves the above-mentioned conventional problems, and an object of the present invention is to provide an English character recognizing device which requires a small number of times of processing such as spell check and a short processing time.

【0004】[0004]

【課題を解決するための手段】本発明は上記目的を達成
するため、英文字認識装置は、単語候補文字列の各組合
せに対して、各文字の認識確度の積と統計的に得られた
文字の位置ごとの隣接頻度をもとに算出される補正値を
加え単語としての確度を求め、スペルチェックを行う際
の候補文字列の順位付けを効果的に行うことにより、ス
ペルチェック等の処理の回数を軽減し処理時間の短縮を
図るようにしたものである。
In order to achieve the above object, the present invention provides an English character recognizing device that statistically obtains the product of the recognition accuracy of each character for each combination of word candidate character strings. Processing such as spell check is performed by adding a correction value calculated based on the adjacency frequency for each character position to obtain the accuracy as a word and effectively ranking the candidate character strings when performing spell check. The number of times is reduced to shorten the processing time.

【0005】[0005]

【作用】したがって本発明によれば、スペルチェックを
行う際の候補文字列の順位を、各文字の認識確度及び統
計的に得られた文字の位置ごとの隣接頻度をもとに順位
付けることにより妥当な順位付けを行うことができ、ス
ペルチェック等の処理の回数を軽減し処理時間の短縮を
図ることができる。
Therefore, according to the present invention, the order of the candidate character strings when performing the spell check is ranked based on the recognition accuracy of each character and the adjacency frequency for each position of the characters statistically obtained. An appropriate ranking can be performed, the number of times of processing such as spell check can be reduced, and the processing time can be shortened.

【0006】[0006]

【実施例】図1は本発明の一実施例における英文字認識
装置の構成を示すものである。図1において、1は認識
対象文書を文書画像として入力する画像入力部、2は入
力された文書画像から文字列の集まりを見つけ、文章領
域を出力する文章領域切り出し部、3は文章領域から単
語単位の区切りを見つけ1つの単語の範囲を単語領域と
して出力する単語切り出し部、4は単語領域から文字単
位の区切りを見つけ1つの文字の範囲を文字領域として
出力する文字切り出し部、5は文字領域から抽出した図
形特徴をもとに該当する複数の候補文字を出力する文字
認識部、6は文字認識部から出力された候補文字を1単
語分蓄え、その組合せにて得られる文字列に対して、各
文字の認識確度の積と統計的に求められた文字の位置ご
との隣接頻度をもとに算出される補正値を加え単語とし
ての確度を求めることにより、候補としての順位を付け
た単語候補を生成する単語候補生成部、7は単語候補の
うち順位が高いものから綴りのチェックを行い、正しい
文字列を認識結果として出力するスペルチェック部、8
は前述の文字の位置ごとの隣接頻度を記憶している文字
隣接頻度リスト、9は1から7の各部をつなぐ内部バ
ス、10は単語候補生成部と文字隣接頻度リストをつなぐ
内部バスである。図2は文字認識処理の全体の流れ図で
ある。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows the configuration of an English character recognition apparatus according to an embodiment of the present invention. In FIG. 1, 1 is an image input unit for inputting a recognition target document as a document image, 2 is a sentence region cutout unit for finding a set of character strings from the input document image and outputting a sentence region, 3 is a word from the sentence region A word cutout unit that finds a unit break and outputs a range of one word as a word region. 4 is a character cutout unit that finds a unit break of a character from the word region and outputs a range of one character as a character region. A character recognition unit that outputs a plurality of corresponding candidate characters based on the graphic features extracted from the character recognition unit, 6 stores one word of the candidate characters output from the character recognition unit, and a character string obtained by the combination , The rank as a candidate, by adding the correction value calculated based on the product of the recognition accuracy of each character and the adjacency frequency for each position of the character statistically calculated Word candidate generation unit for generating a word candidates attached, 7 checks the spelling from highest rank among the word candidates, spell checking unit that outputs a correct character string as a recognition result, 8
Is a character adjacency frequency list that stores the adjacency frequency for each character position, 9 is an internal bus that connects the parts 1 to 7, and 10 is an internal bus that connects the word candidate generation part and the character adjacency frequency list. FIG. 2 is an overall flow chart of the character recognition processing.

【0007】次に、上記実施例の動作を図1,図2を参
照してその動作を説明する。認識したい文章を画像入力
部1にて文章画像として入力する(S11)。入力された文
章画像を文章領域切り出し部2に送り、文章領域切り出
し部2にて文章画像の縦方向及び横方向の黒画素のヒス
トグラムを求め、これをもとに文章領域を見つける、そ
して文章領域の位置情報を内部データとして蓄える(S1
2)。次に、単語領域切り出し部3に文章領域の位置情報
を送り、文章領域内に対する単語領域切り出し処理を行
う。単語領域切り出し部3では単語の前後のスペースが
単語内の文字間よりも大きいことに注目し、ある幅以上
のスペースに挾まれた文字列を単語領域として切り出
す。文章領域切り出し部2にて見つけられた文章領域内
の全ての単語領域の位置情報を求め、内部データとして
蓄える(S13)。次に、文字領域切り出し部4に単語領域
の位置情報を送り、単語領域に対する文字領域切り出し
処理を行う。文字領域切り出し部4では単語領域内の黒
画素のヒストグラムの変化に注目し、ヒストグラムがあ
る値以下の部分を文字と文字の区切りとして単語領域の
文字の連なりを文字単位に分離し文字領域として切り出
す。単語領域切り出し部3にて見つけられた単語領域内
の全ての文字領域の位置情報を求め、内部データとして
蓄える(S14)。以上のS12からS14にて求められた、文
章領域・単語領域・文字領域の位置データをもとに文字
認識部5は1文字単位の認識処理を行い、認識処理の結
果である認識候補文字・認識確度を内部データとして1
単語分、蓄える(S15〜S17)。以下、1単語分の認識候
補文字及び認識確度が表1のように求められたとして説
明る。
Next, the operation of the above embodiment will be described with reference to FIGS. The text to be recognized is input as a text image in the image input unit 1 (S11). The input text image is sent to the text area cutout unit 2, the vertical and horizontal black pixel histograms of the text image are obtained in the text area cutout unit 2, and the text area is found based on the histogram. Storing position information as internal data (S1
2). Next, the position information of the text area is sent to the word area cutout unit 3, and the word area cutout processing is performed on the inside of the text area. The word area cutout unit 3 pays attention to the fact that the space before and after the word is larger than the space between the characters in the word, and cuts out a character string sandwiched by spaces of a certain width or more as a word area. The position information of all word areas in the text area found by the text area cutout unit 2 is obtained and stored as internal data (S13). Next, the position information of the word area is sent to the character area cutout unit 4, and the character area cutout process is performed on the word area. The character area cutout unit 4 pays attention to the change in the histogram of black pixels in the word area, separates the string of characters in the word area into character units by cutting out a part where the histogram is less than a certain value as a character-to-character separation, and cuts out as a character area. .. The position information of all character areas in the word area found by the word area cutout unit 3 is obtained and stored as internal data (S14). The character recognition unit 5 performs recognition processing on a character-by-character basis based on the position data of the text area, word area, and character area obtained in S12 to S14, and the recognition candidate character that is the result of the recognition processing. Recognition accuracy as internal data 1
The word is stored (S15 to S17). Hereinafter, it is assumed that the recognition candidate character and the recognition accuracy for one word are obtained as shown in Table 1.

【0008】[0008]

【表1】 [Table 1]

【0009】求められた認識候補文字を組合せることに
より単語候補を求める(S18)。本例の場合、第1,第
2,第3文字目は候補文字が1文字だが、第4,第5文
字目は2文字なので次のような4つの組合せが単語候補
として考えられる。
Word candidates are obtained by combining the obtained recognition candidate characters (S18). In this example, the first, second, and third characters have one candidate character, but the fourth and fifth characters have two characters, so the following four combinations can be considered as word candidates.

【0010】[0010]

【外1】 [Outer 1]

【0011】次に、求められた全ての単語候補に対して
単語としての確度、すなわち単語確度を求める(S18〜
S21)。単語確度は単語を構成する各文字の認識確度の
積と文字の位置ごとの隣接頻度をもとに算出される補正
値を和算することにより求められ、認識確度は前記文字
認識処理にて得られたもの、また文字の位置ごとの隣接
は下記の表2のように英単語の両端のスペースを含めた
文字数をnとし、左端の文字から第1文字・第2文字…
第n文字としたとき、第1・第2文字間、第2・第3文
字間、…第(n-1)・n文字間の全ての文字間に対して文
字の隣接頻度を統計的に求めた、(n-1)個の隣接頻度デ
ータを各文字間に当てはめることにより求める。なお、
文字隣接頻度データは文字頻度リストに格納されてい
る。
Next, the accuracy as a word, that is, the word accuracy is calculated for all the obtained word candidates (S18-
S21). The word accuracy is obtained by summing the product of the recognition accuracy of each character forming the word and the correction value calculated based on the adjacency frequency for each position of the character, and the recognition accuracy is obtained by the character recognition process. The number of characters that are adjacent to each other and the position of each character are n, including the spaces at both ends of the English word, as shown in Table 2 below.
When the character is the nth character, the adjacency frequency of the characters is statistically calculated for all the characters between the first and second characters, the second and third characters, ... (n-1) th and nth characters. It is found by fitting the obtained (n-1) adjacent frequency data between each character. In addition,
The character adjacency frequency data is stored in the character frequency list.

【0012】[0012]

【表2】 [Table 2]

【0013】まず各単語候補の認識確度の積を求める
(S19)。認識確度は1に近い程その認識結果が正しい確
率が高いことを示しており、単語全体の認識確度は構成
する文字全ての確度を掛け合わしたものとなる。下記に
求めた結果を示す。 単語候補1:1.0×1.0×1.0×0.7×0.6=0.42 単語候補2:1.0×1.0×1.0×0.7×0.4=0.28 単語候補3:1.0×1.0×1.0×0.3×0.6=0.18 単語候補4:1.0×1.0×1.0×0.3×0.4=0.12 次に隣接頻度による補正値を求める(S20)。いま、文字
間の隣接頻度を対象となる2文字の位置(何文字目であ
るか)で決定される隣接頻度データから求めることと
し、単語候補内で複数の候補を持つ文字(本例題では第
5文字目及び第6文字目)の一つ前の文字との隣接頻度
と一つ後の文字との隣接頻度との積を文字隣接値とし、
単語候補内で複数の候補を持つ全ての文字の隣接値をか
け合わせたものを単語隣接値とすると、補正値は各単語
候補の単語隣接値を全ての単語候補の単語隣接値の和で
割ったものとなる。以下に本例題において必要となる隣
接頻度データの一部を示し、各単語候補の単語隣接値を
求めた結果を示す。 (1) 隣接頻度データ:
First, the product of the recognition accuracy of each word candidate is obtained.
(S19). The closer the recognition accuracy is to 1, the higher the probability that the recognition result is correct, and the recognition accuracy of the entire word is the product of all the constituent characters. The results obtained are shown below. Word candidate 1: 1.0 × 1.0 × 1.0 × 0.7 × 0.6 = 0.42 Word candidate 2: 1.0 × 1.0 × 1.0 × 0.7 × 0.4 = 0.28 Word candidate 3: 1.0 × 1.0 × 1.0 × 0.3 × 0.6 = 0.18 Word candidate 4: 1.0 × 1.0 × 1.0 × 0.3 × 0.4 = 0.12 Next, a correction value based on the adjacent frequency is obtained (S20). Now, the adjacency frequency between characters is determined from the adjacency frequency data determined by the position of the target two characters (what character it is), and the character with multiple candidates (in this example, the (5th character and 6th character) The product of the adjacency frequency with the preceding character and the adjacency frequency with the succeeding character is the character adjacency value,
If the word adjacency value is the product of the adjacency values of all characters that have multiple candidates in the word candidate, then the correction value is the word adjacency value of each word candidate divided by the sum of the word adjacency values of all word candidates. It becomes a thing. Below, a part of the adjacency frequency data required in this example is shown, and the result of obtaining the word adjacency value of each word candidate is shown. (1) Adjacent frequency data:

【0014】[0014]

【外2】 [Outside 2]

【0015】(iii) 第6−7文字間c,e,(空白)の
隣接頻度 ・cと空白の隣接頻度は7 ・eと空白の隣接頻度は446 (2) 単語候補1(appic): (i) 第5文字目の文字隣接値 pとiの隣接頻度は8であり、iとcの隣接頻度は55で
あるので、文字隣接値は440である。 (ii) 第6文字目の文字隣接値 iとcの隣接頻度は55であり、cと空白の隣接頻度は7
であるので、したがって、文字隣接値は385である。 (iii) 単語隣接値 単語隣接値は第5文字目の文字隣接値と第6文字目の文
字隣接値の積、すなわち、169400である。 (3) 単語候補2(appie): (i) 第5文字目の文字隣接値 pとiの隣接頻度は8であり、iとeの隣接頻度は37で
あるので、したがって、文字隣接値は296である。 (ii) 第6文字目の文字隣接値 iとeの隣接頻度は37であり、eとスペースの隣接頻度
は446である。したがって、文字隣接値は16502である。 (iii) 単語隣接値 第5文字目の文字隣接値と第6文字目の文字隣接値の
積、すなわち、4884592である。
(Iii) 6th to 7th character spacing c, e, (blank) adjacency frequency ・ c and blank adjacency frequency is 7 ・ e and blank adjacency frequency is 446 (2) Word candidate 1 (appic): (i) Character adjacency value of the fifth character The adjacency frequency of p and i is 8, and the adjacency frequency of i and c is 55, so the character adjacency value is 440. (ii) Character adjacency value of the 6th character The adjacency frequency of i and c is 55, and the adjacency frequency of c and blank is 7
Therefore, the character adjacency value is 385. (iii) Word Adjacent Value The word adjacent value is the product of the character adjacent value of the fifth character and the character adjacent value of the sixth character, that is, 169400. (3) Word candidate 2 (appie): (i) Character adjacency value of the fifth character Since the adjacency frequency of p and i is 8 and the adjacency frequency of i and e is 37, therefore the character adjacency value is It is 296. (ii) Character adjacency value of the sixth character The adjacency frequency of i and e is 37, and the adjacency frequency of e and space is 446. Therefore, the character adjacent value is 16502. (iii) Word adjacency value The product of the character adjacency value of the fifth character and the character adjacency value of the sixth character, that is, 4884592.

【0016】[0016]

【外3】 [Outside 3]

【0017】(iii) 単語隣接値 第5文字目の文字隣接値と第6文字目の文字隣接値の
積、すなわち、12992である。
(Iii) Word adjacency value The product of the character adjacency value of the fifth character and the character adjacency value of the sixth character, that is, 12992.

【0018】[0018]

【外4】 [Outside 4]

【0019】(iii) 単語隣接値 第5文字目の文字隣接値と第6文字目の文字隣接値の
積、すなわち、67049856である。 次に単語隣接値をもとに補正を求めた結果を示す。補正
値は、各単語候補の隣接値を、各単語候補の隣接値の和
で割算することで求められる。 各単語候補の隣接値の和は「72116840」であるので、各単
語候補の補正値は、 単語候補1の補正値=169400/72116840=0.0023 単語候補2の補正値=4884592/72116840=0.0677 単語候補3の補正値=12992/72116840=0.0002 単語候補4の補正値=67049856/72116840=0.9297 となる。次に、各単語候補の単語確度を求める。単語確
度は認識確度に補正値を加えたものであるので、 単語候補1の単語確度は、0.42+0.0023=0.4223 単語候補2の単語確度は、0.28+0.0677=0.3477 単語候補3の単語確度は、0.18+0.0002=0.1802 単語候補4の単語確度は、0.12+0.9297=1.0497 となる。次に、以上の処理により求められた各単語候補
の単語確度が大きいものから順にスペルチェック部7で
スペルチェック処理を行う。スペルチェックの処理は単
語候補の文字列が英単語として正しいか否かを判定する
ものでチェック結果が正しい場合はその単語候補を認識
結果として出力し、誤った場合は次の単語候補をチェッ
クする。そして全ての単語候補が誤っていた場合は、各
単語候補の中で一番認識度が高いものを認識結果として
出力する(S22〜S26)。したがって、スペルチェックを
行う順位は単語確度が大きいものから順に、
(Iii) Word adjacency value The product of the character adjacency value of the fifth character and the character adjacency value of the sixth character, that is, 67049856. Next, the results of corrections based on the word adjacency values are shown. The correction value is obtained by dividing the adjacent value of each word candidate by the sum of the adjacent values of each word candidate. Since the sum of adjacent values of each word candidate is “72116840”, the correction value of each word candidate is: correction value of word candidate 1 = 169400/72116840 = 0.0023 correction value of word candidate 2 = 4884592/72116840 = 0.0677 word candidate Correction value of 3 = 12992/72116840 = 0.0002 Correction value of word candidate 4 = 67049856/72116840 = 0.9297. Next, the word accuracy of each word candidate is calculated. Since the word accuracy is the recognition accuracy plus a correction value, the word accuracy of word candidate 1 is 0.42 + 0.0023 = 0.4223, and the word accuracy of word candidate 2 is 0.28 + 0.0677 = 0.3477. Is 0.18 + 0.0002 = 0.1802, and the word probability of the word candidate 4 is 0.12 + 0.9297 = 1.0497. Next, the spell check unit 7 performs spell check processing in order from the word with the highest word probability of each word candidate obtained by the above processing. The spell check process determines whether or not the character string of a word candidate is correct as an English word. If the check result is correct, the word candidate is output as a recognition result, and if it is incorrect, the next word candidate is checked. .. If all the word candidates are incorrect, the one with the highest degree of recognition among the word candidates is output as the recognition result (S22 to S26). Therefore, the order of spell checking is from the word with the highest word accuracy,

【0020】[0020]

【外5】 [Outside 5]

【0021】以上説明した、S15ないしS26までを全て
の単語領域に対して繰り返し行うことにより与えられた
文章画像の文字認識処理を行う。
The character recognition processing of the given sentence image is performed by repeatedly performing S15 to S26 described above for all word regions.

【0022】[0022]

【発明の効果】本発明は上記実施例から明らかなよう
に、スペルチェックを行う際の候補文字列の順位を各文
字の認識確度及び統計的に得られた文字の位置ごとの隣
接頻度をもとに妥当な順位付けを行うため、スペルチェ
ック等の処理の回数を軽減することができ、処理時間の
短縮を図ることができるという効果を有する。
As is apparent from the above embodiment, the present invention determines the rank of candidate character strings when performing spell check, the recognition accuracy of each character, and the adjacency frequency for each character position statistically obtained. Therefore, the number of processes such as spell check can be reduced and the processing time can be shortened.

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例における英文字認識装置の構
成図である。
FIG. 1 is a configuration diagram of an English character recognition device according to an embodiment of the present invention.

【図2】本発明の実施例における文字認識処理全体の流
れ図である。
FIG. 2 is a flowchart of the entire character recognition process in the embodiment of the present invention.

【符号の説明】[Explanation of symbols]

1…画像入力部、 2…文書領域切り出し部、 3…単
語領域切り出し部、 4…文字領域切り出し部、 5…
文字認識部、 6…単語候補生成部、 7…スペルチェ
ック部、 8…文字隣接頻度リスト、 9,10…内部バ
ス。
1 ... Image input section, 2 ... Document area cutout section, 3 ... Word area cutout section, 4 ... Character area cutout section, 5 ...
Character recognition unit, 6 ... Word candidate generation unit, 7 ... Spell check unit, 8 ... Character adjacency frequency list, 9, 10 ... Internal bus.

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】 認識対象文書を入力する画像入力部と、
入力された文書画像から文章領域を出力する文章領域切
り出し部と、文章領域から単語領域を出力する単語領域
切り出し部と、単語領域から文字領域を出力する文字領
域切り出し部と、文字領域の図形特徴をもとに該当する
複数の候補文字を出力する文字認識部と、文字認識部か
ら出力された候補文字を1単語分蓄え、その組合せにて
得られる文字列に対して、各文字の認識角度を積算した
ものと統計的に求められた文字の位置ごとの隣接頻度を
もとに算出される補正値を加え単語としての確度を求め
ることにより、候補としての順位を付けた単語候補を生
成する単語候補生成部と、単語候補のうち順位が高いも
のから綴りのチェックを行い、正しい文字列を認識結果
として出力するスペルチェック部とからなる英文字認識
装置。
1. An image input unit for inputting a document to be recognized,
A text area cutout unit that outputs a text area from the input document image, a word area cutout unit that outputs a word area from the text area, a character area cutout unit that outputs a character area from the word area, and a graphic feature of the character area A character recognition unit that outputs a plurality of corresponding candidate characters based on the above, and one word of the candidate characters output from the character recognition unit are accumulated, and the recognition angle of each character is recognized with respect to the character string obtained by the combination. By adding the correction value calculated based on the adjacency of each character position and the statistically calculated adjacency frequency for each character position, the word candidate with the rank as a candidate is generated by obtaining the accuracy as a word. An English character recognition device comprising a word candidate generation unit and a spell check unit that checks spelling from the word candidate having the highest rank and outputs a correct character string as a recognition result.
JP3007640A 1991-01-25 1991-01-25 English character recognizing device Pending JPH05174195A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3007640A JPH05174195A (en) 1991-01-25 1991-01-25 English character recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3007640A JPH05174195A (en) 1991-01-25 1991-01-25 English character recognizing device

Publications (1)

Publication Number Publication Date
JPH05174195A true JPH05174195A (en) 1993-07-13

Family

ID=11671429

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3007640A Pending JPH05174195A (en) 1991-01-25 1991-01-25 English character recognizing device

Country Status (1)

Country Link
JP (1) JPH05174195A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008044393A1 (en) * 2006-10-13 2008-04-17 Kabushiki Kaisha Toshiba Word recognizing method and word recognizing program
CN109858011A (en) * 2018-11-30 2019-06-07 平安科技(深圳)有限公司 Standard dictionary segmenting method, device, equipment and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008044393A1 (en) * 2006-10-13 2008-04-17 Kabushiki Kaisha Toshiba Word recognizing method and word recognizing program
JP2008097452A (en) * 2006-10-13 2008-04-24 Toshiba Corp Word recognition method and word recognition program
CN109858011A (en) * 2018-11-30 2019-06-07 平安科技(深圳)有限公司 Standard dictionary segmenting method, device, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US10019436B2 (en) Input method and system
US9785867B2 (en) Character recognition device, image display device, image retrieval device, character recognition method, and computer program product
JP2020511726A (en) Data extraction from electronic documents
US20020041713A1 (en) Document search and retrieval apparatus, recording medium and program
US20090317003A1 (en) Correcting segmentation errors in ocr
TWI567569B (en) Natural language processing systems, natural language processing methods, and natural language processing programs
US7406201B2 (en) Correcting segmentation errors in OCR
CN111832264A (en) PDF file based signature position determination method, device and equipment
US20120014612A1 (en) Document processing apparatus and computer readable medium
US20070016567A1 (en) Searching device and program product
JPH05174195A (en) English character recognizing device
US11972208B2 (en) Information processing device and information processing method
JP2936426B2 (en) English character recognition device
JP3398729B2 (en) Automatic keyword extraction device and automatic keyword extraction method
CN111860513A (en) Optical character recognition support system
JPH0619962A (en) Text dividing device
JPH08180066A (en) Index preparation method, document retrieval method and document retrieval device
JP6303508B2 (en) Document analysis apparatus, document analysis system, document analysis method, and program
JP2006163656A (en) Character recognition system
JPH0528324A (en) English character recognition device
WO2022070422A1 (en) Computer system and character recognition method
US20210192317A1 (en) Information processing device, information processing method, and program
JP4047894B2 (en) Document proofing apparatus and program storage medium
JP3071745B2 (en) Post-processing method of character recognition result
JP2006294069A (en) Document corrector and program storage medium