JPH04120679A - Alphabetical character recognizing device - Google Patents

Alphabetical character recognizing device

Info

Publication number
JPH04120679A
JPH04120679A JP2239998A JP23999890A JPH04120679A JP H04120679 A JPH04120679 A JP H04120679A JP 2239998 A JP2239998 A JP 2239998A JP 23999890 A JP23999890 A JP 23999890A JP H04120679 A JPH04120679 A JP H04120679A
Authority
JP
Japan
Prior art keywords
character
word
area
characters
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2239998A
Other languages
Japanese (ja)
Other versions
JP2936426B2 (en
Inventor
Ryoichi Yushimo
良一 湯下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP2239998A priority Critical patent/JP2936426B2/en
Publication of JPH04120679A publication Critical patent/JPH04120679A/en
Application granted granted Critical
Publication of JP2936426B2 publication Critical patent/JP2936426B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PURPOSE:To reduce the processing time of the title device by finding the certainty of a word by adding a correcting value to the word on the basis of the product of the recognizing accuracy of each character and statistically obtained neighboring frequency of each character and giving priority to spell checking candidate character strings. CONSTITUTION:A picture inputting section 1 inputs a document to be recognized as a document picture. A sentence segmenting section 2 finds the histogram of black picture elements of the document picture in the vertical and horizontal directions and discovers a sentence area. A word area segmenting section 3 segments characters between spaces having widths larger than a certain value as a word area. A character area segmenting section 4 divides a train of characters in the word area into individual characters by regarding the part where the histogram of black picture elements in the word area is lower than a certain value as a punctuation of characters and segments the divided characters as character areas. A character recognizing section 5 performs character recognition on each character. The recognizing accuracy of a word is found by adding correcting values to each other on the basis of the product of the recognizing accuracy of each character and neighboring frequency 8 of each character. Then spell checking 7 is performed on the words in the order of the recognizing accuracy.

Description

【発明の詳細な説明】 (産業上の利用分野) 本発明は、英文字の認識を行う英文字認識に関するもの
である。
DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to recognition of English characters.

(従来の技術) 近年、文字認識装置をコンピュータ等の入力装置として
利用する要求が高まっており、安定な認識結果を効率的
に得ることの出来る文字認識装置がコンピュータ等のシ
ステムの性能向上に不可欠となっている。
(Prior art) In recent years, there has been an increasing demand for character recognition devices to be used as input devices for computers, etc., and character recognition devices that can efficiently obtain stable recognition results are essential for improving the performance of systems such as computers. It becomes.

従来の認識装置は、一つの入力の認識結果として複数の
候補文字が得られた場合、候補文字の中から正解文字を
決定するために、その前後の文字からなる単語の認識結
果によりいくつかの文字列を生成し、その文字列の中か
らスペルチェック等の手法を用いて正しい綴りどなる文
字列を決定し認識結果としていた。
In conventional recognition devices, when multiple candidate characters are obtained as a result of recognition of one input, in order to determine the correct character from among the candidate characters, several recognition devices are used based on the recognition results of words consisting of the characters before and after the candidate character. A character string was generated, and a method such as a spell check was used to determine the correct spelling of the character string, which was then used as the recognition result.

(発明が解決しようとする課題) しかしながら上記従来の文字認識装置は、単語の候補と
なる文字列を生成する際、認識結果の確度のみをもとに
候補文字列の順位付けがなされていたため同一単語内で
の複数の文字が多くの候補文字を持っていた場合、候補
文字列の順位付けが必ずしも妥当でなく、スペルチェッ
ク等の処理の回数が増加し処理時間の増大を招いていた
(Problem to be Solved by the Invention) However, in the conventional character recognition device described above, when generating character strings as word candidates, candidate character strings are ranked only based on the accuracy of recognition results, so When multiple characters within a word have many candidate characters, the ranking of candidate character strings is not necessarily appropriate, which increases the number of times spell checking and other processes are performed, leading to an increase in processing time.

(課題を解決するための手段) 本発明は上記問題点を解決するため、単語候補文字列の
各組合せに対して、各文字の認識確度の積と統計的に得
られた文字の隣接頻度をもとに算出される補正値を加え
単語としての確度を求め、スペルチェックを行う際の候
補文字列の順位付けを効果的に行うことにより、スペル
チェック等の処理の回数を軽減し処理時間の短縮を図る
ようにしたものである。
(Means for Solving the Problems) In order to solve the above problems, the present invention calculates, for each combination of word candidate character strings, the product of the recognition accuracy of each character and the statistically obtained adjacent frequency of the characters. By adding the correction value calculated based on the original word and finding its accuracy as a word, and effectively ranking candidate character strings when spell checking, the number of spell checking processes etc. can be reduced and the processing time can be reduced. This is intended to shorten the time.

(作 用) したがって本発明によれば、スペルチェックを行う際の
候補文字列の順位を各文字の認識確度及び統計的に得ら
れた文字の隣接頻度をもとに付けることにより妥当な順
位付けを行い、スペルチェック等の処理の回数を軽減し
処理時間の短縮を図ることができる。
(Function) Therefore, according to the present invention, by ranking candidate character strings when performing a spell check based on the recognition accuracy of each character and the statistically obtained adjacent frequency of characters, an appropriate ranking can be achieved. By doing so, it is possible to reduce the number of times of processing such as spell checking and shorten the processing time.

(実施例) 第1図は本発明の一実施例における英文字g H装置の
ブロック構成を示したものである。第1図において、1
は認識対象文書を文書画像として入力する画像入力部、
2は入力された文書画像から文字列の集まりを見つけ、
文章領域を出力する文章領域切り出し部、3は文章領域
から単語単位の区切りを見つけ1つの単語の範囲を単語
領域として出力する単語領域切り出し部、4は単語領域
から文字単位の区切りを見つけ]つの文字の範囲を文字
領域として出力する文字領域切り出し部、5は文字領域
から抽出した図形特徴をもとに該当する複数の候補文字
を出力する文字認識部、6は文字認識部から出力された
候補文字を1単語分蓄え、その組合せにて得られる文字
列に対して、各文字の認識確度の積と統計的に求められ
た文字の隣接頻度をもとに算出される補正値を加え単語
としての確度を求めることにより、候補としての順位を
付けた単語候補を生成する単語候補生成部、7は単語候
補のうち順位が高いものから綴りのチェックを行い、正
しい文字列を認識結果として出力するスペルチェック部
、8は前述の文字の隣接頻度を記憶している文字隣接頻
度リスト、9は1から7の各部をつなぐ内部バス、lO
は単語候補生成部と文字隣接頻度リストをつなぐ内部バ
スである。
(Embodiment) FIG. 1 shows a block configuration of an alphabet g H device in an embodiment of the present invention. In Figure 1, 1
is an image input unit that inputs a document to be recognized as a document image;
2 finds a collection of character strings from the input document image,
3 is a word area extraction section that finds a word-based break from the text area and outputs a range of one word as a word area; 4 is a word-area cut-out section that finds character-based breaks from the word area; 5 is a character recognition unit that outputs a plurality of candidate characters based on the graphical features extracted from the character area; 6 is a candidate output from the character recognition unit; The characters are stored for one word, and a correction value calculated based on the product of the recognition accuracy of each character and the statistically determined adjacency frequency of the characters is added to the character string obtained by combining them to form a word. The word candidate generation unit generates word candidates ranked as candidates by determining the accuracy of the word candidates. 7 checks the spelling of the word candidates with the highest ranking, and outputs the correct character string as a recognition result. 8 is a character adjacency frequency list that stores the adjacency frequencies of the aforementioned characters; 9 is an internal bus that connects each part from 1 to 7; lO
is an internal bus that connects the word candidate generator and the character adjacent frequency list.

以上のように構成された本実施例の英文字認識装置につ
いて、第2図に全体の処理の流れ図を、第3図に文字の
隣接頻度を統計的に求めた隣接2文字の頻度表を示して
いる。次に実施例の動作を説明する。
Regarding the English character recognition device of this embodiment configured as described above, Fig. 2 shows a flowchart of the overall processing, and Fig. 3 shows a frequency table of two adjacent characters, which statistically calculates the adjacency frequency of characters. ing. Next, the operation of the embodiment will be explained.

認識したい文書を画像入力部1にて文書画像として入力
する(Sll)。入力された文書画像を文章領域切り出
し部2に送り、文章領域切り出し部2にて文書画像の縦
方向および横方向の黒画素のヒストグラムを求め、これ
をもとに文章領域を見つける、そして文章領域の位置情
報を内部データとして蓄える(S12)。単語領域切り
出し部3に文章領域の位置情報を送り、文章領域内に対
する単語領域切り出し処理を行う。単語領域切り出し部
3では単語の前後のスペースが単語内の文字間よりも大
きいことに注目し、ある幅以北のスペースに挾まれた文
字を単語領域として切り出す。文字領域切り出し部4に
て見つけられた文字領域内の全ての単語領域の位置情報
を求め、内部データとして蓄える(S13)。文字領域
切り出し部4に単語領域の位置情報を送り、単語領域に
対する文字領域切り出し処理を行う。文字領域切り出し
部4では単語領域内の黒画素のヒストグラムの変化に注
目し、ヒストグラムがある値以下の部分を文字と文字の
区切りとして単語領域内の文字の連なりを文字単位に分
離し文字領域として切り出す。単語領域切り出し部にて
見つけられた単語領域内の全ての文字領域の位置情報を
求め、内部データとして蓄える(S14)。
A document to be recognized is input as a document image using the image input unit 1 (Sll). The input document image is sent to the text area clipping unit 2, which calculates a histogram of black pixels in the vertical and horizontal directions of the document image, finds a text area based on this, and extracts the text area. The location information of is stored as internal data (S12). The position information of the text area is sent to the word area extraction unit 3, and word area extraction processing is performed for the inside of the text area. The word area cutting unit 3 pays attention to the fact that the space before and after a word is larger than the space between characters within the word, and cuts out characters sandwiched between spaces north of a certain width as a word area. The positional information of all the word areas within the character area found by the character area cutting unit 4 is obtained and stored as internal data (S13). The position information of the word area is sent to the character area cutting unit 4, and character area cutting processing for the word area is performed. The character area extraction unit 4 pays attention to the changes in the histogram of black pixels in the word area, uses the portion of the histogram below a certain value as a character delimiter, and separates the series of characters in the word area into character units to create a character area. break the ice. The positional information of all character areas within the word area found by the word area extraction unit is obtained and stored as internal data (S14).

以上の312から314にて求められた、文章領域単語
領域・文字領域の位置データをもとに文字認識部5は1
文字単位の認識処理を行い、認識処理の結果である認識
候補文字・認識確度を内部データとして1単語分、蓄え
る(815〜517)。
Based on the position data of the text area, word area, and character area obtained in steps 312 to 314 above, the character recognition unit 5
Recognition processing is performed on a character-by-character basis, and the recognition candidate characters and recognition accuracy, which are the results of the recognition processing, are stored as internal data for one word (815-517).

以下、 1単語分の認識候補文字及び認識確度が次表のように求
められたとして説明する。
The following explanation will be given assuming that the recognition candidate characters and recognition accuracy for one word are determined as shown in the table below.

求められた認識候補文字を組合せることにより単語候補
を求める(Sl、8)。
Word candidates are obtained by combining the obtained recognition candidate characters (Sl, 8).

本例の場合、 第一・第 二・第三文字目は候補が1文字だか゛、第四・第五文字
目は2文字なので下記のような4つの組合せが単語候補
として考えられる。
In this example, the first, second, and third characters have only one candidate, and the fourth and fifth characters have two candidates, so the following four combinations are possible as word candidates.

単語候補1:appic 単語候補2:appie 単語候補3 : appQc 単語候補4 : appQe 次に、求められた全ての単語候補に対して単語としての
確度、即ち単語確度を求める(818〜521)。単語
確度は単語を構成する各文字の認識確度の積と文字の隣
接頻度をもとに算出される補正値を和算することにより
求められ、認識確度は前記文字認識処理にて得られたも
の、また文字の隣接頻度は第3図に示すような英単語に
おける文字単位の隣接頻度の統計を取った隣接2文字組
の頻度表(CorneW、 RlW、  : A 5t
atistical Methodof Spelli
ng Correction、 Inf、Contro
l、 Vol、]2゜No、2. pp、79−93よ
り引用)をもとにする。尚、文字隣接頻度データは文字
頻度リストに格納されている。
Word candidate 1: appic Word candidate 2: appie Word candidate 3: appQc Word candidate 4: appQe Next, the accuracy as a word, that is, the word accuracy, is determined for all the determined word candidates (818-521). The word accuracy is obtained by summing the product of the recognition accuracy of each character that makes up the word and the correction value calculated based on the adjacent frequency of the characters, and the recognition accuracy is the one obtained in the character recognition process described above. , and the adjacency frequency of letters is determined by the frequency table of adjoining two-letter pairs (CorneW, RlW, : A 5t
atical Method of Spelli
ng Correction, Inf, Control
l, Vol, ]2°No, 2. (quoted from pp. 79-93). Note that the character adjacent frequency data is stored in a character frequency list.

まず各単語候補の認識確度の積を求める(Sl、9)。First, the product of the recognition accuracy of each word candidate is calculated (Sl, 9).

認識確度は1に近い程その認識結果が正しい確率が高い
ことを示しており、単語全体の認識確度は構成する文字
全ての確度を掛は合わしたものとなる。下記に求めた結
果を示す。
The closer the recognition accuracy is to 1, the higher the probability that the recognition result is correct, and the recognition accuracy of the entire word is the product of the accuracy of all the constituent characters. The results obtained are shown below.

単語候補1 : 1.OXl、OXl、OXo、7X0
.6=0.42単語候補2 : 1.Ox1.Ox1.
oxo、7xo、4=0.28単語候補3 : 1.O
XI、OXI、OXo、3X0.6=0.18単語候補
4 : 1.OXl、OXI、OXo、3X0.4=0
.12つぎに隣接頻度による補正値を求める( S 2
0)。
Word candidate 1: 1. OXl, OXl, OXo, 7X0
.. 6=0.42 word candidate 2: 1. Ox1. Ox1.
oxo, 7xo, 4=0.28 word candidate 3: 1. O
XI, OXI, OXo, 3X0.6=0.18 Word candidate 4: 1. OXl, OXI, OXo, 3X0.4=0
.. 12 Next, find the correction value based on the adjacent frequency (S 2
0).

いま、単語候補内で複数の候補を持つ文字(本例では第
四文字目または第五文字目)の一つ前の文字との隣接頻
度と一つ後の文字との隣接頻度との積を文字隣接値とし
、単語候補内で複数の候補を持つ全ての文字の隣接値を
掛は合わせたものを単語隣接値とすると、補正値は各単
語候補の単語隣接値を全ての単語候補の単語隣接値の和
で割ったものとなる。
Now, for a character that has multiple candidates in a word candidate (in this example, the fourth or fifth character), calculate the product of the adjacency frequency with the previous character and the adjacency frequency with the next character. If the character adjacency value is the word adjacency value, and the word adjacency value is the sum of the adjacency values of all characters that have multiple candidates in the word candidate, then the correction value is the word adjacency value of each word candidate. It is divided by the sum of adjacent values.

以下に各単語候補の単語隣接値を求めた結果を示す。The results of determining the word adjacency values for each word candidate are shown below.

単語候補に ・第四文字目の文字隣接値 pと1の隣接頻度−8 1とCの隣接頻度=55 すなわち、文字隣接値=440 ・第五文字目の文字隣接値 1とCの隣接頻度=55 Cとスペースの隣接頻度=7 すなわち、文字隣接値=385 ・単語隣接値 (第四文字目の文字隣接値)×(第五文字目の文字隣接
値)=J6940Q 単語候補2: ・第四文字目の文字隣接値 pとiの隣接頻度=8 1とeの隣接頻度=37 すなわち、文字隣接値=296 ・第五文字目の文字隣接値 1とeの隣接頻度=37 eとスペースの隣接頻度=446 すなわち、文字隣接値= 16502 ・単語隣接値 (第四文字目の文字隣接値)×(第五文字目の文字隣接
値)=4884592 単語候補3: ・第四文字目の文字隣接値 pとQの隣接頻度=29 2とCの隣接頻度=8 すなわち、文字隣接値=232 ・第五文字目の文字隣接値 QとCの隣接頻度=8 Cとスペースの隣接頻度=7 すなわち、文字隣接値=56 ・単語隣接値 (第四文字目の文字隣接値)×(第五文字目の文字隣接
値)=12992 単語候補4: ・第四文字目の文字隣接値 pとΩの隣接頻度=29 Qとeの隣接頻度=72 すなわち、文字隣接値=2088 ・第五文字目の文字隣接値 2とeの隣接頻度=72 eとスペースの隣接頻度=446 すなわち、文字隣接値=32112 ・単語隣接値 (第四文字目の文字隣接値〕×(第五文字目の文字隣接
値)= 67049856 次に単語隣接値をもとに補正値を求めた結果を示す。
For word candidates - Adjacent frequency of the fourth character's character adjacent values p and 1 - 8 Adjacent frequency of 1 and C = 55 In other words, character adjacent value = 440 - Adjacent frequency of the fifth character's character adjacent values 1 and C = 55 Frequency of adjacency between C and space = 7 In other words, character adjacency value = 385 - Word adjacency value (character adjacency value for the fourth character) x (character adjacency value for the fifth character) = J6940Q Word candidate 2: - No. Adjacent frequency of the fourth character's character adjacent values p and i = 8 Adjacent frequency of 1 and e = 37 In other words, character adjacent value = 296 - Adjacent frequency of the fifth character's character adjacent values 1 and e = 37 e and space Adjacent frequency = 446 In other words, character adjacency value = 16502 - Word adjacency value (4th character adjacency value) x (5th character adjacency value) = 4884592 Word candidate 3: - 4th character Adjacent frequency between adjacent values p and Q = 29 Adjacent frequency between 2 and C = 8 In other words, character adjacent value = 232 ・Adjacent frequency between character adjacent values Q and C for the fifth character = 8 Adjacent frequency between C and space = 7 That is, character adjacency value = 56 - Word adjacency value (character adjacency value of the fourth character) x (character adjacency value of the fifth character) = 12992 Word candidate 4: - Character adjacency value p and Ω of the fourth character Adjacent frequency of = 29 Adjacent frequency of Q and e = 72 In other words, character adjacency value = 2088 ・Adjacent frequency of 5th character character adjacency value 2 and e = 72 Adjacent frequency of e and space = 446 In other words, character adjacency value = 32112 - Word adjacency value (character adjacency value of the fourth character) x (character adjacency value of the fifth character) = 67049856 Next, the results of determining the correction value based on the word adjacency values are shown.

補正値は (各単語候補の隣接値) / (各単語候補の隣接値の和) で求められ、 各単語候補の隣接値の和=72116840であるので
、 各単語候補の補正値は 単語候補1の補正値: 169400/7211684
0=0.0023 単語候補2の補正値=4884592/7211684
0=0.0677 単語候補3の補正値= 12992772116840
=0.0002 単語候補4の補正値=67049856/721168
40=0.9297 となる。
The correction value is calculated as (adjacent value of each word candidate) / (sum of adjacency values of each word candidate), and the sum of adjacency values of each word candidate = 72116840, so the correction value of each word candidate is word candidate 1 Correction value: 169400/7211684
0=0.0023 Correction value of word candidate 2=4884592/7211684
0=0.0677 Correction value of word candidate 3= 12992772116840
= 0.0002 Correction value of word candidate 4 = 67049856/721168
40=0.9297.

各単語候補の単語確度を求める。Find the word probability of each word candidate.

単語確度は認識確度に補正値を加えたものであるので 単語候補1 単語候補2 単語候補3 単語候補4 の単語確度=0.42+0.0023=0.4223の
単語確度=0.28+0.0677=0.3477の単
語確度=0.18+O,0O02=0.1802の単語
確度=0.12+0.9297=1.0497となる。
Since the word accuracy is the recognition accuracy plus the correction value, the word accuracy of word candidate 1 word candidate 2 word candidate 3 word candidate 4 = 0.42 + 0.0023 = word accuracy of 0.4223 = 0.28 + 0.0677 = Word accuracy of 0.3477=0.18+O, word accuracy of 0O02=0.1802=0.12+0.9297=1.0497.

次に、以上の処理により求められた各単語候補の単語確
度が大きいものから順にスペルチェック処理を行う。
Next, a spell check process is performed for each word candidate obtained through the above process in descending order of word accuracy.

スペルチェック処理は単語候補の文字列が英単語として
正しいか否かを判定するものでチェックした結果が正し
い場合はその単語候補を認識結果として出力し、誤って
いた場合は次の単語候補をチェックする。そして全ての
単語候補が誤っていた場合は各単語候補の中で一番認識
確度が高いものを認識結果として出力する(822〜8
26)。
Spell check processing determines whether the character string of a word candidate is correct as an English word. If the checked result is correct, that word candidate is output as a recognition result, and if it is incorrect, the next word candidate is checked. do. If all the word candidates are incorrect, the word candidate with the highest recognition accuracy is output as the recognition result (822-8
26).

従ってスペルチェックを行う順位は単語確度が大きいも
のから順に ■ 単語候補4 (a p p Q e)■ 単語候補
1  (appic) ■ 単語候補2(appie) ■ 単語候補3 (a p p Q c)となり、順位
■の単語候補4(appQe)の綴りが正しいので“a
l)pQe”が認識結果として出力される。
Therefore, the order of spell checking is as follows: ■ Word candidate 4 (a p p Q e) ■ Word candidate 1 (appic) ■ Word candidate 2 (appie) ■ Word candidate 3 (a p p Q c) Therefore, word candidate 4 (appQe) with rank ■ is spelled correctly, so “a”
l) pQe” is output as the recognition result.

以上説明した、S15からS26までを全ての単語領域
に対して繰り返し行うことにより与えられた文書画像の
文字認識処理を行う、 (発明の効果) 以上説明したようにこの発明によって、スペルチェック
等の処理の回数を軽減することが出来、処理時間の短縮
を図ることが出来る効果を有する。
Character recognition processing for a given document image is performed by repeating steps S15 to S26 for all word regions as explained above. (Effects of the Invention) As explained above, this invention allows This has the effect of reducing the number of times of processing and reducing processing time.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例における英文字認識装置の構
成図、第2図は文字認識処理の全体の流れ図、第3図は
統計的な文字の隣接頻度を表わす文字隣接頻度表である
。 l・・・画像入力部、  2・・・文字領域切り出し部
、 3・・・単語領域切り出し部、 4・・・文字領域
切り比し部、  5・・・文字認識部、 6・・・単語
候補生成部、  7・・・スペルチェック部、  8・
・・文字隣接頻度リスト、 9.10・・・内部バス。 特許出願人 松下電器産業株式会社
Fig. 1 is a block diagram of an English character recognition device according to an embodiment of the present invention, Fig. 2 is a flowchart of the entire character recognition process, and Fig. 3 is a character adjacency frequency table showing statistical adjacency frequencies of characters. . l...Image input unit, 2...Character area cutting unit, 3...Word area cutting unit, 4...Character area cutting and comparing unit, 5...Character recognition unit, 6...Word Candidate generation section, 7... Spell check section, 8.
...Character adjacent frequency list, 9.10...Internal bus. Patent applicant Matsushita Electric Industrial Co., Ltd.

Claims (1)

【特許請求の範囲】[Claims]  認識対象文書を入力する画像入力部と、入力された文
書画像から文章領域を出力する文章領域切り出し部と、
文章領域から単語領域を出力する単語領域切り出し部と
、単語領域から文字領域を出力する文字領域切り出し部
と、文字領域の図形特徴をもとに該当する複数の候補文
字を出力する文字認識部と、前記文字認識部から出力さ
れた候補文字を1単語分蓄え、その組合せにて得られる
文字列に対して、各文字の認識確度を積算したものと統
計的に求められた文字の隣接頻度をもとに算出される補
正値を加え単語としての確度を求めることにより、候補
としての順位を付けた単語候補を生成する単語候補生成
部と、単語候補のうち順位が高いものから綴りのチェッ
クを行い、正しい文字列を認識結果として出力するスペ
ルチェック部とからなる英文字認識装置。
an image input unit that inputs a recognition target document; a text area cutting unit that outputs a text area from the input document image;
A word area extraction unit that outputs a word area from a text area, a character area extraction unit that outputs a character area from the word area, and a character recognition unit that outputs a plurality of corresponding candidate characters based on the graphical characteristics of the character area. , the candidate characters output from the character recognition unit are stored for one word, and for the character string obtained by combining them, the recognition accuracy of each character is integrated and the adjacency frequency of the characters determined statistically is calculated. A word candidate generator generates word candidates that are ranked as candidates by adding correction values calculated based on the original word to find the accuracy of the word, and a spelling check is performed starting from the word candidates with the highest ranking. This is an English character recognition device that includes a spell check unit that performs a spell check and outputs the correct character string as a recognition result.
JP2239998A 1990-09-12 1990-09-12 English character recognition device Expired - Fee Related JP2936426B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2239998A JP2936426B2 (en) 1990-09-12 1990-09-12 English character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2239998A JP2936426B2 (en) 1990-09-12 1990-09-12 English character recognition device

Publications (2)

Publication Number Publication Date
JPH04120679A true JPH04120679A (en) 1992-04-21
JP2936426B2 JP2936426B2 (en) 1999-08-23

Family

ID=17052949

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2239998A Expired - Fee Related JP2936426B2 (en) 1990-09-12 1990-09-12 English character recognition device

Country Status (1)

Country Link
JP (1) JP2936426B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0896085A (en) * 1994-09-22 1996-04-12 Ibm Japan Ltd Character recognition and character complementing method and computer system
JP2016071382A (en) * 2014-09-26 2016-05-09 株式会社東芝 Electronic apparatus, method and program

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0896085A (en) * 1994-09-22 1996-04-12 Ibm Japan Ltd Character recognition and character complementing method and computer system
US5835635A (en) * 1994-09-22 1998-11-10 Interntional Business Machines Corporation Method for the recognition and completion of characters in handwriting, and computer system
JP2016071382A (en) * 2014-09-26 2016-05-09 株式会社東芝 Electronic apparatus, method and program

Also Published As

Publication number Publication date
JP2936426B2 (en) 1999-08-23

Similar Documents

Publication Publication Date Title
US7293229B2 (en) Ensuring proper rendering order of bidirectionally rendered text
US8510099B2 (en) Method and system of selecting word sequence for text written in language without word boundary markers
JPH0634256B2 (en) Contact character cutting method
US10515148B2 (en) Arabic spell checking error model
JPH04120679A (en) Alphabetical character recognizing device
JP4302918B2 (en) Hangul character generation method and dictionary lookup method
JPH05174195A (en) English character recognizing device
JP4047895B2 (en) Document proofing apparatus and program storage medium
JP6303508B2 (en) Document analysis apparatus, document analysis system, document analysis method, and program
JPH0244459A (en) Japanese text correction candidate extracting device
JP4047894B2 (en) Document proofing apparatus and program storage medium
JP2004240643A (en) Character recognition system, method for recognizing character and program
JPH0410671B2 (en)
JPH04289989A (en) Roman letter recognizing device
JP2746345B2 (en) Post-processing method for character recognition
Singh Post-Processing Techniques To Enhance The Performance Of Gurmukhi Text Recognition System
JP3656315B2 (en) English summary device
KR19980020385A (en) How to search similar words using score
JP2000099635A (en) Device and method for predicting character string
JPH06111076A (en) Character recognizing device
JP2639314B2 (en) Character recognition method
JPH0468483A (en) Character recognizing method
JP2006085739A (en) Document proofreading device and program storage medium
JPH0573716A (en) English letter recognition device
JPH04306786A (en) Character recognizing device

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees