JPH0368091A

JPH0368091A - Character recognizing device

Info

Publication number: JPH0368091A
Application number: JP1204984A
Authority: JP
Inventors: Mikio Aoki; 三喜男青木
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1989-08-08
Filing date: 1989-08-08
Publication date: 1991-03-25

Abstract

PURPOSE:To shorten the time required for character recognition by storing data of the number of all test bits in a character pattern data dictionary. CONSTITUTION:Pairs of character patterns 401, 403, and 405 and numbers 402, 404, and 406 of total comparison picture elements are written as follows; the character pattern data dictionary is written in the part 401 and the number of total comparison picture elements in the character pattern data dictionary is written in the part 402 and the character pattern data dictionary of the next character is written in the part 403 and the number of total comparison picture elements of the character is written in the part 404. Consequently, only numbers of the coinciding picture elements between character pattern data 401, 403, and 405 and an extracted character are counted, and numbers 402, 404, and 406 of total comparison picture elements are read in at the time of the end of comparison of all picture elements and are counted to obtain degrees of coincidence. As the result, the counting instruction executed for comparison of each picture element is omitted.

Description

[Detailed description of the invention]

〔産業上の利用分野］本発明は１紙面上に書かれた文字を画像とじて人力する
ことにより１文書画像から文字領域を抽出し、コード番
号に変換する文字認識装置に関する。[Industrial Field of Application] The present invention relates to a character recognition device that extracts a character area from a single document image by manually combining the images of characters written on a sheet of paper and converting the extracted character area into a code number.

[Conventional technology]

近年、文字認識装置の急激なる進歩により、様々な文書
画像から文字領域を自動的に抽出し、さらに一つ一つの
文字を切り出し、認識し、自動的に文書ファイルが作成
できる様になってきており、文字の認識方法には様々な
方法が考え出されている。例えば、文字認識の最も基礎的な方法としてパターンマ
ツチング法がある。該方法は、文字認識用の辞書として
、対象文字余ての文字パターンを辞書中に所有しており
、認識時においては、抽出文字パターンと、前記辞書中
の文字パターンとを重ね合わせることにより、その一致
度を調べｉｔ一致した文字を認識文字としている。該方
法は、非常に単純な方法ではあるが２認識対象文字と辞
書中の文字とのフォントが一致した場合には、がなり高
い率で正確に認識する。また、　Ａｉｉ記方法は通常２
抽出した文字そのまま比較を行うが１文字の大きさによ
る艶響を吸収するために正規化して使用される場合があ
る。さらに、フォントの違いによるずれを解消するため
に、文字パターンにある程度の自由度を持たせることも
行われている。［発明が解決しよったする課題１しかしながら、上記方法においては、非常に単純な方法
でありながらも、かなり正確に認識するという利点はあ
るちのの、抽出文字パターンと辞書中の文字パターンと
を比較する際の画素の数が非常に多いため認識に要する
時間がかかるという。文字認識装置においては重大な課題を有している。そこで本発明は以上の様な課題を解決するちので、その
目的とするところは、パターンマツチングによる文字認
識方法において、少しで６速く文字を認識する文字認識
装置を提供することにある。【課題を解決するための手段］本発明の文字認識装置は１紙面等の反射光を光電変換し
て文書画像を入力する光字的画像入力手段と、前記入力
画像から文字行及び単語の位置を検知して一単語一単語
抽出する手段と、前記抽出文字から一文字一文字を抽出
し、あらかじめ所有している文字パターンデータとの比
較を行い、数置の最も高い文字コードに変換する文字認
識手段とを具備する文字認識装置において１文字パター
ンデータ辞書中に全テストビット数のデータを記録して
おくことを特徴とする。〔実　施　例１以下１本発明について実施例に基づいて詳細に説明する
。本発明の文字認識装置のブロック図を第１図に示す０文
字認識装置はプログラムにしたがって処理を実行するｃ
ｐｕｉｏｔ、文字画像を記憶装置に入力する画像入力装
置１０２５文字認識結果を表示する文字表示手段１０３
５文字パターンデータ辞書の納まっているＲＯＭ１０４
、及び文字画像を記憶する記憶装置であるＲＡＭ　ｌ　
０５より構成されている。以下１本発明の文字認識装置の文字認識の方法を第６図
に示すフローチャートに基づいて第２図、第３図、第４
図及び第５図を用いて詳細に説明する８本発明の文字認
識装置はまず初めに画像入力装＠　ｌ　Ｏ２において光
学的な方法により紙面等に書かれた文字をイメージデー
タとして記憶装置であるＲＡＭ１０５に入力する０次に
入力した文字画像から単語領域の抽出を行う、単語領域
の抽出は、まず入力文字画像の行方向の周辺分布を計数
する。該周辺分布（図示せず）は文字行の存在する位置
で値が大きくなり、文字行と文字行との間は周辺分布の
値が小さい、従って、該周辺分布の値により文字行の位
置を容易に推定することが可能である０文字行の位置を
推定すると次に推定文字行の行方向と垂直な方向の周辺
分布を計数する。該周辺分布（図示せず）の値の大きい
ところは文字の存在している領域であり、小さいところ
は文字の存在していない領域である。従って、文字の存
在していない領域を調べることにより、単語間隔と文字
間隔の大きさが推定でき、単語領域が抽出できる。単語領域が抽出されると次は、抽出した単語の認識を行
う、単語の認識は、−文字一文字を抽出した後にＲＯＭ
１０４に納まっている文字パターンデータ辞書との比較
により行う、従来パターンマツチングによる文字の認識
は、第２図及び第３図に示す様な文字そのもののバクー
ンデータを所有し、該データと抽出文字との一致度を調
べることにより候補文字を絞っている。また、辞書の形
式は、第２図に示す様に、抽出文字を正規化して比較し
た場合、文字の多少の位置ずれ及びフォントの違いを吸
収する様、どの文字でも必ず文字部となる部分２０３、
どの文字でも必ず空白部となる部分２０１及び、文字に
よっては文字部になったり、空白部になったりするため
比較の対象としない部分２０２の３つの領域から形成さ
れている辞書や、ただ単に、対象フォントのみを認識す
る様に対象フォントのパターンそのままを表現した第３
図（ａ）及び第３図（ｂ）の様な形式であり１文字ごと
に比較する画素の数が異なってぃる、従って前記文字パ
ターンデーク辞書を用いて文字の認識を行う場合、文字
パターンデーク辞書と抽出文字データの一画素一画素を
比較しながら、トータルの一致画素数を計数し、それと
同時にトータルの比較画素数を計数して最終的に文字の
一致度を計数し１文字を絞らなければならない、そのた
めに認識に要する時間がかなり長くかかってしまう、そ
こで、本発明においては、第４図及び第５図に示す様な
文字パターンデータ辞書を所有する。第４図は、第２図
における文字バクーンデータ辞書が書かれている部分４
０１及び、前記文字パターンデーク辞書中のトータルの
比較画素数が書かれた部分４０２．さらに次の文字の文
字パターンデータ辞書４０３及び該文字のトータルの比
較画素数４０４という様に１文字パターンデータ及びト
ータルの比較画素数が組になって書かれている。また第
５図ら同様に第３図に示した文字パターンデータ辞書５
０１，５０３．５０５及びトータルの比較画素数５０２
．５０４．５０６により構成されている。従って１文字
パターンデータとの比較時においては１文字パターンデ
ータと抽出文字との一致した画素の数のみを数え、すべ
ての画素の比較が終った時点でトータルの比較画素数を
読み込んで計数すれば一致度は求まる。その結果、各画
素の比較ごとに行っていたカウントの命令が省略できる
。パターンマツチングにおいては、抽出文字パターンと
辞書のすべての画素について比較を行い、各文字は非常
に多くの画素（通常１０００画素以上）を所有している
ので、命令が減ることにより、かなりの時間の削減がで
きる。さらに、単語の認識を行う場合には、数個の文字
に対して多くの文字について比較を行うので速度の差が
人間が感じるまでに大きくなる。従って、本発明の認識
方法を用いることによりパターンマツチング法における
文字認識の時間を大きく短縮することが可能となる。以上の様にして認識した単語を認識結果表示装置１０３
において出力して認識の動作を終了する。【発明の効果］以上述べた様に本発明は、パターンマツチングによる文
字認識方法において、文字パターンデータ辞書中にトー
タルの比較画素数を所有しているので、一画素一画素の
比較時におけるトータルの比較画素数の計数の必要が無
くなる。その結果、認識に要する時間を大幅に削減でき
、文字認識装置の性能を大幅に向上するという大きな効
果を有する。In recent years, rapid advances in character recognition devices have made it possible to automatically extract character areas from various document images, cut out and recognize individual characters, and automatically create document files. Various methods have been devised to recognize characters. For example, the pattern matching method is the most basic method for character recognition. In this method, as a dictionary for character recognition, character patterns of the remaining target characters are stored in the dictionary, and at the time of recognition, by superimposing the extracted character pattern and the character pattern in the dictionary, The degree of matching is checked and characters that match are recognized characters. Although this method is very simple, if the fonts of the two recognition target characters match the characters in the dictionary, they are recognized accurately at a high rate. Also, the Aii notation method is usually 2
Although the extracted characters are compared as they are, they may be normalized and used to absorb the impact caused by the size of a single character. Furthermore, in order to eliminate discrepancies caused by differences in fonts, character patterns are given a certain degree of freedom. [Problem to be Solved by the Invention 1] However, although the above method is a very simple method and has the advantage of fairly accurate recognition, it is difficult to distinguish between the extracted character pattern and the character pattern in the dictionary. Because there are so many pixels to compare, recognition takes a long time. Character recognition devices have serious challenges. SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention aims to provide a character recognition device that recognizes characters as little as 6 times faster in a character recognition method using pattern matching. [Means for Solving the Problems] The character recognition device of the present invention includes an optical image input means for inputting a document image by photoelectrically converting light reflected from a sheet of paper, and positions of character lines and words from the input image. means for detecting and extracting each word, and a character recognition means for extracting each character from the extracted characters, comparing it with pre-existing character pattern data, and converting it into the highest numerical character code. A character recognition device having the following is characterized in that data of the total number of test bits is recorded in a single character pattern data dictionary. [Example 1] The present invention will be described in detail below based on an example. A block diagram of the character recognition device of the present invention is shown in FIG. 1. The character recognition device executes processing according to a program.
puiot, image input device 1025 for inputting character images into a storage device; character display means 103 for displaying character recognition results;
ROM104 containing a 5-character pattern data dictionary
, and RAM l, which is a storage device that stores character images.
It is composed of 05. The character recognition method of the character recognition device of the present invention will be explained below based on the flowchart shown in FIG. 6.
8 The character recognition device of the present invention, which will be explained in detail with reference to the drawings and FIG. To extract a word region from the zero-order input character image input to the RAM 105, the peripheral distribution in the row direction of the input character image is first counted. The value of the marginal distribution (not shown) increases at the position where character lines exist, and the value of the marginal distribution is small between character lines. Therefore, the position of the character line can be determined by the value of the marginal distribution. After estimating the position of the 0 character line, which can be easily estimated, the peripheral distribution in the direction perpendicular to the line direction of the estimated character line is then counted. Areas where the value of the marginal distribution (not shown) is large are areas where characters exist, and areas where the value is small are areas where no characters exist. Therefore, by examining areas where no characters exist, the sizes of word spacing and character spacing can be estimated, and word areas can be extracted. Once the word area has been extracted, the next step is to recognize the extracted word.
Character recognition by conventional pattern matching, which is performed by comparison with the character pattern data dictionary stored in Candidate characters are narrowed down by checking the degree of matching. In addition, as shown in Figure 2, the format of the dictionary is such that when the extracted characters are normalized and compared, the part 203 that always becomes the character part of any character can absorb slight positional deviations and differences in font. ,
A dictionary formed of three areas: a part 201 that is always a blank part for any character, and a part 202 that is not subject to comparison because it becomes a character part or a blank part depending on the character, or simply, The third one expresses the pattern of the target font as it is so that only the target font is recognized.
The format is as shown in Figures (a) and 3(b), and the number of pixels compared for each character is different. Therefore, when recognizing characters using the character pattern dictionary, the character pattern While comparing pixel by pixel of the data dictionary and extracted character data, count the total number of matching pixels, at the same time count the total number of comparison pixels, and finally count the degree of character matching and narrow down to one character. Therefore, in the present invention, a character pattern data dictionary as shown in FIGS. 4 and 5 is provided. Figure 4 shows the part 4 in Figure 2 where the character Bakun data dictionary is written.
01, and a portion 402 in which the total number of comparison pixels in the character pattern dictionary is written. Further, one character pattern data and the total number of comparison pixels are written as a set, such as a character pattern data dictionary 403 of the next character and a total comparison pixel number 404 of the character. Similarly to FIG. 5, the character pattern data dictionary 5 shown in FIG.
01,503.505 and total number of comparison pixels 502
．． 504.506. Therefore, when comparing with single character pattern data, count only the number of pixels that match the single character pattern data and the extracted character, and when all pixels have been compared, read and count the total number of comparison pixels. The degree of agreement is determined. As a result, the counting instruction that is performed every time each pixel is compared can be omitted. In pattern matching, all pixels in the extracted character pattern and the dictionary are compared, and since each character has a large number of pixels (usually more than 1000 pixels), the reduction in the number of instructions can save a considerable amount of time. can be reduced. Furthermore, when recognizing words, many characters are compared against several characters, so the difference in speed becomes large enough to be felt by humans. Therefore, by using the recognition method of the present invention, it is possible to greatly shorten the time required for character recognition in the pattern matching method. The words recognized as described above are displayed on the recognition result display device 103.
The recognition operation is completed by outputting the output at . [Effects of the Invention] As described above, in the character recognition method using pattern matching, the present invention has the total number of comparison pixels in the character pattern data dictionary, so the total number of comparison pixels when comparing pixel by pixel is There is no need to count the number of comparison pixels. As a result, the time required for recognition can be significantly reduced, and the performance of the character recognition device can be significantly improved.

[Brief explanation of drawings]

第１図に本発明の文字認識装置のブロック図を示す。第２図、第３図、第４図及び第５図に本発明の概要を示
した図を示す。第６図に本発明の文字認識装置のフローチャートを示す
。１０１　　・　・　・　ＣＰＵ１０２・・・画像入力装置１０３・・・認識結果表示装置・　・　・　ＲＯＭ・　・　・　ＲＡＭ４０３、　４０４０２、　４０４．　４０５０３、５０５０４、　５０５０１゜５０２、０４０５０１０２０３０１０２４０１　。・共通白部・非比較対象部・共通文字部・文字部・白空部・トータル比較画素数トータル比較画素数文字パターンデータ文字パターンデータ以上FIG. 1 shows a block diagram of a character recognition device according to the present invention. FIGS. 2, 3, 4, and 5 are diagrams showing an overview of the present invention. FIG. 6 shows a flowchart of the character recognition device of the present invention. 101...CPU 102...Image input device 103...Recognition result display device...ROM...RAM 403, 40 402, 404. 40 503, 50 504, 50 501°502, 04 05 01 02 03 01 02 401.・Common white area ・Non-comparison target area ・Common character area ・Text area ・White space area ・Total comparison pixel countTotal comparison pixel countCharacter pattern dataCharacter pattern data or more

Claims

[Claims]

an optical image input means for inputting a document image by photoelectrically converting light reflected from a paper surface, a means for detecting the positions of character lines and words from the input image and extracting each word, and a means for extracting each word from the extracted characters. In a character recognition device equipped with a character recognition means that extracts each character, compares it with pre-existing character pattern data, and converts it to the character code with the highest degree of matching, all tests are stored in the character pattern data dictionary. A character recognition device characterized by recording bit number data.