JPH11161739A

JPH11161739A - Character recognizing device

Info

Publication number: JPH11161739A
Application number: JP9323440A
Authority: JP
Inventors: Masaru Sugioka; 賢杉岡; Koji Ito; 晃治伊東
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1997-11-25
Filing date: 1997-11-25
Publication date: 1999-06-18

Abstract

PROBLEM TO BE SOLVED: To provide a character recognizing device for preventing misreading for a void character. SOLUTION: A picture storing part 1 stores a binary picture image ST of a medium T by using a background as white picture elements and a character as black picture elements. When the binary picture image S is stored in the picture storing part 1, a character area is segmented by one character unit from the binary picture image S1 by a character segmenting part 2, and the segmented character area S2 is transmitted to a character discriminating part 3 and a character recognizing part 4. The character judging part 3 judges whether or not the character area S2 is a normal character or a void character. The character judging part 3 judges a character, and transmits a character judged result S3 to the character recognizing part 4. The character recognizing part 4 receives the judged result S3, receives the character area S2 from the character segmenting part 2, and when the judged result S3 indicates the normal character, the character recognizing part 4 operates a recognition processing, and outputs a recognized result S4. Also, when the judged result S3 indicates the void character, the character recognizing part 4 outputs the recognized result S4 as reject (unreadable) without operating any recognition processing.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文書等の２値画像
イメージから文字を切出して認識し、特に白抜き文字の
認識に対して対策を施した文字認識装置（以下、ＯＣＲ
という）に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition apparatus (hereinafter referred to as "OCR") which cuts out and recognizes a character from a binary image of a document or the like, and particularly takes measures against the recognition of an outline character.
It is about).

【０００２】[0002]

【従来の技術】従来のＯＣＲは、例えばＣＣＤ(Charge
Coupled Device) センサ等を用いて文書等を光学的に読
取り、白と黒との２値に量子化された画像イメージを取
得し、この画像イメージから文字領域を切出して認識処
理を行う。この認識処理において、予め定めた標準的な
文字パタン（以下、標準パタンという）から特徴を抽出
した認識辞書と、認識させようとする文字パタンの特徴
とを比較し、特徴が似ているものを選択して認識結果と
して出力するようにしている。2. Description of the Related Art A conventional OCR is, for example, a CCD (Charge).
A document or the like is optically read by using a sensor or the like, an image image quantized into binary values of white and black is obtained, and a character area is cut out from the image image to perform recognition processing. In this recognition processing, a recognition dictionary in which features are extracted from a predetermined standard character pattern (hereinafter, referred to as a standard pattern) is compared with the features of the character pattern to be recognized. Select and output as recognition result.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
ＯＣＲでは、次のような課題があった。図２は、通常文
字の例を示す図である。認識辞書において、標準パタン
は、例えば図２に示すような黒画素のみで構成されてい
る文字（これを、通常文字という）を用いて作成され
る。これは、通常文字が一般的な文書において使用され
る頻度が高いためである。ところが、今日のワードプロ
セッサ等では、多機能化やディスク容量の増大化に伴
い、白抜き文字、様々なフォント、及び飾り文字等を用
いた文書を容易に作成することができる。図３は、白抜
き文字の例を示す図である。この図３に示すような文字
の縁のみが黒画素で構成される文字（即ち、白抜き文
字）を使用して文書が作成されることがある。又、文書
中に特に強調したい単語等がある場合、これらの白抜き
文字を用いて他の文字と区別することもある。従来のＯ
ＣＲでは、これらの白抜き文字に対し、通常文字を対象
とした認識辞書を用いて認識すると、通常文字と比較し
て著しく特徴が異なるので、誤読になる確率が高く、認
識精度が低下するという問題があった。又、白抜き文字
を認識対象とするために、通常文字の認識辞書に白抜き
文字の特徴を追加すると、該通常文字に対する認識精度
が低下するという問題もあった。そのため、認識対象と
する文書等に白抜き文字が含まれている場合においても
認識精度を低下させないＯＣＲが望まれている。However, the conventional OCR has the following problems. FIG. 2 is a diagram illustrating an example of a normal character. In the recognition dictionary, the standard pattern is created using a character composed of only black pixels as shown in FIG. 2 (this is called a normal character). This is because ordinary characters are frequently used in general documents. However, with today's word processors and the like, multi-functions and increased disk capacities make it possible to easily create documents using blank characters, various fonts, decorative characters, and the like. FIG. 3 is a diagram illustrating an example of outline characters. As shown in FIG. 3, a document may be created using characters in which only the edges of the characters are composed of black pixels (ie, white characters). In addition, when there is a word or the like to be particularly emphasized in a document, these characters may be used to distinguish them from other characters. Conventional O
In CR, when these white characters are recognized using a recognition dictionary for ordinary characters, the characteristics are significantly different from those of ordinary characters, so that the probability of misreading is high and the recognition accuracy is reduced. There was a problem. Further, if features of white characters are added to the recognition dictionary of normal characters in order to make white characters the target of recognition, there is a problem that the recognition accuracy for the normal characters is reduced. For this reason, there is a demand for an OCR that does not reduce the recognition accuracy even when white characters are included in a document or the like to be recognized.

【０００４】[0004]

【課題を解決するための手段】前記課題を解決するため
に、本発明のうちの第１の発明は、文字が記載された媒
体の２値画像イメージを入力して格納する画像格納部
と、前記画像格納部に格納された前記２値画像イメージ
から文字領域を切出す文字切出部と、前記文字領域中の
文字を認識する文字認識部とを、備えたＯＣＲにおい
て、次のような手段を設けている。前記切出された文字
領域中の文字は、該文字を形成する文字線が黒画素のみ
で構成されている通常文字か又は該文字線の縁のみが黒
画素で構成されている白抜き文字かを判別する文字判別
部を設けている。更に、前記文字認識部は、前記文字判
別部の判別結果が前記白抜き文字を示す場合には、前記
文字の認識を行わない構成にしている。第２の発明で
は、ＯＣＲにおいて、第１の発明と同様の画像格納部、
文字切出部、及び文字判別部と、前記通常文字の認識に
適した認識辞書を格納した通常文字辞書部と、前記白抜
き文字の認識に適した認識辞書を格納した白抜き文字辞
書部と、前記文字判別部の判別結果に基づいて前記通常
文字辞書部又は前記白抜き文字辞書部を選択する辞書選
択部と、前記辞書選択部が選択した認識辞書を用いて前
記文字領域から文字を認識する文字認識部とを、備えて
いる。According to a first aspect of the present invention, there is provided an image storage unit for inputting and storing a binary image of a medium on which characters are written, In an OCR having a character extracting unit for extracting a character area from the binary image stored in the image storage unit and a character recognizing unit for recognizing a character in the character area, the following means is provided. Is provided. The character in the cut-out character area is a normal character in which a character line forming the character is composed only of black pixels or an outline character in which only the edge of the character line is composed of black pixels. Is provided. Further, the character recognition unit is configured not to recognize the character when the result of the determination by the character determination unit indicates the outlined character. According to a second aspect, in the OCR, the same image storage unit as the first aspect,
A character extraction unit, a character discrimination unit, a normal character dictionary unit storing a recognition dictionary suitable for recognition of the normal characters, and a white character dictionary unit storing a recognition dictionary suitable for recognition of the white characters A dictionary selecting unit that selects the normal character dictionary unit or the outline character dictionary unit based on a result of the determination by the character determining unit; and a character recognition unit that recognizes characters from the character region using a recognition dictionary selected by the dictionary selection unit. And a character recognition unit that performs

【０００５】第３の発明では、第１又は第２の発明の文
字判別部は、前記切出された文字領域中の文字の黒画素
の領域の外側の輪郭を構成する画素を計数した通常輪郭
点数と、該文字の黒画素の領域の内側の輪郭を構成する
画素を計数したループ輪郭点数とを検出し、該通常輪郭
点数と該ループ輪郭点数との比率を所定の閾値と比較
し、この比較結果に基づいて該文字が前記通常文字か前
記白抜き文字かを判別する構成にしている。第１及び第
３の発明によれば、以上のようにＯＣＲを構成したの
で、媒体の２値画像イメージは画像格納部に格納され、
文字切出部で該格納された２値画像イメージから文字領
域が切出される。この切出された文字領域中の文字は、
文字判別部で通常文字か又は白抜き文字かが判別され
る。文字認識部は、文字判別部の判別結果が通常文字を
示す場合には文字の認識を行い、該判別結果が白抜き文
字を示す場合には文字の認識を行わない。第２及び第３
の発明によれば、辞書選択部は、文字判別部の判別結果
に基づいて通常文字辞書部又は白抜き文字辞書部を選択
する。文字認識部は、選択された辞書部の認識辞書に基
づき、文字切出部から切出された文字領域中の文字を認
識する。In a third aspect of the present invention, the character discriminating unit of the first or second aspect of the present invention includes the normal contour counting pixels constituting the contour outside the black pixel area of the character in the cut-out character area. The number of points and the number of loop contour points obtained by counting the pixels constituting the contour inside the black pixel area of the character are detected, and the ratio between the normal contour points and the loop contour points is compared with a predetermined threshold value. It is configured to determine whether the character is the ordinary character or the outline character based on the comparison result. According to the first and third aspects, since the OCR is configured as described above, the binary image of the medium is stored in the image storage unit,
A character extracting section extracts a character area from the stored binary image. The characters in the extracted character area are
The character determining unit determines whether the character is a normal character or a white character. The character recognition unit recognizes the character when the result of the determination by the character determination unit indicates a normal character, and does not recognize the character when the result of the determination indicates a blank character. Second and third
According to the invention, the dictionary selection unit selects the normal character dictionary unit or the outline character dictionary unit based on the determination result of the character determination unit. The character recognizing unit recognizes a character in the character area cut out from the character cutout unit based on the recognition dictionary of the selected dictionary unit.

【０００６】第４の発明では、ＯＣＲにおいて、文字が
記載された媒体の２値画像イメージを入力して格納する
画像格納部と、前記画像格納部に格納された前記２値画
像イメージから文字領域を切出す文字切出部と、前記切
出された文字領域中の文字は、該文字を形成する文字線
が黒画素のみで構成されている通常文字か又は該文字線
の縁のみが黒画素で構成されている白抜き文字かを判別
する文字判別部と、前記文字判別部の判別結果が前記通
常文字を示す場合、前記切出された文字領域中の文字を
そのまま出力し、該判別結果が前記白抜き文字を示す場
合、前記切出された文字領域の文字を前記通常文字に補
正して出力する画像補正部と、前記画像補正部から出力
された文字を認識する文字認識部とを、備えている。第
５の発明では、第４の発明の文字判別部は、前記切出さ
れた文字領域中の文字の黒画素の領域の外側の輪郭を構
成する画素を計数した通常輪郭点数及び当該画素の座標
を示す通常輪郭データと、該文字の黒画素の領域の内側
の輪郭を構成する画素を計数したループ輪郭点数とを検
出し、該通常輪郭点数と該ループ輪郭点数との比率を所
定の閾値と比較し、この比較結果に基づいて該文字が前
記通常文字か前記白抜き文字かを判別する構成にし、更
に、該通常輪郭データを出力する構成にしている。第６
の発明では、ＯＣＲにおいて、文字が記載された媒体の
２値画像イメージを入力して格納する画像格納部と、前
記画像格納部に格納された前記２値画像イメージから文
字領域を切出す文字切出部と、第５の発明の文字判別部
と、前記文字判別部の判別結果が前記通常文字を示す場
合、前記切出された文字領域中の文字をそのまま出力
し、該判別結果が前記白抜き文字を示す場合、前記切出
された文字領域の文字を前記通常文字に補正して出力す
る画像補正部と、前記画像補正部から出力された文字を
認識する文字認識部とを備えている。According to a fourth aspect of the present invention, in the OCR, an image storage unit for inputting and storing a binary image image of a medium on which characters are written, and a character area from the binary image image stored in the image storage unit. Character extraction portion, and the characters in the extracted character area, the character line forming the character is a normal character composed of only black pixels or only the edge of the character line is a black pixel A character discriminating unit for discriminating whether the character is a white character, and when the discrimination result of the character discriminating unit indicates the normal character, the character in the cut-out character area is output as it is, and the discrimination result When the character indicates the white characters, an image correction unit that corrects and outputs the characters in the cut-out character area to the normal characters, and a character recognition unit that recognizes the characters output from the image correction unit , Have. In a fifth aspect, the character discriminating unit according to the fourth aspect comprises a normal contour point number obtained by counting pixels constituting a contour outside a black pixel area of the character in the cut-out character area, and coordinates of the pixel. Is detected, and the number of loop contour points obtained by counting the pixels constituting the contour inside the black pixel area of the character is detected, and the ratio between the normal contour points and the loop contour points is determined by a predetermined threshold value. A comparison is made to determine whether the character is the normal character or the outline character based on the comparison result, and further, the normal contour data is output. Sixth
In the OCR according to the present invention, in the OCR, an image storage unit for inputting and storing a binary image image of a medium on which characters are written, and a character cutout for extracting a character area from the binary image image stored in the image storage unit When the determination result of the output unit, the character determination unit of the fifth invention, and the character determination unit indicates the normal character, the character in the cut-out character area is output as it is, and the determination result is the white character. In the case of indicating a blank character, the image processing device includes an image correction unit that corrects and outputs the character in the cut-out character area to the normal character, and a character recognition unit that recognizes a character output from the image correction unit. .

【０００７】そして、前記画像補正部は、文字切出部か
ら文字領域を受取ると共に、第５の発明の文字判別部か
ら判別結果と前記通常輪郭データとを受取り、該判別結
果が前記通常文字を示す場合には該文字領域を前記文字
認識部に送出し、該判別結果が前記白抜き文字を示す場
合には、該通常輪郭データ及び該文字領域を輪郭矩形検
出部へ送出する切換部と、前記通常輪郭データから輪郭
を囲む最小矩形をそれぞれ算出し、直交するＸ軸及びＹ
軸からなるＸＹ座標系における該算出した各矩形の位置
と大きさとを特定する各座標値を比較した結果、或る矩
形が他の矩形を完全に囲むとき、この囲んだ方の矩形を
形成する通常輪郭データを上位とし、囲まれた方の矩形
を形成する通常輪郭データを下位に位置付けることによ
り、該各輪郭データの順位付けを行う前記輪郭矩形検出
部と、前記輪郭矩形検出部において順位付けされた前記
各通常輪郭データと前記文字領域とを受取り、該文字領
域中の各通常輪郭データで囲まれた各領域の画素に対
し、該順位が奇数順位の場合には黒画素に変換し、該順
位が偶数順位の場合には白画素に変換する処理を上位か
ら順に行うことにより、該文字領域中の文字を前記通常
文字に補正して前記文字認識部へ送出する補正画像作成
部とを、備えている。第４、第５、及び第６の発明によ
れば、文字判別部の判別結果が通常文字を示す場合、画
像補正部は文字切出部で切出された文字領域中の文字を
そのまま出力する。文字判別部の判別結果が白抜き文字
を示す場合、画像補正部は、文字領域中の文字を文字認
識部で通常文字の認識に適した認識辞書を用いて認識が
行えるように、通常文字に補正して出力する。画像補正
部から出力された文字は、文字認識部で認識される。従
って、前記課題を解決できるのである。The image correction section receives the character area from the character cutout section, receives the determination result and the normal outline data from the character determination section of the fifth invention, and determines the normal character as the result of the determination. A switching unit that sends the character area to the character recognition unit when the character area is indicated, and sends the normal outline data and the character area to the outline rectangle detection unit when the determination result indicates the outline character; The minimum rectangle surrounding the contour is calculated from the normal contour data, and the orthogonal X axis and Y
As a result of comparing the calculated coordinate values specifying the position and size of each of the calculated rectangles in the XY coordinate system composed of axes, when a certain rectangle completely surrounds another rectangle, the enclosed rectangle is formed. The contour rectangle detecting section which ranks the respective contour data by positioning the normal contour data at the top and the normal contour data forming the enclosed rectangle at the bottom, Received each of the normal contour data and the character area, and for the pixels of each area surrounded by each of the normal contour data in the character area, if the rank is an odd rank, convert it to a black pixel; When the rank is an even rank, a process of converting to white pixels is performed in order from the top, thereby correcting a character in the character area to the normal character and transmitting the corrected image to the character recognition unit. , Have According to the fourth, fifth, and sixth aspects of the present invention, when the determination result of the character determination unit indicates a normal character, the image correction unit outputs the character in the character region cut out by the character cutout unit as it is. . When the result of the determination by the character determination unit indicates an outline character, the image correction unit converts the character in the character area into a normal character so that the character recognition unit can recognize the character using a recognition dictionary suitable for recognition of the normal character. Correct and output. The character output from the image correction unit is recognized by the character recognition unit. Therefore, the above problem can be solved.

【０００８】[0008]

【発明の実施の形態】第１の実施形態図１は、本発明の第１の実施形態を示すＯＣＲの構成図
である。このＯＣＲは、文字が記載された媒体Ｔの２値
画像イメージＳＴを入力して格納する画像格納部１を有
している。画像格納部１の出力側には、該画像格納部１
に格納された２値画像イメージＳ１から文字領域を切出
す文字切出部２が接続されている。文字切出部２の出力
側には、切出された文字領域中の文字が通常文字か白抜
き文字かを判別する文字判別部３が接続されている。更
に、文字切出部２の出力側には、文字判別部３における
判別結果が通常文字の場合は、文字領域中の文字を認識
処理し、該判別結果が白抜き文字の場合は、リジェクト
（不読）として出力する文字認識部４が接続されてい
る。文字認識部４からは、文字の認識処理結果又はリジ
ェクトが認識結果Ｓ４として出力されるようになってい
る。次に、図１のＯＣＲにおける処理を説明する。画像
格納部１は、媒体Ｔの２値画像イメージＳＴを、例えば
背景を白画素及び文字を黒画素として格納する。２値画
像イメージＳＴが画像格納部１に格納されると、文字切
出部２では、２値画像イメージＳ１から文字領域を１文
字単位に切出し、該切出した文字領域Ｓ２を文字判別部
３及び文字認識部４に送出する。文字判別部３は、文字
領域Ｓ２が通常文字か白抜き文字かを判別する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment FIG. 1 is a block diagram of an OCR showing a first embodiment of the present invention. The OCR has an image storage unit 1 for inputting and storing a binary image ST of a medium T on which characters are described. On the output side of the image storage unit 1, the image storage unit 1
Is connected to a character extracting section 2 for extracting a character area from the binary image image S1 stored in the storage area. On the output side of the character extracting unit 2, a character determining unit 3 for determining whether the character in the extracted character area is a normal character or an outline character is connected. Further, on the output side of the character extracting unit 2, when the determination result in the character determination unit 3 is a normal character, the character in the character area is recognized and when the determination result is a blank character, a reject ( A character recognizing unit 4 for outputting as (unread) is connected. The character recognition unit 4 outputs a character recognition processing result or rejection as a recognition result S4. Next, processing in the OCR of FIG. 1 will be described. The image storage unit 1 stores the binary image ST of the medium T, for example, as white pixels for the background and black pixels for the characters. When the binary image ST is stored in the image storage unit 1, the character extracting unit 2 extracts a character area from the binary image S1 in units of one character, and extracts the extracted character area S2 into the character discriminating unit 3 and It is sent to the character recognition unit 4. The character determination unit 3 determines whether the character area S2 is a normal character or a white character.

【０００９】以下、この判別方法（１），（２）につい
て説明する。（１）文字の輪郭の検出先ず、文字領域Ｓ２を走査して文字の輪郭を検出する。
この輪郭の検出には、以下に説明する輪郭追跡法を用い
る。図４（ａ），（ｂ）は、輪郭追跡法を説明する図で
ある。輪郭追跡法では、図４（ａ）に示すように、文字
領域Ｓ２の左上を走査の開始点ｓとし、該文字領域Ｓ２
の全面を順次走査する。この時、白画素に接する黒画素
を検出した場合、図４（ｂ）に示すように、黒画素を反
時計方向に追跡するようにして輪郭を検出する。白画素
に接する黒画素とは、黒画素の上下左右のいずれかに白
画素が存在するものをいう。具体的には、走査過程で最
初に検出した白画素に接する黒画素を輪郭追跡の開始点
とし、これを黒画素ｂ０とする。次に、黒画素ｂ０を始
点とした反時計方向に黒画素ｂ１が検出される。黒画素
ｂ１が検出されると、該黒画素ｂ１を中心とし且つ黒画
素ｂ０を始点とした反時計方向に黒画素ｂ２が検出され
る。このように、輪郭を追跡して開始点の黒画素ｂ０に
戻ってくるまで黒画素の検出を繰り返す。そして、追跡
した黒画素の数を輪郭点数として文字判別部３内のテー
ブルに書き込む。この時、同じ輪郭を再び追跡しないよ
うにする。Hereinafter, the determination methods (1) and (2) will be described. (1) Detection of Character Outline First, the character area S2 is scanned to detect the character outline.
To detect the contour, a contour tracking method described below is used. FIGS. 4A and 4B are diagrams illustrating the contour tracking method. In the contour tracing method, as shown in FIG. 4A, the upper left of the character area S2 is set as a scanning start point s, and the character area S2
Are sequentially scanned. At this time, when a black pixel in contact with a white pixel is detected, as shown in FIG. 4B, the contour is detected by following the black pixel in a counterclockwise direction. A black pixel in contact with a white pixel refers to a pixel in which a white pixel exists at any of the upper, lower, left, and right sides of the black pixel. Specifically, a black pixel that is in contact with a white pixel detected first in the scanning process is set as a contour tracking start point, and this is set as a black pixel b0. Next, the black pixel b1 is detected in the counterclockwise direction starting from the black pixel b0. When the black pixel b1 is detected, the black pixel b2 is detected counterclockwise around the black pixel b1 and starting at the black pixel b0. As described above, the detection of the black pixel is repeated until the contour is traced and the pixel returns to the black pixel b0 at the start point. Then, the number of tracked black pixels is written in the table in the character discriminating unit 3 as the number of contour points. At this time, the same contour is not tracked again.

【００１０】（２）文字の判別図５（ａ），（ｂ）は通常文字の輪郭データを示す図、
図６（ａ），（ｂ）は白抜き文字の輪郭データを示す図
である。輪郭データには、黒画素の領域を囲むように構
成される通常輪郭データと、黒画素の領域の内側を構成
するループ輪郭データとがある。これらを区別するため
に、輪郭追跡の開始点の１つ上の画素が白画素の場合を
通常輪郭データとし、輪郭追跡の開始点の１つ上の画素
が黒画素の場合をループ輪郭データとする。又、輪郭追
跡の開始点の１つ上の画素が文字領域外の場合には通常
輪郭データとする。文字判別部３では、輪郭追跡の際、
通常輪郭点数及びループ輪郭点数を計数する。一般に、
通常文字では、通常輪郭点数に比ベてループ輪郭点数が
少なく、白抜き文字では通常輪郭点数とループ輪郭点数
があまり変わらないという特徴がある。そこで、通常輪
郭点数をＡ、及びループ輪郭点数をＢとし、輪郭データ
比率Ｃを次式（１）で求める。Ｃ＝Ａ／Ｂ・・・（１）この輪郭データ比率Ｃは、通常文字の場合は大きく、白
抜き文字の場合は小さいので、文字判別部３は該輪郭デ
ータ比率Ｃを所定の閾値ＴＨＬと比較し、次式（２）を
満たせば通常文字として判別し、満たさなければ白抜き
文字として判別する。Ｃ＞ＴＨＬ・・・（２）図５（ａ），（ｂ）に示す通常文字の輪郭データでは、
同図（ａ）の通常輪郭点数Ａ＝２７５、及び同図（ｂ）
のループ輪郭点数Ｂ＝８２が求められ、輪郭データ比率
Ｃ＝３．３５が得られる。図６（ａ），（ｂ）に示す白
抜き文字の輪郭データでは、同図（ａ）の通常輪郭点数
Ａ＝３７４、及び同図（ｂ）のループ輪郭点数Ｂ＝３４
２が求められ、輪郭データ比率Ｃ＝１．０９が得られ
る。従って、閾値ＴＨＬを予め例えば２に設定しておけ
ば、通常文字と白抜き文字との判別が可能である。尚、
ループ輪郭点数Ｂが０の場合、（１）式を用いずに通常
文字として判別する。(2) Character determination FIGS. 5A and 5B show outline data of a normal character.
FIGS. 6A and 6B are diagrams showing outline data of outline characters. The contour data includes normal contour data configured to surround the area of the black pixel and loop contour data configured inside the area of the black pixel. In order to distinguish them, the case where the pixel immediately above the start point of the contour tracking is a white pixel is defined as normal outline data, and the case where the pixel above the start point of the outline tracking is a black pixel is defined as loop outline data. I do. When the pixel immediately above the start point of the contour tracking is outside the character area, it is determined to be normal contour data. In the character discriminating unit 3, when tracking the contour,
Usually, the number of contour points and the number of loop contour points are counted. In general,
A normal character has a feature that the number of loop contour points is smaller than that of a normal contour point, and a blank character has a feature that the number of normal contour points and the number of loop contour points do not change much. Therefore, the number of normal contour points is A, the number of loop contour points is B, and the contour data ratio C is obtained by the following equation (1). C = A / B (1) Since the contour data ratio C is large for normal characters and small for outline characters, the character discriminating unit 3 sets the contour data ratio C to a predetermined threshold value THL. If the following expression (2) is satisfied, the character is determined as a normal character, and if not, the character is determined as a blank character. C> THL (2) In the outline data of the normal character shown in FIGS. 5A and 5B,
The number of normal contour points A = 275 in FIG.
Is obtained, and the contour data ratio C = 3.35 is obtained. In the outline data of the outline characters shown in FIGS. 6A and 6B, the number of normal outline points A = 374 in FIG. 6A and the number of loop outline points B = 34 in FIG.
2 is obtained, and the contour data ratio C = 1.09 is obtained. Therefore, if the threshold value THL is set to, for example, 2 in advance, it is possible to distinguish between a normal character and a white character. still,
When the loop contour number B is 0, the character is determined as a normal character without using the expression (1).

【００１１】文字判別部３は、文字を判別すると、文字
の判別結果Ｓ３を文字認識部４へ送出する。文字認識部
４は、判別結果Ｓ３を受取ると共に、文字切出部２から
文字領域Ｓ２を受取り、判別結果Ｓ３が通常文字を示す
場合は認識処理を行って認識結果Ｓ４を出力する。又、
判別結果Ｓ３が白抜き文字を示す場合、文字認識部４は
認識処理を行わず、リジェクト（不読）として認識結果
Ｓ４を出力する。以上のように、この第１の実施形態で
は、白抜き文字が含まれる文書を認識対象とした場合で
も、文字判別部３で文字が通常文字か白抜き文字かを判
別し、誤読となる可能性の高い白抜き文字を不読にする
ようにしたので、認識精度が低下することがない。更
に、リジェクトとして認識結果Ｓ４を出力するようにし
たので、後段の修正端末による処理において、オペレー
タによる認識結果の確認や修正作業等を容易に行うこと
ができる。When the character discrimination section 3 discriminates a character, it sends a character discrimination result S3 to the character recognition section 4. The character recognition unit 4 receives the determination result S3, receives the character area S2 from the character cutout unit 2, and performs a recognition process when the determination result S3 indicates a normal character, and outputs the recognition result S4. or,
When the discrimination result S3 indicates an outline character, the character recognizing unit 4 outputs the recognition result S4 as reject (unread) without performing the recognition process. As described above, in the first embodiment, even when a document including white characters is to be recognized, the character determination unit 3 determines whether the character is a normal character or a white character, which may result in erroneous reading. Since the white characters having high recognizability are not read, the recognition accuracy does not decrease. Further, since the recognition result S4 is output as a reject, the operator can easily confirm the recognition result, perform a correction operation, and the like in the processing by the correction terminal in the subsequent stage.

【００１２】第２の実施形態図７は、本発明の第２の実施形態を示すＯＣＲの構成図
であり、第１の実施形態を示す図１中の要素と共通の要
素には共通の符号が付されている。このＯＣＲでは、文
字判別部３の出力側に、図１中の文字認識部４に代え
て、異なる構成の文字認識部４Ａが接続されている。更
に、文字判別部３の出力側には辞書選択部５が接続さ
れ、該辞書選択部５には通常文字から特徴を抽出して作
成された通常文字辞書を格納する通常文字辞書部６と、
白抜き文字から特徴を抽出して作成された白抜き文字辞
書を格納する白抜き文字辞書部７とが接続されている。
辞書選択部５は、文字判別部３の判別結果Ｓ３が通常文
字の場合には通常文字辞書部６を選択し、該判別結果Ｓ
３が白抜き文字の場合には白抜き文字辞書部７を選択す
るものである。辞書選択部５の出力側には、文字認識部
４Ａが接続されている。文字認識部４Ａは、辞書選択部
５が選択した通常文字辞書部６又は白抜き文字辞書部７
の認識辞書に基づいて文字領域Ｓ２から文字を認識する
ものである。他は、図１と同様の構成である。 Second Embodiment FIG. 7 is a block diagram of an OCR according to a second embodiment of the present invention, in which elements common to those in FIG. 1 showing the first embodiment have common reference numerals. Is attached. In this OCR, a character recognition unit 4A having a different configuration is connected to the output side of the character discrimination unit 3 instead of the character recognition unit 4 in FIG. Further, a dictionary selection unit 5 is connected to an output side of the character discrimination unit 3, and the dictionary selection unit 5 includes a normal character dictionary unit 6 for storing a normal character dictionary created by extracting features from normal characters.
An outline character dictionary unit 7 that stores an outline character dictionary created by extracting features from outline characters is connected.
When the determination result S3 of the character determination unit 3 is a normal character, the dictionary selection unit 5 selects the normal character dictionary unit 6, and determines the determination result S3.
If the character 3 is an outline character, the outline character dictionary unit 7 is selected. A character recognition unit 4A is connected to an output side of the dictionary selection unit 5. The character recognizing unit 4A is provided with the normal character dictionary unit 6 or the white character dictionary unit 7 selected by the dictionary selecting unit 5.
Is for recognizing characters from the character area S2 based on the recognition dictionary. Other configurations are the same as those in FIG.

【００１３】次に、図７のＯＣＲにおける処理を説明す
る。画像格納部１、文字切出し部２、及び文字判別部３
は、第１の実施形態と同様の処理を行う。辞書選択部５
は、文字判別部３から文字領域Ｓ２の判別結果Ｓ３を受
取り、該判別結果Ｓ３が通常文字を示す場合は通常文字
辞書部６と接続し、該通常文字辞書部６に格納された認
識辞書を文字認識部４Ａへ送出する。又、判別結果Ｓ３
が白抜き文字を示す場合、辞書選択部５は白抜き文字辞
書部７と接続し、該白抜き文字辞書部７に格納された認
識辞書を文字認識部４Ａへ送出する。文字認識部４Ａ
は、文字切出部２から文字領域Ｓ２を受取り、辞書選択
部５から辞書を受取って該文字領域Ｓ２中の文字の認識
処理を行い、認識結果Ｓ４Ａを出力する。以上のよう
に、この第２の実施形態では、予め通常文字辞書部６と
白抜き文字辞書部７とを用意しておき、白抜き文字が存
在する文書を認識対象とした場合でも、文字判別部３に
よって文字が通常文字か白抜き文字かを判別し、辞書選
択部５で判別結果Ｓ３に応じた最適な辞書を選択して文
字認識部４Ａへ送出するようにしたので、通常文字の認
識を従来通り行い、且つ白抜き文字を正しく認識でき
る。Next, the processing in the OCR of FIG. 7 will be described. Image storage unit 1, character extraction unit 2, and character determination unit 3
Performs the same processing as in the first embodiment. Dictionary selection unit 5
Receives the discrimination result S3 of the character area S2 from the character discrimination unit 3, connects to the normal character dictionary unit 6 when the discrimination result S3 indicates a normal character, and reads the recognition dictionary stored in the normal character dictionary unit 6. It is sent to the character recognition unit 4A. Also, the determination result S3
Indicates white characters, the dictionary selection unit 5 connects to the white character dictionary unit 7 and sends the recognition dictionary stored in the white character dictionary unit 7 to the character recognition unit 4A. Character recognition unit 4A
Receives the character area S2 from the character extracting section 2, receives the dictionary from the dictionary selecting section 5, performs recognition processing of characters in the character area S2, and outputs a recognition result S4A. As described above, in the second embodiment, the normal character dictionary unit 6 and the outline character dictionary unit 7 are prepared in advance, and even when a document in which outline characters exist is to be recognized, the character discrimination is performed. The unit 3 determines whether a character is a normal character or an outline character, and the dictionary selecting unit 5 selects an optimal dictionary according to the determination result S3 and sends it to the character recognizing unit 4A. Is performed as before, and white characters can be correctly recognized.

【００１４】第３の実施形態図８は、本発明の第３の実施形態を示すＯＣＲの構成図
であり、第１の実施形態を示す図１中の要素と共通の要
素には共通の符号が付されている。このＯＣＲでは、文
字切出部２の出力側に、図１中の文字判別部３に代え
て、異なる構成の文字判別部３Ａが接続されている。文
字判別部３Ａは、文字切出部２から文字画像Ｓ２を受取
り、文字判別部３と同様に文字の判別を行って判別結果
Ｓ３Ａａを生成すると共に、輪郭の追跡を行う際に輪郭
データを格納した輪郭テーブルＳ３Ａｂを作成するもの
である。文字切出部２の出力側及び文字判別部３Ａの出
力側には、画像補正部８が接続されている。画像補正部
８は、文字判別部３Ａの判別結果Ｓ３Ａａが白抜き文字
を示す場合には、文字領域Ｓ２を通常文字と同等になる
ように補正するものである。画像補正部８の出力側に
は、文字認識部４が接続されている。他は、図１と同様
の構成である。 Third Embodiment FIG. 8 is a block diagram of an OCR according to a third embodiment of the present invention. Elements common to the elements in FIG. 1 according to the first embodiment are denoted by the same reference numerals. Is attached. In this OCR, a character discriminating unit 3A having a different configuration is connected to the output side of the character extracting unit 2 instead of the character discriminating unit 3 in FIG. The character discriminating unit 3A receives the character image S2 from the character clipping unit 2, performs character discrimination in the same manner as the character discriminating unit 3, generates a discrimination result S3Aa, and stores contour data when contour tracking is performed. The created contour table S3Ab is created. An image correction unit 8 is connected to the output side of the character extraction unit 2 and the output side of the character determination unit 3A. When the determination result S3Aa of the character determination unit 3A indicates an outline character, the image correction unit 8 corrects the character area S2 so as to be equivalent to a normal character. The character recognition unit 4 is connected to the output side of the image correction unit 8. Other configurations are the same as those in FIG.

【００１５】図９は、図８中の画像補正部８の構成図で
ある。この画像補正部８は、切換部８ａを有している。
切換部８ａは、文字切出部２から文字領域Ｓ２を受取
り、文字判別部３Ａから判別結果Ｓ３Ａａと輪郭テーブ
ルＳ３Ａｂとを受取り、該判別結果Ｓ３Ａａが通常文字
を示す場合は文字領域Ｓ２をそのまま文字認識部４に送
出し、該判別結果Ｓ３Ａａが白抜き文字を示す場合は輪
郭テーブルＳ３Ａｂと文字画像Ｓ２とを輪郭矩形検出部
８ｂへ送出するものである。輪郭矩形検出部８ｂは、輪
郭テーブルＳ３Ａｂを走査して各輪郭データを囲む最小
矩形を輪郭矩形として求めるものである。輪郭矩形検出
部８ｂの出力側には、文字領域Ｓ２上の画素を、上位の
輪郭データから順に該輪郭データで囲まれた領域の画素
を黒画素又は白画素に変換する補正画像作成部８ｃが接
続されている。FIG. 9 is a block diagram of the image correction unit 8 in FIG. The image correction unit 8 has a switching unit 8a.
The switching unit 8a receives the character area S2 from the character cutout unit 2, receives the determination result S3Aa and the contour table S3Ab from the character determination unit 3A, and if the determination result S3Aa indicates a normal character, the character area S2 is left as it is. The outline table S3Ab and the character image S2 are sent to the outline rectangle detection unit 8b when the judgment result S3Aa indicates an outline character. The outline rectangle detection unit 8b scans the outline table S3Ab and obtains a minimum rectangle surrounding each outline data as an outline rectangle. On the output side of the outline rectangle detection unit 8b, there is provided a correction image creation unit 8c that converts pixels in the character area S2 into pixels in a region surrounded by the outline data in order from upper-level outline data into black pixels or white pixels. It is connected.

【００１６】次に、図８のＯＣＲにおける処理（１）〜
（３）を説明する。（１）画像格納部１及び文字切出し部２は、第１の実
施形態と同様の処理を行う。（２）文字判別部３Ａにおける処理図１０は、輪郭テーブルＳ３Ａｂを示す図である。この
輪郭テーブルＳ３Ａｂは、検出された通常輪郭データの
番号（検出順を表す）Ｎ（図１では、１，２，３）と、
該通常輪郭データの文字画像の左上を原点とした座標デ
ータＤと、後述の順位Ｍとから構成されている。この順
位Ｍの初期値は、１とする。文字判別部３Ａは、文字切
出部２から文字領域Ｓ２を受取り、第１の実施形態と同
様に文字の判別を行い、判別結果Ｓ３Ａａ及び図１０に
示す輪郭テーブルＳ３Ａｂを画像補正部８へ送出する。Next, processing (1) to OCR in FIG.
(3) will be described. (1) The image storage unit 1 and the character cutout unit 2 perform the same processing as in the first embodiment. (2) Processing in the Character Discrimination Unit 3A FIG. 10 is a diagram showing the outline table S3Ab. The outline table S3Ab includes the number (indicating the detection order) N of the detected normal outline data (1, 2, 3, 3 in FIG. 1),
It is composed of coordinate data D having the origin at the upper left of the character image of the normal outline data, and a rank M described later. The initial value of the order M is 1. The character discriminating unit 3A receives the character area S2 from the character clipping unit 2, performs character discrimination in the same manner as in the first embodiment, and sends the discrimination result S3Aa and the contour table S3Ab shown in FIG. I do.

【００１７】（３）画像補正部８における処理画像補正部８は、文字切出部２から文字領域Ｓ２を受取
ると共に、文字判別部３Ａから判別結果Ｓ３Ａａ及び輪
郭テーブルＳ３Ａｂを受取り、該判別結果Ｓ３Ａａが通
常文字を示す場合は文字領域Ｓ２をそのまま文字認識部
４へ送出する。一方、判別結果Ｓ３Ａａが白抜き文字を
示す場合は、画像補正部８は、文字領域Ｓ２中の文字が
通常文字と同等になるように補正する。次に、この補正
の方法を説明する。画像補正部８中の輪郭矩形検出部８
ｂでは、各輪郭データ毎にｘ座標の最小値Ｘｍｉｎ、ｘ
座標の最大値Ｘｍａｘ、ｙ座標の最小値Ｙｍｉｎ、及び
ｙ座標の最大値Ｙｍａｘの４要素を求める。(3) Processing in the image correction unit 8 The image correction unit 8 receives the character area S2 from the character cutout unit 2, receives the determination result S3Aa and the outline table S3Ab from the character determination unit 3A, and determines the determination result S3Aa. Indicates a normal character, the character area S2 is sent to the character recognition unit 4 as it is. On the other hand, when the determination result S3Aa indicates a blank character, the image correction unit 8 corrects the character in the character area S2 so that the character is the same as a normal character. Next, a method of this correction will be described. Outline rectangle detection unit 8 in image correction unit 8
b, the minimum value Xmin, x of the x coordinate for each contour data
Four elements of a maximum value Xmax of the coordinate, a minimum value Ymin of the y coordinate, and a maximum value Ymax of the y coordinate are obtained.

【００１８】図１１は、図６（ａ）の輪郭データの輪郭
矩形の一例を示す図である。この図１１中の各輪郭矩形
を比較して該輪郭矩形の順位付けを行う。この順位付け
の方法は、或る輪郭矩形と他の輪郭矩形とを比較し、一
方の輪郭矩形が他方の輪郭矩形を完全に囲むとき、この
囲んだ方の輪郭データを上位とし、囲まれた方の輪郭デ
ータを下位に位置付け、下位に位置付けられた輪郭デー
タの輪郭テーブルＳ３Ａｂ上の順位Ｍを１だけ下げる。
この囲むか否かの判定は、上記４要素を比較することに
より行う。例えば、図１１における第１の輪郭データｄ
１と第２の輪郭データｄ２とを比較する時、次式（３）
の関係が成り立てば、第２の輪郭データｄ２が下位であ
ると判定し、該第２の輪郭データｄ２の輪郭テーブルＳ
３Ａｂ上の順位Ｍを１だけ下げる。第１の輪郭データｄ１第２の輪郭データｄ２Ｘｍｉｎ＜ＸｍｉｎＸｍａｘ＞ＸｍａｘＹｍｉｎ＜ＹｍｉｎＹｍａｘ＞Ｙｍａｘ・・・（３）輪郭矩形検出部８ｂは、これらの順位付けの動作を全て
の輪郭データ同士で行って順位を決定し、輪郭テーブル
Ｓ３Ａｂと文字領域Ｓ２を補正画像作成部８ｃに送出す
る。補正画像作成部８ｃでは、文字領域Ｓ２上の画素を
上位の輪郭データから順に該輪郭データで囲まれた領域
の画素を変換する。この変換は、図１２に示すように、
奇数順位の輪郭データの場合は黒画素に変換し、偶数順
位の輪郭データの場合は白画素に変換する。図１２
（ａ），（ｂ），（ｃ）は、画像補正の過程を示す図で
ある。図１２（ａ）に示す文字画像の場合、まず図１１
中の第１位の輪郭データｄ１で囲まれた文字画像上での
領域を黒画素に変換することにより、図１２（ｂ）に示
す画像が得られる。次に、図１１中の第２位の輪郭デー
タｄ２及びｄ３で囲まれた領域を白画素に変換すること
により、、図１２（ｃ）に示す補正画像Ｓ８ｃが作成さ
れる。補正画像Ｓ８ｃは、文字認識部４に送出されて認
識処理される。以上のように、この第３の実施形態で
は、白抜き文字が存在する文書を認識対象とした場合で
も、文字判別部３Ａによって文字が通常文字か白抜き文
字かを自動的に判別し、判別結果Ｓ３Ａａが白抜き文字
を示す場合には、文字線が黒画素のみで構成される通常
文字のように補正してから認識処理を行うようにしたの
で、文字認識部４において従来からの通常文字の認識辞
書のみで通常文字の認識を従来通り行い、且つ白抜き文
字を正しく認識できる。FIG. 11 is a diagram showing an example of an outline rectangle of the outline data shown in FIG. The contour rectangles in FIG. 11 are compared to rank the contour rectangles. This ranking method compares a certain contour rectangle with another contour rectangle, and when one contour rectangle completely surrounds the other contour rectangle, the enclosed contour data is ranked higher and the The lower contour data is positioned lower, and the rank M of the lower positioned contour data on the contour table S3Ab is lowered by one.
The determination as to whether or not to enclose is made by comparing the above four elements. For example, the first contour data d in FIG.
When comparing 1 with the second contour data d2, the following equation (3) is used.
Is established, it is determined that the second contour data d2 is lower order, and the contour table S of the second contour data d2 is determined.
The rank M on the 3 Ab is lowered by one. First contour data d1 Second contour data d2 Xmin <Xmin Xmax> Xmax Ymin <Ymin Ymax> Ymax (3) The contour rectangle detecting unit 8b performs these ranking operations on all the contour data. Then, the order is determined, and the outline table S3Ab and the character area S2 are sent to the corrected image creating unit 8c. The corrected image creating unit 8c converts the pixels in the character area S2 in the area surrounded by the contour data in order from the upper contour data. This conversion is performed as shown in FIG.
The odd-numbered contour data is converted to black pixels, and the even-numbered contour data is converted to white pixels. FIG.
(A), (b), (c) is a figure which shows the process of an image correction. In the case of the character image shown in FIG.
By converting the region on the character image surrounded by the first-order outline data d1 into black pixels, the image shown in FIG. 12B is obtained. Next, a region surrounded by the second-order contour data d2 and d3 in FIG. 11 is converted into white pixels, whereby a corrected image S8c shown in FIG. 12C is created. The corrected image S8c is sent to the character recognizing unit 4 and subjected to a recognition process. As described above, in the third embodiment, even when a document in which white characters are present is to be recognized, the character determination unit 3A automatically determines whether the character is a normal character or a white character, and determines the character. When the result S3Aa indicates an outline character, the character line is corrected as if it were a normal character composed of only black pixels, and then the recognition process was performed. , The ordinary characters can be recognized as before, and white characters can be correctly recognized.

【００１９】尚、本発明は上記実施形態に限定されず、
種々の変形が可能である。その変形例としては、例えば
次のようなものがある。（ａ）実施形態では、文字の判別及び輪郭データの作
成のために輪郭追跡法を用いているが、これに限らず、
通常輪郭データとループ輪郭データとそれぞれの個数を
検出できる方法であれば、他の方法でもよい。（ｂ）実施形態では、文字の判別を行うための閾値Ｔ
ＨＬを２にしているが、通常文字と白抜き文字とを判別
できる閾値であれば、他の値でもよい。（ｃ）画像格納部１の入力信号である２値画像イメー
ジＳＴは、例えばスキャナやファクシミリ等の画像読取
装置による文字の読取り信号を２値化した信号でもよ
い。The present invention is not limited to the above embodiment,
Various modifications are possible. For example, there are the following modifications. (A) In the embodiment, the contour tracing method is used for character discrimination and creation of contour data.
Other methods may be used as long as they can detect the numbers of normal contour data and loop contour data. (B) In the embodiment, the threshold value T for performing character determination
Although HL is set to 2, any other value may be used as long as it is a threshold value that can distinguish between a normal character and an outline character. (C) The binary image ST that is an input signal of the image storage unit 1 may be a signal obtained by binarizing a character reading signal by an image reading device such as a scanner or a facsimile.

【００２０】[0020]

【発明の効果】以上詳細に説明したように、第１及び第
２の発明によれば、白抜き文字が含まれる文書を認識対
象とした場合でも、文字判別部で文字が通常文字か白抜
き文字かを判別し、誤読となる可能性の高い白抜き文字
を不読にするようにしたので、認識精度が低下すること
がない。更に、リジェクト（不読）として認識結果を出
力するようにしたので、後段の修正端末による処理にお
いて、オペレータによる認識結果の確認や修正作業等を
容易に行うことができる。第３の発明によれば、予め通
常文字辞書部と白抜き文字辞書部とを用意しておき、白
抜き文字が存在する文書を認識対象とした場合でも、文
字判別部によって文字が通常文字か白抜き文字かを判別
し、辞書選択部で文字判別部の判別結果に応じた最適な
辞書を選択して文字認識部へ送出するようにしたので、
通常文字の認識精度を低下させずに、白抜き文字を正し
く認識できる。第４及び第５の発明によれば、白抜き文
字が存在する文書を認識対象とした場合でも、文字判別
部によって文字が通常文字か白抜き文字かを自動的に判
別し、判別結果が白抜き文字を示す場合には、文字線が
黒画素のみで構成される通常文字のように補正してから
認識処理を行うようにしたので、文字認識部において従
来からの通常文字の認識辞書のみで通常文字の認識を従
来通り行うと共に、白抜き文字を正しく認識できる。As described in detail above, according to the first and second aspects of the present invention, even if a document including white characters is to be recognized, the character discrimination unit determines whether the character is a normal character or a white character. Since it is determined whether the character is a character and white characters that are likely to be erroneously read are not read, the recognition accuracy does not decrease. Further, since the recognition result is output as a reject (unread), the operator can easily confirm the recognition result, perform a correction operation, and the like in the processing by the correction terminal in the subsequent stage. According to the third invention, a normal character dictionary and a white character dictionary are prepared in advance, and even if a document in which white characters exist is to be recognized, the character discriminating unit determines whether the character is a normal character. Since it is determined whether the character is a blank character, the dictionary selection unit selects the optimal dictionary according to the determination result of the character determination unit and sends it to the character recognition unit.
White characters can be correctly recognized without lowering the recognition accuracy of normal characters. According to the fourth and fifth aspects of the invention, even when a document in which white characters are present is to be recognized, the character determination unit automatically determines whether the character is a normal character or a white character, and determines that the character is white. In the case of indicating a blank character, the character line is corrected like a normal character composed of only black pixels, and then the recognition process is performed. Therefore, the character recognition unit uses only a conventional normal character recognition dictionary. Normal characters can be recognized as usual, and white characters can be correctly recognized.

[Brief description of the drawings]

【図１】本発明の第１の実施形態のＯＣＲの構成図であ
る。FIG. 1 is a configuration diagram of an OCR according to a first embodiment of the present invention.

【図２】通常文字の例を示す図である。FIG. 2 is a diagram illustrating an example of a normal character.

【図３】白抜き文字の例を示す図である。FIG. 3 is a diagram illustrating an example of outline characters.

【図４】輪郭追跡法を説明する図である。FIG. 4 is a diagram illustrating a contour tracking method.

【図５】通常文字の輪郭データを示す図である。FIG. 5 is a diagram showing outline data of a normal character.

【図６】白抜き文字の輪郭データを示す図である。FIG. 6 is a diagram showing outline data of outline characters.

【図７】本発明の第２の実施形態のＯＣＲの構成図であ
る。FIG. 7 is a configuration diagram of an OCR according to a second embodiment of the present invention.

【図８】本発明の第３の実施形態のＯＣＲの構成図であ
る。FIG. 8 is a configuration diagram of an OCR according to a third embodiment of the present invention.

【図９】図８中の画像補正部８の構成図である。9 is a configuration diagram of an image correction unit 8 in FIG.

【図１０】輪郭テーブルＳ３Ａｂを示す図である。FIG. 10 is a diagram showing an outline table S3Ab.

【図１１】輪郭矩形を示す図である。FIG. 11 is a diagram showing a contour rectangle.

【図１２】画像補正の過程を示す図である。FIG. 12 is a diagram illustrating a process of image correction.

[Explanation of symbols]

Ｔ媒体１画像格納部２文字切出部３文字判別部４，４Ａ文字認識部５辞書選択部６通常文字辞書部７白抜き文字辞書部８画像補正部８ａ切換部８ｂ輪郭矩形検出部８ｃ補正画像作成部 T medium 1 Image storage unit 2 Character extraction unit 3 Character discrimination unit 4, 4A character recognition unit 5 Dictionary selection unit 6 Normal character dictionary unit 7 White character dictionary unit 8 Image correction unit 8a Switching unit 8b Outline rectangle detection unit 8c Correction Image creation section

Claims

[Claims]

An image storage unit for inputting and storing a binary image image of a medium on which characters are written, and a character extracting unit for extracting a character area from the binary image image stored in the image storage unit And a character recognition unit that recognizes a character in the character region, wherein the character in the cut-out character region has a character line forming the character formed of only black pixels. A character discriminating unit for discriminating whether the character is a normal character or an outline character composed of black pixels only at the edges of the character line is provided.If the judgment result of the character identification unit indicates the outline character, A character recognition device, wherein the character is not recognized.

2. An image storage unit for inputting and storing a binary image image of a medium on which characters are written, and a character extracting unit for extracting a character area from the binary image image stored in the image storage unit The character in the cut-out character area is a normal character in which a character line forming the character is composed of only black pixels or a white character in which only the edge of the character line is composed of black pixels. A character determination unit that determines whether the character is a character, a normal character dictionary unit storing a recognition dictionary suitable for recognition of the normal character, and a white character dictionary unit storing a recognition dictionary suitable for recognition of the white character, A dictionary selection unit that selects the normal character dictionary unit or the outline character dictionary unit based on the determination result of the character determination unit; and recognizes characters from the character region using a recognition dictionary selected by the dictionary selection unit. It has a character recognition unit. Character recognition device.

3. The character discriminating unit includes: a normal contour score obtained by counting pixels constituting a contour outside a black pixel region of a character in the cut-out character region; The number of loop contour points obtained by counting the pixels constituting the inner contour is detected, and the ratio between the normal contour points and the loop contour points is compared with a predetermined threshold value. 3. A structure for determining whether the character is a white character or the white character.
Character recognition device according to the description.

4. An image storage unit for inputting and storing a binary image image of a medium on which characters are written, and a character extracting unit for extracting a character area from the binary image image stored in the image storage unit The character in the cut-out character area is a normal character in which a character line forming the character is composed of only black pixels or a white character in which only the edge of the character line is composed of black pixels. A character discriminating unit for discriminating whether the character is a character, when the discrimination result of the character discriminating unit indicates the normal character
An image correction unit that outputs the characters in the cut-out character area as they are and corrects and outputs the characters in the cut-out character area to the normal characters when the determination result indicates the outlined characters; A character recognition unit that recognizes a character output from the image correction unit.

5. The normal contour data indicating the number of normal contour points obtained by counting the pixels constituting the contour outside the black pixel area of the character in the cut-out character area and the coordinates of the pixel. And the number of loop contour points obtained by counting the pixels constituting the contour inside the black pixel region of the character, and comparing the ratio between the normal contour points and the loop contour points with a predetermined threshold value. 5. The character recognition device according to claim 4, wherein the character is determined to be the normal character or the outline character based on a result, and the normal outline data is output.

6. An image storage unit for inputting and storing a binary image image of a medium on which characters are written, and a character extracting unit for extracting a character area from the binary image image stored in the image storage unit And a character discriminator according to claim 5, wherein when the discrimination result of the character discriminator indicates the normal character,
An image correction unit that outputs the characters in the cut-out character area as they are and corrects and outputs the characters in the cut-out character area to the normal characters when the determination result indicates the outlined characters; A character recognition unit that recognizes a character output from the image correction unit, wherein the image correction unit receives the character region from the character cutout unit, and determines the determination result from the character determination unit and the normal result. When the determination result indicates the normal character, the character area is sent to the character recognition unit, and when the determination result indicates the outline character,
A switching unit that sends the normal outline data and the character area to an outline rectangle detection unit; and a minimum rectangle surrounding the outline is calculated from the normal outline data, and the calculation is performed in an XY coordinate system including orthogonal X and Y axes. As a result of comparing the coordinate values specifying the position and size of each of the rectangles, when a certain rectangle completely surrounds another rectangle, the normal outline data forming this enclosed rectangle is set as the higher order, and the The contour rectangle detecting unit that ranks the respective contour data by positioning the regular contour data forming the smaller rectangle in the lower position, and the regular contour data ranked in the contour rectangle detecting unit and the regular contour data. A character area is received, and pixels in each area surrounded by each normal contour data in the character area are converted to black pixels when the order is odd, and when the order is even, White By performing the process of converting the element from the host in order, and a corrected image creating unit for transmitting a character in the character region is corrected to the normal character to the character recognition unit,
A character recognition device comprising: