JPH06231306A - Character recognition device - Google Patents

Character recognition device

Info

Publication number
JPH06231306A
JPH06231306A JP5017245A JP1724593A JPH06231306A JP H06231306 A JPH06231306 A JP H06231306A JP 5017245 A JP5017245 A JP 5017245A JP 1724593 A JP1724593 A JP 1724593A JP H06231306 A JPH06231306 A JP H06231306A
Authority
JP
Japan
Prior art keywords
character
area
primary
recognition
determination unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP5017245A
Other languages
Japanese (ja)
Inventor
Noboru Nakamura
昇 中村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Holdings Corp
Original Assignee
Matsushita Electric Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matsushita Electric Industrial Co Ltd filed Critical Matsushita Electric Industrial Co Ltd
Priority to JP5017245A priority Critical patent/JPH06231306A/en
Publication of JPH06231306A publication Critical patent/JPH06231306A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To provide a character recognition device by which a document including a dot character can easily be recognized and which is superior in operability. CONSTITUTION:A primary character area judgement part 1 obtaining the circumscribing rectangle of a connected graphic from a primary binary picture obtained by reading the document by a scanner and judging a character area with the size, a dot area extraction part 2 extracting a dot area from the rate of the change point of black and white as against the whole size from a non-character area, a secondary character area judgement part 3 judging the character area from a secondary binary picture obtained by reading palely the dot area by the scanner in the same way as the primary character area judgement part 1, a character segment part 5 segmenting a character pattern from primary/ secondary character areas, a character characteristic extraction part 6 extracting a characteristic from the character pattern, a recognition certainty calculation part 8 comparing the character characteristic with a character characteristic dictionary 7 storing all the character characteristics and obtaining certainty and a recognition character decision part 9 deciding the recognized character from recognition certainty are provided.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は文書等を読み込んでその
文字を対応する文字コードに変換する文字認識装置であ
って、文字の上に網点をかけたもの(以下網点文字と呼
ぶ)を含んだ文書を文字認識することのできる文字認識
装置に関するものである。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for reading a document or the like and converting the character into a corresponding character code, in which a dot is applied to the character (hereinafter referred to as a dot character). The present invention relates to a character recognition device capable of character-recognizing a document including a character.

【0002】[0002]

【従来の技術】近年、ワードプロセッサ等の普及に伴
い、文書中の文字を強調するため等に、網点文字が頻繁
に用いられるようになり、この網点文字を含む文書を認
識することのできる文字認識装置の開発が行われてい
る。
2. Description of the Related Art In recent years, with the widespread use of word processors and the like, halftone dot characters have come to be frequently used for emphasizing characters in a document, and a document including the halftone dot characters can be recognized. Character recognition devices are being developed.

【0003】以下に従来の文字認識装置について説明す
る。図3は網点文字を含む認識対象文書を示す図であ
り、図4は図3に示した認識対象文書をスキャナから通
常の濃度で読み込んだときの2値画像を示す図であり、
図5は図3に示した認識対象文書をスキャナから薄い濃
度で読み込んだときの2値画像を示す図である。
A conventional character recognition device will be described below. 3 is a diagram showing a recognition target document including halftone dots, and FIG. 4 is a diagram showing a binary image when the recognition target document shown in FIG. 3 is read from a scanner at a normal density.
FIG. 5 is a diagram showing a binary image when the recognition target document shown in FIG. 3 is read from a scanner with a light density.

【0004】従来の文字認識装置によって、図3に示す
ような認識対象文書を文字認識しようとする場合に、こ
れをスキャナから通常の濃度で読み込むと、図4に示す
ような2値画像が得られ、通常文字部分は認識可能であ
るが、網点文字部分が認識不可能となる。一方、これを
スキャナから薄い濃度で読み込むと、図5に示すような
2値画像が得られ、網点文字部分は認識可能であるが、
通常文字部分が認識不可能となってしまう。
When the conventional character recognition apparatus attempts to character-recognize a document to be recognized as shown in FIG. 3, when it is read with a normal density from a scanner, a binary image as shown in FIG. 4 is obtained. Therefore, the normal character portion can be recognized, but the halftone dot character portion cannot be recognized. On the other hand, if the image is read from the scanner at a low density, a binary image as shown in FIG. 5 is obtained, and the halftone dot character portion can be recognized.
The normal character part becomes unrecognizable.

【0005】そこで、従来の文字認識装置で、図3に示
すような網点文字を含む認識対象文書を文字認識する場
合には、まず、認識対象文書をスキャナから通常の濃度
で読み込ませて通常文字部分を文字認識させ、次に、利
用者がスキャナの濃度を薄く調整して、認識対象文書を
再度スキャナから薄く読み込ませて網点文字部分を文字
認識させ、次に、利用者が2つの認識結果を統合するこ
とによって、文字認識作業を行っている。
Therefore, in the case of character recognition of a recognition target document including halftone dots as shown in FIG. 3 by the conventional character recognition device, first, the recognition target document is read from a scanner at a normal density and then the normal recognition is performed. The character portion is character-recognized, then the user adjusts the density of the scanner lightly, the document to be recognized is read again from the scanner lightly, and the halftone dot character portion is character-recognized. Character recognition work is performed by integrating the recognition results.

【0006】[0006]

【発明が解決しようとする課題】しかしながら上記従来
の構成では、網点文字を含む認識対象文書を文字認識す
る場合に、認識対象文書をスキャナから通常の濃度で読
み込ませた後に、利用者がスキャナの濃度を薄く調整し
て再度認識対象文書をスキャナから読み込ませ、この2
つの認識結果を統合しなければならず、煩雑で手数が掛
かり作業性に欠けるという問題点があった。
However, in the above-mentioned conventional configuration, when the recognition target document including halftone characters is character-recognized, after the recognition target document is read from the scanner at a normal density, the user scans it. Adjust the density of the image to be thin and read the document to be recognized again from the scanner.
Since the two recognition results have to be integrated, there has been a problem that it is complicated, laborious, and lacks in workability.

【0007】本発明は上記従来の問題点を解決するもの
で、網点文字を含む認識対象文書を容易に文字認識する
ことのできる作業性に優れた文字認識装置を提供するこ
とを目的とする。
An object of the present invention is to solve the above-mentioned conventional problems, and an object thereof is to provide a character recognition apparatus having excellent workability, which can easily recognize a recognition target document including halftone characters. .

【0008】[0008]

【課題を解決するための手段】この目的を達成するため
に本発明の文字認識装置は、認識対象文書を読み込んで
2値画像を出力する際に読み込むときの濃度を調整可能
及び/または認識対象文書を多値データとして読み込む
ことが可能なスキャナと、認識対象文書を前記スキャナ
より通常の濃度で読み込むか、または多値データとして
読み込んで通常の閾値により2値化して出力される一次
2値画像から連結図形の外接矩形を求めてその外接矩形
の大きさによって一次文字領域を判定する一次文字領域
判定部と、前記一次文字領域判定部によって判定された
非文字領域から全体の大きさに対する白黒の変化点の割
合によって網点領域を抽出する網点領域抽出部と、前記
網点領域抽出部によって抽出された網点領域部分の認識
対象文書を前記スキャナより通常の濃度よりも薄い濃度
で再度読み込むかまたは網点領域部分の読み込み済の多
値データを通常の閾値よりも薄く設定した閾値によって
2値化して出力される二次2値画像から前記一次文字領
域判定部と同様にして二次文字領域を判定する二次文字
領域判定部と、前記一次文字領域判定部で判定された一
次文字領域と前記二次文字領域判定部で判定された二次
文字領域から外接矩形の大きさ,位置によって文字パタ
ーンを切り出す文字切り出し部と、前記文字切り出し部
で切り出された文字パターンから文字特徴を抽出する文
字特徴抽出部と、予め全ての文字の文字特徴を記憶した
文字特徴辞書と、前記文字特徴抽出部で抽出された文字
特徴と前記文字特徴辞書とを比較して文字候補,類似度
等の認識確度を求める認識確度計算部と、前記認識確度
計算部で求められた認識確度から認識文字を決定する認
識文字決定部とを備えた構成を有している。
In order to achieve this object, the character recognition apparatus of the present invention is capable of adjusting the density when reading a document to be recognized and outputting a binary image and / or the object to be recognized. A scanner capable of reading a document as multivalued data, and a primary binary image which is read by the scanner with a normal density from the scanner, or read as multivalued data and binarized by a normal threshold value and output. A primary character area determination unit that determines a circumscribed rectangle of the connected figure from the size of the circumscribed rectangle and a primary character area determination unit that determines the primary character area from the non-character area determined by the primary character area determination unit, The halftone dot area extracting unit for extracting the halftone dot area according to the change point ratio, and the recognition target document of the halftone dot area portion extracted by the halftone dot area extracting unit are From the secondary binary image that is read again at a density lower than the normal density than the scanner, or the read multi-valued data of the halftone dot area is binarized by the threshold set to be lighter than the normal threshold and output. A secondary character area determination unit that determines a secondary character area in the same manner as the primary character area determination unit, a primary character area determined by the primary character area determination unit, and a secondary character area determination unit by the secondary character area determination unit A character cutout unit that cuts out a character pattern according to the size and position of a circumscribing rectangle from a secondary character region, a character feature extraction unit that extracts character features from the character pattern cut out by the character cutout unit, and characters of all characters in advance. Recognition accuracy calculation for obtaining recognition accuracy of character candidates, similarity, etc. by comparing the character feature dictionary storing the features with the character features extracted by the character feature extraction unit and the character feature dictionary If has a configuration that includes a recognition character determining unit that determines the recognized character from the recognition accuracy obtained in the recognition accuracy calculation unit.

【0009】[0009]

【作用】この構成によって、一次文字領域判定部で一次
文字領域を判定し、網点領域抽出部で抽出された網点領
域を、二次文字領域判定部がスキャナから薄い濃度で再
度読み込むか、または読み込み済の多値データを薄く設
定した閾値によって2値化して、出力される二次2値画
像から、一次文字領域判定部と同様にして二次文字領域
を判定し、一次文字領域及び二次文字領域を文字認識す
ることにより、網点文字を含む認識対象文書を容易かつ
自動的に文字認識することができる。
With this configuration, the primary character area determination unit determines the primary character area, and the secondary character area determination unit rereads the halftone dot area extracted by the halftone dot area extraction unit with a light density from the scanner. Alternatively, the read multi-valued data is binarized by a threshold value that is set lightly, the secondary character area is determined from the output secondary binary image in the same manner as the primary character area determination unit, and the primary character area and the secondary character area are determined. By recognizing the character in the next character area, it is possible to easily and automatically recognize the recognition target document including halftone characters.

【0010】[0010]

【実施例】以下本発明の一実施例における文字認識装置
について、図面を参照しながら説明する。図1は本発明
の一実施例における文字認識装置の構成図である。1は
スキャナ(図示せず)から認識対象文書を通常の濃度で
読み込んで入力される一次2値画像から連結図形の外接
矩形を求めて外接矩形をその距離,大きさにより統合し
統合後の外接矩形の大きさを基にして一次文字領域,非
文字領域を判定する一次文字領域判定部、2は一次文字
領域判定部1で判定された非文字領域から全体の大きさ
と白黒の変化点の割合によって網点領域を抽出する網点
領域抽出部、3は網点領域抽出部2で抽出された網点領
域部分の認識対象文書をスキャナ(図示せず)から通常
の濃度より薄い濃度で再度読み込んで得られる二次2値
画像から一次文字領域判定部1と同様にして二次文字領
域,非文字領域を判定する二次文字領域判定部、4は一
次文字領域判定部1で判定された一次文字領域及び二次
文字領域判定部3で判定された二次文字領域中の外接矩
形の縦方向,横方向の射影をとって文字行を抽出する文
字行抽出部、5は文字行抽出部4で決定された文字行
幅,外接矩形の大きさから文字の大きさを推定しこの文
字の大きさを基準に外接矩形をその大きさ,距離によっ
てノイズ成分を除いてマージして文字パターンとして切
り出す文字切り出し部、6は文字切り出し部5で切り出
された文字パターンから文字特徴を抽出する文字特徴抽
出部、7は予め全ての文字の文字特徴を記憶している文
字特徴辞書、8は文字特徴抽出部6で抽出された文字特
徴と文字特徴辞書7とを比較して文字候補及び類似度等
の認識確度を求める認識確度計算部、9は認識確度計算
部8で求められた認識確度から認識文字を決定する認識
文字決定部である。
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A character recognition device according to an embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a character recognition device in one embodiment of the present invention. Reference numeral 1 is a circumscribed circumscribed rectangle obtained by reading a recognition target document with a normal density from a scanner (not shown) to obtain a circumscribed rectangle of a connected figure from a primary binary image and integrating the circumscribed rectangles according to their distances and sizes. A primary character area determination unit that determines a primary character area and a non-character area based on the size of a rectangle, 2 is a ratio of the overall size and a black-and-white change point from the non-character area determined by the primary character area determination unit 1. A halftone dot area extracting unit 3 for extracting a halftone dot area by means of a scanner (not shown) rereads the recognition target document of the halftone dot area portion extracted by the halftone dot area extracting unit 2 at a density lower than the normal density. The secondary character area determination unit 4 that determines a secondary character area and a non-character area from the secondary binary image obtained in step 1 in the same manner as the primary character area determination unit 1, and the primary character area determination unit 4 determines the primary character area determination unit 1. Character area and secondary character area judgment The character line extraction unit 5 that extracts the character lines by projecting the vertical and horizontal projections of the circumscribed rectangle in the secondary character area determined in step 3 is the character line width and circumscribed line determined by the character line extraction unit 4. A character cutout unit that estimates the size of a character from the size of the rectangle and merges the circumscribing rectangle based on the size of the character to remove noise components according to the size and distance, and cuts out as a character pattern, 6 is a character cutout unit A character feature extraction unit for extracting character features from the character pattern cut out in 5, a character feature dictionary 7 in which character features of all characters are stored in advance, and 8 a character feature extracted by the character feature extraction unit 6. A recognition accuracy calculation unit that compares the character feature dictionary 7 with the recognition accuracy such as character candidates and similarity, and a recognition character determination unit 9 that determines a recognition character from the recognition accuracy calculated by the recognition accuracy calculation unit 8. .

【0011】以上のように構成された本発明の一実施例
における文字認識装置について、以下その動作を説明す
る。図2は本発明の一実施例における文字認識装置のフ
ローチャートである。初めに、一次文字領域判定部1に
よって、スキャナ(図示せず)から認識対象文書を通常
の濃度で読み込んで入力される一次2値画像から、連結
図形の外接矩形を求め、外接矩形をその距離,大きさに
より統合し、統合後の外接矩形の大きさを基にして、一
次文字領域,非文字領域を判定する(S1)。次に、網
点領域抽出部2によって、S1で判定された非文字領域
から、全体の大きさと白黒の変化点の割合によって、網
点領域を抽出する(S2)。次に、二次文字領域判定部
3によって、S2で抽出された網点領域部分の認識対象
文書を、スキャナ(図示せず)から通常の濃度より薄い
濃度で、再度読み込んで入力された二次2値画像から連
結図形の外接矩形を求め、外接矩形をその距離,大きさ
により統合し、統合後の外接矩形の大きさを基にして二
次文字領域,非文字領域を判定する(S3)。次に、文
字行抽出部4によって、S1で判定した一次文字領域及
びS3で判定した二次文字領域に対して、外接矩形の縦
方向,横方向のヒストグラムをとり、文字行を抽出する
(S4)。次に、文字切り出し部5によって、S4で抽
出した文字行幅,外接矩形の大きさの分布から、文字の
大きさを推定する(S5)。次に、文字切り出し部5に
よって、S5で推定した文字の大きさを基準に、外接矩
形をその大きさ,距離からノイズ成分を除いてマージし
て、文字パターンとして切り出す(S6)。次に、文字
特徴抽出部6によって、S6で切り出した文字パターン
から文字特徴を抽出する(S7)。次に、認識確度計算
部8によって、S7で抽出した文字特徴と文字特徴辞書
7とを比較して、文字候補,類似度等の認識確度を求め
る(S8)。次に、認識文字決定部9によって、S8で
求められた文字候補のうち類似度が最も高いものを認識
文字として決定する(S9)。
The operation of the character recognition device having the above-described structure according to the embodiment of the present invention will be described below. FIG. 2 is a flowchart of the character recognition device in one embodiment of the present invention. First, the primary character area determination unit 1 obtains a circumscribed rectangle of a connected figure from a primary binary image input by reading a recognition target document with a normal density from a scanner (not shown), and determines the circumscribed rectangle by the distance. , The primary character area and the non-character area are determined based on the size of the circumscribed rectangle after integration (S1). Next, the halftone dot area extraction unit 2 extracts a halftone dot area from the non-character area determined in S1 based on the overall size and the ratio of black and white change points (S2). Next, the secondary character area determination unit 3 reads the recognition target document of the halftone dot area portion extracted in S2 again with a density lower than normal density from a scanner (not shown) and inputs the secondary character. The circumscribing rectangle of the connected figure is obtained from the binary image, the circumscribing rectangle is integrated according to its distance and size, and the secondary character area and the non-character area are determined based on the size of the circumscribing rectangle after integration (S3). . Next, the character line extraction unit 4 takes a histogram of the circumscribing rectangle in the vertical and horizontal directions for the primary character region determined in S1 and the secondary character region determined in S3, and extracts the character line (S4). ). Next, the character cutout unit 5 estimates the character size from the distribution of the character line width and the size of the circumscribing rectangle extracted in S4 (S5). Next, the character slicing unit 5 merges the circumscribed rectangles by removing the noise component from the size and distance based on the character size estimated in S5, and cuts out as a character pattern (S6). Next, the character feature extraction unit 6 extracts character features from the character pattern cut out in S6 (S7). Next, the recognition probability calculation unit 8 compares the character features extracted in S7 with the character feature dictionary 7 to obtain recognition probabilities such as character candidates and similarities (S8). Next, the recognized character determination unit 9 determines the character having the highest degree of similarity among the character candidates obtained in S8 as a recognized character (S9).

【0012】尚、本実施例においては、一次文字領域判
定部1と二次文字領域判定部3で、読み込むときの濃度
を変えて、2回スキャナ(図示せず)より認識対象文書
を読み込んでいるが、スキャナ(図示せず)を、認識対
象文書を多値データとして読み込むことができるものと
し、一次文字領域判定部1で認識対象文書をこのスキャ
ナ(図示せず)から多値データとして読み込み、通常の
閾値により2値化して一次2値画像を得て、二次文字領
域判定部3では一次文字領域判定部1で読み込み済みの
多値データを通常より薄い閾値により2値化して二次2
値画像を得るようにすれば、スキャナ(図示せず)より
認識対象文書を読み込む回数が1回のみとなり、文字認
識作業に要する時間が短縮されるため好ましい。
In this embodiment, the primary character area determination unit 1 and the secondary character area determination unit 3 change the densities at the time of reading and read the document to be recognized twice by a scanner (not shown). However, it is assumed that the scanner (not shown) can read the recognition target document as multivalued data, and the primary character area determination unit 1 reads the recognition target document as multivalued data from this scanner (not shown). , A primary binary image is obtained by binarizing with a normal threshold value, and the secondary character area determination unit 3 binarizes the multi-valued data read by the primary character area determination unit 1 with a threshold value that is thinner than usual to obtain a secondary image. Two
It is preferable to obtain the value image because the number of times the document to be recognized is read by the scanner (not shown) is only once, and the time required for character recognition work is shortened.

【0013】また、二次文字領域から認識された認識文
字に、その文字が網点文字であったことを示す情報を付
加するようにすると、認識結果から、認識対象文書で網
点文字にされていた部分を容易に検知することができ、
認識結果を表示する際等にその部分を網点文字として表
示する等を行うことができ好ましい。
If information indicating that the character was a halftone dot character is added to the recognized character recognized from the secondary character area, the recognition result makes the halftone character in the document to be recognized. It can easily detect the part that was
This is preferable because it is possible to display the recognition result as a halftone dot character when displaying the recognition result.

【0014】[0014]

【発明の効果】以上のように本発明は、一次文字領域判
定部で一次文字領域を判定し、網点領域抽出部で抽出さ
れた網点領域を、二次文字領域判定部がスキャナから薄
い濃度で再度読み込むか、または読み込み済の多値デー
タを薄く設定した閾値によって2値化して、出力される
二次2値画像から、一次文字領域判定部と同様にして二
次文字領域を判定し、一次文字領域及び二次文字領域を
文字認識することにより、網点文字を含む認識対象文書
を容易かつ自動的に文字認識することができる作業性に
優れた文字認識装置を実現できるものである。
As described above, according to the present invention, the primary character area determination unit determines the primary character area, and the secondary character area determination unit detects the halftone dot area extracted by the halftone dot area extraction unit from the scanner. The secondary character area is determined in the same way as the primary character area determination unit from the secondary binary image that is output by reading again with the density or by binarizing the read multi-valued data with a thinly set threshold value. By recognizing characters in the primary character area and the secondary character area, it is possible to realize a character recognizing device with excellent workability that can easily and automatically recognize a recognition target document including halftone characters. .

【図面の簡単な説明】[Brief description of drawings]

【図1】本発明の一実施例における文字認識装置の構成
FIG. 1 is a configuration diagram of a character recognition device according to an embodiment of the present invention.

【図2】本発明の一実施例における文字認識装置のフロ
ーチャート
FIG. 2 is a flowchart of a character recognition device according to an embodiment of the present invention.

【図3】網点文字を含む認識対象文書を示す図FIG. 3 is a diagram showing a recognition target document including halftone characters.

【図4】図3に示した認識対象文書をスキャナから通常
の濃度で読み込んだときの2値画像を示す図
FIG. 4 is a diagram showing a binary image when the recognition target document shown in FIG. 3 is read from a scanner at a normal density.

【図5】図3に示した認識対象文書をスキャナから薄い
濃度で読み込んだときの2値画像を示す図
5 is a diagram showing a binary image when the recognition target document shown in FIG. 3 is read from a scanner with a light density.

【符号の説明】[Explanation of symbols]

1 一次文字領域判定部 2 網点領域抽出部 3 二次文字領域判定部 4 文字行抽出部 5 文字切り出し部 6 文字特徴抽出部 7 文字特徴辞書 8 認識確度計算部 9 認識文字決定部 DESCRIPTION OF SYMBOLS 1 Primary character region determination unit 2 Halftone dot region extraction unit 3 Secondary character region determination unit 4 Character line extraction unit 5 Character cutout unit 6 Character feature extraction unit 7 Character feature dictionary 8 Recognition accuracy calculation unit 9 Recognized character determination unit

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】認識対象文書を読み込んで2値画像を出力
する際に読み込むときの濃度を調整可能及び/または認
識対象文書を多値データとして読み込むことが可能なス
キャナと、認識対象文書を前記スキャナより通常の濃度
で読み込むか、または多値データとして読み込んで通常
の閾値により2値化して出力される一次2値画像から連
結図形の外接矩形を求めてその外接矩形の大きさによっ
て一次文字領域を判定する一次文字領域判定部と、前記
一次文字領域判定部によって判定された非文字領域から
全体の大きさに対する白黒の変化点の割合によって網点
領域を抽出する網点領域抽出部と、前記網点領域抽出部
によって抽出された網点領域部分の認識対象文書を前記
スキャナより通常の濃度よりも薄い濃度で再度読み込む
か、または網点領域部分の読み込み済の多値データを通
常の閾値よりも薄く設定した閾値によって2値化して出
力される二次2値画像から前記一次文字領域判定部と同
様にして二次文字領域を判定する二次文字領域判定部
と、前記一次文字領域判定部で判定された一次文字領域
と前記二次文字領域判定部で判定された二次文字領域か
ら外接矩形の大きさ,位置によって文字パターンを切り
出す文字切り出し部と、前記文字切り出し部で切り出さ
れた文字パターンから文字特徴を抽出する文字特徴抽出
部と、予め全ての文字の文字特徴を記憶した文字特徴辞
書と、前記文字特徴抽出部で抽出された文字特徴と前記
文字特徴辞書とを比較して文字候補,類似度等の認識確
度を求める認識確度計算部と、前記認識確度計算部で求
められた認識確度から認識文字を決定する認識文字決定
部とを備えたことを特徴とする文字認識装置。
1. A scanner capable of adjusting density when reading a recognition target document and outputting a binary image and / or reading the recognition target document as multivalued data, and the recognition target document The circumscribed rectangle of the connected figure is obtained from the primary binary image that is read by the scanner with normal density, or is read as multi-valued data and binarized by the normal threshold value, and the primary character area is determined by the size of the circumscribed rectangle. A primary character area determination unit that determines the primary character area determination unit, a halftone dot area extraction unit that extracts a halftone dot area from the non-character area determined by the primary character area determination unit based on the ratio of black and white change points to the overall size, The document to be recognized in the halftone dot area portion extracted by the halftone dot area extraction unit is read again at a density lower than the normal density by the scanner, or the halftone dot area is read. The secondary character area is determined in the same manner as the primary character area determination unit from the secondary binary image output by binarizing the read multi-valued data of the part with a threshold value set to be thinner than the normal threshold value. A character that cuts out a character pattern according to the size and position of the circumscribed rectangle from the next character area determination unit, the primary character area determined by the primary character area determination unit, and the secondary character area determined by the secondary character area determination unit A cutout unit, a character feature extraction unit that extracts character features from the character pattern cut out by the character cutout unit, a character feature dictionary that stores character features of all characters in advance, and a character feature extraction unit that extracts the character features. A recognition accuracy calculation unit that obtains recognition accuracy such as a character candidate and similarity by comparing a character feature with the character feature dictionary, and a recognized character is determined from the recognition accuracy obtained by the recognition accuracy calculation unit. Character recognition apparatus characterized by comprising a that recognition character determining unit.
JP5017245A 1993-02-04 1993-02-04 Character recognition device Pending JPH06231306A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP5017245A JPH06231306A (en) 1993-02-04 1993-02-04 Character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5017245A JPH06231306A (en) 1993-02-04 1993-02-04 Character recognition device

Publications (1)

Publication Number Publication Date
JPH06231306A true JPH06231306A (en) 1994-08-19

Family

ID=11938571

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5017245A Pending JPH06231306A (en) 1993-02-04 1993-02-04 Character recognition device

Country Status (1)

Country Link
JP (1) JPH06231306A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012109941A (en) * 2010-11-15 2012-06-07 Konica Minolta Laboratory Usa Inc Method for binarizing scanned document image including gray or light color text printed by halftone pattern
US9319556B2 (en) 2011-08-31 2016-04-19 Konica Minolta Laboratory U.S.A., Inc. Method and apparatus for authenticating printed documents that contains both dark and halftone text

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012109941A (en) * 2010-11-15 2012-06-07 Konica Minolta Laboratory Usa Inc Method for binarizing scanned document image including gray or light color text printed by halftone pattern
US8947736B2 (en) 2010-11-15 2015-02-03 Konica Minolta Laboratory U.S.A., Inc. Method for binarizing scanned document images containing gray or light colored text printed with halftone pattern
US9319556B2 (en) 2011-08-31 2016-04-19 Konica Minolta Laboratory U.S.A., Inc. Method and apparatus for authenticating printed documents that contains both dark and halftone text
US9596378B2 (en) 2011-08-31 2017-03-14 Konica Minolta Laboratory U.S.A., Inc. Method and apparatus for authenticating printed documents that contains both dark and halftone text

Similar Documents

Publication Publication Date Title
US7751648B2 (en) Image processing apparatus, image processing method, and computer program
JP4031210B2 (en) Character recognition device, character recognition method, and recording medium
EP0381773B1 (en) Character recognition apparatus
JP2002208007A (en) Automatic detection of scanned document
US5625710A (en) Character recognition apparatus using modification of a characteristic quantity
JP2002015280A (en) Device and method for image recognition, and computer- readable recording medium with recorded image recognizing program
JP3215163B2 (en) Ruled line identification method and area identification method
JPH06231306A (en) Character recognition device
KR0186172B1 (en) Character recognition apparatus
JP4731748B2 (en) Image processing apparatus, method, program, and storage medium
JPH0749926A (en) Character recognizing device
JP2002056356A (en) Character recognizing device, character recognizing method, and recording medium
KR910007032B1 (en) A method for truncating strings of characters and each character in korean documents recognition system
JPH0916715A (en) Character recognition system and method therefor
JP3193573B2 (en) Character recognition device with brackets
JP2978801B2 (en) Character input method for handwritten character recognition
JPH05174185A (en) Japanese character recognizing device
JP2993252B2 (en) Homomorphic character discrimination method and apparatus
JPH0535914A (en) Picture inclination detection method
JP2576079B2 (en) Character extraction method
JP2974396B2 (en) Image processing method and apparatus
JPH1185905A (en) Device and method for discriminating font and information recording medium
JPH05174178A (en) Character recognizing method
JPS60122474A (en) Normalizing system
JPH1055411A (en) Font identifying device