JPH01292487A

JPH01292487A - Device for recognizing character

Info

Publication number: JPH01292487A
Application number: JP63122271A
Authority: JP
Inventors: Keiko Abe; 阿部　惠子
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1988-05-19
Filing date: 1988-05-19
Publication date: 1989-11-24

Abstract

PURPOSE:To furthermore improve a character recognizing speed by executing the smoothing processing of a character pattern extracted from image character information. CONSTITUTION:Smoothing processing to invert the data of an objective picture element PEOBJ when a comparing pattern consisting of the objective picture element PEOBJ and its adjacent picture elements PENF1-PENF8 coincides with a smoothing reference pattern PTREF is executed. When the objective picture element PEOBJ is a black character part projecting from a black character part, when white picture elements are mixed with the black character part, when an independent white picture element is mixed in the black character part, or when an independent black picture element is mixed in a white background part, respective parts concerned are removed or embedded in accordance with necessity to furthermore simplify the peripheral rugged shape of a character pattern. Since the identifying accuracy of a character identifying part can be improved, the whole character identifying speed can be furthermore improved.

Description

【発明の詳細な説明】以下の順序で本発明を説明する。[Detailed description of the invention] The present invention will be explained in the following order.

Ａ産業上の利用分野Ｂ発明の概要Ｃ従来の技術り発明が解決しようとする問題点（第２図）Ｅ問題点を
解決するための手段（第１図、第３図、第６図及び第８
図）２作用（第１図及び第８図）Ｇ実施例（Ｇ１）文字認識装置の実施例（第１図〜第１０図）（
Ｇ２）他の実施例Ｈ発明の効果Ａ産業上の利用分野本発明は文字認識装置に関し、特に印刷文字を認識する
場合に通用して好適なものである。A. Industrial field of application B. Overview of the invention C. Conventional technology Problems to be solved by the invention (Figure 2) E. Means for solving the problems (Figures 1, 3, 6 and 6) 8th
Figure) 2 Effects (Figures 1 and 8) G Example (G1) Example of Character Recognition Device (Figures 1 to 10) (
G2) Other Embodiments H Effects of the Invention A Field of Industrial Application The present invention relates to a character recognition device, and is particularly applicable and suitable for recognizing printed characters.

Ｂ発明の概要本発明は、文字認識装置において、イメージ文字情報か
ら抽出された文字パターンをスムージング処理するよう
にしたことにより、文字認識速度を一段と向上させるこ
とができる。B. Summary of the Invention According to the present invention, character recognition speed can be further improved by smoothing a character pattern extracted from image character information in a character recognition device.

Ｃ従来の技術従来、大量の印刷文書を電子化してファイリングしたり
、データベース化して多様な用途に対応できるような情
報網を構築することが提案されており、印刷文書の電子
化手段も人手による入力操作が必要な文字情報入力装置
に代えて、人手による入力操作が不必要な文字認識装置
を用いることが考えられている。C. Conventional technology In the past, it has been proposed to digitize and file a large amount of printed documents, or create a database to build an information network that can be used for a variety of purposes. Instead of character information input devices that require input operations, it has been considered to use character recognition devices that do not require manual input operations.

因は印刷文字認識装置は一般に、印刷文書上の印刷文字
を光学的に読み取って２次元的なイメージ情報として電
子化し、当該イメージ情報から印刷された文字を取り出
して対応する文字コードを出力するようになされている
。The reason is that printed character recognition devices generally optically read the printed characters on a printed document, digitize them as two-dimensional image information, extract the printed characters from the image information, and output the corresponding character code. is being done.

このようにしてイメージ情報を文字コードにコード化す
ることにより、電子計算機を用いて単語を検索したり、
意味を理解したりするような解読処理を自動的に実行す
るような文書処理システムを実現できる。By encoding image information into character codes in this way, you can search for words using a computer,
It is possible to realize a document processing system that automatically executes decoding processing such as understanding meaning.

かくするにつき、例えばイメージ情報から取り出したデ
ータを必要に応じて圧縮して解読処理に供することがで
きることにより、文書の処理速度を必要に応じて向上さ
せることができると考えられる。In this way, for example, it is possible to compress the data extracted from the image information and use it for decoding processing as necessary, thereby making it possible to improve the document processing speed as necessary.

Ｄ発明が解決しようとする問題点ところが現在用いられている印刷漢字認識装置は、実際
上識別率として９７〜９９％程度に高い識別能力をもっ
ているが、識別速度としてはｌＯ〜３０文字／秒程度の
機能をもつに過ぎず、大量の印刷文書を電子化するため
の入力手段としては未だ不十分である。Problems that the invention aims to solveHowever, the currently used printed kanji recognition devices actually have a high recognition ability of about 97 to 99% in terms of recognition rate, but the recognition speed is about 10 to 30 characters/second. However, it is still insufficient as an input means for digitizing a large amount of printed documents.

識別速度を向上させる方法として従来、イメージ情報か
ら切り出された入力情報の特徴（例えば４辺のペリフェ
ラル特徴）に基づいて特＠量を検出して大分類辞書から
候補文字を選出するような大分類処理を実行し、当該候
補文字についてパターンマツチングに基づく細分類処理
を実行することによって最も類億した文字を決定するよ
うな文字認識方法が提案されている（特開昭６２−１８
６３９０号公報）。Conventionally, as a method to improve identification speed, there has been a large classification method in which candidate characters are selected from a large classification dictionary by detecting special @ quantities based on the characteristics of input information extracted from image information (for example, peripheral features on four sides). A character recognition method has been proposed in which the most similar characters are determined by executing processing and subclassifying the candidate characters based on pattern matching (Japanese Patent Application Laid-Open No. 62-18
6390).

このように大分類処理と細分類処理とを階層的に実行す
る文字認識方法は、特に形状、位置等に特殊な特徴がな
い文字（すなわち通常文字）を識別する際には識別能力
を一段と高めることができる利点があることが確認され
ている。In this way, character recognition methods that hierarchically perform major classification processing and subclassification processing further improve identification ability, especially when identifying characters that do not have special characteristics such as shape or position (i.e., normal characters). It has been confirmed that there are advantages that can be achieved.

ところが実際上印刷文書から２次元的なイメージ情報を
得る手段として一般にイメージスキャナが用いられてい
るが、光学的光情報を電気的な２値情報に変換する際に
、印刷文字の周縁部分において細かい凹凸が生ずること
を避は得ない問題がある。However, in practice, image scanners are generally used as a means of obtaining two-dimensional image information from printed documents, but when converting optical light information into electrical binary information, fine details are generated in the peripheral areas of printed characters. There is an unavoidable problem that unevenness occurs.

例えば第２図に示すように、　　「娃」の文字をイメー
ジスキャナによって読み取ることにより得られるイメー
ジ情報から形成した文字パターンＰＴＩＩＯＪＩにおい
て、僅かに右上がりに傾斜する文字部の上側辺又は上側
辺の境界部分に１ドラトル数ドツト分の「白」ドツトを
挟んで１ドラトル数ドツト分の「黒」ドツトを配列した
ような凹凸パターン部分ＤＢＫＯが生ずることが観察さ
れる。For example, as shown in Figure 2, in the character pattern PTIIOJI formed from image information obtained by reading the character ``娃'' with an image scanner, the upper side of the character part or the boundary of the upper side slopes slightly upward to the right. It is observed that a concavo-convex pattern portion DBKO is formed in which a few dots of "black" dots are arranged with "white" dots of a few dots in between.

この凹凸パターン部分ＤＥＫＯは識別対象となる標準印
刷文字から得たイメージ情報について発生しているもの
で、イメージスキャナに光電変換素子として設けられて
いるイメージセンサの配列によって決まるような凹凸形
状を呈し、微視的に見たとき当該文字「娃」の文字パタ
ーンＰＴや。Ｊｌの特徴を表しているということができ
る。This uneven pattern portion DEKO is generated from image information obtained from standard printed characters to be identified, and exhibits an uneven shape determined by the arrangement of image sensors provided as photoelectric conversion elements in the image scanner. The character pattern PT of the character "娃" when viewed microscopically. It can be said that this represents the characteristics of Jl.

ところが文字パターンＰＴ、。、１の周辺から見た凹凸
形状を特徴として把握するようなペリフェラル特徴抽出
手法によって文字の認識をしようとする場合、文字部の
周縁部に生ずる凹凸パターン部分ＤＢＫＯをペリフェラ
ル特徴の一要素としてその特徴量を抽出すると、文字の
特徴が不必要に複雑になるおそれがある０文字の特徴が
複雑になると、大分類処理が複雑になり、その結果文字
の誤認識を生ずる確率が高くなるために文字認識速度を
劣化させる原因になるおそれがある。However, the character pattern PT. When trying to recognize characters using a peripheral feature extraction method that grasps the uneven shape seen from the periphery of 1 as a feature, the uneven pattern portion DBKO that occurs at the periphery of the character part is regarded as an element of the peripheral feature. Extracting the amount of characters may make character features unnecessarily complicated.0 If the character features become complex, the general classification process becomes complicated, and as a result, the probability of misrecognition of characters increases. This may cause deterioration of recognition speed.

本発明は以上の点を考慮してなされたもので、イメージ
スキャナによって発生されたイメージ文字情報に含まれ
ている凹凸パターン部分の形状を文字パターンの主要な
特徴を損なわせることなく簡略化することにより、文字
認識速度を劣化させる原因を未然に防止し得るようにし
た文字認識装置を提案しようとするものである。The present invention has been made in consideration of the above points, and it is an object of the present invention to simplify the shape of an uneven pattern portion included in image character information generated by an image scanner without impairing the main characteristics of the character pattern. This paper attempts to propose a character recognition device that can prevent causes of deterioration in character recognition speed.

Ｅ問題点を解決するための手段かかる問題点を解決するため本発明においては、文字列
信号Ｓ４から切り出した２値データでなる文字パターン
入力データＤ　Ａ　Ｔ　Ａ　ｔ　Ｈのうち、注目画素Ｐ
Ｅ、、□及びその隣接画素ＰＥＮＦＩ〜ＰＥＮｒ＊でな
る比較パターンＰＴｃｏやをスムージング標準パターン
ＰＴｍｔｙと比較し、一致したとき注目画素ＰＥ０ＩＪ
のデータを反転させることによりスムージング処理パタ
ーンＦ’Ｔｓｓｚを得、当該スムージング処理パターン
ＰＴｓに２でなる文字パターン出力データＤ　Ａ　Ｔ　
Ａｏｕｔに基づいて文字識別処理を実行するようにする
。E Means for Solving the Problem In order to solve the problem, in the present invention, the pixel of interest P is
The comparison pattern PTco consisting of E, , □ and its adjacent pixels PENFI to PENr* is compared with the smoothing standard pattern PTmty, and when they match, the pixel of interest PE0IJ
By inverting the data, a smoothing processing pattern F'Tssz is obtained, and character pattern output data D A T consisting of the smoothing processing pattern PTs and 2 is obtained.
Character identification processing is executed based on Aout.

Ｆ作用注目画素ＰＥ０ＩＪ及びその隣接画素ＰＥＮＦＩ〜ＰＥ
ＮＦＩでなる比較パターンＰＴｃｏｘがスムージング標
準パターンＰＴｍｔｒと一致したとき注目画素ＰＥ０Ｉ
Ｊのデータを反転させるようなスムージング処理をする
ことにより、注目画素ＰＥ０ＩＪが黒字文字部から突出
した黒字部分であったり、黒字文字部に白地画素が混在
していたり、黒字文字部に孤立した白地画素が混在した
り、白地背景部に孤立した黒字画素が混在したりするよ
うな場合に、これを必要に応じて除去したり埋め込むこ
とができることにより、文字パターンの周辺の凹凸形状
を一段と単純化し得る。F action pixel of interest PE0IJ and its adjacent pixels PENFI to PE
When the comparison pattern PTcox consisting of NFI matches the smoothing standard pattern PTmtr, the pixel of interest PE0I
By performing smoothing processing such as inverting the data of J, it is possible to find that the pixel of interest PE0IJ is a black part that protrudes from the black character part, white background pixels are mixed in the black character part, or white background pixels are isolated in the black character part. In cases where pixels are mixed or isolated black pixels are mixed on a white background, this can be removed or embedded as necessary, further simplifying the uneven shape around the character pattern. obtain.

かくして文字識別部における識別精度を高めることがで
きることにより、この分会体として文字識別速度を一段
と向上させることができる。By thus being able to improve the identification accuracy in the character identification section, the character identification speed of this branch can be further improved.

Ｇ実施例以下図面について、本発明の一実施例を詳述する。G example An embodiment of the present invention will be described in detail below with reference to the drawings.

（Ｇｌ）文字認識装置の実施例第３図において、１は全体として文字認識装置を示し、
原稿読取部２において得られたイメージ読取信号Ｓ１を
雑音除去手段４−に与える。(Gl) Embodiment of character recognition device In FIG. 3, 1 indicates the character recognition device as a whole;
The image read signal S1 obtained in the original reading section 2 is applied to the noise removing means 4-.

雑音除去手段４は、イメージ読取信号Ｓｔ内に含まれて
いるいわゆる孤立点についての雑音を除去することによ
り、誤って当該孤立点を文字の一部であると認識するお
それを回避するように動作する。The noise removal means 4 operates to avoid the possibility of erroneously recognizing the so-called isolated points as part of a character by removing noise about the so-called isolated points included in the image read signal St. do.

雑音除去手段４の雑音除去出力Ｓ２は文字切出部５の回
転補正手段６に与えられ、文書の回転誤差を補正した後
その補正出力Ｓ３を文字列抽出手段７に供給する。The noise removed output S2 of the noise removing means 4 is given to the rotation correcting means 6 of the character cutting section 5, and after correcting the rotation error of the document, the corrected output S3 is supplied to the character string extracting means 7.

文字列抽出手段７は、印刷文書のうち、文字領域を他の
領域（例えば写真、図面等の領域）から区分けして文字
領域に含まれるイメージ文字データだけを抽出すると共
に、当該文字領域に含まれる文字列が横書であることを
確認した後文字列の抽出をする。The character string extraction means 7 separates the character area from other areas (for example, areas of photographs, drawings, etc.) in the printed document, extracts only the image character data included in the character area, and extracts only the image character data included in the character area. After confirming that the character string to be written is horizontal writing, extract the character string.

この文字列の抽出は、第４図に示すように、文字領域Ａ
Ｒの各ドツトの位置を、列方向（水平方向）に取ったＸ
軸及び行方向（垂直方向）に取ったｙ軸でなるｘｙ座標
で表すようにし、文字領域ＡＲを構成する文字列ＡＲＩ
、ＡＲ２・・・・・・に含まれる論理「１」レベルのド
ツト（黒い文字部のドツトを表す）の和の値をｙ軸上に
投影して（これをｙ投影と呼ぶ）、ｙ投影信号Ｓ、を得
る。This character string extraction is performed in the character area A, as shown in Figure 4.
The position of each dot in R in the column direction (horizontal direction)
The character string ARI constituting the character area AR is expressed by xy coordinates consisting of the axis and the y axis taken in the row direction (vertical direction).
, AR2...... by projecting the sum of the logical "1" level dots (representing the dots in the black character part) onto the y-axis (this is called y-projection). Obtain a signal S.

ここで、　文字列ＡＲＩ、ＡＲ２・・・・・・の間の行
間位置には黒い文字部がないことがらｙ投影信号Ｓｙの
信号レベルが「０」レベルになるのに対して、文字列Ａ
ＲＩ、ＡＲ２・・・・・・に対応するｙ軸上の位置では
、文字列に含まれている各文字をＸ軸と平行なうイン上
のトータルドツト数に対応する信号レベルになる。そこ
でｙ投影信号Ｓｙを所定のスレショルドレベルと比較し
、当８亥スレショルドレベル以上の区間の間論理「１」
レベルに立ち上がる文字列切出データＣＬを得ることが
できる。Here, since there is no black character part between the lines between the character strings ARI, AR2, etc., the signal level of the y projection signal Sy becomes "0" level, whereas the signal level of the character string A
At positions on the y-axis corresponding to RI, AR2, . . . , each character included in the character string has a signal level corresponding to the total number of dots on the in parallel to the X-axis. Therefore, the y projection signal Sy is compared with a predetermined threshold level, and the logic is "1" during the interval above the threshold level.
Character string cutout data CL that rises to the level can be obtained.

文字列抽出手段７はこの文字列切出データＣＬを用いて
回転補正手段６から与えられる補正出力Ｓ３のうち、当
該文字列切出データＣＬが論理「１」になっている期間
の信号部分を、各行の文字列ＡＲＩ、ＡＲ２・・・・・
・の文字列信号Ｓ４として文字切出手段８に供給する。The character string extraction means 7 uses this character string extraction data CL to extract the signal portion of the correction output S3 given from the rotation correction means 6 during the period in which the character string extraction data CL is logic "1". , character strings ARI, AR2 in each line...
. is supplied to the character cutting means 8 as a character string signal S4.

このようにして文字列切出データＣＬが論理「１」レベ
ルに立ち上がる区間は当該文字列ＡＲＩ、ＡＲ２・・・
・・・の行の最大高さＨＬ（＝ＨＬ１、ＨＬ２・・・・
・・）を表すことになり、各行に含まれる文字の高さ方
向（すなわちｙ軸方向）の位置は、当該行の最大高さＨ
Ｌ　（−ＨＬＩ、ＨＬ２・・・・・・）の範囲に生ずる
ことになる。In this way, the period in which the character string extraction data CL rises to the logic "1" level is the character string ARI, AR2, . . .
Maximum height HL of the rows (=HL1, HL2...
), and the position of the characters included in each line in the height direction (i.e., y-axis direction) is the maximum height H of the line.
This will occur in the range of L (-HLI, HL2...).

文字切出手段８は各行の文字列ＡＰＩ、ＡＲ２・・・・
・・に含まれている各文字及び文字の構成文字部分（分
離文字の場合）が存在する位置及び範囲をＸ方向及びｙ
方向について検出することにより、第５図に示すように
、各文字及び構成文字部分に外接するように取り囲む高
さｈ及び幅Ｗの外接枠ＷＡＫＵによって囲まれる矩形領
域ＣＨＲとして切り出すような処理を実行する。なお、
Ｐは全角文字の平均文字ピッチ、ｄは矩形間隔を示す。The character cutting means 8 extracts the character string API, AR2, etc. of each line.
..., the position and range of each character and its constituent character parts (in the case of separate characters) in the X direction and y direction.
By detecting the direction, as shown in FIG. 5, processing is executed to cut out a rectangular area CHR surrounded by a circumscribing frame WAKU of height h and width W that circumscribes each character and its constituent character parts. do. In addition,
P represents the average character pitch of full-width characters, and d represents the rectangular spacing.

かくして外接枠ＷＡＫＵによって囲まれた矩形領域ＣＨ
Ｒの内部に、第２図について上述したように、文字パタ
ーンＭＯＪＩのうち黒い文字部のドツトを論理「１」レ
ベルで表し、かつ白い背景部のドツトを論理「０」レベ
ルで表してなる時間直列的な２植体号構成の文字パター
ン入力データＤＡＴＡ、、を形成する。Thus, the rectangular area CH surrounded by the circumscribing frame WAKU
Inside R, as described above with reference to FIG. 2, there is a time period in which the dots in the black character part of the character pattern MOJI are represented by the logic "1" level, and the dots in the white background part are represented by the logic "0" level. Character pattern input data DATA, , having a serial structure of two typefaces is formed.

文字切出手段８は文字パターン入力データＤＡＴ　Ａ　
Ｉ　Ｎを第６図のスムージング回路１５によってスムー
ジング処理することにより、第７図に示すような文字パ
ターン出力データＤ　Ａ　Ｔ　Ａ　ｏｕｙを得てこれを
入力文字情報Ｓ５として文字識別部９に送出する。The character cutting means 8 receives character pattern input data DAT A.
By subjecting IN to smoothing processing by the smoothing circuit 15 shown in FIG. 6, character pattern output data D AT A ouy as shown in FIG. 7 is obtained, and this is sent to the character identification section 9 as input character information S5. .

スムージング回路１５は、文字パターン入力データＤＡ
ＴＡＩＮ（第２図）を構成する各ドツトの画素について
、第８図に示すように、当該画素を注目画素ＰＥ０ＩＪ
とし、かつその周囲を取り囲む８個の隣接画素ＰＥ、４
□〜ＰＥＮＦＩＩで構成される比較入カバターンＰＴＣ
Ｏ）Ｉを、第１図（Ａ）　〜（Ｆ）に示すスムージング
標準パターンＰＴ□、と比較して、一致したとき次式％式％のように、　比較パターンｐ’ｒｃｏやの注目画素ＰＥ
０ＩＪのデータｒＰＥｏｍＪＪに対して論理ｒｌＪを排
他的論理和演算することにより、スムージング処理パタ
ーンＰＴｓ、４ｚの注目画素のデータを反転させるよう
なスムージング処理を実行するようになされている。The smoothing circuit 15 uses character pattern input data DA.
For each dot pixel that constitutes TAIN (Fig. 2), as shown in Fig. 8, the pixel is designated as the pixel of interest PE0IJ.
and surrounding eight adjacent pixels PE,4
Comparative cover turn PTC composed of □ ~ PENFII
O) Compare I with the smoothing standard pattern PT□ shown in FIG.
By performing an exclusive OR operation on the data rPEomJJ of 0IJ and the logic rlJ, a smoothing process is executed to invert the data of the pixel of interest in the smoothing process pattern PTs, 4z.

第１図（Ａ）の場合のスムージング標準パターンＰ　Ｔ
＋＋ｗｖ　　（＝　Ｐ　Ｔ＊Ｅｒ＋＋〜ＰＴＩＥＦ１４
）は文字パターン入力データＤ　Ａ　Ｔ　Ａ　ＩＮ　（
第２図）の凹凸パターン部分ＤＥＫＯのうち、注目画素
ＰＥ０ＩＪの部分が１画素分だけ黒い文字部から外方に
突出したような比較入カバターンＰＴｃｏｓが到来した
場合に適用するもので、このときスムージング回路１５
は当該突出した画素を取り去るようなスムージング処理
をすることによりスムージング処理パターンＰ　Ｔｓｎ
ｚ　　（−Ｐ　Ｔｓｔｚ＋”）を得る。Smoothing standard pattern P T in case of Fig. 1 (A)
++wv (= P T * Er++ ~ PTIEF14
) is character pattern input data D AT A IN (
This is applied when a comparative cover pattern PTcos has arrived, in which the target pixel PE0IJ protrudes outward from the black text area by one pixel in the concavo-convex pattern portion DEKO in Figure 2).In this case, smoothing is applied. circuit 15
is a smoothing processing pattern P Tsn by performing smoothing processing to remove the protruding pixels.
z (−P Tstz+”) is obtained.

また第１図（Ｂ）のスムージング標準パターンＰＴ＊ｔ
ｖ　　（−ＰＴａｔｒｚ＋〜Ｐ　Ｔｘｔｒｚ＊　）は、
白い背景部が注目画素ＰＥ０ＩＪの部分において１画素
分だけ黒い文字部分の中に食い込むような比較入カバタ
ーンＰＴＣ。イが到来した場合に適用するもので、この
場合スムージング回路１５は当該食い込んだ白い画素を
黒い画素で埋め込むようなスムージング処理をした後ス
ムージング処理パターンＰＴ＊Ｍｚ　　（＝ＰＴｓｗ。Also, the smoothing standard pattern PT*t in FIG. 1(B)
v (-PTatrz+~PTxtrz*) is
Comparison cover turn PTC in which the white background part cuts into the black character part by one pixel in the part of the target pixel PE0IJ. In this case, the smoothing circuit 15 performs a smoothing process to embed the white pixel in question with a black pixel, and then creates a smoothing process pattern PT*Mz (=PTsw).

）を得る。).

また第１図（Ｃ）のスムージング標準パターンＰＴｔｔ
ｔｖ　　（＝ＰＴｍｔｒｓ＋　〜ＰＴ□。４）は、同様
に黒い文字部に１画素分だけ白い背景画素が食い込んだ
ような比較入カバターンＰＴｃｏ１４が到来した場合に
適用するもので、このときスムージング回路１５は当該
食い込んだ白い画素部分を黒い画素で埋め込んだと同様
のスムージング処理を実行した後スムージング処理パタ
ーンＰＴｓｘｚ　　（＝ＰＴ３□、）を得る。Also, the smoothing standard pattern PTtt in FIG. 1(C)
tv (=PTmtrs+ ~PT□.4) is similarly applied when a comparison cover pattern PTco14 arrives, in which a white background pixel digs into a black character part by one pixel, and at this time, the smoothing circuit 15 A smoothing process pattern PTsxz (=PT3□,) is obtained after performing a smoothing process similar to that of embedding the white pixel part with black pixels.

また第１図（Ｄ）のスムージング標準パターンＰＴ＊ｔ
ｙ　　（＝ＰＴ□Ｆ４）は、黒い文字部の中に孤立する
ように白い画素が１画素分だけ混在したような比較入カ
バターンＰＴｃｏｘが到来した場合に適用するもので、
このときスムージング回路１５は当該白い画素に黒い画
素を埋め込むようなスム−ジング処理をした後スムージ
ング処理パターンＰＴ□ｚ（”ＰＴｓ。４）を得る。Also, the smoothing standard pattern PT*t in FIG. 1(D)
y (=PT□F4) is applied when a comparison cover pattern PTcox in which only one white pixel is isolated in a black text area has arrived.
At this time, the smoothing circuit 15 performs smoothing processing to embed a black pixel into the white pixel, and then obtains a smoothing processing pattern PT□z ("PTs.4)."

第１図（Ｅ）のスムージング標準バタ〜ンＰＴ＊ｗｖ　
　（”ＰＴｍｚ□）は、白い背景部分に孤立して黒い画
素が１画素分だけ混在するような比較パターンＰＴＣＯ
Ｍが到来した場合に適用するもので、このときスムージ
ング回路１５は当該孤立した黒い画素を消し去るような
スムージング処理をした後スムージング処理パターンＰ
Ｔｓｎｚ　　（−ＰＴｓｘｚｓ）を得る。Smoothing standard pattern PT*wv in Figure 1 (E)
("PTmz□) is a comparison pattern PTCO in which one black pixel is isolated on a white background.
This is applied when M arrives, and at this time, the smoothing circuit 15 performs a smoothing process to erase the isolated black pixel, and then creates a smoothing process pattern P.
Tsnz (-PTsxzs) is obtained.

第１図（Ｆ）のスムージング標準パターンＰＴ＊ｔｒ　
　（−ＰＴ□Ｆ＆）は、例えば交差する線部分のように
、十字状線部の中心画素に白い画素が１画素分だけ混在
したような比較入カバターンＰＴ　Ｃｏｇが到来した場
合に適用するもので、このときスムージング回路１５は
当該白い画素を黒い画素で埋め込むようなスムージング
処理をした後スムージング処理パターンＰＴ、。（”　
Ｐ　Ｔ　ｓ□、）を得る。Smoothing standard pattern PT*tr in Figure 1 (F)
(-PT□F&) is applied when a comparison input cover pattern PT Cog in which only one white pixel is mixed in the center pixel of the cross-shaped line part, such as in the case of intersecting line parts, is applied. , At this time, the smoothing circuit 15 performs a smoothing process to embed the white pixel with a black pixel, and then creates a smoothing process pattern PT. (”
P T s□, ) is obtained.

スムージング回路１５（第６図）は時間直列データでな
る文字パターン入力データＤ　Ａ　Ｔ　Ａ　ｒ　ｙを順
次、１ビツト遅延用シフト回路ＤＬＩＩ、ＤＬ１２．１
ライン遅延用シフト回路ＤＬ２１．１ビツト遅延用シフ
ト回路ＤＬ２２、ＤＬ２３、ｌライン遅延用シフト回路
ＤＬ３１．１ビツト遅延用シフト回路ＤＬ３２、ＤＬ；
３３を通るように１ビツトずつシフトさせて行くように
なされ、　かくして１ビツト遅延用シフト回路ＤＬＩ　
Ｌ　ＤＬＩ２、ＤＬ２２、ＤＬ２３、ＤＬ３２、ＤＬ３
３の入力端及び出力端から、　順次比較式カバターンＰ
ＴＣＯＮを構成する画素Ｐ　Ｅｕｒ＊　、　Ｐ　ＥＭＦ
？　、　ＰＥＮＦ＆　−、Ｐ　ＥＮｒｓ　ｓ　Ｐ　Ｅｏ
ｍＪＳＰ　ＥＮＦ４、Ｐ　ＥＮＦ３　、Ｐ　ＥＮｒｚ　
、Ｐ　Ｅ□、のデータを引き出してＲＯＭ　（ｒｅａｄ
　ｏｎｌｙ　ｍｅｍｏｒｙ）変換回路構成のスムージン
グ処理回路ＳＭＺに供給するようになされている。The smoothing circuit 15 (FIG. 6) sequentially transfers character pattern input data DATA ray consisting of time series data to shift circuits DLII and DL12.1 for a 1-bit delay.
Line delay shift circuit DL21.1 Bit delay shift circuit DL22, DL23, l Line delay shift circuit DL31.1 Bit delay shift circuit DL32, DL;
33, one bit at a time, and thus the 1-bit delay shift circuit DLI
L DLI2, DL22, DL23, DL32, DL3
From the input end and output end of 3, sequential comparison type cover turn P
Pixels P Eur*, P EMF configuring TCON
? , PENF & -, P ENrs s P Eo
mJSP ENF4, P ENF3, P ENFrz
, P E□, is extracted from the ROM (read
The signal is supplied to a smoothing processing circuit SMZ having a (only memory) conversion circuit configuration.

スムージング処理回路ＳＭＺは、　第１図（Ａ）〜（Ｆ
）について上述したスムージング標準パターンＰＴｍｔ
ｒをスムージング処理パターンＰＴ□２に変換するよう
な変換テーブルを有し、これにより文字パターン入力デ
ータＤ　Ａ　Ｔ　Ａ　Ｉ　Ｎのデータが１ビツトずつシ
フト回路をシフトするごとに、比較入カバターンＰＴ、
。、をスムージング標準パターンＰＴ□７と比較し、一
致するパターンが到来したときこれをスムージング処理
パターンＰＴｓｘｚに変換して文字パターン出力データ
ＤＡ　Ｔ　Ａ　ｏｕｔとして送出する。The smoothing processing circuit SMZ is shown in Figures 1 (A) to (F).
) described above for the smoothing standard pattern PTmt
It has a conversion table that converts r into a smoothing processing pattern PT□2, so that each time the character pattern input data DATA IN is shifted through the shift circuit one bit at a time, the comparison input cover pattern PT,
. , is compared with the smoothing standard pattern PT□7, and when a matching pattern arrives, it is converted into a smoothing processing pattern PTsxz and sent as character pattern output data DATA out.

この実施例の場合文字識別部９は、ペリフェラル特徴に
基づく大分類処理によって候補文字を識別した後、当該
候補文字についてパターンマツチングに基づく細分類処
理を実行するような階層的な識別処理を実行するように
なされている。In this embodiment, the character identification unit 9 identifies candidate characters through major classification processing based on peripheral features, and then performs hierarchical identification processing such as performing subclassification processing based on pattern matching on the candidate characters. It is made to be.

ここでペリフェラル特徴に基づいて文字認識する手法と
しては、特開昭６２−１８６３９０号公報に開示のもの
を適用し得る。Here, as a method for character recognition based on peripheral characteristics, the method disclosed in Japanese Patent Application Laid-open No. 186390/1983 can be applied.

因に文字識別部９は第９図及び第１０図に示すように、
　４辺形の外接枠ＷＡＫＵで囲まれた入力文字ＭＯＪＩ
について、その左及び右側辺（これをＡ辺及びＢ辺と呼
ぶ）ＷＡＫＵＡ及びＷＡＫＵｌ側から見た第１及び第２
のペリフェラル特徴と、上側辺及び上側辺（これを０辺
及び０辺と呼ぶ）ＷＡＫＵｃ及びＷＡＫＵゎから見た第
３及び第４のペリフェラル特徴に基づいて、白地に対し
て黒い文字部が作る文字構造をコード化してなる特徴量
ＤｃＮＩＩを形成し、この特徴量ＤＣ□に基づいて類似
する候補文字を大分類辞書から選出する。Incidentally, the character identification section 9, as shown in FIGS. 9 and 10,
Input character MOJI surrounded by a quadrilateral circumscribing frame WAKU
, the left and right sides (these are called sides A and B) are the first and second sides seen from WAKUA and WAKUl sides.
Based on the peripheral features of and the third and fourth peripheral features seen from the upper side and the upper side (these are called 0 side and 0 side) WAKUc and WAKUゎ, the character created by the black character part on a white background. A feature amount DcNII is formed by encoding the structure, and similar candidate characters are selected from the major classification dictionary based on this feature amount DC□.

Ａ辺におけるペリフェラル特徴は、　人力文字ＭＯＪＩ
のうち、　左側辺Ｗ　Ａ　Ｋ　Ｕ　ａに沿う例えば３ド
ツト目のラインの黒字文字部を、矢印ＡＲＡで示す方向
から見たとき、ドツトラインＬＡより左側にある黒字文
字部の長さｄｓＩ　、ｄ　ｓ２、ｄｓ、をコード化して
特徴量Ｄ　ｃＭｔを形成する。The peripheral feature on side A is human power character MOJI
Among them, when the black character part of the third dot line along the left side WAKU a is viewed from the direction indicated by the arrow ARA, the lengths of the black character part on the left side of the dot line LA are dsI and ds2 , ds, is encoded to form a feature amount D cMt.

第１Ｏ図の場合、左側文字部ＭＯＪＩａの左上方の隅部
の走査開始部分（これをスタートコーナと呼ぶ）の長さ
Ｃ３がかなり長いことにより、当該左上方の隅部にはド
ツトがないことを表すことができる。In the case of Figure 1O, there is no dot in the upper left corner because the length C3 of the scanning start portion (this is called the start corner) at the upper left corner of the left side character portion MOJIa is quite long. can be expressed.

これに対して走査終了端にある左下方の隅部（これをエ
ンドコーナと呼ぶ）の長さＣｅは短いので、当該エンド
コーナにドツトがあることを表すことができる。On the other hand, since the length Ce of the lower left corner at the end of scanning (this is called the end corner) is short, it can be indicated that there is a dot at the end corner.

かくしてスタートコーナ及びエンドコーナの長さＣｓ及
びＣｅが所定のスレショルド長さより短いときスタート
コーナ及びエンドコーナに黒字文字部があることを表す
論理「１」のデータを特徴量ＤＣＭＩの第０及び第１番
目のビットＢ、及びＢ、のデータとする。Thus, when the lengths Cs and Ce of the start corner and the end corner are shorter than the predetermined threshold length, the data of logic "1" representing that there is a black character part at the start corner and the end corner is set to the 0th and 1st of the feature quantity DCMI. The data of the th bit B and B.

これに対して、スタートコーナ及びエンドコーナ間に゛
ある黒字文字部の長さｄｓ、、ｄａ！・・・・・・ｄｓ
ｎが所定のスレショルド長さより短いとき論理「０」の
データを設定しくこのことは当該黒字文字部が短点であ
ることを表す）、これに対して大きいとき論理ｒｌＪの
データを設定する（このことは当該黒字文字部が長点で
あることを表す）。On the other hand, the length of the black character section between the start corner and the end corner is ds,,da!・・・・・・ds
When n is shorter than a predetermined threshold length, data of logic "0" is set (this indicates that the black character part is a dot), whereas when n is larger, data of logic rlJ is set (this This means that the black character part is a long dot).

この長さｄａ、　Ｓｄｓ、・・・・・・に対応するデー
タを第２、第３・・・・・・番目のビットのデータＢｚ
ｓＢ、・・・・・・とすることによって、　黒字文字部
の長さの配列の仕方（従って短点及び長点の配列の仕方
）、すなわち当該文字ＭＯＪＩの特徴を可変長コードの
特徴量Ｄｃｍとして表すことができる。The data corresponding to this length da, Sds, ...... is the data of the second, third ...... bit Bz
sB, ......, the way the lengths of the black character part are arranged (therefore the way the dots and dashes are arranged), that is, the characteristics of the character MOJI, can be expressed as the feature amount Dcm of the variable length code. It can be expressed as

このようにして文字識別部９はＡ辺、Ｂ辺、０辺、Ｄ辺
についての特徴量ＤＣ□でなる特徴量数値データを標準
文字について予め求めて標準文字コードとして大分類辞
書に格納しておき、入力文字情報Ｓ５が到来したとき当
該入力文字情報Ｓ５がもっている特徴量Ｄ　ＣＨＩと同
じ特徴量Ｄｅｌｌを有する標準文字コードを候補文字情
報として細分類処理回路に引き渡すようになされている
。In this way, the character identification unit 9 obtains feature value numerical data consisting of feature values DC□ for the A side, B side, 0 side, and D side in advance for standard characters, and stores them as standard character codes in the major classification dictionary. Then, when the input character information S5 arrives, a standard character code having the same feature amount Dell as the feature amount D CHI of the input character information S5 is delivered to the subclassification processing circuit as candidate character information.

細分類処理回路は当該候補文字情報についての文字パタ
ーンを細分類辞書から読み出して入力文字情報Ｓ５の文
字パターンとパターンマツチングをとることにより、最
も確からしい文字を識別し、当該識別結果を認識文字情
報Ｓ６として送出する。The subclassification processing circuit reads the character pattern for the candidate character information from the subclassification dictionary, performs pattern matching with the character pattern of the input character information S5, identifies the most likely character, and uses the identification result as a recognized character. It is sent as information S6.

以上の構成によれば、文字切出手段８のスムージング回
路１５においてスムージング処理を実行するようにした
ことにより、文字列信号Ｓ４から切り出された文字パタ
ーンＰＴイ。Ｊｌ　（第２図）に凹凸パターン部分ＤＥ
ＫＯが生じていたとしても、これをスムージング処理で
きることにより第７図に示すように黒字文字部の周辺に
凹凸パターンをもたない文字パターン出力データＤ　Ａ
　Ｔ　Ａ　ｏｕｔに変換してから入力文字情報Ｓ５とし
て文字識別部に引き渡すことができる。According to the above configuration, the smoothing process is executed in the smoothing circuit 15 of the character cutting means 8, so that the character pattern PT is cut out from the character string signal S4. Concave and convex pattern part DE on Jl (Fig. 2)
Even if a KO occurs, it can be smoothed, resulting in character pattern output data D A that does not have an uneven pattern around the black character part, as shown in FIG.
After converting it into T A out, it can be delivered to the character recognition section as input character information S5.

第２図及び第７図を比較してみれば明らかなように、　
スムージング処理された後の文字パターン出力データＤ
　Ａ　Ｔ　Ａ　ｏａｔが表す文字パターンＰＴ１４゜Ｊ
ＩＫはその周辺から見た凹凸形状がスムージング処理前
の文字パターン入力データＤ　Ａ　Ｔ　Ａ　Ｉ　Ｍの文
字パターンＰＴ工。、Ｉの場合の凹凸形状と比較して格
段的に単純化されている。As is clear from comparing Figures 2 and 7,
Character pattern output data D after smoothing processing
Character pattern PT14゜J represented by A T A oat
IK is character pattern PT processing of character pattern input data DATAIM before smoothing process. , I is significantly simplified compared to the uneven shape in the case of I.

しかも、このようにして単純化された文字パターンＰＴ
ＭＯＪＩＸの周辺の凹凸形状は、入力文字ＭＯＪＩがも
っている主要な形状的特徴それ自体を大幅に変更するお
それはないので、文字識別部９においてペリフェラル特
微量Ｄ　ｅＮＲを抽出する際に、当該特徴量り、□のデ
ータを不必要に複雑にするおそれを有効に回避し得、こ
の分天分類処理効率を高めることができることにより、
全体として文字Ｉｌｌ速度を向上させることができる。Moreover, the character pattern PT simplified in this way
There is no risk that the uneven shape around MOJIX will significantly change the main shape features of the input character MOJI, so when extracting the peripheral feature amount D eNR in the character recognition unit 9, the feature amount is , □ can effectively avoid the possibility of unnecessarily complicating the data, and can improve the processing efficiency of this fractional classification.
The character Ill speed can be improved as a whole.

因にペリフェラル特徴についての特徴量Ｄ　ｅｌｌのデ
ータが複雑になれば、入力文字ＭＯＪｒの微細な特徴的
な変化でも特徴量Ｄ　ｅｌｌに取り込むことになるので
、文字識別部９においてペリフェラル特徴の抽出結果に
実用上不必要な変動が生ずるおそれがあるが、上述の実
施例によればかがる変動要因を有効に除去し得る。Incidentally, if the data of the feature quantity Dell regarding the peripheral feature becomes complex, even minute characteristic changes in the input character MOJr will be incorporated into the feature quantity Dell, so the character recognition unit 9 will extract the peripheral feature extraction result. However, according to the above-described embodiment, such fluctuation factors can be effectively eliminated.

（Ｇ２）他の実施例（１）　　上述の実施例においては、文字識別部９にお
いてペリフェラル特徴に基づいて識別処理をする場合に
ついて述べたが、文字識別手法として他の識別方法を実
行する場合にも広く本発明を適用し得る。(G2) Other Embodiments (1) In the above-described embodiments, a case was described in which the character identification unit 9 performs identification processing based on peripheral characteristics, but when performing other identification methods as a character identification method, The present invention can also be widely applied.

（２）　　上述の実施例においては、スムージング処理
を文字切出手段８において実行することにより、回転補
正手段６、文字列抽出手段７における処理を実行した後
にスムージング処理するようにした場合について述べた
が、これに加えて又はこれに代えて雑音除去手段４にお
いてスムージング処理を実行するようにしても上述の場
合と同様の効果を得ることができる。(2) In the above-mentioned embodiment, a case has been described in which the smoothing process is performed in the character extraction means 8, so that the smoothing process is performed after the processes in the rotation correction means 6 and the character string extraction means 7 are performed. However, in addition to or in place of this, the same effect as in the above case can be obtained even if the noise removing means 4 executes smoothing processing.

（３）上述の実施例においては、注目画素ＰＢＯＩＪに
対してこれを取り囲む８つの隣接画素ＰＥ、□〜ＰＥｕ
ｒｓの全てを用いてスムージング処理をするようにした
が、当該８つの隣接画素のうちの一部を省略するように
しても良い。(3) In the above embodiment, eight adjacent pixels PE, □ to PEu surrounding the pixel of interest PBOIJ
Although the smoothing process is performed using all of rs, some of the eight adjacent pixels may be omitted.

Ｈ発明の効果上述のように本発明によれば、イメージ文字情報の文字
パターンの周縁部分に生ずる凹凸パターン部分をスムー
ジング処理することにより、文字パターンの外形形状を
単純化し得ることにより、−段と効率良く文字識別処理
を実行し得るような文字認識装置を容易に実現し得る。H Effects of the Invention As described above, according to the present invention, by smoothing the uneven pattern portion that occurs on the periphery of the character pattern of image character information, the outer shape of the character pattern can be simplified, thereby achieving - A character recognition device that can efficiently perform character identification processing can be easily realized.

[Brief explanation of the drawing]

第１図は本発明による文字認識装置において実行される
スムージング処理に用いられるスムージングパターンを
示す路線図、第２図は文字パターン入力データの文字パ
ターンを示すパターン図、第３図は本発明による文字認
識装置の全体構成を示すブロック図、第４図は文字列の
切出動作の説明に供する路線図、第５図は矩形領域の抽
出動作の説明に供する路線図、第６図はスムージング回
路の構成を示すブロック図、第７図はスムージング処理
された文字パターン出方データの文字パターンを示すパ
ターン図、第８図はスムージング処理すべき比較入カバ
ターンを示す路線図、第９図及び第１Ｏ図はべりフェラ
ル特徴の抽出原理を示す路線図である。１・・・・・・文字認識装置、２・・・・・・原稿読取
部、３・・・・・・文字認識処理部、４・・・・・・雑
音除去手段、５・・・・・・文字切出部、６・・・・・
・回転補正手段、７・・・・・・文字列抽出手段、８・
・・・・・文字切出手段、９・・・・・・文字識別部、
１５・・・・・・スムージング回路、ＤＬＩＩ〜ＤＬ３
３・・・・・・シフト回路、ＳＭＺ・・・・・・スムー
ジング処理回路。FIG. 1 is a route map showing a smoothing pattern used in the smoothing process executed in a character recognition device according to the present invention, FIG. 2 is a pattern diagram showing a character pattern of character pattern input data, and FIG. 3 is a character diagram according to the present invention. FIG. 4 is a block diagram showing the overall configuration of the recognition device. FIG. 4 is a route diagram for explaining the character string extraction operation. FIG. 5 is a route diagram for explaining the rectangular area extraction operation. FIG. 6 is a route diagram for explaining the rectangular area extraction operation. A block diagram showing the configuration, FIG. 7 is a pattern diagram showing character patterns of character pattern appearance data subjected to smoothing processing, FIG. 8 is a route map showing comparative cover turns to be subjected to smoothing processing, and FIGS. 9 and 1O. FIG. 3 is a route map showing the principle of extraction of the ferrule feature. 1...Character recognition device, 2...Document reading unit, 3...Character recognition processing unit, 4...Noise removal means, 5...・Character cutting part, 6...
・Rotation correction means, 7...Character string extraction means, 8.
...Character cutting means, 9...Character identification section,
15...Smoothing circuit, DLII to DL3
3...Shift circuit, SMZ...Smoothing processing circuit.

Claims

[Claims] A comparison pattern consisting of a pixel of interest and its adjacent pixels among character pattern input data consisting of binary data cut out from a character string signal is compared with a smoothing standard pattern, and if they match, the data of the pixel of interest is A character recognition device characterized in that a smoothing processing pattern is obtained by reversing the pattern, and character recognition processing is performed based on character pattern output data formed by the smoothing processing pattern.