JPH0377183A

JPH0377183A - Character segmenting device

Info

Publication number: JPH0377183A
Application number: JP1213807A
Authority: JP
Inventors: Toru Ishikawa; 石河　融; Hiroshi Yoshida; 浩史吉田; Koichi Higuchi; 浩一樋口; Yoshiyuki Yamashita; 山下　義征
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1989-08-19
Filing date: 1989-08-19
Publication date: 1991-04-02

Abstract

PURPOSE:To improve accuracy for segmenting a character pattern by comparing a block length threshold value with a block length, classifying a black block corresponding to the result of this comparison and selecting the suitable segmenting direction of the character pattern corresponding to this classification. CONSTITUTION:A peripheral distribution preparing means 14 is provided to scan row picture data in a direction almost orthogonal to a direction along a character row, to count the number of black bits on a scanning line for each scanning line and to calculate black bit distribution and a block detecting means 16 is provided to detect an area where the number of the black bits is more than a bit number threshold value. Then, a segmenting direction selecting means 18 is provided to select the direction almost orthogonal to the character row as the segmenting direction when the black block length in the direction along the character row is smaller than the threshold value, and the select a direction obliquely cross the direction along the character row as the segmenting direction when the block length is more than the threshold value, and a segmenting means 20 is provided to segment the character pattern along the selected segmenting direction. Thus, even when plural oblique characters are made close each other, a character segmenting position can be detected for the unit of one character by simple processing.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は文字切出し装置、特に複数の異なる種類の字
体の文字（例えば斜体文字及び斜体でない文字）が混在
する帳票等の画像データがら文字パタンを精度良く切出
すための装置に関する。Detailed Description of the Invention (Industrial Field of Application) The present invention relates to a character cutting device, and particularly to a character cutting device for cutting out character patterns from image data such as a form containing a mixture of characters of a plurality of different types of fonts (for example, italic characters and non-italic characters). This invention relates to a device for accurately cutting out.

（従来の技術）帳票等の画像データから文字パタン（−文字性の画像デ
ータ）を切出す従来の技術としで、例えば次に述べるも
のが提案されている。(Prior Art) As a conventional technique for cutting out a character pattern (image data of -character character) from image data such as a form, the following, for example, has been proposed.

第１１図（Ａ）は行画像データの一例を部分的に示す図
である。この図においてｄはラインバッファに格納され
た文字行１行分の画像データの一部を示しこの画像デー
タｄの白地部分は文字背景部分及び符号１０ａ〜ＩＯｅ
を付しで示す黒地部分はそれぞれ一文字分の文字線部を
表す。画像データｄ上にはＸ軸方向を文字行に沿う方向
としたＸ−Ｙ直交座標系を設定しでいる。また第１１図
（８）は主走査及び副走査方向ｔＹ軸及びＸ軸方向とし
て画像データｄを走査して作成した黒ビット分布を示し
、この図において横軸は同図（Ａ）のＸ軸に対応する座
標軸及び縦軸は座標Ｘにおける走査線上の黒ビット数を
表す。FIG. 11(A) is a diagram partially showing an example of row image data. In this figure, d indicates a part of the image data for one character line stored in the line buffer, and the white part of this image data d is the character background part, and the symbols 10a to IOe
Each black area shown with an asterisk represents the character line portion of one character. An X-Y orthogonal coordinate system is set on the image data d, with the X-axis direction being along the character lines. In addition, FIG. 11 (8) shows the black bit distribution created by scanning the image data d in the main scanning and sub-scanning directions tY-axis and X-axis directions, and in this figure, the horizontal axis is the X-axis in FIG. The coordinate axis and the vertical axis corresponding to the coordinate axis represent the number of black bits on the scanning line at the coordinate X.

第一の従来技術では、ラインバッファに格納した帳票等
の画像データｄを、文字行方向とほぼ直交する方向を主
走査方向として、線順次に走査し、各走査線毎に走査線
上の黒ビット数を計数して黒ビットの分布を作威し、こ
の分布ヲ参照して文字パタンの切出し位置を決定する。In the first conventional technique, image data d such as a form stored in a line buffer is scanned line-by-line with the main scanning direction being approximately orthogonal to the character line direction, and black bits on the scanning line are scanned for each scanning line. The distribution of black bits is determined by counting the number of black bits, and the cutting position of the character pattern is determined by referring to this distribution.

切出し位置の決定では、Ｘ座標の各位Ｍ＠の黒ビット数
を閾値αと比較し、黒ビット数が閾値αより大きくなる
位置を検出すると当該位′ａを切出し開始位ＨＸｓとし
、次いで黒ビット数が閾値αよりも小さくなる位置を検
出すると当該位置の直前の位置を切出し終了位置×５と
する。To determine the cutting position, the number of black bits at each position M@ of the When a position where the number is smaller than the threshold value α is detected, the position immediately before the position is set as the cutting end position×5.

しかしながら例えば第１１図（Ａ）及び（Ｂ）に示すよ
うに斜体文字が非常に近接している場合のように、閾値
αを越える黒ビット数が一文字毎に途切れて検出されず
に複数文字（こ渡って連続して検出されると、切出し位
ＭＸ、からＸ６までの間に複数文字分の画像データが含
まれることとなり、−文字単位に画像データを切出せな
い。However, for example, as shown in FIGS. 11(A) and 11(B), when italicized characters are very close together, the number of black bits exceeding the threshold α is interrupted character by character, resulting in multiple characters ( If these characters are detected continuously, image data for a plurality of characters will be included between the cutting positions MX and X6, and image data cannot be cut out in units of -characters.

そこで特開昭６２−１２７９８５号公報において、この
ような場合にも一文字単位に画像データを切出すことの
できる第二の従来技術が提案されている。Therefore, Japanese Patent Laid-Open No. 62-127985 proposes a second conventional technique that can extract image data character by character even in such a case.

この第二の従来技術では、例えば第１１図（Ａ）にも示
すように、画像データＣＩ！走査して画像データｄの文
字線部に外接する外接枠Ｇを検出する。次いて外接枠Ｇ
内の画像データｄｔ外接枠Ｇの上辺から下辺に向けて走
査すると共に下辺から上辺に向けて走査し、上辺からの
走査で検出される背景部分及び下辺からの走査で検出さ
れる背景部分と、これら以タトの背景部分と、文字線部
との４種類の部分に外接枠Ｇ内の画像データを分類する
。次いて外接枠Ｇ内の画像データｄを再度走査して、前
述の分類が変化する点（変化点）を検出し、この変換点
の位置に基づき切出し位置を決定する。In this second conventional technique, for example, as shown in FIG. 11(A), image data CI! The image data d is scanned to detect a circumscribing frame G that circumscribes the character line portion of the image data d. Next, circumscribing frame G
Image data dt in the circumscribed frame G is scanned from the upper side to the lower side and from the lower side to the upper side, and a background portion detected by scanning from the upper side and a background portion detected by scanning from the lower side; The image data within the circumscribing frame G is classified into four types of parts: a background part with a vertical line, and a character line part. Next, the image data d within the circumscribed frame G is scanned again to detect the point at which the classification changes (change point), and the cutting position is determined based on the position of this conversion point.

この第二の従来技術によれば、例えば特定の強調したい
単語のみをイタリック体で印字しそれ以外の強調する必
要のない単語についでは斜体でない字体で印字した場合
等、斜体及び斜体でない文字が混在する画像データから
一文字単位に文字の切出しを行なえる。According to this second prior art, italic and non-italic characters are mixed, for example, when only a specific word to be emphasized is printed in italics, and other words that do not need to be emphasized are printed in non-italic fonts. Characters can be extracted character by character from image data.

（発明が解決しようとする課題）しかしながら上述した第二の従来技術は処理が複雑であ
るために、切出し速度が低下し、ざらには上述した処理
を実現するための装置構成が複雑化して装置の小型化が
困難であるという問題点があった。(Problems to be Solved by the Invention) However, since the above-mentioned second conventional technology has a complicated process, the cutting speed decreases, and the equipment configuration for realizing the above-mentioned process becomes complicated. The problem was that it was difficult to miniaturize.

この発明の目的は、上述した従来の問題点を解決するた
め、複数の斜体文字が近接する場合でも簡単な処理で一
文字単位に文字切出し位置を検出できる文字切出し装置
を提供することにある。SUMMARY OF THE INVENTION In order to solve the above-mentioned conventional problems, it is an object of the present invention to provide a character cutting device that can detect the character cutting position for each character by simple processing even when a plurality of italic characters are close to each other.

（課題を解決するための手段）この目的を達成するため、この発明の文字切出し装置は
、量子化された行画像データがら文字パタンを切出す文
字切出し装置において、行画像データを文字行に沿う方
向とほぼ直交する方向に走査し、各走査線毎に走査線上
の黒ビット数を計数して黒ビット分布を求める周辺分布
作成手段と、行画像データに関して黒ビット数がビット
数閾値以上となる領域を黒ブロックとして検出するブロ
ック検出手段と、文字行に沿う方向における黒ブロック
のブロック長がブロック長閾値よつ小ざいときには切出
し方向として文字行に沿う方向とほぼ直交する方向を選
択し、ブロック長がプロツウ長閾値以上となるときには
切出し方向として文字行に沿う方向と角度０度で（但し
ｅは定数）斜めに交差する方向を選択する切出し方向選
択手段と、選択された切出し方向に沿って行画像データ
から文字パタンを切出す切出し手段とを備えで成ること
を特徴とする。(Means for Solving the Problem) In order to achieve this object, a character cutting device of the present invention cuts out a character pattern from quantized line image data, and the character cutting device cuts out a character pattern from quantized line image data. a marginal distribution creation means for calculating a black bit distribution by scanning in a direction substantially perpendicular to the scanning direction and counting the number of black bits on each scanning line for each scanning line; A block detection means detects an area as a black block, and when the block length of the black block in the direction along the character line is smaller than the block length threshold, the block detection means selects a direction almost orthogonal to the direction along the character line as the cutting direction, and When the length is equal to or greater than the length threshold, a cutting direction selection means selects a direction that diagonally intersects with the direction along the character line at an angle of 0 degrees (where e is a constant) as the cutting direction; The present invention is characterized by comprising a cutting means for cutting out a character pattern from line image data.

（作用）このような構成の文字切出し装置によれば、ブロック長
閾値及びブロック長を比較し、この比較結果に応して黒
ブロックを分類し、この分類に応した適切な文字パタン
の切出し方向を選択する。(Function) According to the character cutting device having such a configuration, the block length threshold value and the block length are compared, the black blocks are classified according to the comparison result, and the character pattern cutting direction is determined in an appropriate character pattern corresponding to the classification. Select.

従って、文字パタンの切出し精度を向上できる。Therefore, the accuracy of cutting out character patterns can be improved.

例えば第１１図に示テように複数個の斜体文字が近接し
て並んでいる場合、文字１個分の文字幅を越えるブロッ
ク長を有する黒ブロックが検出される。従ってこのよう
なことが起こることを想定した場合、ブロック長閾値を
任意好適に設定して黒ブロックを１個の文字を含むもの
及び複数個の文字を含むものの２種に分類し、複数個の
文字を含む黒ブロックを検出したら近接して並ぶ複数個
の斜体文字を含む黒ブロックを検出したとみなし、切出
し方向を文字行方向と任意好適な角度で斜めに交差する
方向とすれば、近接する複数個の斜体文字の画像データ
から一文字単位に精度良く文字パタンを切出すことがで
きる。For example, when a plurality of italic characters are arranged closely as shown in FIG. 11, a black block having a block length exceeding the width of one character is detected. Therefore, assuming that such a situation occurs, the block length threshold is set arbitrarily and suitably, the black blocks are classified into two types: those containing one character and those containing multiple characters, and the black blocks containing multiple characters are When a black block containing characters is detected, it is assumed that a black block containing multiple italic characters arranged in close proximity is detected, and if the cutting direction is set to diagonally intersect with the character line direction at an arbitrary suitable angle, it is assumed that the black blocks are adjacent. To accurately cut out a character pattern for each character from image data of a plurality of italic characters.

（実施例）以下、図面を参照し、この発明の実施例につき説明する
。尚、図面はこの発明が理解できる程度にＷ略的に示さ
れているにすぎず、従って各構成成分の構成、入出力信
号の流れ、形状、寸法及び配設位Ｍを図示例に限定する
ものではない。(Embodiments) Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that the drawings are only schematically shown to the extent that the present invention can be understood, and therefore the configuration of each component, the flow of input/output signals, the shape, dimensions, and arrangement position M are limited to the illustrated examples. It's not a thing.

第１図はこの発明の詳細な説明に供する図であり、実施
例の文字切出し装Ｍを用いて構成した文字認識装置の一
構成例を概略的に示す機能ブロック図である。FIG. 1 is a diagram for explaining the present invention in detail, and is a functional block diagram schematically showing a configuration example of a character recognition device constructed using a character cutting device M according to an embodiment.

同図において１２はこの発明の実施例の文字切出し装Ｍ
を示し、この装置１２は行画像データを文字行に沿う方
向とほぼ直交する方向に走査し、各走査線毎に走査線上
の黒ビット数を計数して黒ビット分布を求める周辺分布
作成手段１４と、行画像データに関して黒ビット数がビ
ット数閾値以上となる領域を黒ブロックとして検出する
ブロック検出手段１６と、文字行に沿う方向における黒
ブロックのブロック長がブロック長閾値より小さいとき
には切出し方向として文字行に沿う方向とほぼ直交する
方向を選択し、ブロック長がブロック長間値以上となる
ときには切出し方向として文字行に沿う方向と角度ｅ度
で（但し８は定数）斜めに交差する方向を選択する切出
し方向選択手段１８と、選択された切出し方向に沿って
行画像データから文字パタンを切出す切出し手段２０と
を備えで成ることを特徴とする。In the figure, 12 is a character cutting device M according to an embodiment of the present invention.
This device 12 scans the line image data in a direction substantially perpendicular to the direction along the character line, and calculates the black bit distribution by counting the number of black bits on each scanning line for each scanning line. a block detection means 16 for detecting an area where the number of black bits is equal to or greater than a bit number threshold with respect to line image data as a black block; Select a direction that is almost orthogonal to the direction along the character line, and when the block length is greater than or equal to the block length value, select a direction that diagonally intersects the direction along the character line at an angle of e degrees (8 is a constant) as the cutting direction. It is characterized by comprising a cutting-out direction selection means 18 for selecting, and a cutting-out means 20 for cutting out a character pattern from line image data along the selected cutting direction.

また２２は文字認識装置を示し、この装置２２は帳票等
の読取り対象の里子化された行画像データを得るための
前処理部２４と、量子化された行画像データから認識対
象−単位分例えば文字−文字性の文字パタンを切出す文
字切出し装置１２と、文字パタンを格納するパタンレジ
スタ２６と、文字パタンかう文字の特徴Ｍを抽出し、こ
の特徴量に基づき文字の認識結果を得る認識部２８とか
ら成る。Reference numeral 22 denotes a character recognition device, and this device 22 includes a preprocessing unit 24 for obtaining fostered line image data of a reading target such as a form, and a recognition target unit from quantized line image data, e.g. A character cutting device 12 that cuts out a character pattern of character-character character, a pattern register 26 that stores the character pattern, and a recognition unit that extracts the feature M of the character corresponding to the character pattern and obtains a character recognition result based on the feature amount. It consists of 28.

尚、第１図中の符号Ｔは図示しない外部機器に接続され
る出力端子を示し、認識部２８からの認識結果は出力端
子Ｔを介して外部機器へ入力される。Note that the symbol T in FIG. 1 indicates an output terminal connected to an external device (not shown), and the recognition result from the recognition section 28 is inputted to the external device via the output terminal T.

以下、この実施例の文字切出し装置１２につきより詳細
に説明し、これと共にこの発明の理解を深めるため図示
例の文字認識装置２２につきより詳細に説明する。Hereinafter, the character cutting device 12 of this embodiment will be explained in more detail, and at the same time, the character recognition device 22 of the illustrated example will be explained in more detail in order to better understand the present invention.

（前処理部）図示例の前処理部２４は光電変換部２４ａ及びラインバ
ッファ２４ｂがら成る。(Preprocessing Unit) The preprocessing unit 24 in the illustrated example includes a photoelectric conversion unit 24a and a line buffer 24b.

光電変換部２４ａは、図示しない書式テーブルに格納さ
れている入力書式データに基づき読取対象となる帳票の
文字行１行分の領域（行領域）ｌＦｒ検出し、この行領
域を光学的に走査して帳票からの光信号Ｓを量子化され
た電気信号に変換しこの信号を画像データとしてライン
バッファ２４ｂに格納する。画像データは例えば白黒２
値に量子化された電気信号であり、行画像データの黒画
素は文字線部を及び白画素は文字背景部を表すものとす
る。The photoelectric conversion unit 24a detects an area (line area) lFr for one character line of the form to be read based on input format data stored in a format table (not shown), and optically scans this line area. The optical signal S from the form is converted into a quantized electrical signal, and this signal is stored as image data in the line buffer 24b. For example, the image data is black and white 2
It is an electrical signal quantized into values, and the black pixels of the row image data represent the character line portion, and the white pixels represent the character background portion.

第２図は読取対象となる帳票の例を示す図である。また
第３図は入力書式データの説明図であり、同図（Ａ）及
びＣＢ）は入力書式データの例及び入力書式データと帳
票との関係を示す。FIG. 2 is a diagram showing an example of a form to be read. FIG. 3 is an explanatory diagram of input format data, and FIG. 3 (A) and CB) show examples of input format data and the relationship between input format data and forms.

第２図に示すような書式の帳票３０の場合には、例えば
、第３図にも示すように帳票３ｏの左上隅点を原点及び
文字行に沿う方向をＸ軸方向とじたＸ−Ｙ直交座標系を
帳票３０上に設定し書式データとしての第１行の行領域
の左上隅点ＬＴＩのＸ座標Ｘｓ及びＹ座標Ｙ３、■行領
域の高さｈ、■行頭域の幅Ｗ、■行ピッチｐ及び■帳票
１杖当つの行総数ｎを用いることによって、帳票上の行
領域の位Ｍを各文字行毎に検出できる。In the case of the form 30 in the format shown in FIG. 2, for example, as shown in FIG. 3, the X-Y orthogonal The coordinate system is set on the form 30, and the X coordinate Xs and Y coordinate Y3 of the upper left corner point LTI of the line area of the first line as format data, ■ Height h of the line area, ■ Width W of the line head area, ■ Line By using the pitch p and the total number n of lines per document, the position M of the line area on the document can be detected for each character line.

ラインバッファ２４ｂは画像データを文字行１行単位に
格納するものであり、例えば１２８×４０９６画素の容
量を有する。行画像データ上には例えば文字行に沿う方
向ｔｘ軸方向としたＸ−Ｙ直交座標系を設定し、この座
標系で表される画素位置のデータの読出しを自在に行な
えるようになしでいる。The line buffer 24b stores image data in units of character lines, and has a capacity of, for example, 128×4096 pixels. For example, an X-Y orthogonal coordinate system is set on the line image data, with the tx axis direction along the character line, so that data at pixel positions represented by this coordinate system can be freely read out. .

第４図に行画像データの一例を示す。同図においてＤは
文字行１行分の画像データを示し、この行画像データＤ
の一部が第１１図（Ａ）に示す画像データｄである。第
４図のＸ軸及びＹ軸は第１１図（Ａ）のＸ軸及びＹ軸に
対応する。FIG. 4 shows an example of row image data. In the figure, D indicates image data for one character line, and this line image data D
A part of this is image data d shown in FIG. 11(A). The X-axis and Y-axis in FIG. 4 correspond to the X-axis and Y-axis in FIG. 11(A).

（文字切出し装置）ａ：周辺分布作成手段この実施例の周辺分布作成手段１４は、文字行に沿う方
向とほぼ直交する方向を主走査方向として行画像データ
を線順次に走査して、各副走査位置毎に走査線上の黒ビ
ット数を計数し、列方向の黒ビット数の分布を作成する
。(Character segmentation device) a: Peripheral distribution creation means The peripheral distribution creation means 14 of this embodiment scans the line image data line-by-line in a direction substantially orthogonal to the direction along the character line as the main scanning direction, and The number of black bits on a scanning line is counted for each scanning position, and a distribution of the number of black bits in the column direction is created.

第４図に示す行画像データＤの黒ビット数の分布を第５
図に示す。The distribution of the number of black bits of the row image data D shown in FIG.
As shown in the figure.

ｂニブロック検出手段この実施例のブロック検出手段１６は、周辺分布作成手
段１４が作成した黒ビット数の分布を参照し、例えばビ
ット数閾値ＴＨＬ　ｔＴＨＬ　＝　１として黒ビット数
の分布が１以上となる領域を黒ブロックとしで検出する
。b Ni block detection means The block detection means 16 of this embodiment refers to the distribution of the number of black bits created by the marginal distribution creation means 14, and detects that the distribution of the number of black bits is 1 or more, for example, by setting the bit number threshold THL tTHL = 1. The area is detected as a black block.

そして各黒ブロックの文字行方向における始端値Ｍ　Ｍ
　Ｓ及び終端位買Ｍｅ’８検出する。この例では、黒ビ
ット数の分布が○から１以上に変化したときの黒ビット
数の分布が１以上となるＸ座標を始端値１１Ｍ５　、及
び黒ビット数が１以上から○に変化したときの黒ビット
数の分布が１以上となるＸ座標を終端位置Ｍ、として検
出する。And the starting end value M M of each black block in the character line direction
S and terminal position purchase Me'8 are detected. In this example, when the distribution of the number of black bits changes from ○ to 1 or more, the X coordinate at which the distribution of the number of black bits becomes 1 or more is the starting value 11M5, and when the number of black bits changes from 1 or more to ○. The X coordinate where the distribution of the number of black bits is 1 or more is detected as the end position M.

第５図に、検出された始端位置Ｍ、及び終端位ＷＭ＋−
の例を示した。FIG. 5 shows the detected starting position M and terminal position WM+-
An example was shown.

Ｃ：切出し方向選択手段この実施例の切出し方向選択手段１日は、ブロック検出
手段１６が検出した始端位置Ｍ８及びＭ６に基づき、各
黒ブロックの文字行に沿う方向における長さ（ブロック
長）Ｐを算出する。ブロック長ＰはＰ＝ＭＥ　−ＭＳ　
＋１と表せる。C: Cutting direction selection means The cutting direction selection means 1 of this embodiment determines the length (block length) P of each black block in the direction along the character line based on the starting end positions M8 and M6 detected by the block detection means 16. Calculate. Block length P is P = ME - MS
It can be expressed as +1.

そしで各黒ブロックのブロック長Ｐをブロック長間値２
（例えばＺ＝３５）と比較する。Then, set the block length P of each black block to the block length value 2
(for example, Z=35).

切出し方向選択手段１８はブロック長Ｐ及び閾値２の比
較結果に応じで切出し方向の選択信号Ｃを出力する。ｐ
＜ｚとなるとき切出し方向として文字行に沿う方向とほ
ぼ直交する方向を選択することを表す選択信号例えばＣ
＝Ｏを、またＰ≧Ｃとなるとき切出し方向として文字行
に沿う方向と角度ｅ度で（但しｅは定数）斜めに交差す
る方向を選択することを表す選択信号例えばＣ＝１を出
力する０選択信号Ｃは各黒ブロツク毎に出力される。The cutting direction selection means 18 outputs a cutting direction selection signal C according to the result of comparing the block length P and the threshold value 2. p
When <z, a selection signal indicating that a direction substantially perpendicular to the direction along the character line is selected as the cutting direction; for example, C.
=O, and when P≧C, outputs a selection signal, e.g., C=1, indicating that a direction that diagonally intersects with the direction along the character line at an angle of e degrees (where e is a constant) is selected as the cutting direction. A 0 selection signal C is output for each black block.

この実施例ではＰ＜Ｚとなるとき当該ブロック長ｐｔ有
する黒ブロックは文字１個の黒ブロック、及びＰ≧Ｚと
なるとき当該ブロック長Ｐを有する黒ブロックは複数個
の文字を含む可能性がある黒ブロックとみなす。In this example, when P<Z, the black block with the block length pt may contain one character, and when P≧Z, the black block with the block length P may contain multiple characters. It is considered as a certain black block.

ｄ：切出し手段この実施例の切出し手段２０は、切出し方向選択手段１
８から各黒ブロツク毎に選択信号Ｃを入力し、この選択
信号Ｃに応した切出し方向に沿って行画像データから文
字パタンを切出す。d: Cutting-out means The cutting-out means 20 of this embodiment includes the cutting-out direction selection means 1
A selection signal C is input for each black block from 8, and a character pattern is cut out from the line image data along a cutting direction corresponding to this selection signal C.

ｉ）Ｐ＜Ｚとなるとき行画像データの選択信号Ｃがｐ＜ｚｖ表すとき当該黒ブ
ロック長Ｐを得た黒ブロックＢ、は斜体ではない文字の
ブロックであるとみなし、この黒ブロックＢ、の始端値
ＭＭｓ及び終端位ＭＭ、に基づき文字パタンを切出す。i) When P<Z, when the selection signal C of the row image data represents p<zv, the black block B from which the black block length P has been obtained is considered to be a block of characters that are not italicized, and this black block B, A character pattern is cut out based on the starting end value MMs and the ending end position MM.

例えば第４図に示すアルファヘット′○′の黒ブロック
が黒ブロックＢ、である。For example, the black block with the alpha head '○' shown in FIG. 4 is the black block B.

この場合、切出し手段２０はこのときの選択された切出
方向に平行、すなわち文字行に沿う方向にほぼ直交する
方向（図示例ではＹ軸方向）に平行であり、かつ位！Ｍ
ｓ−８及びＭｔ＋ｔｔ通る切出し線Ｋ　＋　ｓ及びに＋
＋−を設定する（但しｓ、ｔは零又は任意好適な正の整
数値）。第４図に切出し線ＫＩ５及びＫＩＥの例を点線
で示す。In this case, the cutting means 20 is parallel to the selected cutting direction, that is, parallel to a direction substantially perpendicular to the direction along the character lines (in the illustrated example, the Y-axis direction), and is parallel to the direction along the character lines. M
Cutting line K + s and + passing through s-8 and Mt+tt
Set +- (where s and t are zero or any suitable positive integer value). FIG. 4 shows examples of cutting lines KI5 and KIE with dotted lines.

そしてこれら切出し線に＋８．ＫＩＥの間の領域を切出
し領域とし、この切出し領域内の行画像データを一文字
分の画像データすなわち文字パタンとして切出す。文字
パタンは一文字毎に切出されパタンレジスタ２６に格納
される。And +8 to these cutting lines. The area between the KIEs is set as a cutout area, and the line image data within this cutout area is cut out as image data for one character, that is, a character pattern. The character pattern is cut out character by character and stored in the pattern register 26.

切出し領域の文字行とほぼ直交する方向（こおける上限
位置及び下限位置は、例えば書式データから得られる、
行領域の文字行とほぼ直交する方向における上辺位置及
び下辺位置とすればよい。図示例ではこれら上辺位置及
び下辺位置はＹ座標で表せる。このほか、行画像データ
中の文字線部（こ外接する行外接枠９の文字行とほぼ直
交する方向（こおける上辺位１１９　Ｔ及び下辺位１１
９１１を切出し領域の上限位置及び下限位置としてもよ
い。A direction almost orthogonal to the character line of the cutout area (the upper limit position and lower limit position in this direction can be obtained from format data, for example,
The upper side position and the lower side position may be set in a direction substantially perpendicular to the character line of the line area. In the illustrated example, these upper side positions and lower side positions can be expressed by Y coordinates. In addition, the character line portion in the line image data (in a direction almost perpendicular to the character line of the line circumscribing frame 9 that circumscribes this (upper position 119 T and lower position 11
911 may be set as the upper limit position and lower limit position of the cutting area.

ｉｉ）　Ｐ≧Ｚとなるとき行画像データの選択信号Ｃ′ｔＪ＜１２２１表すとき当
該黒ブロック長Ｐを得た黒ブロックＢ２は複数個の斜体
文字を含む可能性がある黒ブロックであるとみなす。例
えば第４図に示す単語ＬＩＮＩＣ８’の黒ブロックが黒
ブロックＢ２である。ii) When P≧Z, the row image data selection signal C′tJ<1221, the black block B2 that obtained the black block length P is considered to be a black block that may contain multiple italic characters. . For example, the black block of the word LINIC8' shown in FIG. 4 is black block B2.

この場合、切出し手段２０は黒ブロックＢ２の始端位置
Ｍｓ及びＭＥに基づき走査領域工を設定する。走査領域
Ｉｕ例えば位ＷＭｓ−ｐ及びＭＥ＋ｑの間の行画像デー
タの領域とする（但し、ｐ及びｑは零又は任意好適な正
の整数値）。In this case, the cutting means 20 sets the scanning area based on the start position Ms and ME of the black block B2. The scanning area Iu is, for example, an area of row image data between WMs-p and ME+q (where p and q are zero or any suitable positive integer value).

そして、走査領域工内の行画像データをこのとき選択さ
れた切出し方向、すなわち文字行に沿う方向と角度０度
で斜めに交差する方向を走査方向として走査し、自走査
線から黒走査線に変化するときの当該黒走査線の位ＭＣ
ＩＩと、黒走査線から自走査線に変化するときの当該黒
走査線の位置Ｃｗと′＠検出する。ここで走査線上１こ
文字線部を表す黒画素が存在しない走査線を自走査線及
び走査線上に文字線部を表す黒画素が存在する走査線を
黒走査線と称する。位ＨＣｅ及びＣｗは例えば黒走査線
が×軸と交差する位置で表すことができる。Then, the line image data in the scanning area is scanned in the cutting direction selected at this time, that is, in the direction that diagonally intersects with the direction along the character line at an angle of 0 degrees, and from the self-scanning line to the black scanning line. The position MC of the black scanning line when changing
II and the position Cw of the black scanning line when it changes from the black scanning line to the self-scanning line. Here, a scanning line in which there is no black pixel representing a character line portion on the scanning line is referred to as a self-scanning line, and a scanning line in which there is a black pixel representing a character line portion on the scanning line is referred to as a black scanning line. The positions HCe and Cw can be represented, for example, by the position where the black scanning line intersects the x-axis.

ｅ（よ読取対象とする文字の字体に応して任意好適に設
定するものであるが、この例では例えば０２７５度とし
た。e (This can be set arbitrarily and suitably depending on the font of the character to be read, but in this example, it is set to 0275 degrees, for example.

次いで切出し手段２０は位ＭＣＢ及びＣｗに基づき切出
し線に２１ｉ及びＫ　２Ｅ％設定する。Next, the cutting means 20 sets 21i and K2E% to the cutting line based on the positions MCB and Cw.

この場合、切出し手段２０はこのときの選択された切出
方向に平行、すなわち文字行（こ沿う方向に（図示例で
はＸ軸方向）に平行であり、かつ位置Ｍｓ−ｖ及びＭＥ
＋、Ｗを通る切出し線に２Ｓ及びに２Ｅを設定する（但
し■、Ｗは零又は任意好適な正の整数＠）。第４図に切
出し線に２Ｓ及びに２Ｅの例を点線で示す。In this case, the cutting means 20 is parallel to the selected cutting direction, that is, parallel to the character line (in the illustrated example, the X-axis direction), and at the positions Ms-v and ME.
+, set 2S and 2E to the cutting line passing through W (where ■, W is zero or any suitable positive integer @). In FIG. 4, examples of 2S and 2E are shown with dotted lines.

そしてこれら切出し線に２Ｓ、Ｋ、２Ｅの間の領域を切
出し領域とし、この切出し領域内の行画像データを一文
字分の画像データすなわち文字パタンとして切出す。文
字パタンは一文字毎に切出されパタンレジスタ２６に格
納される。Then, the area between 2S, K, and 2E on these cutting lines is set as a cutting area, and the line image data within this cutting area is cut out as one character's worth of image data, that is, a character pattern. The character pattern is cut out character by character and stored in the pattern register 26.

切出し領域の文字行とほぼ直交する方向における上限位
置及び下限位置は、上述のＰ＜Ｚの場合と同様にして設
定すればよい。The upper limit position and lower limit position in the direction substantially perpendicular to the character line of the cutout area may be set in the same manner as in the case of P<Z described above.

第６図にパタンレジスタに格納された文字パタンの例を
示す、同図にも示すように文字パタンｂ上には任意好適
な位Ｎを原点とするＸ−Ｙ直交座標系が設定されており
、この座標系で表される画素位置の文字パタンのデータ
の読出しをパタンレジスタ２６から自在（こ行なえるよ
うになしでいる。Figure 6 shows an example of a character pattern stored in the pattern register. As shown in the figure, an X-Y orthogonal coordinate system with an arbitrary suitable position N as the origin is set on character pattern b. , character pattern data at pixel positions represented by this coordinate system can be freely read out from the pattern register 26 (this is not possible).

パタンレジスタ２６は例えば１２８Ｘ１２８画素の容ｊ
ｌを有する。The pattern register 26 has a capacity of, for example, 128 x 128 pixels.
It has l.

この実施例の文字切出し装置１２によれば、切出し領域
の検出に至るまでの処理が非常に簡素なので、切出し速
度が速いという利点がある。また処理が簡素なので、装
置構成を簡単化して小型化を図れるという利点がある。According to the character cutting device 12 of this embodiment, the processing up to the detection of the cutting area is very simple, so there is an advantage that the cutting speed is fast. Furthermore, since the processing is simple, there is an advantage that the device configuration can be simplified and the device can be made smaller.

（認識部）認識部２日はパタンレジスタ２６に格納された文字パタ
ンを走査し、文字パタンの特徴量抽出及び認識を行ない
、文字パタンの認識結果例えばＪＩＳ規格に規定される
文字コードを出力する。(Recognition unit) The recognition unit 2 scans the character pattern stored in the pattern register 26, extracts and recognizes the feature amount of the character pattern, and outputs the recognition result of the character pattern, for example, the character code specified in the JIS standard. .

認識部２８の構成は、従来周知の種々の構成とすること
ができるが、この実施例では、認識部２８を以下（こ述
べるような機能を有する認識部とする。Although the configuration of the recognition unit 28 can be various conventionally known configurations, in this embodiment, the recognition unit 28 is a recognition unit having the following functions.

この実施例の認識部２８は、まず、文字パタンの文字線
部に外接する文字外接枠を検出する。The recognition unit 28 of this embodiment first detects a character circumscribing frame circumscribing a character line portion of a character pattern.

ざらに当該文字パタンについて線幅Ｗを算出する。線幅
Ｗは次に示す従来周知の近似式で表される。Roughly calculate the line width W for the character pattern. The line width W is expressed by the following conventionally known approximation formula.

Ｗ＝　１／　（１−（Ｑ／Ａ））但し、Ｑは文字パタンを例えば２×２の窓を用いで走査
したときに窓内の全ての点が黒ビットとなる状態の個数
、Ａは文字外接枠内の文字パタンの全黒ビット数を表す
。W= 1/ (1-(Q/A)) However, Q is the number of states in which all points within the window are black bits when a character pattern is scanned using, for example, a 2 x 2 window, and A is the number of states in which all points within the window are black bits. Represents the total number of black bits of the character pattern within the character circumscribing frame.

この線幅Ｗに基づいてサブパタンを抽出するために用い
る閾値βｔｈ＝Ｎ−ＷＬ（但し、Ｎはサブパタン抽出時
の主走査方向毎に任意好適（こ設定される定数）を得る
。Threshold value βth used to extract a sub-pattern based on this line width W=N-WL (where N is an arbitrary constant value set for each main scanning direction when extracting a sub-pattern).

そして、文字パタンを複数の方向に走査して各走査毎に
走査線上の黒ビットの連続個数Ａ！検出し、閾値ｕ　ｔ
ｈ及び黒ビット連続個数１に基づき各走査方向別のサブ
パタンを文字パタンがら抽出する。ｌ≧Ａ　ｔｈとなる
連続個数ｌの黒ビット群をサブパタンの黒ビット部分と
しで、及び走査線上の白ビット及びｌ＜ｌｔｌ、となる
黒ビット群をサブパタンの白ビット部分として抽出する
。Then, the character pattern is scanned in multiple directions, and for each scan, the number of consecutive black bits on the scanning line is A! Detect, threshold value u t
Based on h and the number of consecutive black bits (1), sub-patterns for each scanning direction are extracted from the character patterns. A continuous number of black bits such that l≧A th is extracted as the black bit portion of the subpattern, and white bits on the scanning line and a black bit group such that l<ltl are extracted as the white bit portion of the subpattern.

次いで認識部２８は、文字パタンの文字外接枠内の領域
に対応するサブパタンの領域をＮＸＭ　（Ｎ及びＭは自
然数）個の分割領域に分割し、各分割領域毎に分割領域
内の文字線長を表す特徴量を抽出し、抽出した各分割領
域の特徴量をそれぞれ文字の大きざで正規化する。例え
ば、文字の大きざとして（ΔＸ十△Ｙ）／２７ａ用い、
この文字の大きざで各分割領域毎の特徴量を除すことに
よって特徴ＪＩＶ正規化する。但し、ΔＸは文字パタン
上の文字外接枠のＸ軸に沿う方向の長さ及びΔＹは文字
パタン上の文字外接枠のＹ軸に沿う方向の長さである。Next, the recognition unit 28 divides the area of the sub-pattern corresponding to the area within the character circumscribing frame of the character pattern into NXM (N and M are natural numbers) divided areas, and calculates the character line length within the divided area for each divided area. , and normalize the extracted feature values of each divided region according to the size of the characters. For example, using (ΔX+ΔY)/27a as the character size,
The feature JIV is normalized by dividing the feature amount of each divided region by the size of the character. However, ΔX is the length of the character circumscribing frame on the character pattern in the direction along the X-axis, and ΔY is the length of the character circumscribing frame on the character pattern in the direction along the Y-axis.

そして認識部２８は各サブパタンの各分割領域毎に得た
特＠量から戊る特徴マトリクスを一文字単位に得る。Then, the recognition unit 28 obtains a feature matrix for each character from the characteristic quantities obtained for each divided region of each sub-pattern.

第６図に示す文字゛Ｕ°の文字パタンを４つの異なる方
向に走査して４つのサブパタンを抽出した場合の各走査
サブパタン別の特徴マトリクスを第７図（Ａ）〜（Ｄ）
に示す。Figures 7 (A) to (D) show feature matrices for each scanning sub-pattern when the character pattern of the character "U°" shown in Figure 6 is scanned in four different directions to extract four sub-patterns.
Shown below.

次いで文字パタンの特徴マトリクスを辞書マトリクスと
照合し類似度Ｒが最も大きくなる辞書マトリクスに対応
する文字名を認識結果としで出力但し、ｉは各分割領域
に付される番号、ｆ、は番号ｉを付した分割領域の正規
化された特徴量及び９１は特徴量ｆｉに対応する辞書マ
トリクスの特ｆｆｉ：ｌを表す。Next, the feature matrix of the character pattern is compared with the dictionary matrix, and the character name corresponding to the dictionary matrix with the largest similarity R is output as the recognition result, where i is the number assigned to each divided area, and f is the number i. The normalized feature amount of the divided region marked with and 91 represent the feature ffi:l of the dictionary matrix corresponding to the feature amount fi.

この実施例の認識部２８は、ひとつのカテゴリに対して
複数例えば２種の辞書マトリクスを有する。ひとつのカ
テゴリ例えば文字′す′に対しで用意した２ｆ４！の辞
書マトリクスを第８図及び第９図に示す。第８図（Ａ）
〜（Ｄ）に示す辞書マトリクスは第７図（Ａ）〜（Ｄ）
に対応する各サブパタン別のイタリック体の辞書マトリ
クス、及び第９図（Ａ）〜（Ｄ）に示す辞書マトリクス
は第７図（Ａ）〜（Ｄ）に対応する各サブパタン別の標
準体の辞書マトリクスである。第７図の特徴マトリクス
と第８図の辞書マトリクスとの類似度ＲはＲ＝１１．６
、また第７図の特徴マトリクスと第９図の辞書マトリク
スとの類似度口は円＝１．０であった。The recognition unit 28 of this embodiment has a plurality of, for example, two types of dictionary matrices for one category. 2f4 prepared for one category, for example the character 'su'! The dictionary matrices of are shown in FIGS. 8 and 9. Figure 8 (A)
The dictionary matrices shown in ~(D) are shown in Figures 7(A)~(D).
The dictionary matrices in italics for each sub-pattern corresponding to , and the dictionary matrices shown in FIGS. 9(A) to (D) are standard dictionaries for each sub-pattern corresponding to FIGS. 7(A) to (D). It is a matrix. The similarity R between the feature matrix in Figure 7 and the dictionary matrix in Figure 8 is R = 11.6.
, and the degree of similarity between the feature matrix in FIG. 7 and the dictionary matrix in FIG. 9 was circle=1.0.

第４図に示した行画像データ中の文字の認識結果の出力
例を、第１０図に示す。尚、第１０図に示す出力例は認
識結果例えば文字コードに対応するキャラクタを印字し
たものである。FIG. 10 shows an output example of the recognition result of characters in the line image data shown in FIG. 4. Incidentally, the output example shown in FIG. 10 is a result of the recognition, for example, a character corresponding to a character code is printed.

この発明は上述した実施例（このみ限定されるものでは
なく、従って各構成成分の構成、動作、動作の流れ、入
出力信号、入出力信号の流れ、数値的条件、形状、寸法
及び配設位置を任意好適に変更できる。This invention is applicable to the embodiments described above (but is not limited thereto, and therefore includes the configuration, operation, flow of operation, input/output signals, flow of input/output signals, numerical conditions, shapes, dimensions, and arrangement positions of each component). can be changed as desired.

（発明の効果）上述した説明からも明らかなように、この発明の文字切
出し装置によれば、ブロック長閾値及びブロック長を比
較し、この比較結果に応じて黒ブロックを分類し、この
分類に応した適切な文字パタンの切出し方向を選択する
。従って、文字パタンの切出し精度を向上できる。(Effects of the Invention) As is clear from the above description, the character segmentation device of the present invention compares the block length threshold and the block length, classifies the black block according to the comparison result, and classifies the black block according to the classification. Select the appropriate character pattern cutting direction according to the desired character pattern. Therefore, the accuracy of cutting out character patterns can be improved.

[Brief explanation of drawings]

第１図はこの発明の詳細な説明に供する機能ブロック図
、第２図は読取対象となる帳票の一例を示す図、第３図（
Ａ）及び（Ｂ）は書式データの説明に供する図、第４図は行画像データの一例を示す図、第５図は黒ビッ
ト数分布の一例を示す図、第６図は切出された文字パタ
ンの一例を示す図、第７図（Ａ）〜（Ｄ）は文字パタンの特徴マトリクスの
例を示す図、第８図（Ａ）〜（Ｄ）及び第９図（Ａ）〜（Ｄ）はひと
つのカテゴリに対して用意された異なる種類の辞書マト
リクスの例を示す図、第１０図は認識結果の出力例を示
す図、第１１図（Ａ）及びＣＢ）は行画像データ及び黒
ビット分布数の例を示す図である。１２・・・文字切出し装置、　１４・・・周辺分布作成
手段１６・・・ブロック検出手段８・・・切出し方向選択手段２０・・・切出し手段。Fig. 1 is a functional block diagram providing a detailed explanation of the present invention, Fig. 2 is a diagram showing an example of a form to be read, and Fig. 3 (
A) and (B) are diagrams for explaining format data, Figure 4 is a diagram showing an example of line image data, Figure 5 is a diagram showing an example of black bit number distribution, and Figure 6 is a diagram showing an example of the black bit number distribution. Figures showing examples of character patterns; Figures 7(A) to (D) are diagrams showing examples of character pattern feature matrices; Figures 8(A) to (D) and Figures 9(A) to (D). ) is a diagram showing an example of different types of dictionary matrices prepared for one category, Figure 10 is a diagram showing an example of recognition result output, and Figure 11 (A) and CB) are row image data and black FIG. 3 is a diagram showing an example of bit distribution numbers. 12... Character cutting device, 14... Surrounding distribution creation means 16... Block detection means 8... Cutting direction selection means 20... Cutting out means.

Claims

[Claims]

(1) In a character cutting device that cuts out a character pattern from quantized line image data, the line image data is scanned in a direction substantially orthogonal to the direction along the character line, and black bits on the scanning line are scanned for each scanning line. a marginal distribution creating means for calculating a black bit distribution by counting the number of black bits; a block detecting means for detecting as a black block an area in which the number of black bits is equal to or greater than a bit number threshold with respect to the line image data; When the block length of the black block is smaller than the block length threshold, a direction substantially perpendicular to the direction along the character line is selected as the cutting direction, and when the block length is greater than or equal to the block length threshold, the direction along the character line is selected as the cutting direction. and a cutting direction selection means for selecting a direction that diagonally intersects with the character pattern at an angle θ degrees (where θ is a constant), and a cutting means for cutting out a character pattern from the line image data along the selected cutting direction. A character cutting device featuring: