JPS6362084A

JPS6362084A - Character segmentation system

Info

Publication number: JPS6362084A
Application number: JP61207273A
Authority: JP
Inventors: Mamoru Maeda; 護前田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-09-03
Filing date: 1986-09-03
Publication date: 1988-03-18

Abstract

PURPOSE:To prevent wrong recognition and to improve a recognition rate by removing a noises which exceeds the minimum width of a character when the character is segmented. CONSTITUTION:For example, a number '4' which becomes blurred and is separated into two is segmented. Then, a vertical projection pattern 2 is scanned from left to right to find an interval WLEN to a next pattern. The WLEN is compared with predetermined character pitch and when the WLEN is larger than the character pitch, black length BLEN is found. This BLEN is compared with the predetermined minimum width CHMIN of a character and when BLEN > CHMIN, the vertical projection pattern 2 is further scanned continuously to find the front white length FWLEN. This FWLEN is compared with predetermined white length GAP and when FWLEN >= GAP, it is judged that the FWLEN is caused by the blurring and further front black length FBLEN is found. The sum HW of those BLEN, FWLEN, and FBLEN is calculated and compared with the maximum character width and when the sum is smaller, it is judged as the segmentation width of one character pattern.

Description

【発明の詳細な説明】〔技術分野〕本発明は、ＯＣＲ（光学文字読取り装置）等における文
字切出し方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field] The present invention relates to a character extraction method in OCR (optical character reader) and the like.

[Prior art]

一般にＯＣＲ等においては、文字切出しに垂直射影及び
水平射影が用いられる。しかしながら、原稿上の文字の
一部かすれ等によって１文字のパターンが複数個に分離
した場合、及びノイズがパターンの周囲に存在した場合
には、誤って切り出されることがあるという問題があっ
た。Generally, in OCR and the like, vertical projection and horizontal projection are used for character extraction. However, there is a problem in that when a single character pattern is separated into a plurality of parts due to partially faded characters on the document, or when noise is present around the pattern, it may be cut out incorrectly.

〔the purpose〕

本発明の目的は、ＯＣＲ等において、文字パターンの切
出し時、文字パターンの周辺に存在するノイズ、あるい
は文字のかすれ等によって生じる文字パタ〒ンの分離に
より誤って切り出されるのを防ぐことにある。An object of the present invention is to prevent character patterns from being erroneously cut out due to separation of character patterns caused by noise existing around the character pattern or blurred characters when cutting out a character pattern in OCR or the like.

〔composition〕

本発明は、文字切出し時に幅の狭いパターンを検出した
時には１文字の標準サイズに合せて前方の射影走査を行
い、１つの文字パターンとして統合できるかという判定
する。また、求まった文字幅の範囲内を上方あるいは下
方から走査し、パターンありを検出すると、所定の高さ
だけスキップした位置から走査を再開し、再びパターン
ありを検出した時、前の検出位置と比較して文字パター
ンの上方あるいは下方に含まれるノイズを除去して文字
切出しを行う。According to the present invention, when a narrow pattern is detected during character extraction, forward projection scanning is performed to match the standard size of one character, and it is determined whether it can be integrated as one character pattern. In addition, when a pattern is detected by scanning within the determined character width range from above or below, scanning is resumed from the position skipped by a predetermined height, and when a pattern is detected again, the previous detected position Characters are extracted by comparing and removing noise contained above or below the character pattern.

以下、本発明の一実施例について図面により説明する。An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の文字切出し方式を説明するための図で
あり、第１図（ａ）は文字パターンの文字幅を決定する
場合の図、第１図（ｂ）は文字パターンの上方あるいは
下方に存在するノイズを除去する場合の図を示している
。なお、第１図では、１つの文字パターンは７セグメン
トで構成されるとしている。第１図（ａ）において、１
は読み取られた一つの文字パターン行、２はその垂直射
影パターンである。周知のように、垂直射影パターン２
は文字パターン行２を例えば左から右に順次、垂直方向
に黒ドツトがあるかどうか見ていき、黒ドツトがあれば
“１”、なければ“Ｏ”とすることで得られる。また、
第１図（ｂ）において、３は一つの文字パターン、４は
該文字パターンの上方に存在するノイズである。FIG. 1 is a diagram for explaining the character cutting method of the present invention. FIG. 1(a) is a diagram when determining the character width of a character pattern, and FIG. This figure shows a case where noise present below is removed. In FIG. 1, it is assumed that one character pattern is composed of seven segments. In Figure 1(a), 1
is one read character pattern line, and 2 is its vertical projection pattern. As is well known, vertical projection pattern 2
can be obtained by checking character pattern row 2 sequentially from left to right, for example, to see if there is a black dot in the vertical direction, and setting it to "1" if there is a black dot, and "O" if there is no black dot. Also,
In FIG. 1(b), 3 is one character pattern, and 4 is noise existing above the character pattern.

初め、第１図（ａ）により、切出すべき文字パターンの
横方向の文字幅を決定する処理について説明する。第２
図はこの場合の処理フローを示したものである１例とし
て、二Ｎでは第１図（ａ）中の数字「４」　（これは、
かすれ等により２つに分離されている）を切り出すもの
とする。First, the process of determining the horizontal character width of a character pattern to be cut out will be explained with reference to FIG. 1(a). Second
The figure shows the processing flow in this case.As an example, in 2N, the number "4" in FIG. 1(a) (this is
(separated into two due to blurring etc.) shall be cut out.

数字「５」のパターンを切り出した後、垂直射影パター
ン２を左から右に走査して、次のパターンとの間隔ＷＬ
ＥＮを求める（ステップ１０１）。After cutting out the number "5" pattern, scan the vertical projection pattern 2 from left to right to find the distance WL from the next pattern.
EN is determined (step 101).

このＷＬＥＮを予め定めた文字ピッチＰ　Ｉ　ＴＣＨと
比較し【ステップ１０２）　、　ＷＬＥＮ＜Ｐ　Ｉ　Ｔ
ＣＨであれば、そのま＼ステップ１０４に進み、ＷＬＥ
Ｎ≧ＰＩＴＣＨであればスペース処理（ステップ１０３
）を行った後、ステップ１０４に進む。ステップ処理で
は、ＷＬＥＮ≧ＰＩＴＣＨの場合、当該間隔領域をスペ
ースとして数える。スペニス数はＷＬＥＮ／ＰＩＴＣＨ
で求める。もし、後処理でスペースを出力しない方がよ
い場合には、ＰＩＴＣＨ＝Ｏと指定しておき、スペース
数を求める前にＰＩＴＣＨ＝Ｏをチェックし、ＰＩＴＣ
Ｈ＝Ｏであれば、スペース数を求める処理をスキップし
、スペースサプレスを行う。This WLEN is compared with a predetermined character pitch PITCH (step 102), and WLEN<PITCH.
If it is CH, proceed directly to step 104 and use WLE.
If N≧PITCH, space processing (step 103
), the process proceeds to step 104. In step processing, if WLEN≧PITCH, the interval region is counted as a space. The spanene number is WLEN/PITCH
Find it with If it is better not to output spaces in post-processing, specify PITCH=O, check PITCH=O before calculating the number of spaces,
If H=O, the process of calculating the number of spaces is skipped and space suppression is performed.

ステップ１０４では、次の黒長さＢＬＥＮを求める。こ
のＢＬＥＮを予め定めた文字の最小幅ＣＨＭＩＮと比較
しくステップ１０５）、ＢＬＥＮ≦ＣＨＭＩＮであれば
、この黒領域はノイズと見做して、そのま＞ＷＬＥＮに
加え（ステップ１０６）、ＢＬＥＮ＞ＣＨＭＩＮの場合
には、さらに垂直射影パターン２の走査を続けて前方の
白長さＦＷＬＥＮを求める（ステップ１０７）、このＦ
ＷＬＥＮを予め定めた白長さＧＡＰと比較しくステップ
１０８）　、ＦＷＬＥＮ≧ＧＡＰであれば。In step 104, the next black length BLEN is determined. This BLEN is compared with the predetermined minimum width CHMIN of characters (step 105), and if BLEN≦CHMIN, this black area is regarded as noise and is added to >WLEN (step 106), and BLEN>CHMIN In this case, the vertical projection pattern 2 is further scanned to obtain the front white length FWLEN (step 107).
Compare WLEN with a predetermined white length GAP (step 108), if FWLEN≧GAP.

ＦＷＬＥＮは文字等のかすれが原因で生じたと判断して
、さらに前方の黒長さＦＢＬＥＮを求める（ステップ１
０９）。そして、ＢＬＥＮ＋ＦＷＬＥＮ＋ＦＢＬＥＮ＝
ＨＷを計算しくステップ１１０）、このＨＷを予め定め
た最大文字幅ＣＨＭＡＸと比較しくステップ１１１）、
ＨＷ≦ＣＨＭＡＸであれば、このＨＷを１つの文字パタ
ーンに対する切出し幅とする（ステップ１１２）、一方
、ＨＷ＞ＣＨＭＡＸの場合は、ＢＬＥＮとＦＢＬＥＮの
部分をそれぞれ別の文字パターンとする（ステップ１１
３）、これはステップ１０８でＦＷＬＥＮ＞ＧＡＰが判
定された場合も同様である。It is determined that FWLEN is caused by blurred characters, etc., and the front black length FBLEN is further determined (step 1).
09). And BLEN+FWLEN+FBLEN=
Calculate the HW (step 110), compare this HW with a predetermined maximum character width CHMAX (step 111),
If HW≦CHMAX, this HW is set as the cutting width for one character pattern (step 112).On the other hand, if HW>CHMAX, the BLEN and FBLEN parts are set as separate character patterns (step 11).
3), this also applies when it is determined in step 108 that FWLEN>GAP.

次に、第１図（ｂ）により、切出すべき文字パターンの
上方あるいは下方に存在するノイズを除去する処理につ
いて説明する。第３図はこの場合の処理フローを示した
ものである。Next, referring to FIG. 1(b), a process for removing noise existing above or below a character pattern to be cut out will be explained. FIG. 3 shows the processing flow in this case.

文字パターン３の文字幅（ＨＷ’　とする）が第２図の
如き処理により求まったなら、該文字幅ＨＷ′の範囲内
を例えば上方から下方に向けて走査し、最初に黒画素を
検出した位置で走査を一担停止しその位置（ｘｔ−ｙ□
）を求める（ステップ２０１）０次に、該位置のＹ座標
Ｙ１に文字パターンの最小高さＣＨＭＩＮを加え（ステ
ップ２０２）、その位置から走査を再開して次に黒画素
を検出したら走査を再び停止して該位置（Ｘ、、　Ｙ、
）を求める（ステップ２ｏ３）。こ＼で、最初に停止し
た位置のＸ座標Ｘ１と次に停止した位置のＸ座標ｘ２と
を比較しくステップ２０４）、Ｘ１＝Ｘ、であれば、最
初に停止した位置（Ｘユ、Ｙｌ）をパターンの頂点とす
る（ステップ２０５）、一方、Ｘ１〜Ｘ２であれば、２
回目に停止した位置を最初に停止した位置と見做し、即
ち、ｘ２→Ｘ□、Ｙ２→Ｙ□として（ステップ２０６）
、Ｘ□＝ｘ２の位置が見つかるまでステップ２０２以降
の処理を繰り返す。これにより、第１図（ｂ）に示すよ
うなパターン３の上方に存在するノイズを除去すること
ができる。同様にして、求まった文字幅の範囲内を下方
から上方に向けて走査することにより、パターンの底点
が求まる１文字パターンを切出す際、その縦方向の幅は
頂点と底点の範囲内とすればよい。Once the character width (HW') of character pattern 3 has been determined by the process shown in Figure 2, the range of the character width HW' is scanned, for example, from the top to the bottom, and black pixels are detected first. Stop the scanning at the position and move to that position (xt-y□
) (Step 201) 0 Next, add the minimum height CHMIN of the character pattern to the Y coordinate Y1 of the position (Step 202), restart scanning from that position, and then restart scanning when a black pixel is detected. Stop and move to the corresponding position (X, Y,
) is determined (step 2o3). Now, compare the X coordinate X1 of the first stopped position and the X coordinate x2 of the next stopped position (step 204). If X1=X, then the first stopped position (XY, Yl) is the vertex of the pattern (step 205). On the other hand, if X1 to X2, 2
The position at which it stopped the second time is regarded as the position at which it stopped for the first time, that is, x2→X□, Y2→Y□ (step 206)
, X□=x2 is found, the processing from step 202 onward is repeated. Thereby, noise existing above the pattern 3 as shown in FIG. 1(b) can be removed. Similarly, when cutting out a single character pattern whose bottom point is found by scanning from the bottom to the top within the range of the found character width, its vertical width is within the range of the top and bottom points. And it is sufficient.

第４図は本発明方式を実現するハードウェア構成例の概
略ブロック図である。第４図において、イメージメモリ
１４には文字パターンデータとその垂直射影パターンが
格納されている。いま、ＣＦＵＬＬから垂直射影走査が
指定されると、走査制御部１２はアドレス生成部１３に
対してイメージメモリ１４内の垂直射影パターン格納ア
ドレスの初期値をロードすると共に、アドレス生成部１
３をカウントイネーブル状態にする。この結果。FIG. 4 is a schematic block diagram of an example of a hardware configuration for realizing the method of the present invention. In FIG. 4, the image memory 14 stores character pattern data and its vertical projection pattern. Now, when vertical projection scanning is specified from CFULL, the scan control section 12 loads the initial value of the vertical projection pattern storage address in the image memory 14 into the address generation section 13, and also loads the address generation section 13 with the initial value of the vertical projection pattern storage address.
3 to count enable state. As a result.

イメージメモリ１４内の垂直射影パターンが走査され、
走査制御部１２に順次取り込まれる。走査制御部１２で
は、この取り込まれた垂直射影パターンの黒ラン、白ラ
ンを検出し、その始点／終点アドレスをＣＰＵＩＩに送
る。ＣＰＵＩＩでは走査制御部１２から送られた黒ラン
、白ランの始点／終点アドレスにもとづいて第２図の処
理を実行し、切出すべきパターンの文字幅（横方向）を
求める。a vertical projection pattern in image memory 14 is scanned;
The images are sequentially taken in by the scan control section 12. The scan control unit 12 detects black runs and white runs of the captured vertical projection pattern, and sends the start point/end point addresses to the CPU II. The CPU II executes the process shown in FIG. 2 based on the start/end addresses of the black run and white run sent from the scan control section 12 to determine the character width (horizontal direction) of the pattern to be cut out.

次に、切出すべき文字パターンの上方あるいは下方に存
在するノイズを除去する処理の場合には、ＣＰＵ１１は
走査制御部１２に対して文字パターン走査とその走査範
囲を指定する。これを受けて走査制御部１２はアドレス
生成部１３に対してイヌニジメモリ１４内の文字パター
ン格納アドレスの初期値をロードし、該アドレス生成部
１３をカウントイネーブル状態にする。これにより、イ
メージメモリ１４内の該当文字パターン領域が上方から
下方あるいは下方から上方に走査され、その結果が走査
制御部１２に順次取り込まれる。走査制御部１２は黒画
素を検出すると、アドレス生成部１３のカウントアツプ
を一時中止して、即ち、走査を一時中止して、そのアド
レスをＣＰＵＩＩに送り、ＣＰＵＩ　１から次の走査開
始位置の指示を受けて走査を再開する。そして、黒画素
を検出すると、再び走査を中止してそのアドレスをＣＰ
Ｕ１ｌに送り、Ｃ，ＦＵＬＬからの次の指示を待つ。Next, in the case of processing to remove noise existing above or below the character pattern to be cut out, the CPU 11 specifies the character pattern scan and its scanning range to the scan control unit 12. In response to this, the scan control section 12 loads the initial value of the character pattern storage address in the digital memory 14 into the address generation section 13, and puts the address generation section 13 into a count enable state. As a result, the corresponding character pattern area in the image memory 14 is scanned from the top to the bottom or from the bottom to the top, and the results are sequentially fetched into the scan control section 12. When the scan control unit 12 detects a black pixel, the scan control unit 12 temporarily suspends the count up of the address generation unit 13, that is, suspends the scan, sends the address to the CPU II, and instructs the next scan start position from the CPU 1. and resume scanning. When a black pixel is detected, scanning is stopped again and the address is transferred to CP.
Send it to U1l and wait for the next instruction from C, FULL.

ＣＰＵＩ　１では走査制御部１２から送られた黒画素の
アドレスにもとづいて第３図の処理を実行し、走査制御
部１２に対して走査の再開を指示したり。The CPU 1 executes the process shown in FIG. 3 based on the black pixel address sent from the scan control section 12, and instructs the scan control section 12 to resume scanning.

パターンの頂点、底点を求める。Find the top and bottom points of the pattern.

〔effect〕

以上の説明から明らかな如く、本発明によれば、文字切
出しの際、文字の最小幅を越えるノイズを効率よく除去
でき、また、かすれ等により文字パターンが分離した場
合にも容易に統合でき、ＯＣＲ等における認識率の向上
がもたらされる。As is clear from the above description, according to the present invention, when cutting out characters, it is possible to efficiently remove noise that exceeds the minimum width of characters, and even when character patterns are separated due to blurring, etc., it is possible to easily integrate them. This results in an improvement in the recognition rate in OCR and the like.

[Brief explanation of the drawing]

第１図は本発明の文字切出し方式の原理説明図、第２図
及び第３図は本発明による文字切出し方式の一実施例の
フローチャート、第４図は本発明方式を実現するハード
ウェア構成例を示す図である。１１・・・ＣＰＵ、　　１２・・・走査制御部、１３・
・・アドレス生成図、　　１４・・・イメージメモリ。ＦＷＬＥＮ第　　２　　図FIG. 1 is a diagram explaining the principle of the character extraction method of the present invention, FIGS. 2 and 3 are flowcharts of an embodiment of the character extraction method of the present invention, and FIG. 4 is an example of a hardware configuration for realizing the method of the present invention. FIG. 11... CPU, 12... Scanning control section, 13.
...Address generation diagram, 14...Image memory. FWLEN Figure 2

Claims

[Claims]

(1) When a narrow pattern is detected in a method of character segmentation using vertical or horizontal projection, forward projection scanning is performed to match the standard size of the character, and it is determined whether or not it can be integrated as a single character pattern. A character cutting method featuring

(2) When the character pattern is scanned and the first black pixel is detected, the scanning is restarted by shifting a predetermined position, and the noise part is extracted from the positional relationship between the position of the first detected black pixel and the next detected black pixel. The character cutting method according to claim 1, characterized in that the character is removed.