JPS58106665A - Character line cutout method - Google Patents

Character line cutout method

Info

Publication number
JPS58106665A
JPS58106665A JP56204636A JP20463681A JPS58106665A JP S58106665 A JPS58106665 A JP S58106665A JP 56204636 A JP56204636 A JP 56204636A JP 20463681 A JP20463681 A JP 20463681A JP S58106665 A JPS58106665 A JP S58106665A
Authority
JP
Japan
Prior art keywords
projection
block
character
line
projections
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP56204636A
Other languages
Japanese (ja)
Inventor
Mamoru Maeda
護 前田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP56204636A priority Critical patent/JPS58106665A/en
Publication of JPS58106665A publication Critical patent/JPS58106665A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

PURPOSE:To cut out character lines having a large slope surely, by splitting a document picture into blocks, obtaining the projection of each block toward the character line direction and checking the connection of the projections of each block. CONSTITUTION:According to an address control of a R/W control circuit 103, a picture data in a buffer memory device 101 is read out from a head line sequentially. In case of character cutout, a document picture is split into >=3 blocks toward the character line direction. A projection detecting circuit 104 checks whether or not black picture elements exist for a prescribed number or over at each block and detects the projection toward the character line direction at each block. When the detection of projection is finished, a connection discriminating circuit 105 checks the connection of projections and cuts out the character line. First, the overlapping between the projection of a block and an adjacent block is checked and the overlapped projections are connected. This operation is done for all the projections.

Description

【発明の詳細な説明】 本発明は、文書画像上の文字行を切り出す方法に関する
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a method for cutting out character lines on a document image.

光学的文字認識装置などにおい・では、読み取った文書
画像から個々の文字ノ(・ターンを抽出するに先立ち、
文字行の切り出しを行なう必要がある。
In optical character recognition devices, etc., before extracting individual character turns from a read document image,
It is necessary to cut out character lines.

この文字行の切出しは、通常、文書画像の文字行方向(
横書き文書ならば水平方向、縦書き文書ならば垂直方向
)の1射影をとることによって行なっている。しかし、
文字行に傾きがあったり、文書原稿のスキューが大きい
と、傾きの分だけ射影が拡がり、文字行の切出しが不正
確になる。傾きゃスキューがさらに大きくなると、隣接
する文字行の射影同士が重なってしまい、切出しが不可
能となる。
This text line extraction is usually done in the text line direction (
This is done by taking one projection in the horizontal direction for a horizontally written document, and in the vertical direction for a vertically written document. but,
If a character line is tilted or the skew of the document is large, the projection will expand by the tilt, making the extraction of the character line inaccurate. If the skew becomes even larger, the projections of adjacent character lines will overlap, making extraction impossible.

このような問題を解決するために、例えば横書き文書画
像の左端部と右端部とについて、それぞれ水平方向の射
影を求め、各射影の中心を起点として斜め方向に走査し
て黒画素を計数し、黒画素数が最大となる走査方向を文
字行の方向として検出することにより、文字行を切り出
す方法も提案され′Cいる。しかしこの方法は、斜め方
向の走萱が必要である等、アルゴリズムおよび装置構成
が複雑になる傾向がある。さらに、傾きが大きく、黒画
素数の多い長い文字行に挾まれた短い文字行は、切出し
を失敗する恐れもある。
To solve this problem, for example, the horizontal projections are obtained for the left and right edges of a horizontally written document image, and black pixels are counted by scanning diagonally from the center of each projection. A method of cutting out a character line by detecting the scanning direction in which the number of black pixels is maximum as the direction of the character line has also been proposed. However, this method tends to require complicated algorithms and device configurations, such as the need for diagonal running. Furthermore, there is a risk that a short character line sandwiched between long character lines with a large slope and a large number of black pixels may fail to be extracted.

したかつ・C本発明の目的は、傾きの大きい文字行を従
来よりも確実に切り出すことのできる、新規な文字行切
出し方法を提供することである。
An object of the present invention is to provide a new method for cutting out character lines that can more reliably cut out character lines with large inclinations than before.

本発明のもう1つの目的は、切出しのアルゴリズムが簡
単で、かつ大きな記憶域を必要とせず、装置化が容易な
文字行切出し方法を提供することにある。
Another object of the present invention is to provide a character line extraction method that has a simple extraction algorithm, does not require a large storage area, and is easy to implement.

しかして本発明の文字切出し方法は、文書画像を文字行
方向に3つ以上のブロックに分割し、該文書画像を走査
し゛C個々のブロック毎に文字行方向(横書き文書なら
水平方向、縦書き文書なら垂IP−/J IHJ J 
v)2u釉也小αノ、Lり未) i: シ’C求メタ各
ブロックの射影間の連結を調べることにより、該文書画
像上の個々の文字行を切り出すことを特徴とするもので
ある。
Therefore, the character cutting method of the present invention divides a document image into three or more blocks in the character line direction, and scans the document image for each block in the character line direction (horizontal direction for horizontal writing documents, vertical writing direction). For documents, Tari IP-/J IHJ J
v) 2u Glaze Yako αノ, L Rimi) i: C'C seeking meta This is characterized by cutting out individual character lines on the document image by examining the connections between the projections of each block. be.

以下、一実施例によっ°C本発明を具体的に説明する。Hereinafter, the present invention will be specifically explained using an example.

第1図は、本発明(二よる文字行切出しを実行する装置
の一例を示すブロック図である。
FIG. 1 is a block diagram showing an example of an apparatus for executing character line segmentation according to the present invention.

50は文書の1ページまたは複数ページ分の画像データ
が蓄積されている画像メモリである。101は文字行切
出し用の記憶域として用いられるバッファメモリ装置で
あり、最も傾斜した1文字行分に相当する走査ライン数
(N)の画像データを蓄積できる容量を有する。
Reference numeral 50 denotes an image memory in which image data for one or more pages of a document is stored. Reference numeral 101 denotes a buffer memory device used as a storage area for cutting out character lines, and has a capacity capable of storing image data of the number (N) of scanning lines corresponding to one character line with the highest inclination.

102は画像を含む走査ラインを検出する画像検出回路
であり、本例ではノイズの影響を避けるために黒画素が
2画素以上連続する走査ラインを画像ラインとして検出
する。103はバッファメモリ装置101ノ読み書き(
f(、/W )をiむ1]御す6 R/W制御回路であ
る。104は射影検出回路、105は射紗り恥鮎r胴へ
−(又手付を識別する連結判定回路である。
Reference numeral 102 denotes an image detection circuit that detects a scanning line containing an image. In this example, in order to avoid the influence of noise, a scanning line in which two or more black pixels are consecutive is detected as an image line. 103 is a buffer memory device 101 read/write (
This is a R/W control circuit that controls f(,/W). 104 is a projection detection circuit, and 105 is a connection judgment circuit for identifying the shairi-shai-ayu-r-dou (also, the touch).

つぎに、第2図に概念的に示すような横書き文書画像を
例にして、文字行の切出し動作を説明する。
Next, the operation of cutting out character lines will be explained using a horizontally written document image as conceptually shown in FIG. 2 as an example.

几/W制御回路103は画像メモリ装置5oに対し、文
書の上端より画像データの転送を要求する。画像メモリ
装置50より読み出された画像データは、R/W制御回
路103の制御下でバッファメモリ装置101に順次書
き込まれるが、同時に画像検出回路102にも入力され
る。
The /W control circuit 103 requests the image memory device 5o to transfer image data from the top of the document. Image data read from the image memory device 50 is sequentially written into the buffer memory device 101 under the control of the R/W control circuit 103, but is also input to the image detection circuit 102 at the same time.

画像検出回路102が最初の画像ラインを検出して信号
を出力すると、几/W制御回路103はその走査ライン
を先頭ラインt1(第2図)とし、それからN本口の最
終ラインtN(第2図)の画像データがバッファメモリ
装置1o1に読み込まれた時点で、画像メモリ装置間に
対して画像データの転送停止を要求する。これにより、
バッファメモリ装置101に走査ラインt1〜tNのN
912分の画像データが蓄積される。なお、先頭ライン
t1より前の走査ラインの画像データは捨゛Cられる。
When the image detection circuit 102 detects the first image line and outputs a signal, the 几/W control circuit 103 sets the scanning line as the first line t1 (FIG. 2), and then selects the last line tN (second line) of the N main line. When the image data shown in the figure) is read into the buffer memory device 1o1, a request is made to stop the transfer of image data between the image memory devices. This results in
N of scanning lines t1 to tN is stored in the buffer memory device 101.
912 minutes of image data is accumulated. Note that the image data of the scanning lines before the first line t1 are discarded.

このようにしてバッファメモリ装置101が満杯になる
と、そこに蓄積された画像データを走査して文字行切出
し操作を開始する。
When the buffer memory device 101 becomes full in this way, the image data stored therein is scanned and a character line cutting operation is started.

すなわち、R/W制御回路103のアドレス制御にした
がって、バッファメモリ装置101内の画像データが先
頭ラインt1から順番に読み出される。
That is, according to the address control of the R/W control circuit 103, the image data in the buffer memory device 101 is sequentially read out from the first line t1.

この際、読出し中の走査ライン番号(先頭ラインt1か
らの相対的なライン番号)と、ブロック番号が几/W制
御回路103より射影検出回路104へ送出される。
At this time, the scanning line number (relative line number from the first line t1) being read out and the block number are sent from the L/W control circuit 103 to the projection detection circuit 104.

ここでブロック番号について説明する。Here, block numbers will be explained.

本発明では文字切出しの際に、文書画像を文字行方向に
3つ以上のブロックに分割して扱う。本例では第2図に
点線で示すように、文書画像をN〜Gの7ブロツクに分
割している。これら各ブロックN〜Gの識別゛番号が上
記のブロック番号である。
In the present invention, when cutting out characters, a document image is divided into three or more blocks in the character line direction. In this example, the document image is divided into seven blocks N to G, as shown by dotted lines in FIG. The identification numbers of these blocks N to G are the above-mentioned block numbers.

さ′C1射影検出回路104はバッファメモ1.I’4
f101から読み出される各走査ラインにつぃ°C1各
ブロック毎に揖i#車が所安数しjト在左すス孔胛べて
、ブロック毎に文字行方向(主走査方向)の射影を検出
する。このようにして各ブロックにつぃC求められた射
影を、第2図に縦矢線a1.b1゜・・・・・・11で
概念的に示す。なお、ブロックB、C。
The C1 projection detection circuit 104 uses buffer memory 1. I'4
For each scanning line read from f101, a number of digits are placed in each block, and a projection in the character line direction (main scanning direction) is performed for each block. To detect. The projections obtained for each block in this way are shown in FIG. 2 by the vertical arrows a1. b1°...11 is conceptually shown. In addition, blocks B and C.

Eのように文字行が2行存在するブロックでは、同一ブ
ロックに2つの射影が検出されることは当然である。射
影検出回路104はフリック別に検出した各射影の上端
と下端の走査ライン番号を検出し保持する。
In a block such as E where there are two character lines, it is natural that two projections are detected in the same block. The projection detection circuit 104 detects and holds the scanning line numbers of the upper and lower ends of each projection detected for each flick.

上記の射影検出が終了すると、連結判定回路105が射
影相互の連結を調べて文字行を切り出す。まず、各ブロ
ックの射影とその隣妙のブロックの射影との重なりを調
べ、重なり合う射影を連結する。
When the projection detection described above is completed, the connection determination circuit 105 examines the connection between the projections and cuts out a character line. First, the overlap between the projection of each block and the projection of its neighboring block is checked, and the overlapping projections are connected.

この操作を全′Cの射影につい°C行なう。This operation is performed for all the projections of 'C'.

第2図の例では、隣接するブロックF、Gの射影f1+
’lの範囲に共通の走査ラインを含むから、射影f1+
’lは連結する。ブロックE、Fの射影e1+hの範囲
に共通の走査ラインを含むから相互に連結され、したが
つ・C射影e1+ r1+ 91が連結される。ブロッ
クDの射影d1とブロックEの射影e1とは重ならない
から連結されない。
In the example of FIG. 2, the projection f1+ of adjacent blocks F and G
Since the common scanning line is included in the range of 'l, the projection f1+
'l is concatenated. The projections e1+h of blocks E and F include a common scanning line, so they are interconnected, and therefore the C projections e1+r1+ 91 are interconnected. Projection d1 of block D and projection e1 of block E do not overlap, so they are not connected.

このようにし゛C1第2図の例では射影a1.b1゜C
1、−射影eII fII21、射影b2I021dl
、C2がそれぞれ一群として連結される。
In this way, in the example of C1 in FIG. 2, projection a1. b1゜C
1, -projection eII fII21, projection b2I021dl
, C2 are each connected as a group.

つぎに、先頭ラインt1を含む射影を含む連結された射
影群と他の射影群(または孤立した射影)との連結を調
べる。もし、先頭ラインを含む射影群に全ブロックの射
影が含まれ”でいれば、その射影群の範囲に文字行があ
ると判定駿、左右端の射影の中心を通る直線を基準とし
て、一定幅の帯状領域を文字行として切り出し、その領
域を示す情報を出力する。
Next, the connection between the connected projection group including the projection including the leading line t1 and other projection groups (or isolated projections) is examined. If the projection group including the first line contains the projections of all blocks, it is determined that there is a character line within the range of the projection group. A strip-shaped area is cut out as a character line, and information indicating the area is output.

先頭ラインt1を含む射影群に全ブロックの射影が含ま
れていなければ、その射影群のうちの左右端の射影の中
心を結ぶ直線の延長線と交叉する他の射影(群)を捜す
。それが存在すれば、その射影(群)と先頭ラインt1
を含む射影群とを連結し、その中で左右端の射影の中心
を通る直線を基準とし、一定幅の帯状領域を文字行とし
て切り出す。連結すべき他の射影(群)がなければ、先
頭ラインt1を含む射影群の中の左右端の射影の中心を
通る直線を基準とし、一定幅の帯状領域を文字行としC
切り出す。
If the projection group including the leading line t1 does not include projections of all blocks, another projection (group) that intersects with the extension of the straight line connecting the centers of the left and right projections in the projection group is searched. If it exists, its projection(s) and the first line t1
, and a straight line passing through the center of the left and right projections is used as a reference, and a band-shaped area of a constant width is cut out as a character line. If there are no other projections (groups) to be connected, use a straight line passing through the center of the left and right projections in the projection group including the first line t1 as a reference, and define a character line as a strip area of a constant width C.
break the ice.

第2図の例では、文字列(00・・・・・・00,11
1・・・・・・111)を含む文字行が切り出されるこ
とになる。
In the example in Figure 2, the character string (00...00,11
1...111) will be cut out.

以上のようにして1つの文字行の切出しを終了すると、
几/W制御回路103は文字行とし°C切り出された領
域につい゛Cバッファメモリ装置101内の画像データ
を消去(クリア)する。ついで几/W制御回路103は
、消去しない範囲で最も高い位置にある射影の上端の走
査ラインを改めて先頭ラインt1′とする。第2図の例
では射影e2の上端の走査ラインを新し、い先頭ライン
t1′とする。そし°CR,/W制御回路103は画像
メモリ装置間に対して、前回転送した最後の走査ライン
の次の走査ラインから画像データの転送を要求する。・
ノくラフアメモリ装置101に先頭ラインt1′から走
査ラインN本分の画像データが蓄積すると、R/W制御
回路103は画像メモリ装[50に対して画像データの
転送停止を要求する。その後、前述したと同様の文字行
切出し操作が開始される。
When you finish cutting out one character line as above,
The /W control circuit 103 erases (clears) the image data in the C buffer memory device 101 for the area cut out as a character line. Next, the L/W control circuit 103 redefines the scanning line at the upper end of the projection at the highest position in the non-erased range as the leading line t1'. In the example shown in FIG. 2, the scanning line at the upper end of the projection e2 is new and is set as the first line t1'. Then, the CR,/W control circuit 103 requests image data to be transferred between the image memory devices from the scan line next to the last scan line transferred previously.・
When the image data for N scanning lines from the first line t1' is accumulated in the rough area memory device 101, the R/W control circuit 103 requests the image memory device 50 to stop transferring the image data. Thereafter, a character line cutting operation similar to that described above is started.

以上の動作の繰り返しにより、文書画像上の全′Cの文
字行が切り出される。
By repeating the above operations, all 'C' character lines on the document image are cut out.

なお、以上の説明では、文字行として切り出された領域
の画像データ(〕(ツラフアメモリ置101内)をクリ
アしたが、これを行なう代りに、その領域内の射影をそ
の後の文字切出し時の連結対象から除外するような処理
を行なってもよUN Qまた、射影検出と連結判定とを
並行して実行することも可能である。
In the above explanation, the image data ( ) (in the Turahua memory area 101) of the area cut out as a character line is cleared, but instead of doing this, the projection in that area is used as the concatenation target for subsequent character cutting. It is also possible to perform processing such as excluding the data from UNQ.It is also possible to perform projection detection and connection determination in parallel.

本発明は以上に詳述した如くであるから、傾きの大きな
文字行−や、長い文字行の間に挾まれた短い文字行も確
実に切出し可能であり、また切出しアルゴリズムも簡単
で大容敞の記憶域も不要であるため切出し装置も安価に
実現できる等、極めて大きな効果を奏するものである。
Since the present invention has been described in detail above, it is possible to reliably cut out a character line with a large inclination or a short character line sandwiched between long character lines, and the extraction algorithm is simple and can be used to cut out a large volume. Since no storage area is required, the cutting device can be realized at a low cost, and has extremely large effects.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の一実施例を示すブロック図、第2図は
文字切出し過程を説明するためのIシ]である。 関・・・画像メモリ装置、101・・・文字切出し用の
ノζッファメモリ装置、102・・・画像検出回路、1
03・・・R/W (読み書き)制御回路、104・・
・射影検出回路、105・・・連結判定回路。 代理人 弁理士 鈴 木   誠
FIG. 1 is a block diagram showing one embodiment of the present invention, and FIG. 2 is a block diagram for explaining the character extraction process. Seki: Image memory device, 101: No.Z buffer memory device for character extraction, 102: Image detection circuit, 1
03...R/W (read/write) control circuit, 104...
- Projection detection circuit, 105... connection determination circuit. Agent Patent Attorney Makoto Suzuki

Claims (1)

【特許請求の範囲】[Claims] ■、 文書画像を文字行方向に3つ以上のブロックに分
割し、該文書画像を走査することにより、個々のブロッ
ク毎に文字行方向の射影を求め、求めた各ブロックの射
影相互間の連結を調べることにより該文書画像上の個々
の文字行を切り出すことを特徴とする文字行切出し方法
■ Divide the document image into three or more blocks in the text line direction, scan the document image, find the projection in the text line direction for each block, and connect the obtained projections of each block. 1. A method for cutting out character lines, the method comprising cutting out individual character lines on the document image by examining the character lines.
JP56204636A 1981-12-18 1981-12-18 Character line cutout method Pending JPS58106665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP56204636A JPS58106665A (en) 1981-12-18 1981-12-18 Character line cutout method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP56204636A JPS58106665A (en) 1981-12-18 1981-12-18 Character line cutout method

Publications (1)

Publication Number Publication Date
JPS58106665A true JPS58106665A (en) 1983-06-25

Family

ID=16493752

Family Applications (1)

Application Number Title Priority Date Filing Date
JP56204636A Pending JPS58106665A (en) 1981-12-18 1981-12-18 Character line cutout method

Country Status (1)

Country Link
JP (1) JPS58106665A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60254382A (en) * 1984-05-31 1985-12-16 Toshiba Corp Detecting method of character line
EP0513678A3 (en) * 1991-05-13 1994-01-26 Eastman Kodak Co
JP2007272348A (en) * 2006-03-30 2007-10-18 Nidec Sankyo Corp Character string recognition method and character string recognition device
JP2011108025A (en) * 2009-11-18 2011-06-02 Kobe Steel Ltd Character recognition method and character recognition device
US8787676B2 (en) 2010-08-03 2014-07-22 Fuji Xerox, Co., Ltd. Image processing apparatus, computer readable medium storing program, and image processing method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60254382A (en) * 1984-05-31 1985-12-16 Toshiba Corp Detecting method of character line
EP0513678A3 (en) * 1991-05-13 1994-01-26 Eastman Kodak Co
JP2007272348A (en) * 2006-03-30 2007-10-18 Nidec Sankyo Corp Character string recognition method and character string recognition device
JP4658848B2 (en) * 2006-03-30 2011-03-23 日本電産サンキョー株式会社 Character string recognition method and character string recognition apparatus
JP2011108025A (en) * 2009-11-18 2011-06-02 Kobe Steel Ltd Character recognition method and character recognition device
US8787676B2 (en) 2010-08-03 2014-07-22 Fuji Xerox, Co., Ltd. Image processing apparatus, computer readable medium storing program, and image processing method

Similar Documents

Publication Publication Date Title
JPH05242292A (en) Separating method
JPS6140684A (en) Contour tracking device
JPS58106665A (en) Character line cutout method
JP3089396B2 (en) Mark reading apparatus and method
JP3285686B2 (en) Area division method
JP3904397B2 (en) Table recognition method
SE516860C2 (en) Device and method of fingerprint control, by checking the features of the sub-images
JPS5949671A (en) Optical character reader
JP2902694B2 (en) Optical character reader
JPS6136874A (en) Corrected character processing method for optical character reader
JPS6343788B2 (en)
JP2000298725A (en) Device and method for detecting text data
JP2008210327A (en) Character image output system and character image output method
JPS61196381A (en) Character segmenting system
JPH0564396B2 (en)
JP3190794B2 (en) Character segmentation device
JPH02128292A (en) Optical character reader
JPH07282191A (en) Table processing method
JP2722550B2 (en) Optical character reader
JPS58170165A (en) Mark reading system
JPH0127468B2 (en)
JPH04276888A (en) Character reader
JPS596419B2 (en) Character extraction method
JPH0225553B2 (en)
JP3867237B2 (en) Character recognition method and apparatus, and recording medium on which character recognition program is recorded