JPS60110085A - Character segmenting device of optical character reader - Google Patents

Character segmenting device of optical character reader

Info

Publication number
JPS60110085A
JPS60110085A JP58219030A JP21903083A JPS60110085A JP S60110085 A JPS60110085 A JP S60110085A JP 58219030 A JP58219030 A JP 58219030A JP 21903083 A JP21903083 A JP 21903083A JP S60110085 A JPS60110085 A JP S60110085A
Authority
JP
Japan
Prior art keywords
character
frame
line
entry frame
character entry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP58219030A
Other languages
Japanese (ja)
Inventor
Tadashi Ito
正 伊藤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Nippon Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp, Nippon Electric Co Ltd filed Critical NEC Corp
Priority to JP58219030A priority Critical patent/JPS60110085A/en
Publication of JPS60110085A publication Critical patent/JPS60110085A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To segment one by one the characters within a character entry frame which is written in a non-drop-out color by searching the inside area of the character frame after detecting the upper left, upper right, lower left and lower right corners of said frame set in a grid form. CONSTITUTION:A character entry frame 2 written on a slip 1 is scanned by a photoelectric conversion element 3 together with the characters written within the frame 2. The analog signals thus obtained are converted into binary coded signals by an A/D converter 4, and data equivalent to one line on the slip 1 are stored to a line buffer memory 5. A rule detection part 7 scans a pattern stored in a line buffer 5 to detect a rule. A control part 6 performs the control so that the vertical center position of the frame 2 equivalent to one line is set at the center position of the Y direction of the buffer 5 by the format information. Then the part 7 calculates both right and left edge addresses x2 and x1 of right and left edge fields of the frame 2 fed to the buffer 5 by the position information Fs and Fe on the frame 2 which are previously given to the part 7 from outside. Thus rule detection reference points A-D are obtained. Then a character segmenting part 8 segments one by one the characters within the frame 2 which is written in a non-drop-out color.

Description

【発明の詳細な説明】 この発明は文字切出し装置に関し、特に帳票上における
記入文字を光学的に読取る光学式文字読取装置における
文字切出し装置に関する。
DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character cutting device, and more particularly to a character cutting device in an optical character reading device that optically reads characters written on a form.

光学式文字読取装M(以下、OCRと称す)においては
、一般に光学的に感知不可能な(・わゆるドロップアウ
トカラーによって格子状文字枠が記入された帳票の文字
読1取りをカすように構成されている。当該格子状文字
枠をドロップアウトカラーにより印刷する必要があるこ
とから、多色印刷の可能な複写機を用いてOCR用の帳
票を作成し々ければ々らないという欠点がある。
Optical character reading equipment M (hereinafter referred to as OCR) generally uses optically undetectable (so-called drop-out colors) that eliminate the ability to read a single character on a form with a grid-like character frame. Since it is necessary to print the grid character frame in dropout color, the disadvantage is that it is necessary to create a form for OCR using a copier capable of multicolor printing. There is.

そこで、本発明は、光学的に感知可能表色(非ドロップ
アウトカラー)にて文字記入枠を印刷した帳票であって
も文字切出しが可能1ocaにおける文字切出し装置を
提供することを目的としている。
SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a character cutting device in one oca that can cut out characters even from a form in which a character entry frame is printed in an optically sensitive color system (non-dropout color).

本発明によるOCRの文字切出し装置は、格子状の文字
記入枠及びこの文字記入枠内に記入された文字群を夫々
1行年位に光学的に走査してディジタル化しつつ記憶す
る行バツフアメモリと、外部から指定される文字記入枠
の位置を示すフォーマット情報によって当該行バツフア
メモリ内をサーチして文字記入枠の縦、横の罫線を検出
しこの検出された縦、横の罫線の位置情報を用いて文字
記入枠内の文字を1文字ずつ切出す手段とを有すること
を特徴として(・る。
The OCR character cutting device according to the present invention includes a grid-like character entry frame and a line buffer memory that optically scans each character group written in the character entry frame in about one line year and stores the digitized data. The line buffer memory is searched using format information indicating the position of the character entry frame specified from the outside to detect the vertical and horizontal ruled lines of the character entry frame, and the position information of the detected vertical and horizontal ruled lines is used. It is characterized by having means for cutting out characters one by one in the character entry frame.

以下に、本発明を図面を参照しつつ説明する。The present invention will be explained below with reference to the drawings.

第1図は本発明装量により読取ることが可能とたった帳
をの一例を示す図であり、11は帳票、2は非ドロップ
アウトカラーにて記入された格子状の文字記入枠を示し
Llは帳票の上端から1行目の文字記入枠の中心位置ま
での距離、Pは文字記入枠と次行の文字記入枠のピッチ
、FSll−It帳票の右端から文字記入枠の最左端の
フィールドの左端までの距離、Feは帳票の右辺から文
字記入枠の最右端のフィールドの右端までの距離、Fm
m は帳票の右端からm行目の文字記入枠のn+1フイ
ールド目の左端の罫線までの距離を示す。但しnの値は
各行によって異なる場合も許容される。。
FIG. 1 is a diagram showing an example of a book that can be read by the loading method of the present invention, where 11 is a form, 2 is a grid-like character entry frame written in non-dropout color, and Ll is a Distance from the top of the form to the center position of the character entry frame on the first line, P is the pitch between the character entry frame and the next line's character entry frame, FSll-It is the left edge of the leftmost field from the right edge of the form to the leftmost character entry frame , Fe is the distance from the right side of the form to the right edge of the rightmost field in the character entry frame, Fm
m indicates the distance from the right edge of the form to the left edge ruled line of the (n+1)th field of the mth line character entry frame. However, the value of n may be different for each row. .

第2図は本発明の実施例を示すブロック図であり、帳票
1上に記入された文字記入枠2とその枠内に記入された
文字とを光電、変装素子3で走査し、その結果得られる
アナログ信号をA−D変換器4で二値化信号に変換し、
帳票上の1行分のデータハ行バッファメモリ5に格納さ
れる、1罫線検出部7は行バッファ5に格納されたパタ
ーンを走査し、罫線を検出する。
FIG. 2 is a block diagram showing an embodiment of the present invention, in which the character entry frame 2 written on the form 1 and the characters written in the frame are scanned by a photoelectric and disguise element 3, and the result is obtained. convert the analog signal into a binary signal with an A-D converter 4,
One line of data on the form is stored in the line buffer memory 5. A single ruled line detection unit 7 scans the pattern stored in the line buffer 5 and detects a ruled line.

罫1vlli1検出に関して第3図を参照してその方法
を説明する、院・53図は帳票の1行分のデータが行バ
ッファ5に入力された状態を示しており、フォーマット
情報により1行分の文字記入枠の縦方向の中心位fdが
行バッファ5のX方向の中心位でに入る様に制御部6が
制御するう外部よりrt;r a検出部7にあらかじめ
与えられた文字記入枠の位置悟報Fs、Feにより槓線
検出部7は行バッファ5に入力された文字記入枠の最左
端のフィールドの左端アドレスX、と最右端のフィール
ドの右端アドレスX、を算出し、罫線検出基準点A、B
、C,Dをめる A、Aけ文字記入枠の左上角、B点は
文字記入枠の右上角、0点は文字記入枠の左下角、D点
は文字記入枠の右下角のそれぞれの縦罫線と横罫線の交
点を検出するための基準点であり、各点のアドレスは次
の(1)式によりめる。
The method for detecting ruled line 1vlli1 will be explained with reference to FIG. 3. FIG. The control section 6 controls the character entry frame given in advance to the rt; Based on the position information Fs and Fe, the line detection unit 7 calculates the left end address X of the leftmost field and the right end address X of the rightmost field of the character entry frame input to the line buffer 5, and uses the ruled line detection criteria Points A, B
, C, and D. A, A, the upper left corner of the character entry frame, B point is the upper right corner of the character entry frame, 0 point is the lower left corner of the character entry frame, and D point is the lower right corner of the character entry frame. This is a reference point for detecting the intersection of a vertical ruled line and a horizontal ruled line, and the address of each point is determined by the following equation (1).

(1)式におい℃添字yは行バッファ5のX方向アドレ
ス、添字Xは行バッファ5のxjj向アドアドレスは行
バッファ5の高さ、Pは文字記入枠のピッチ、α、βは
適当な定数とする。
In equation (1), the subscript y is the X-direction address of the line buffer 5, the subscript X is the xjj-direction address of the line buffer 5, the height of the line buffer 5, P is the pitch of the character entry frame, and α and β are appropriate values. Let it be a constant.

文字記入枠の左上角の縦、横罫線の交点を検出するため
、(1)式にてめたA点よりY方向にU。
To detect the intersection of the vertical and horizontal ruled lines at the upper left corner of the character entry frame, move U in the Y direction from point A determined using equation (1).

X方向にtの矩形領域Ra(第4図(a)のIt aで
示す斜線部分)をX方向及びX方向にスキャンしながら
黒ビットの数をカウントし、各スキャンのヒストグラム
をめる。(u * tFi適当適当数定数る)。第4図
(+))にそのX方向のヒストグラムを示す。ヒストグ
ラムにおいて黒点数がθを越える部分の巾Wa yの中
心値のアドレスAy′を文字記入枠の左上角の横罫線の
Xアドレスとして登録する。
The number of black bits is counted while scanning a rectangular area Ra (hatched area indicated by Ita in FIG. 4(a)) of t in the X direction in the X direction and the X direction, and a histogram of each scan is created. (u*tFi is an appropriate constant). FIG. 4 (+) shows the histogram in the X direction. The address Ay' of the center value of the width Way of the portion where the number of black points exceeds θ in the histogram is registered as the X address of the horizontal ruled line at the upper left corner of the character entry frame.

第4図(C)にX方向のヒストグラムを示す。第4図(
C)において、黒点数がθを越える部分の巾Waxの中
心値のアドレスAx′を文字記入枠の左上角の縦罫線の
Xアドレスとし、左上角の縦横罫線の交点A / ()
、xl 、 Ay/ >を登録する。
FIG. 4(C) shows a histogram in the X direction. Figure 4 (
In C), let the address Ax' of the center value of the width Wax of the part where the number of black dots exceeds θ be the X address of the vertical ruled line at the upper left corner of the character entry frame, and the intersection point A / () of the vertical and horizontal ruled lines at the upper left corner.
, xl, Ay/> are registered.

同様な方法で、右上角の縦横罫線の交点B′(Bx。In a similar manner, the intersection point B' (Bx) of the vertical and horizontal ruled lines at the upper right corner.

By’)*左下角の縦横罫線の交点C′(Cx′、Cy
′)。
By') * Intersection of vertical and horizontal ruled lines in the lower left corner C'(Cx', Cy
').

右下角の縦横罫線の交点D’ (Dx t Dy)を登
録する。
The intersection D' (Dx t Dy) of the vertical and horizontal ruled lines at the lower right corner is registered.

上記の方法で登録されたA/ 、 f31 、 C/ 
、 D/の各座標は文字切出し部8へ転送される。この
文字切出し部8は以下に説明する方法で、第5図に示す
文字記入枠の範囲を垂直にサーチし、文字の切出しを省
力う。第5図においてxmmのX座標はAx′十(Fs
−Frrm)によってめられる。文字切出し部8は第5
図に示す行バッファの第1フイールドのX座標について
は (Ax’+q) 〜 (Xmドq) q:定数の範囲を
サーチする。また、Y方向に関しては次の(2)式に示
す範囲をサーチすることにより文字記入枠がサーチ領域
に侵入することを防ぐことがで、きる。
A/, f31, C/ registered using the above method
, D/ are transferred to the character cutting section 8. This character cutting section 8 vertically searches the range of the character entry frame shown in FIG. 5 using the method described below, thereby saving the need for cutting out characters. In Fig. 5, the X coordinate of xmm is Ax'
-Frrm). The character cutting part 8 is the fifth
Regarding the X coordinate of the first field of the row buffer shown in the figure, a search is performed in the range of (Ax'+q) to (Xm do q) q: constant. Further, regarding the Y direction, by searching the range shown by the following equation (2), it is possible to prevent the character entry frame from invading the search area.

(2)式にお(・てγは定数、Xはサーチ開始時におけ
る行バツフア上のX座標とする。
In equation (2), γ is a constant, and X is the X coordinate on the row buffer at the start of the search.

行バッファの最終フィールドのX座標については (Xmm+Q )〜(Bx′−q) の範囲をサーチする。またY方向に関しては同じ<(2
)式の範囲をサーチすることにより文字記入枠がサーチ
領域に侵入することを防ぐことができる。
As for the X coordinate of the last field of the row buffer, the range from (Xmm+Q) to (Bx'-q) is searched. Also, in the Y direction, the same <(2
) By searching the range of the expression, it is possible to prevent the character entry frame from invading the search area.

第1フイールドと最終フィールドを除く中間のフィール
ドのX座標につ(・ては、 (Xml+q)〜(Xm l+t −Q )の範囲をサ
ーチする。但し1=1〜(n−1)の値をとるものとす
る。またY方向に関しては同じく(2)式の範囲をサー
チすることにより文字記入枠がサーチ領域に侵入するこ
とを防ぐことができる。
For the X coordinate of the intermediate field excluding the first field and the last field, search the range from (Xml+q) to (Xml+t-Q).However, the value of 1=1 to (n-1) is searched. In addition, in the Y direction, by similarly searching the range of equation (2), it is possible to prevent the character entry frame from invading the search area.

それぞれのフィールドにおいてサーチした結果最初に黒
が検出されたY方向スキャンアドレスから次に黒から白
への変抑点までのX座標の距1i1[t CWk(kは
切出された点部分のブロック番号)を測定し、 CWlc≧K ・・・・・・・・・・・・ (3)を満
足する場合、その範囲に文字が存在するものとして、そ
の間のイメージを文字認識部9へ転送する。
The distance of the X coordinate from the Y-direction scan address where black is first detected as a result of the search in each field to the next inflection point from black to white 1i1 [t CWk (k is the block of the cut out point part) CWlc≧K ・・・・・・・・・・・・ If (3) is satisfied, it is assumed that characters exist in that range, and the image between them is transferred to the character recognition unit 9. .

例えば第6図は第5図におけるサーチ領域の一部を取り
出した図を示しているが、最初の文字1I211につい
ては最初に黒が検出さる点がXaであり、黒から白への
変摸点がXbであり、CW□=Xb−Xaとガる。次に
黒が検出されるノイズ点についてFiCW2=X(1−
X cとなり、その次の文字1131についてはCW、
:Xf−Xeとなる。Kを適度な値ニ設定スル事ニヨリ
、CW、 ≧に、CW2(K、CW3≧にとなり文字1
121.1311の文字イメージは9へ移送され、n2
−−3″の間のノイズは文字認識部9へ移送され々い。
For example, FIG. 6 shows a part of the search area in FIG. 5, and for the first character 1I211, the point where black is first detected is Xa, and the point where black changes to white. is Xb, and CW□=Xb-Xa. Next, for the noise point where black is detected, FiCW2=X(1-
X c, and the next character 1131 is CW,
:Xf-Xe. If K is set to a suitable value, then CW is ≥, CW2 (K, CW3 is ≥, and the character 1 is set.
The character image of 121.1311 is transferred to 9 and n2
--3'' is transferred to the character recognition unit 9.

斜上の如く、本発明によれは、格子状に設定された文字
記入枠の右上、右上、左下、左下の各自を検出すること
により、文字記入枠の内部領域をサーチすることが可能
と々す、非ドロップアウトカラーにて記入された文字記
入枠の内部の文字を1文字ずつ切出すことが可能と々る
。よって、帳票としては、多色刷不可能な複写機等にて
印刷したOCR用帳票を用いることができるという効果
を生ずる。
As shown above, according to the present invention, it is possible to search the internal area of a character entry frame by detecting the upper right, upper right, lower left, and lower left of the character entry frame set in a grid pattern. It is possible to cut out characters one by one inside the character entry frame written in non-dropout color. Therefore, it is possible to use an OCR form printed by a copying machine or the like which cannot perform multicolor printing as the form.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明により文字切出しが可能となるOCRの
帳票を示す図、第2図は本発明の実施例のブロック図、
第3図は文字記入枠の左上、右上。 左下、右下の各自を検出する方法を説明するための図、
第4図は左上角を検出するに8賛な黒点数のヒストグラ
ムの例を示す図、第5図は文字記入枠を検出した抜ぞの
内部をサーチする方法を説明するための図、第6図は文
字の分離方法について説明するための図である。 1戟部分の符号の説明 1・・・・・・帳W、2・・・・・・文字記入枠、3・
・・・・・光電変換部、5・・・・・・行バツフアメモ
リ、7・・・・・・罫iV!検出部、8・・・・・・文
字切出し部 (b) 、4 Δ)! (Att) (C) 第4図 第5及 第乙図
FIG. 1 is a diagram showing an OCR form that enables character extraction according to the present invention, and FIG. 2 is a block diagram of an embodiment of the present invention.
Figure 3 shows the upper left and upper right of the text entry frame. A diagram to explain how to detect the bottom left and bottom right respectively,
Figure 4 is a diagram showing an example of a histogram of the number of sunspots with 8 points to detect the upper left corner, Figure 5 is a diagram to explain the method of searching inside the blank area where a character entry frame has been detected, and Figure 6 The figure is a diagram for explaining a method of separating characters. 1 Explanation of the symbols of the sword part 1...Book W, 2...Character entry frame, 3.
...Photoelectric conversion unit, 5...Row buffer memory, 7...Rule iV! Detection section, 8...Character cutting section (b), 4 Δ)! (Att) (C) Figure 4 Figure 5 and Figure B

Claims (1)

【特許請求の範囲】[Claims] 帳票を1行年位に光学的に走査して得られたディジタル
信号をメモ1月(記憶し、文字記入枠の前記帳票上での
位置を示すフォーマット情報により文字を1文字づつ切
出す光学式文字読取装置の文字の切出し装置において、
格子状に設定された文字記入枠及びこの文字記入枠内に
記入された文字群を夫々1行年位に前記メモリへ記憶せ
しめる手段と、この記憶きれた文字枠の縦及び横の罫線
の位置情報を検出してこの検出された位置情報によって
前記文字記入枠内の文字を切出す手段とを有することを
特徴とする文字切出し装置。
An optical method that stores the digital signal obtained by optically scanning a form line by line and cuts out characters one by one based on format information indicating the position of the character entry frame on the form. In the character cutting device of the character reading device,
A character entry frame set in a grid pattern, a means for storing a group of characters entered in this character entry frame into the memory in one line, and the positions of vertical and horizontal ruled lines of the character frame which have been memorized. A character cutting device comprising means for detecting information and cutting out characters within the character writing frame based on the detected position information.
JP58219030A 1983-11-21 1983-11-21 Character segmenting device of optical character reader Pending JPS60110085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP58219030A JPS60110085A (en) 1983-11-21 1983-11-21 Character segmenting device of optical character reader

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP58219030A JPS60110085A (en) 1983-11-21 1983-11-21 Character segmenting device of optical character reader

Publications (1)

Publication Number Publication Date
JPS60110085A true JPS60110085A (en) 1985-06-15

Family

ID=16729149

Family Applications (1)

Application Number Title Priority Date Filing Date
JP58219030A Pending JPS60110085A (en) 1983-11-21 1983-11-21 Character segmenting device of optical character reader

Country Status (1)

Country Link
JP (1) JPS60110085A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60160486A (en) * 1984-01-31 1985-08-22 Toshiba Corp Optical character reader

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60160486A (en) * 1984-01-31 1985-08-22 Toshiba Corp Optical character reader

Similar Documents

Publication Publication Date Title
EP0081767B1 (en) Character and image processing apparatus
US5038393A (en) Method of effectively reading data written on data sheet, and data reading apparatus therefor
EP0736999A2 (en) Method and apparatus for scanning a document
US20020015525A1 (en) Image processor for character recognition
GB1338867A (en) System for analysing engineering drawings or like documents
JPH0147828B2 (en)
JPS60110085A (en) Character segmenting device of optical character reader
JPS6245581B2 (en)
JP4254008B2 (en) Pattern detection apparatus and method
JP3463300B2 (en) Mark sheet and mark sheet direction detecting method and apparatus
JPS62251887A (en) Character recognizing/graphic processing device
JPS5949671A (en) Optical character reader
JPS58123169A (en) Cut-out system of character line
JP3354676B2 (en) Electronic file device
JPS6014381A (en) Optical character reader
JPH01144181A (en) Optical character reader
JPH0778820B2 (en) Image processing method
JPS63116282A (en) Ocr with image input
JPS6336668A (en) Mark information read control device
JPS6252337B2 (en)
JPH06309499A (en) Document processor
JPS6134683A (en) Optical character reader
JPS6327751B2 (en)
JPH02216587A (en) Image file device
JPH01245376A (en) Character segmenting device for character reader