JPH0433074B2

JPH0433074B2 -

Info

Publication number: JPH0433074B2
Application number: JP60036574A
Authority: JP
Inventors: Shigeru Goto; Shinji Narita; Yoshuki Yamashita
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1985-02-27
Filing date: 1985-02-27
Publication date: 1992-06-02
Also published as: JPS61196382A

Description

【発明の詳細な説明】（産業上の利用分野）この発明は、文字切出し方式に関し、更に詳細
には帳票に記入された文字を読取り、読取つた文
字に基づく文字パターン列を１文字領域毎に分離
して抽出する文字切出し方式に関する。[Detailed Description of the Invention] (Industrial Application Field) This invention relates to a character extraction method, and more specifically, reads characters written on a form, and creates a character pattern string for each character area based on the read characters. This paper relates to a character extraction method for separating and extracting characters.

（従来の技術）光学式文字認識装置（以下、OCRと略す）に
おいては帳票に記入された文字を行毎に走査し、
光信号を光電変換器により画像信号に変換し、ラ
インバツフアに格納する。このラインバツフアを
順次読み出し文字パターン列を１文字領域毎に分
離し、その分離された文字パターンにより認識を
行つているが、文字パターン列の中から１文字領
域を抽出する文字切出し法はOCRの性能に大き
く影響する。(Prior art) An optical character recognition device (hereinafter abbreviated as OCR) scans characters written on a form line by line.
The optical signal is converted into an image signal by a photoelectric converter and stored in a line buffer. This line buffer is sequentially read out and the character pattern string is separated into character regions, and recognition is performed using the separated character patterns.However, the character extraction method that extracts one character region from the character pattern string has the performance of OCR. greatly affects.

次に、OCRのラインバツフアに格納されてい
る文字列のパターンデータから１文字の領域を分
離する従来の文字切出し方法を説明する。 Next, a conventional character extraction method for separating a single character area from character string pattern data stored in an OCR line buffer will be described.

OCRにおいて、文字列が格納されているライ
ンバツフアの上端から下端に向つて１列走査し、
この走査と直角な方向に順次列を移動することに
より、ラインバツフアの文字パターンを読出す。
そして、１列の走査中に黒点（文字部分を黒点，
背景部分を白点）を計数することによりヒストグ
ラムを作成し、その黒点ヒストグラムを参照し
て、１文字の領域を決定する。 In OCR, one line is scanned from the top to the bottom of the line buffer where the character string is stored,
By sequentially moving the columns in a direction perpendicular to this scanning, the character pattern in the line buffer is read out.
Then, while scanning one row, black dots (character parts are black dots,
A histogram is created by counting the background area (white dots), and the area of one character is determined by referring to the black dot histogram.

第１０図は、従来の黒点ヒストグラムを用いた
パターン列を示す図である。同図において、１０
０，１０１は文字パターンで、OCRにおけるラ
インバツフアに格納されているパターンである。
１０２は文字パターン１００，１０１の列方向の
黒点ヒストグラムである。また、同図において、
ラインバツフアの左端の指定された位置より読出
しを開始し、１列の読出し中に該列の黒点ヒスト
グラムを作成し該ヒストグラムと閾値α（α：定
数）と比較し、該ヒストグラムがαより大きい列
を始点とし再び閾値αより小となる列を終点と
し、始点から終点までを１文字の領域として切出
していた。 FIG. 10 is a diagram showing a pattern sequence using a conventional black point histogram. In the same figure, 10
0 and 101 are character patterns, which are stored in the line buffer in OCR.
102 is a black point histogram of the character patterns 100 and 101 in the column direction. Also, in the same figure,
Reading starts from the specified position at the left end of the line buffer, and while reading one column, a black point histogram is created for that column, and this histogram is compared with a threshold value α (α: constant), and the column whose histogram is larger than α is The starting point was taken as the end point, and the row smaller than the threshold value α was taken as the ending point, and the area from the starting point to the ending point was cut out as a region of one character.

（発明が解決しようとする問題点）しかしながら、上記従来の方法では、手書文字
の場合において記入者が文字を傾斜して記入して
いるため、あるいは文字記入枠からはみ出して記
入したため、もしくは記入者が文字の一部をはね
たため等の理由により、隣接する文字が重なつ
て、２文字以上の文字パターンが１文字として切
出されるという問題があつた。また、第１０図か
らわかるように、文字パターン１００，１０１は
列方向で重なつている部分があるためその黒点ヒ
ストグラム１０２は一つの領域として形成されて
しまう。さらに、黒点ヒストグラムの始点から終
点までの長さを求め、２文字以上であると判定さ
れた場合、所定の閾値に相当する位置を切出し点
としても、当該文字以外の文字の一部が混入した
り、当該文字の一部が欠落するという問題があつ
た。(Problems to be Solved by the Invention) However, in the conventional method described above, in the case of handwritten characters, the person writing the characters is slanted, or the characters are written outside the character writing frame. There has been a problem in that, due to reasons such as someone hitting a part of a character, adjacent characters overlap and a character pattern of two or more characters is cut out as one character. Furthermore, as can be seen from FIG. 10, since the character patterns 100 and 101 overlap in some parts in the column direction, their black point histogram 102 is formed as one area. Furthermore, if the length from the start point to the end point of the black point histogram is determined and it is determined that there are two or more characters, even if the position corresponding to the predetermined threshold is set as the cutout point, some characters other than the relevant character may be mixed in. There was a problem that some characters were missing.

この発明は、これらの問題点を解決するための
もので、簡単な構成で精度の良い文字切出し方式
を提供することを目的とする。 The present invention is intended to solve these problems, and aims to provide a highly accurate character extraction method with a simple configuration.

（問題点を解決するための手段）この発明は、前記問題点を解決するために帳票
上に記入された文字列を光電変換して得られる量
子化された文字パターン列を１文字毎に分離して
抽出する文字切出し方式において、以下のような
手段により構成する。(Means for Solving the Problem) In order to solve the above-mentioned problem, the present invention separates a quantized character pattern string obtained by photoelectrically converting a character string written on a form into individual characters. The character extraction method for extracting characters is configured by the following means.

この発明は、量子化された文字パターン列をラ
インバツフアメモリに格納し、ラインバツフアメ
モリを文字列の列方向に相当するごとき１列走査
することにより列方向の黒点ヒストグラムを作成
しかつその走査を順次各列毎に行つて列方向の黒
点ヒストグラムの幅を検出する手段と、所定の閾
値と列方向の黒点ヒストグラムの幅と比較してこ
の黒点ヒストグラムの幅が何文字分に相当するか
検出する手段と、この黒点ヒストグラムの有する
文字数に基づいて文字切出し処理を施す領域を定
めてその領域内の文字パターン列を行方向に各行
毎に走査して行方向の黒点ヒストグラムを作成し
かつ列方向と行方向の黒点ヒストグラムより文字
パターン列の文字外接枠を検出する手段と、その
文字外接枠内の文字パターン列を保持する記憶手
段と、文字外接枠の上下の辺から各々反対側の辺
へ向つて走査に伴う記憶手段から文字パターン列
の内容を読み出し、その内容が文字部分であるか
背景部分であるか検出する検出手段と、上辺から
の走査により検出された背景部分及び下辺からの
走査により検出された背景部分，文字部分，並び
に該走査で文字部分が検出されると、該列の走査
を打切り、そのため該走査を受けなかつた背景部
分の４種類に文字外接枠内の文字パターン列を分
類する分類手段と、行走査を行い分類が変化する
変化点を検出し、順次格納し同時に変化点の前後
の状態（分類結果）を保持し、該状態の遷移を所
定の分類の変化の遷移の組合せと比較して一致す
る変化点を検出する変化点検出手段と、その検出
された変化点に基づいて文字切出し位置を決定す
る手段とから構成されている。 This invention stores a quantized character pattern string in a line buffer memory, scans the line buffer memory for one column corresponding to the column direction of the character string, and creates a black point histogram in the column direction. A means for detecting the width of a black spot histogram in the column direction by sequentially scanning each column, and comparing the width of the black spot histogram in the column direction with a predetermined threshold value to determine how many characters the width of the black spot histogram corresponds to. Based on the detection means and the number of characters included in this sunspot histogram, a region to be subjected to character extraction processing is determined, and a character pattern string within that region is scanned row by row in the row direction to create a blackspot histogram in the row direction. A means for detecting a character circumscribing frame of a character pattern string from a black point histogram in the direction and row direction, a storage means for retaining the character pattern string within the character circumscribing frame, and an edge on the opposite side from the upper and lower sides of the character circumscribing frame a detection means for reading out the content of a character pattern string from a storage means accompanying scanning toward the top side and detecting whether the content is a character part or a background part; The character pattern within the character circumscribing frame is divided into four types: the background part, the character part, and the background part that was not subjected to the scan, when the character part is detected in the scan, and the scanning of the column is aborted. A classification means for classifying a column, a row scanning to detect a change point where the classification changes, store it sequentially, simultaneously hold the state before and after the change point (classification result), and detect the transition of the state as a change in a predetermined classification. The present invention is comprised of a change point detecting means for comparing a combination of transitions and detecting a matching change point, and a means for determining a character cutting position based on the detected change point.

（作用）以上のような構成からなる文字切出し方式によ
れば、次のように作用する。(Operation) According to the character extraction method having the above configuration, the operation is as follows.

量子化された文字パターン列は列方向に各列毎
に走査されて列方向の黒点ヒストグラムの幅を検
出し、かつこれと所定の閾値と比較して以後行う
文字切出し処理を施す領域が決定される。また、
その領域内を行走査して行方向の黒点ヒストグラ
ムを作成し、前記列方向とこの行方向の黒点ヒス
トグラムより文字外接枠が決定される。更に、こ
の文字外接枠内の上下の辺から各々反対側の辺へ
走査し上記の４種類の部分に分類する。そして、
再び文字外接枠内の行走査を行い、分類が変化す
る変化点を検出してその変化点を順次格納して分
類の遷移と逐次所定の分類の変化の遷移の組合せ
と比較して一致する変化点を検出する。その結
果、検出された変化点の座標に基づいて文字切出
し位置を決定する。 The quantized character pattern string is scanned column by column to detect the width of the black dot histogram in the column direction, and this is compared with a predetermined threshold to determine the area for subsequent character extraction processing. Ru. Also,
A black dot histogram in the row direction is created by scanning lines within the area, and a character circumscribing frame is determined from the black dot histogram in the column direction and the row direction. Furthermore, the characters are scanned from the upper and lower sides of the character circumscribing frame to the opposite sides, and are classified into the above-mentioned four types of parts. and,
Scan the lines within the character circumscribing frame again, detect the change points where the classification changes, store the change points sequentially, and compare them with the combination of classification transitions and predetermined classification change transitions to find the matching changes. Detect points. As a result, a character cutting position is determined based on the coordinates of the detected change point.

（実施例）以下、この発明の一実施例を図面に基づいて説
明する。(Example) Hereinafter, an example of the present invention will be described based on the drawings.

第１図は、この発明の一実施例を示すブロツク
図である。同図において、２００は図示されてい
ない光電変換部よりの画像信号、２０１はライン
バツフア、２０２は黒点ヒストグラム作成回路２
２０，外接枠検出回路２２１および文字判定回路
２２２である。２０３はデータの切換え回路、２
０４はパターンメモリ、２０５，２０６はパター
ンメモリ用のアドレスを発生するｘ方向のｘカウ
ンタとｙ方向のｙカウンタである。２０７は制御
回路である。２０８はパターン領域分類回路、２
０９は白点より黒点への変化点検出回路である。
２１０はパターン領域変化点検出回路、２１１は
切出し領域の検出回路、２１２〜２１４は切出し
領域決定用のレジスタである。 FIG. 1 is a block diagram showing one embodiment of the present invention. In the figure, 200 is an image signal from a photoelectric conversion unit (not shown), 201 is a line buffer, and 202 is a sunspot histogram creation circuit 2.
20, a circumscribing frame detection circuit 221 and a character determination circuit 222. 203 is a data switching circuit;
04 is a pattern memory, and 205 and 206 are an x counter in the x direction and a y counter in the y direction, which generate addresses for the pattern memory. 207 is a control circuit. 208 is a pattern area classification circuit;
09 is a circuit for detecting a change point from a white point to a black point.
210 is a pattern area change point detection circuit, 211 is a cutout area detection circuit, and 212 to 214 are registers for determining cutout areas.

以下に、第１図のブロツク図を用いて本実施例
の動作について説明を行う。 The operation of this embodiment will be explained below using the block diagram of FIG.

帳票上の文字列は光電変換器により２値化され
た画像信号２００に変換され、ラインバツフア２
０１に格納される。制御回路２０７の制御により
以下の処理が行われる。制御回路２０７はライン
バツフア２０１に格納されている画像信号をライ
ンバツフア２０１の先頭位置より１列単位に読出
し、順次列を更進し、１行分の文字パターンデー
タを全て読出した時点で終了する。また、制御回
路２０７では、ラインバツフア２０１より１列単
位にパターンデータを読出すと同時に黒点ヒスト
グラム作成回路２２０を起動する。黒点ヒストグ
ラム作成回路２２０では、１列の読出し中の黒点
数を計数することにより当該列の黒点ヒストグラ
ムを作成し、黒点ヒストグラム作成回路２２０に
含まれるヒストグラムメモリ２３０に格納する。
以上の処理を繰り返し１行分、全列の黒点ヒスト
グラムをヒストグラムメモリ２３０に格納した時
点で処理を終了する。 The character string on the form is converted into a binary image signal 200 by a photoelectric converter, and the line buffer 2
It is stored in 01. The following processing is performed under the control of the control circuit 207. The control circuit 207 reads out the image signal stored in the line buffer 201 column by column from the head position of the line buffer 201, advances sequentially through the columns, and ends when all the character pattern data for one line has been read out. Further, the control circuit 207 starts up the black point histogram creation circuit 220 at the same time as reading the pattern data from the line buffer 201 column by column. The black-spot histogram creation circuit 220 creates a black-spot histogram for one column by counting the number of sunspots being read in one column, and stores it in the histogram memory 230 included in the black-spot histogram creation circuit 220 .
The above process is repeated until the black point histogram for one row and all columns is stored in the histogram memory 230, and the process ends.

１行分の黒点ヒストグラムを作成した後は、黒
点ヒストグラム作成回路２２０中のヒストグラム
メモリ２３０を先頭より読出して、前記黒点ヒス
トグラムを参照してブロツクの検出を行う。制御
回路２０７は黒点ヒストグラム作成回路２２０中
のヒストグラムメモリ２３０より、順次黒点ヒス
トグラムを読出し、黒点ヒストグラムと閾値α
（α：定数、ただし、本実施例においてはα＝１
とする）を比較し、前記ヒストグラムが大きけれ
ば文字のブロツクの始点候補とし、順次黒点ヒス
トグラムの格納番地を更進し、読出された黒点ヒ
ストグラムが閾値αより大きい列を計数し、β
（β：定数、ただし、本実施例においてはβ＝２
とする）列連続した場合、前記始点候補を始点と
する。さらに列の更進を続け、始点が検出された
後、始めて黒点ヒストグラムが閾値αより小さく
なる列を終点とし、始点から終点までの長さで示
される領域をブロツクとする。次に、制御回路２
０７は文字判定回路２２２を起動し前記検出され
たブロツクの長さを読取対象としている文字の平
均的な幅より求められた閾値γ₁，γ₂（γ₁，γ₂は定
数、ただし、本実施例においてはγ₁＝75，γ₂＝
125とする）と比較する。そして、当該ブロツク
の長さＷが閾値γ₁より小さいときには当該ブロツ
クを１文字と判定し、γ₁≦Ｗ≦γ₂のときは２文字
と判定し、さらに、Ｗ＞γ₂のときは３文字以上と
判定する。また、制御回路２０７で当該ブロツク
の判定の後、該ブロツクについて外接枠検出回路
２２１を起動し、外接枠を検出する。さらに、こ
のブロツクの外接枠が検出されると、前記外接枠
内の文字パターンをパターンメモリ２０４に転送
する。ここで、Ｗ＞γ₂の場合つまり前記ブロツク
を３文字以上と判定した場合、始点からγ₂まで切
出し処理を行つて１文字目と２文字目を分割し、
その結果の切出し点を始点としてその始点からγ₂
まで切り出し処理を行つてさらに２文字目と３文
字目を分割するごとき順次切出しを行いＷまで処
理することとなる。更に、Ｗ＞γ₂の場合は、始点
からγ₂までの前記外接枠内のパターンメモリ２０
４に転送し、残りは以下の処理で始点からγ₂まで
の間の切出し点が決定した時点で再度転送する。
ここで、後述する第２図に示すように文字外接枠
の上辺左端を原点とし、下辺位置をPB、右辺位
置をPRとする。 After creating a black point histogram for one line, the histogram memory 230 in the black point histogram creation circuit 220 is read from the beginning, and blocks are detected by referring to the black point histogram. The control circuit 207 sequentially reads out the black point histogram from the histogram memory 230 in the black point histogram creation circuit 220, and stores the black point histogram and the threshold value α.
(α: constant, however, in this example α=1
), and if the histogram is large, it is selected as a starting point candidate for a block of characters, the storage address of the black dot histograms is sequentially advanced, the columns whose read black dot histograms are larger than the threshold value α are counted, and β
(β: constant, however, in this example, β=2
) If the rows are continuous, the starting point candidate is taken as the starting point. Further, the row continues to advance, and after the starting point is detected, the row whose black point histogram becomes smaller than the threshold α for the first time is set as the end point, and the area indicated by the length from the starting point to the ending point is set as the block. Next, control circuit 2
07 activates the character determination circuit 222 and sets the length of the detected block to threshold values γ ₁ and γ ₂ (γ ₁ and γ ₂ are constants, however, this value is determined from the average width of the character to be read). In the example, γ ₁ =75, γ ₂ =
125). When the length W of the block is smaller than the threshold γ ₁ , the block is determined to be one character, when γ ₁ ≦W ≦ γ ₂ , it is determined to be 2 characters, and furthermore, when W>γ ₂ , it is determined that the block is 3 characters. It is judged to be more than or equal to characters. Further, after the control circuit 207 determines the block, the circumscribing frame detection circuit 221 is activated for the block to detect the circumscribing frame. Furthermore, when the circumscribing frame of this block is detected, the character pattern within the circumscribing frame is transferred to the pattern memory 204. Here, if W > γ ₂ , that is, if the block is determined to be 3 or more characters, the cutting process is performed from the starting point to γ ₂ to separate the first and second characters,
Starting from the resulting cutting point, γ ₂
The cutting process is performed up to the point ``W'', and then the second character and the third character are separated, and so on. Furthermore, in the case of W>γ ₂ , the pattern memory 20 within the circumscribed frame from the starting point to γ ₂
4, and the rest are transferred again when the cutout points between the starting point and γ ₂ are determined by the following process.
Here, as shown in FIG. 2, which will be described later, the left end of the upper side of the character circumscribing frame is the origin, the lower side position is PB, and the right side position is PR.

次に、上記のような文字の判定により２文字以
上と判定されたものの処理について第１図に基づ
いて説明する。 Next, processing of characters determined to be two or more characters in the above character determination will be explained based on FIG. 1.

制御回路２０７はパターンメモリ２０４のアド
レスを与えるｘカウンタ２０５及びｙカウンタ２
０６を文字の外接枠の上辺の左端の位置にセツト
し、ｙカウンタ２０６をインクリメントして文字
外接枠の下辺に向つて走査を行う。そして、パタ
ーンメモリ２０４のアドレスをＸ軸，Ｙ軸に対し
て（ｘ，ｙ）とし、それぞれｘカウンタ，ｙカウ
ンタの値を用いる。前記アドレスで示される位置
のパターンメモリ２０４の内容をPM（ｘ，ｙ）
で表わす。本実施例においては白点をPM（ｘ，
ｙ）＝０、黒点をPM（ｘ，ｙ）＝１、前記上辺か
らの走査時に検出された白点をPM（ｘ，ｙ）＝
２、前記下辺からの走査時に検出された白点を
PM（ｘ，ｙ）＝４とした。従つて、本実施例にお
けるパターンメモリ２０４は１メツシユに対して
３ビツトのデータ幅を有する。パターン領域分類
回路２０８において、文字外接枠の上辺左端にア
ドレスを設定しパターンメモリ２０４より文字パ
ターンを読みだす。PM（ｘ，ｙ）＝０のときは
（PM（ｘ，ｙ）．OR.2）を新たなPM（ｘ，ｙ）と
し切換え回路２０３を介してパターンメモリ２０
４の当該番地に書き込みを行う。 The control circuit 207 includes an x counter 205 and a y counter 2 that provide the address of the pattern memory 204.
06 is set at the left end position of the upper side of the character circumscribing frame, the y counter 206 is incremented, and scanning is performed toward the lower side of the character circumscribing frame. Then, the addresses of the pattern memory 204 are set to (x, y) for the X and Y axes, and the values of the x counter and y counter are used, respectively. The contents of the pattern memory 204 at the position indicated by the address are PM(x,y)
It is expressed as In this example, the white point is defined as PM(x,
y) = 0, the black point is PM (x, y) = 1, the white point detected during scanning from the upper side is PM (x, y) =
2. The white point detected during scanning from the bottom side
PM(x,y)=4. Therefore, the pattern memory 204 in this embodiment has a data width of 3 bits for one mesh. In the pattern area classification circuit 208, an address is set at the left end of the upper side of the character circumscribing frame, and the character pattern is read out from the pattern memory 204. When PM (x, y) = 0, (PM (x, y).OR.2) is set as a new PM (x, y) and sent to the pattern memory 20 via the switching circuit 203.
Write to the corresponding address of 4.

制御回路２０７は、白点から黒点への変化点検
出回路２０９がPM（ｘ，ｙ）＝１である黒点を検
出すると、該列の走査を打ち切り、ｘカウンタ２
０５を１つインクリメントし、次の列の走査を文
字外接枠の上辺より行う。また、前記文字外接枠
の上辺より走査を行い下辺まで到達したときも該
列の走査を打ち切り、次列の走査を行う。以上の
走査を順次繰り返し、文字外接枠の右端の列を処
理したら終了する。前記上辺よりの走査が終了し
たら制御回路２０７は、ｘカウンタ，ｙカウンタ
を文字外接枠の下辺左端に設定し、前記下辺より
上辺に向つての走査を行い、前記上辺よりの走査
時と同様の処理を行う。ただし、PM（ｘ，ｙ）＝
０のときは、PM（ｘ，ｙ）．OR.4）をPM（ｘ，
ｙ）としてパターンメモリ２０４に格納する。前
記上辺よりの走査と同様に右端の列の処理をした
ら終了する前記２種類の走査が終了し、文字外接
枠内のパターンの分類が出来たら、制御回路２０
７は、ｘカウンタ２０５及びｙカウンタ２０６を
文字外接枠上の上辺左端に設定し、水平走査（行
走査）を行い文字切出し領域の検出を行う。 When the change point detection circuit 209 from a white point to a black point detects a black point where PM(x,y)=1, the control circuit 207 aborts the scanning of the column and sets the x counter 2.
05 is incremented by one, and the next column is scanned from the upper side of the character circumscribing frame. Also, when scanning starts from the upper side of the character circumscribing frame and reaches the lower side, the scanning of that column is stopped and the next column is scanned. The above scanning is repeated sequentially, and the process ends after processing the rightmost column of the character circumscribing frame. When the scanning from the upper side is completed, the control circuit 207 sets the x counter and the y counter to the left end of the lower side of the character circumscribing frame, and scans from the lower side to the upper side, similar to the scanning from the upper side. Perform processing. However, PM(x,y)=
When it is 0, PM(x,y). OR.4) to PM(x,
y) in the pattern memory 204. Similar to the scanning from the upper side, the process ends after processing the rightmost column. When the above two types of scanning are completed and the patterns within the character circumscribing frame have been classified, the control circuit 20
In step 7, the x counter 205 and the y counter 206 are set at the upper left end of the character circumscribing frame, and a horizontal scan (line scan) is performed to detect a character cutting area.

ここで、上記の列走査を具体的に示すために一
例を用いて説明する。第２図は、本実施例の上下
走査による具体例を示す図である。同図におい
て、１００，１０１は文字パターン、１０３は上
辺から下辺への走査方向、１０４は下辺から上辺
への走査方向を示す。また、第３図は第２図の列
走査の処理結果を示す図である。同図において、
上辺から下辺への走査時に、検出された白点（文
字部分を黒点，背景部分を白点とする）をＣ点と
し、Ｃ点の集合をＣ領域とする。また前記走査時
に、黒点が検出された場合は、該列の走査はそこ
で打ち切り次列の処理を行う。ここで、黒点の集
合をＡ領域とする。同様の処理を下辺より上辺へ
の走査時にも行い、該走査時に検出された白点を
Ｄ点とし、Ｄ点の集合をＤ領域とする。２回の走
査によりＣ点，Ｄ点以外の白点すなわち、前記２
回の走査で走査されなかつた白点をＢ点としその
集合をＢ領域とする。 Here, an example will be used to specifically illustrate the above column scanning. FIG. 2 is a diagram showing a specific example of vertical scanning in this embodiment. In the figure, 100 and 101 indicate character patterns, 103 indicates a scanning direction from the top side to the bottom side, and 104 indicates a scanning direction from the bottom side to the top side. Further, FIG. 3 is a diagram showing the processing result of the column scanning in FIG. 2. In the same figure,
The white point detected during scanning from the top side to the bottom side (the text portion is the black point, the background portion is the white point) is defined as a C point, and the set of C points is defined as a C area. Furthermore, if a black spot is detected during the scanning, the scanning of the column is stopped at that point and the next column is processed. Here, the set of sunspots is defined as area A. Similar processing is performed when scanning from the lower side to the upper side, the white point detected during this scanning is defined as point D, and the set of D points is defined as area D. By scanning twice, white points other than points C and D, that is, the above 2
The white point that has not been scanned in the previous scan is defined as B point, and the set thereof is defined as B area.

次に、文字切出し領域の検出を第１図に基づい
て説明する。 Next, detection of a character cutout area will be explained based on FIG. 1.

先ず、パターン領域変化点検出回路２１０は制
御回路２０７により起動されると、パターンメモ
リ２０４から文字パターンデータ（前記列走査に
より分類結果）を読出して外接枠内を行走査す
る。また、パターン領域変化点検出回路２１０
は、パターンメモリ２０４からの文字パターンデ
ータを処理するが、現在処理している点の文字パ
ターンデータが処理されている間その点の１つ前
の点の文字パターンデータを保持しており、かつ
現在処理した文字パターンデータと１点前文字パ
ターンデータを比較する。その比較した結果が、
変化したと判定されると、その１点前の座標位置
を検出し保持する。つまり、PM（ｘ−１，ｙ）
とPM（ｘ，ｙ）を比較し、等しくない場合には、
Ｘ軸座標ｘ−１を（xREG Ｉ）２１４に格納す
る。切出し領域検出回路２１１においては、パタ
ーン領域変化点検出回路２１０で前記変化点が検
出されたとき、PM（ｘ，ｙ）を状態レジスタに
保持する。切出し領域検出回路２１１では、前記
PM（ｘ，ｙ）を保持する状態レジスタ（図示せ
ず）を３個有し、該状態レジスタは前記変化点が
検出されたときに、レジスタの内容が隣接するレ
ジスタにシフトする構成となつている。さらに、
前記変化点が検出され、前記状態レジスタのシフ
トが完了したら前記３種類状態レジスタの内容が
制御回路２０７に格納されている次に示す状態と
一致するかを検出する。状態レジスタをST１，
ST２，ST３とすれば、ST１＝４かつST２＝０
かつST３＝２あるいは、ST１＝２かつST２＝
０かつST３＝４あるいは、ST２＝２かつST３
＝４あるいは、ST２＝４かつST３＝２という状
態である。ただし、ST３は現在の座標位置の内
容であるとする。そこで、前記状態レジスタが位
置組合せと一致した場合、切出し領域検出回路２
１１からの決定信号が（yREG）２１２及び
（xRES）２１３に供給される。その時に各レ
ジスタに格納されていたｘカウンタ２０５もしく
はｙカウンタ２０６の内容が各レジスタからy₁，
x₂として出力される。また、x₁はパターン領域変
化点検出回路２１０の状態レジスタST３がＣ点
あるいはＤ点のときのｘカウンタ２０５の内容を
（xREG ）２１４に格納したものとなる。
X₁，X₂，Y₁は制御回路２０７に含まれるレジス
タに保持される。また、前記状態レジスタが前記
組合せと一致した場合、その行の水平走査は打ち
切り、Ｙカウンタをインクリメントし新たな次の
行の水平走査を行う。以上の水平走査が、外接枠
内で全て終了した次点でx₁（xREG ），x₂
（xREG ），y₁（yREG ）をもとに切出し
位置を決定する。 First, when the pattern area change point detection circuit 210 is activated by the control circuit 207, it reads out character pattern data (classification results obtained by the column scanning) from the pattern memory 204 and performs row scanning within the circumscribing frame. In addition, the pattern area change point detection circuit 210
processes the character pattern data from the pattern memory 204, but while the character pattern data of the point currently being processed is being processed, it holds the character pattern data of the point immediately before that point, and Compare the currently processed character pattern data with the previous character pattern data. The result of that comparison is
If it is determined that there has been a change, the coordinate position of the previous point is detected and held. In other words, PM(x-1,y)
and PM(x,y), and if they are not equal,
Store the X-axis coordinate x-1 in (xREG I) 214. The cutout area detection circuit 211 holds PM(x, y) in the status register when the pattern area change point detection circuit 210 detects the change point. In the cutout area detection circuit 211, the
It has three status registers (not shown) that hold PM (x, y), and the status registers are configured so that when the change point is detected, the contents of the registers are shifted to adjacent registers. There is. moreover,
When the change point is detected and the shift of the state register is completed, it is detected whether the contents of the three types of state register match the following states stored in the control circuit 207. Set the status register to ST1,
If ST2 and ST3, ST1=4 and ST2=0
and ST3=2 or ST1=2 and ST2=
0 and ST3=4 or ST2=2 and ST3
=4 or ST2=4 and ST3=2. However, ST3 is assumed to be the contents of the current coordinate position. Therefore, if the status register matches the position combination, the cutout area detection circuit 2
The decision signal from 11 is provided to (yREG) 212 and (xRES) 213. The contents of the x counter 205 or y counter 206 stored in each register at that time are transferred from each register to y ₁ ,
Output as x ₂ . Further, _x1 is the contents of the x counter 205 stored in (xREG) 214 when the status register ST3 of the pattern area change point detection circuit 210 is at point C or point D.
X ₁ , X ₂ , and Y ₁ are held in registers included in the control circuit 207. If the status register matches the combination, the horizontal scan of that row is discontinued, the Y counter is incremented, and a new horizontal scan of the next row is performed. The runner-up where all the above horizontal scans are completed within the circumscribed frame is x ₁ (xREG), x ₂
The cutting position is determined based on (xREG) and _y1 (yREG).

以下に、第４図に示す切出し位置が決定された
パターン例を使用して、パターンの転送方法を説
明する。また、第４図は、第１図のブロツク図に
おけるパターンメモリ２０４に格納されているパ
ターンおよび切出し位置を示している。座標は横
軸をＸ軸，縦軸をＹ軸としており、パターンメモ
リ２０４は第４象限に位置しているものとする。
XMおよびYMはパターンメモリ２０４の大きさ
を示しており、本実施例においてはXM＝YM＝
128メツシユとした。PRおよびPBはパターンメ
モリ２０４に格納されているパターンの外接枠を
示すものでＸ＝Ｏ，Ｘ＝PR，Ｙ＝Ｏ，Ｙ＝PRの
４本の直線により表わされる。第４図において３
００，３０１はパターン、直線Ｙ＝y₁，Ｘ＝x₁，
Ｘ＝x₂は切出し位置を示している。本実施例にお
けるパターンメモリは、１メツシユを表わすデー
タが第５図の構成となつている。第５図におい
て、(1)が１のときは下辺から上辺への列走査時に
白点であつたことを意味し、(1)が０のときは前記
白点以外であつたことを意味する。また、(2)が１
のときは上辺から下辺への列走査時に白点であつ
たことを意味し、(2)が０のときは前記白点以外で
あつたことを意味する。さらに、(3)が１のときは
黒点である点を意味し、(3)が０のときは白点であ
る点を意味する。従つて、転送するパターンデー
タは、(3)で示されるデータだけである。Ｘ＝０で
表わされる直線上のメツシユをＹ＝０の点よりＹ
座標を１つづつインクリメントすることによりＹ
＝PBの点までパターンデータを転送する。１列
転送終了後Ｘ座標をインクリメントする。１列毎
に前記転送を繰り返し、Ｘ＝x₁の列の転送を終了
した時点で次の列からＸ＝x₂の列まではＹ座標が
y₁よりPBまでは、パターンデータをマスクし固
定値０を転送する。Ｘ＝x₂の列まで転送した時点
でパターン３００の転送は終了する。パターン３
０１についても同様な方法によりパターンを転送
することが可能である。また、外接枠内に１文字
が含まれるデータについては外接枠内のパターン
を同様な方法により転送することが出来る。 The pattern transfer method will be described below using the example pattern shown in FIG. 4 in which the cutout position has been determined. Further, FIG. 4 shows the patterns and cutout positions stored in the pattern memory 204 in the block diagram of FIG. 1. In the coordinates, the horizontal axis is the X axis and the vertical axis is the Y axis, and it is assumed that the pattern memory 204 is located in the fourth quadrant.
XM and YM indicate the size of the pattern memory 204, and in this embodiment, XM=YM=
It was set at 128 meters. PR and PB indicate the circumscribing frames of the patterns stored in the pattern memory 204, and are represented by four straight lines: X=O, X=PR, Y=O, Y=PR. In Figure 4, 3
00,301 is a pattern, straight line Y=y ₁ , X=x ₁ ,
X=x ₂ indicates the cutting position. In the pattern memory in this embodiment, data representing one mesh has the structure shown in FIG. In Figure 5, when (1) is 1, it means that the point was a white point during column scanning from the bottom side to the top side, and when (1) is 0, it means that it was a point other than the white point. . Also, (2) is 1
When (2) is 0, it means that the point was a white point during column scanning from the top side to the bottom side, and when (2) is 0, it means that it was a point other than the white point. Furthermore, when (3) is 1, it means a point that is a black point, and when (3) is 0, it means a point that is a white point. Therefore, the pattern data to be transferred is only the data shown in (3). The mesh on the straight line represented by X=0 from the point Y=0
Y by incrementing the coordinate by one
= Transfer pattern data to point PB. After one column transfer is completed, the X coordinate is incremented. The above transfer is repeated for each column, and when the transfer of the column X=x ₁ is completed, the Y coordinates from the next column to the column X=x ₂ are changed.
From _y1 to PB, pattern data is masked and a fixed value of 0 is transferred. The transfer of the pattern 300 ends when the column of X=x ₂ is transferred. pattern 3
It is also possible to transfer the pattern for 01 using a similar method. Furthermore, for data in which one character is included within the circumscribing frame, the pattern within the circumscribing frame can be transferred using a similar method.

次に、第６図，第７図及び第８図に示すフロー
チヤートに基づいて本実施例の処理の流れを詳細
に説明する。ここで、第６図は全体の流れを示
し、第７図および第８図はそれぞれ上下２回の走
査によるパターンの領域の分類、および切出し領
域の決定の流れ図を示している。先ず、第６図の
全体の流れ図より説明する。Ｓ４００では、読取
動作を開始する。Ｓ４０１ではラインバツフアに
格納されたパターンデータを１列読み出し、第１
図はの黒点ヒストグラム作成回路２２０にて黒点
ヒストグラムを作成しヒストグラムメモリ２３０
に格納する。Ｓ４０２においては１行分全ての黒
点ヒストグラムの作成終了を検出し、１行全て作
成されるまでＳ４０１の処理を繰り返す。Ｓ４０
３においては処理した文字を管理し、１行中全部
の文字の切出しが終了するまで以下の処理を繰り
返す。Ｓ４０４では黒点ヒストグラムをヒストグ
ラムメモリより読出し、黒点ヒストグラムの始
点、および終点を検出しブロツクとする。また、
該ブロツクの長さと閾値γ₁，γ₂とを比較し何文字
で構成されるブロツクであるかを保持しておく。
Ｓ４０５においては、第１図の外接枠検出回路２
２１においてブロツクの外接枠を検出し、その外
接枠内のパターンデータをパターンメモリ２０４
に転送する。Ｓ４０６においては前記保持された
ブロツクの長さの判定結果により、１文字であれ
ばパターンメモリ２０４のパターンデータを出力
段へ転送し次の文字の処理へ進む。２文字以上で
あれば、以下の処理を行う。Ｓ４０７において
は、外接枠の上辺および下辺からそれぞれ対辺へ
列走査を行いパターンの領域の分類を行い結果を
パターンメモリに格納する。Ｓ４０８においては
外接枠内の水平走査（行走査）を行い前記分類結
果をパターンメモリより読出し切出し領域の検出
を行つて切出し位置を決定する。Ｓ４０９ではパ
ターンメモリ内のパターンを切出し位置に従つて
転送する。パターンメモリ内のパターンを全て転
送した時点で次の文字の処理を行う。 Next, the process flow of this embodiment will be explained in detail based on the flowcharts shown in FIGS. 6, 7, and 8. Here, FIG. 6 shows the overall flow, and FIGS. 7 and 8 show flowcharts of classification of pattern areas and determination of cutout areas by two upper and lower scans, respectively. First, the overall flowchart in FIG. 6 will be explained. In S400, a reading operation is started. In S401, one column of pattern data stored in the line buffer is read out, and the first
In the figure, a sunspot histogram is created by the sunspot histogram creation circuit 220 and stored in the histogram memory 230.
Store in. In S402, it is detected that the creation of all the black point histograms for one row has been completed, and the process of S401 is repeated until all the black point histograms for one row have been created. S40
In step 3, the processed characters are managed and the following process is repeated until all characters in one line have been cut out. In S404, the black point histogram is read from the histogram memory, and the starting point and ending point of the black point histogram are detected and set as a block. Also,
The length of the block is compared with threshold values γ ₁ and γ ₂ and the number of characters the block consists of is stored.
In S405, the circumscribing frame detection circuit 2 of FIG.
21, the circumscribing frame of the block is detected, and the pattern data within the circumscribing frame is stored in the pattern memory 204.
Transfer to. In S406, based on the result of determining the length of the held block, if it is one character, the pattern data in the pattern memory 204 is transferred to the output stage and the process proceeds to the next character. If there are two or more characters, perform the following processing. In S407, column scanning is performed from the upper and lower sides of the circumscribing frame to the opposite sides, the pattern area is classified, and the results are stored in the pattern memory. In S408, horizontal scanning (row scanning) within the circumscribed frame is performed, the classification results are read out from the pattern memory, a cutting area is detected, and the cutting position is determined. In S409, the pattern in the pattern memory is transferred according to the cutout position. When all the patterns in the pattern memory have been transferred, the next character is processed.

次に、第６図におけるＳ４０７およびＳ４０８
の処理について第７図及び第８図に詳細なフロー
チヤートを示し、その動作を順に説明する。 Next, S407 and S408 in FIG.
Detailed flowcharts of the processing are shown in FIGS. 7 and 8, and the operations thereof will be explained in order.

第７図は、文字パターン領域の分類と、白点か
ら黒点への変化点検出の流れを示している。Ｓ５
００で、文字パターンデータが入力されると、Ｓ
５０１およびＳ５０２では初期化であり、パター
ンメモリのｘ，ｙの座標を文字外接枠の上辺左端
に設定し、走査の方向を示す値Ｕ／Ｄを上辺より
下辺に向つて走査するので２とする。Ｓ５０３に
おいては、パターンメモリの内容を調べPM（ｘ，
ｙ）＝１（黒点）であれば処理をＳ５０７へ移し、
PM（ｘ，ｙ）≠１（白点）であるときは、Ｓ５０
４でパターンメモリの内容をPM（ｘ，ｙ）＝（PM
（ｘ，ｙ）．OR.U／Ｄ）とする。Ｓ５０５におい
てはスキヤンの方向によりｙカウンタの値をイン
クリメントあるいはデクリメントする。Ｓ５０６
では、１列の管理を行い１列の処理が終了するま
でＳ５０３に戻り同様の処理を繰り返す。PM
（ｘ，ｙ）＝１が検出されるかまたは１列の走査が
終了したときはＳ５０７でｙカウンタを走査開始
点（上辺上あるいは下辺上）に設定し、ｘカウン
タをＳ５０８でインクリメントし、ｘカウンタが
文字外接枠の右端（PR＋１）に一致するまでＳ
５０３からの処理を繰り返す。当該走査でｘカウ
ンタが右端と一致した場合は、走査の方向を下辺
から上辺の方向とし、前記処理を繰り返す。この
ときＳ５１１でＵ／Ｄを４とする。さらに、ｘ，
ｙカウンタを初期化する。Ｕ／Ｄ＝４の走査で同
様の処理を行い全て終了したら、第８図のフロー
チヤートに示した処理を行う。 FIG. 7 shows the flow of classifying character pattern areas and detecting points of change from white dots to black dots. S5
00, when character pattern data is input, S
501 and S502 are initialization, where the x and y coordinates of the pattern memory are set to the upper left edge of the character circumscribing frame, and the value U/D indicating the scanning direction is set to 2 because scanning is performed from the upper side to the lower side. . In S503, the contents of the pattern memory are checked and PM(x,
y)=1 (black point), the process moves to S507,
When PM (x, y)≠1 (white point), S50
4, the contents of the pattern memory are PM (x, y) = (PM
(x, y). OR.U/D). In S505, the value of the y counter is incremented or decremented depending on the scan direction. S506
Then, one column is managed, and the process returns to S503 and the same process is repeated until the processing of one column is completed. PM
If (x, y) = 1 is detected or one column of scanning is completed, the y counter is set to the scanning start point (on the top or bottom edge) in S507, the x counter is incremented in S508, and x S until the counter matches the right edge of the character circumscribing frame (PR+1)
The process from 503 is repeated. If the x counter matches the right end in the scanning, the scanning direction is set from the bottom side to the top side, and the above process is repeated. At this time, U/D is set to 4 in S511. Furthermore, x,
Initialize the y counter. Similar processing is performed for the scan of U/D=4, and when all is completed, the processing shown in the flowchart of FIG. 8 is performed.

第８図はパターン領域の変化点検出と切出し領
域の決定についての流れを示すものであり、パタ
ーンメモリの文字外接枠内の水平走査（行走査）
を上辺左端より行い切出し領域の決定をする。Ｓ
６００，Ｓ６０１ではそれぞれｙカウンタ，ｘカ
ウンタを初期化する。Ｓ６０２では行走査中の領
域の変化を保持するための状態レジスタST１〜
ST３の初期化を行う。現在位置の状態を示すも
のはST３であり、走査中現在の領域前の領域を
示すものはST２であり、ST２前の領域を示すも
のはST１である。Ｓ６０３では、パターンメモ
リの内容PM（ｘ，ｙ）がST３と比較し一致して
いればＳ６０５に進み、一致していなければ該座
標は変化点であるので、Ｓ６０４でST２，ST３
の内容をそれぞれST１，ST２へシフトする。Ｓ
６０５においてはPM（ｘ，ｙ）の内容をST３に
シフトする。Ｓ６０６では状態レジスタST３が
Ｃ点あるいはＤ点であるか判定し、Ｃ点あるいは
Ｄ点のときは、ｘカウンタの内容を（xREG
）に格納する。Ｓ６０８では状態レジスタST
１，ST２，ST３の状態の組合せを判定し、Ｓ６
１６に示す組合せと一致する場合にはＳ６１１〜
Ｓ６１３により、x₁に（xREG ）を、x₂にｘ
カウンタの内容を、y₁にｙカウンタの内容を格納
する。Ｓ６１４，Ｓ６１５において、ｙカウンタ
をインクリメントし、文字外接枠の下辺と一致す
るまでＳ６０１に戻り同様の処理を行う。ただ
し、Ｓ６０８で組合せがＳ６１６の後半の２項で
ある場合には、Ｓ６１１では、x₁にxREG の
内容から−１を加えたものを格納する。 Figure 8 shows the flow of detecting change points in a pattern area and determining a cutout area.
Starting from the left end of the upper side, the extraction area is determined. S
In steps 600 and S601, the y counter and x counter are initialized, respectively. In S602, status registers ST1 to ST1 to hold changes in the area during row scanning are set.
Initialize ST3. ST3 indicates the state of the current position, ST2 indicates the area before the current area during scanning, and ST1 indicates the area before ST2. In S603, the contents PM (x, y) of the pattern memory are compared with ST3, and if they match, the process proceeds to S605; if they do not match, the coordinates are a change point, so in S604, ST2 and ST3 are
The contents of are shifted to ST1 and ST2, respectively. S
In 605, the contents of PM (x, y) are shifted to ST3. In S606, it is determined whether the status register ST3 is at point C or point D. If it is at point C or point D, the contents of the x counter are
). In S608, status register ST
Determine the combination of states 1, ST2, and ST3, and proceed to S6
If it matches the combination shown in 16, S611~
By S613, (xREG) is set to x ₁ and x is set to x ₂ .
Store the contents of the counter in _y1 . In S614 and S615, the y counter is incremented, and the process returns to S601 and the same process is performed until it matches the lower side of the character circumscribing frame. However, if the combination in S608 is the latter two items in S616, then in S611, x ₁ plus -1 from the contents of xREG is stored.

Ｓ６０８において、状態レジスタST１〜ST３
がＳ６１６に示す組合せと一致しない場合は、Ｓ
６０９，Ｓ６１０において、ｘカウンタをインク
リメントし、文字外接枠の右辺と一致するまでＳ
６０３に戻り前記処理を繰り返す。Ｓ６１５でｙ
カウンタの値が文字外接枠の下辺と一致した場合
は、x₁，x₂，y₁の値に切出し点を決定する。第９
図は、本実施例により切出しを行つた場合のパタ
ーン例であり、Ａ−A′はパターンの分割位置を
示している。 In S608, status registers ST1 to ST3
does not match the combination shown in S616, S
609, S610, the x counter is incremented, and the S
The process returns to 603 and the above process is repeated. y in S615
If the value of the counter matches the lower edge of the character circumscribing frame, the cutting point is determined at the values of x ₁ , x ₂ , and y ₁ . 9th
The figure shows an example of a pattern cut out according to this embodiment, and A-A' indicates the dividing position of the pattern.

以上説明したように、本実施例によれば、前後
の文字パターンが当該文字パターンに重なつた場
合でも、当該文字パターンが欠落したり、前後の
文字パターンの一部が混入することなく文字パタ
ーンの切出しを行うことが出来る。 As explained above, according to this embodiment, even if the character pattern before and after overlaps with the character pattern, the character pattern can be created without missing the character pattern or with part of the character pattern before and after it being mixed in. can be cut out.

さらに、本実施例においては、２文字が重なり
合つた場合を示したが、３文字以上重なり合つた
場合においても、重なり合つた文字の先頭より２
文字を基準に順次切出し点を決定することにより
同様な効果を得ることが出来る。 Furthermore, in this example, the case where two characters overlap is shown, but even when three or more characters overlap, two characters from the beginning of the overlapping characters are used.
A similar effect can be obtained by sequentially determining cutout points based on characters.

（発明の効果）以上説明したように、本発明によれば、文字パ
ターンの外接枠の上下の辺から各々対辺に向つて
列走査を行うことにより背景部分を走査方向別の
領域に分類し、その分類結果により外接枠内の行
走査を行つて切出し領域を検出し、切出し位置を
決定するので、精度の高い文字切出しを行うこと
ができる。また、パターンの外接枠内を走査し
て、変化点の検出を行うことにより実現している
ので簡単な回路構成で実施することが可能であ
る。さらに、本発明を用いることにより、隣接し
た文字が重なり合つた場合でも切出しが可能であ
るので、文字記入枠の間隔を小さくすることがで
き一行当りの読取可能文字数を増やすことができ
る。従つて、多くの種類の帳票に対応でき、帳票
設計の自由度が大きく、従つて性能のよいOCR
が実現出来るという効果がある。(Effects of the Invention) As explained above, according to the present invention, the background portion is classified into regions according to the scanning direction by performing column scanning from the upper and lower sides of the circumscribed frame of the character pattern toward the opposite sides, Based on the classification results, the lines within the circumscribed frame are scanned to detect the cutout area and determine the cutout position, so that character cutout can be performed with high precision. Further, since this is realized by scanning the circumscribed frame of the pattern and detecting the change point, it can be implemented with a simple circuit configuration. Furthermore, by using the present invention, it is possible to cut out even when adjacent characters overlap, so it is possible to reduce the interval between character entry frames and increase the number of readable characters per line. Therefore, it can handle many types of forms, has a large degree of freedom in form design, and has high performance OCR.
The effect is that it can be realized.

[Brief explanation of drawings]

第１図はこの発明の一実施例を示すブロツク
図、第２図は本実施例の列走査による具体例を示
す図、第３図は第２図の列走査の処理結果を示す
図、第４図は第２図における切出し位置が決定さ
れたパターン例を示す図、第５図は本実施例にお
けるパターンメモリの構成を示す図、第６図，第
７図及び第８図は本実施例の処理の流れを示すフ
ローチヤート、第９図は本実施例により切出しを
行なつた場合のパターン例を示す図、第１０図は
従来の黒点ヒストグラムを用いたパターン例を示
す図である。２００…画像信号、２０１…ラインバツフア、
２０２…外接枠作成回路、２０３…切換回路、２
０４…パターンメモリ、２０５…ｘカウンタ、２
０６…ｙカウンタ、２０７…制御回路、２０８…
パターン領域分類回路、２０９…白点→黒点変化
点検出回路、２１０…パターン領域変化点検出回
路、２１１…切出し領域検出回路、２１２…
yREG、２１３…xREG 、２１４…xREG
。 FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a diagram showing a specific example of the column scanning of this embodiment, FIG. 3 is a diagram showing the processing results of the column scanning of FIG. 4 is a diagram showing an example of the pattern in which the cutting position in FIG. 2 has been determined, FIG. 5 is a diagram showing the configuration of the pattern memory in this embodiment, and FIGS. FIG. 9 is a diagram showing an example of a pattern when clipping is performed according to this embodiment, and FIG. 10 is a diagram showing an example of a pattern using a conventional black point histogram. 200...image signal, 201...line buffer,
202... Circumscribing frame creation circuit, 203... Switching circuit, 2
04...Pattern memory, 205...x counter, 2
06...y counter, 207...control circuit, 208...
Pattern area classification circuit, 209...White point→black point change point detection circuit, 210...Pattern area change point detection circuit, 211...Cutout area detection circuit, 212...
yREG, 213...xREG, 214...xREG
.

Claims

[Claims]

1. In a character extraction method in which a quantized character pattern string obtained by photoelectrically converting a character string written on a form is separated and extracted character by character, the character pattern string is stored in a line buffer memory. , create a black point histogram in the column direction by scanning the line buffer memory one column in the column direction of the character string, and perform the scanning sequentially for each column to determine the width (threshold value α) of the black point histogram in the column direction.
(α is a constant) means for detecting the length of a row in which larger black spot histograms continue for a threshold value β (β is a constant) or more columns; and comparing the width of the black spot histogram in the column direction with a predetermined threshold value. means for detecting how many characters the width of the sunspot histogram corresponds to, and determining an area to perform character cutting processing based on the number of characters in the sunspot histogram, and cutting the character pattern string in the area in each row in the row direction. means for creating a black dot histogram in the row direction by scanning each time, and detecting a character circumscribing frame of a character pattern string from the black dot histogram in the column direction and row direction, and holding the character pattern string within the character circumscribing frame; A storage means scans the character circumscribing frame from the upper and lower sides to the opposite sides, reads out the contents of the character pattern string from the storage means along with the scanning, and determines whether the contents are a character part or not. a background portion detected by scanning from the upper side obtained by the detection means, a background portion and a character portion detected by scanning from the lower side, and a character portion detected by the scanning; If so, the scanning of the column is aborted, and therefore the character pattern string within the character circumscribing frame is classified into four types of background portions that have not been subjected to the scanning. Scanning is performed to detect change points at which the classification by the classification means changes, and store them sequentially, simultaneously retaining the states (classifications) before and after the change point, and converting the state transition into a combination of transitions of changes in predetermined classifications. A character cutting method characterized by comprising a changing point detecting means for comparing and detecting the matching changing point, and a means for determining a character cutting position based on the changing point detected by the changing point detecting means.