JP2844618B2

JP2844618B2 - Character segmentation device

Info

Publication number: JP2844618B2
Application number: JP63280564A
Authority: JP
Inventors: 英裕渡辺; 正雄赤羽; 晃持田; 雄二中島
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1988-11-07
Filing date: 1988-11-07
Publication date: 1999-01-06
Anticipated expiration: 2014-01-06
Also published as: JPH02126382A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、文字認識システムの主要な構成要素の一つ
である文字切り出し装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a character cutout device which is one of the main components of a character recognition system.

［従来の技術］従来は、文字列を読み取った文字画像データから、文
字画素の連結状態を１画素づつ追跡していくことで連結
状態にある画素集合を求め、画素集合に外接する文字外
接枠の座標を読み取ることによって、文字列の中から一
文字づつ文字をを切り出していた。例えば、「画像認識
のはなし」（木内雄二著、日刊工業新聞社発刊、90頁）
のなかでは、文字認識の前処理の過程として文字切り出
しについて記述されている。[Prior Art] Conventionally, a connected pixel set is obtained by tracing the connected state of character pixels one pixel at a time from character image data obtained by reading a character string, and a character circumscribing frame circumscribing the pixel set is obtained. By reading the coordinates of, characters were cut out one by one from the character string. For example, "The Story of Image Recognition" (Yuji Kiuchi, published by Nikkan Kogyo Shimbun, 90 pages)
Among them, character extraction is described as a pre-process of character recognition.

［発明が解決しようとする課題］従来の文字切り出し装置では、画素の追跡に処理時間
がかかる上、１つの文字が多くの文字外接枠に分割され
る場合があり、文字の位置や大きさの変動が多い場合に
処理が繁雑になる問題点があった。特にリアルタイムで
文字認識を行う場合には、認識時間を著しく長くさせる
要因となっており、短時間で切り出し処理のできる装置
の出現が望まれていた。[Problems to be Solved by the Invention] In the conventional character segmentation device, it takes a long time to track the pixels, and one character may be divided into many character circumscribed frames. There is a problem that the process becomes complicated when there is a lot of variation. In particular, when character recognition is performed in real time, this is a factor that significantly increases the recognition time, and the emergence of a device that can perform a cutout process in a short time has been desired.

本発明は係る問題点を解決するために、直接文字外接
枠座標を求めるのではなく、自由度の高い文書から文字
の切り出しを高速に実行することのできる文字切り出し
装置を提供することを目的としている。An object of the present invention is to provide a character extracting device capable of executing character extracting from a document having a high degree of freedom at high speed, instead of directly obtaining a character circumscribing frame coordinate in order to solve such a problem. I have.

［課題を解決するための手段］上述の課題を解決するため、請求項１に記載された発
明は、読み取られた文字列の画像を文字画素と背景画素
からなる文字画像データとして記憶する文字画像記憶手
段と、前記文字画像データから、互いに連結している前記文
字画素の集合を包含する直角多角形を形成するととも
に、前記直角多角形の文字領域イメージを表わす文字領
域イメージデータを作成する文字領域イメージデータ作
成手段と、前記文字領域イメージデータを記憶する文字領域イメ
ージデータ記憶手段と、前記文字領域イメージの頂点における特徴点を抽出し
て、特徴座標値を読み取る特徴点読み取り手段と、前記特徴点座標値を記憶する特徴点記憶手段と、前記特徴点座標値から文字外接枠を求める文字外接枠
構成手段と、前記文字外接枠の座標値を記憶する文字外接枠記憶手
段と、を備え、前記文字領域イメージデータ作成手段は、前記文字領
域イメージ内の座標値（X,Y）および前記文字領域イメ
ージ内の画素値Ｇ（X,Y）について、次の式（ａ）およ
び（ｂ）を満足するように前記文字領域イメージを生成
する手段を備えることを特徴とする。Means for Solving the Problems In order to solve the above-mentioned problems, the invention according to claim 1 is a character image storing an image of a read character string as character image data including character pixels and background pixels. A character area for forming a right-angled polygon including a set of the character pixels connected to each other from the character image data and creating character-area image data representing a character-area image of the right-angled polygon; Image data creating means; character area image data storage means for storing the character area image data; feature point reading means for extracting feature points at vertices of the character area image and reading feature coordinate values; Feature point storage means for storing coordinate values; character circumscribed frame forming means for obtaining a character circumscribed frame from the feature point coordinate values; And a character circumscribed frame storage means for storing coordinate values of the character circumscribed frame. The character area image data creating means comprises: a coordinate value (X, Y) in the character area image and a pixel value in the character area image. For G (X, Y), there is provided a means for generating the character area image so as to satisfy the following expressions (a) and (b).

Xs≦Ｘ≦Xeである任意のＸについて、Ymin（Ｘ）＝Ys …（ａ） Xs≦Ｘ≦Xeである任意のＸについて、Ｇ（X,k）≠G0,
Ymin（Ｘ）≦ｋ≦Ymax（Ｘ） …（ｂ）但し、Xs,Xeは文字領域イメージのＸ座標における定
義域、Ys,Yeは文字領域イメージのＹ座標における定義
域、Ymin（Ｘ）,Ymax（Ｘ）は座標Ｘにおける文字画素
群のＹ座標値の最小値および最大値、G0は背景画素の画
素値である。For any X where Xs ≦ X ≦ Xe, Ymin (X) = Ys (a) For any X where Xs ≦ X ≦ Xe, G (X, k) ≠ G0,
Ymin (X) ≦ k ≦ Ymax (X) (b) where Xs and Xe are domains in the X coordinate of the character area image, Ys and Ye are domains in the Y coordinate of the character area image, and Ymin (X), Ymax (X) is the minimum and maximum values of the Y coordinate value of the character pixel group at the coordinate X, and G0 is the pixel value of the background pixel.

ここで、直角多角形とは各頂点における角度が直角で
ある多角形である。Here, the right-angled polygon is a polygon whose angle at each vertex is a right angle.

請求項２に記載された文字切り出し装置では、前記特徴点読取手段は、前記文字領域イメージデータ
にマスク処理を施すことによって複数種類の特徴点を抽
出する手段を有し、文字外接枠構成手段は、前記特徴点の組み合わせを複
数の基本パターンに当てはめることによって、前記複数
の基本パターンのいずれかに当てはまる特徴点の組み合
わせを求めるとともに、求められた特徴点の組み合わせ
から文字外接枠の座標を求める手段を有する。In the character cutout device according to claim 2, the feature point reading means has means for extracting a plurality of types of feature points by performing a mask process on the character area image data, Applying a combination of the feature points to a plurality of basic patterns to obtain a combination of feature points corresponding to any of the plurality of basic patterns, and obtaining coordinates of a character circumscribed frame from the obtained combination of the feature points. Having.

［作用］請求項１の文字切り出し装置では、上記式（ａ），
（ｂ）を満足するように文字領域イメージを生成してい
るので、文字画素の連結状態に応じた直角多角形の文字
領域イメージを生成することができる。そして、この文
字領域イメージの頂点における特徴点を抽出した後、こ
の特徴点から文字枠を求めているので、文字画素の連結
状態に応じて適切な文字枠を求めることができる。[Operation] In the character segmentation device of the first aspect, the above expressions (a),
Since the character area image is generated so as to satisfy (b), it is possible to generate a right-angle polygonal character area image according to the connection state of the character pixels. After extracting the characteristic points at the vertices of the character area image, the character frame is determined from the characteristic points, so that an appropriate character frame can be determined according to the connection state of the character pixels.

請求項２の文字切り出し装置では、特徴点の組み合わ
せを複数の基本パターンに当てはめることによって文字
枠を求めているので、特徴点の組み合わせからから適切
な文字枠を求めることができる。In the character extracting apparatus according to the second aspect, since the character frame is obtained by applying the combination of the feature points to the plurality of basic patterns, an appropriate character frame can be obtained from the combination of the feature points.

［実施例］第１図は本発明の一実施例としての文字切り出し装置
の構成を示すブロック図である。Embodiment FIG. 1 is a block diagram showing a configuration of a character cutout device as one embodiment of the present invention.

文字画像記憶手段１は、スキャナ部より送られてくる
文字部と背景部とに２値化された文字列の文字画像デー
タを格納する。The character image storage unit 1 stores character image data of a binarized character string in a character portion and a background portion sent from the scanner unit.

文字領域イメージデータ作成手段２は、文字画像デー
タから文字領域イメージデータを作成し、文字領域イメ
ージデータ記憶手段３に格納する。The character area image data creating means 2 creates character area image data from the character image data and stores it in the character area image data storage means 3.

特徴点読み取り手段４は文字領域イメージデータから
特徴点とその座標値を求め、特徴点記憶手段５に格納す
る。The characteristic point reading means 4 obtains characteristic points and their coordinate values from the character area image data and stores them in the characteristic point storage means 5.

文字外接枠構成手段６は特徴点データから文字外接枠
を求め文字外接枠記憶手段７に格納する。The character circumscribed frame forming means 6 obtains a character circumscribed frame from the feature point data and stores it in the character circumscribed frame storage means 7.

上記構成により、文字画像データから、文字領域イメ
ージデータを作成し、画素の連結性を特徴づける特徴点
を抽出し、その座標点を求め、理論的に文字外接枠を求
めることを可能としている。With the above configuration, it is possible to create character area image data from character image data, extract feature points characterizing the connectivity of pixels, obtain the coordinate points, and theoretically obtain a character circumscribed frame.

文字切り出し装置について以下に説明する。まず、文
字領域イメージデータ作成手段の特徴について、文字画
像データと当該手段により作成される文字領域イメージ
データの例（ひらがなの‘い’）を用いて説明する。The character extracting device will be described below. First, the features of the character area image data creating means will be described using character image data and an example of character area image data created by the means ("i").

第３図（ａ）は入力された文字画像データ、第３図
（ｂ）は処理中途のデータ（実施例として後述する第７
図で示すフローチャート図におけるステップ112の処理
が終了した段階のデータに相当）、第３図（ｃ）は文字
領域イメージ処理が終了した段階での文字領域イメージ
データである。FIG. 3 (a) shows input character image data, and FIG. 3 (b) shows data in the process of being processed (see FIG.
FIG. 3 (c) shows the character area image data at the stage when the character area image processing has been completed.

文字領域イメージデータ作成手段２によって作成する
文字領域イメージデータ中の個々のイメージには、以下
の性質を持たせている。Each image in the character area image data created by the character area image data creating means 2 has the following properties.

X_s≦ｘ≦X_eである任意のｘについて、 Y_min（ｘ）＝Y_s X_s≦ｘ≦X_eである任意のｘについて、Ｇ（x,k）≠0, Y_min（ｘ）≦ｋ≦Y_max（ｘ）ここで、X_s,X_e,Y_s,Y_eは第３図（ｃ）で図示するよう
に文字領域イメージのＸ座標、Ｙ座標における定義域で
ある。Ｇ（X,Y）は座標（X,Y）における画素の値であ
り、０（背景画素）,1（文字画素及び文字画素として置
換された画素）の値をとる。Y_min（Ｘ）,Y_min（Ｘ）は
Ｘ座標における文字画素群のＹ座標の最小値、最大値で
ある。式は文字領域イメージの上辺に欠けが生じない
ことを示しており、式は文字領域イメージに中空領域
や横方向（Ｘ座標に平行）の凹みが生じないことを示し
ている。For any x where X _s ≦ x ≦ X _e , for any x where Y _min (x) = Y _s X _s ≦ x ≦ X _e , G (x, k) ≠ 0, Y _min (x) in ≦ k ≦ Y _max (x) _{_{where, X s, X e, Y}} s, the Y _e is a domain in the X-coordinate, Y-coordinate of the character region image as shown in FIG. 3 (c). G (X, Y) is the value of the pixel at the coordinates (X, Y), and takes the values of 0 (background pixel) and 1 (character pixel and a pixel replaced as a character pixel). Y _min (X) and Y _min (X) are the minimum and maximum values of the Y coordinate of the character pixel group at the X coordinate. The expression indicates that no chipping occurs in the upper side of the character area image, and the expression indicates that no hollow area or a horizontal (parallel to the X coordinate) dent occurs in the character area image.

文字領域イメージデータから文字外接枠を求めるため
には少なくとも３種類の画素の座標値がわかればよい
が、４種類以上用いる方法もある。これらの画素を特徴
点と呼ぶ。第４図（ａ）は特徴点を抽出する３×３マス
クであり、第４図（ｂ）は３×３マスクで各特徴点を抽
出する際の特徴点の周辺画素の条件の一例を示してい
る。具体的には特徴点Ｐは文字領域イメージの左上角の
画素であり、特徴点Ｑは左下角の画素である。また特徴
点Ｒは右下に凹んであり凹みの角の画素である。特徴点
となる画素の組み合わせは他にも考えられるが、以下の
説明は第４図（ｂ）で示した条件によって得られる特徴
点を対象として進める。In order to determine the character circumscribed frame from the character area image data, it is sufficient to know the coordinate values of at least three types of pixels, but there is also a method of using four or more types. These pixels are called feature points. FIG. 4 (a) shows a 3 × 3 mask for extracting a feature point, and FIG. 4 (b) shows an example of a condition of pixels surrounding the feature point when extracting each feature point with a 3 × 3 mask. ing. Specifically, the feature point P is a pixel at the upper left corner of the character area image, and the feature point Q is a pixel at the lower left corner. The feature point R is a pixel that is concave to the lower right and has a concave corner. Although other combinations of pixels serving as feature points are conceivable, the following description focuses on feature points obtained under the conditions shown in FIG. 4B.

文字領域イメージは式で示される性質を持っている
ため特徴点Ｐは１個の文字領域イメージに対して１個し
か存在し得ない。それに対して特徴点Ｑは文字領域イメ
ージの下辺に欠けが存在する場合、複数存在する。左下
が欠けている場合は１個のみである。特徴点Ｑが複数存
在する場合特徴点Ｒが存在する。特徴点Ｒは式で示さ
れる性質により、Ｘ座標の値が同値となる特徴点Ｑを必
ず対として持つことになる。またＸ座標値が同値である
ことから最も右端にある（Ｘ座標値が最大である）特徴
点Ｑには対をなす特徴点Ｒが存在しないことが判る（も
し特徴点Ｒが存在したとするとＸ座標値がより大きな値
を持つ特徴点Ｑが存在しなければならない）。よって特
徴点Ｒの個数は特徴点Ｑの個数−１となる。Since the character region image has the property shown by the equation, there can be only one feature point P for one character region image. On the other hand, a plurality of feature points Q exist when there is a chip on the lower side of the character area image. If the lower left is missing, there is only one. When there are a plurality of feature points Q, a feature point R exists. The characteristic point R always has a characteristic point Q having the same value of the X coordinate as a pair due to the property shown by the equation. Further, since the X coordinate values are the same, it can be understood that there is no paired feature point R at the rightmost feature point Q (X coordinate value is the largest) (if the feature point R exists, There must be a feature point Q with a larger X coordinate value). Therefore, the number of feature points R is equal to the number of feature points Q minus one.

第５図は一例として、第３図（ｃ）で示した文字領域
イメージデータにおける３種類の特徴点を示している
（タイプＰが１点、タイプＱが２点、タイプＲが１点存
在している）。FIG. 5 shows, as an example, three types of characteristic points in the character area image data shown in FIG. 3 (c) (type P has one point, type Q has two points, and type R has one point. ing).

特徴点の持つ性質をまとめると以下のようになる。 The characteristics of the feature points are summarized as follows.

（ア）タイプＰは１つの文字領域イメージに１個だけ存
在する。(A) Only one type P exists in one character area image.

（イ）タイプＱは１つの文字領域イメージに１個以上存
在する。(A) One or more types Q exist in one character area image.

（ウ）タイプＲの個数はタイプＱの個数より１個だけ少
なく、必ず対をなすタイプＱの特徴点があり、そのＸ座
標値は同値である。(C) The number of type R is one less than the number of type Q, and there is always a pair of characteristic points of type Q, and their X coordinate values are the same.

以下に説明するように、（ア）（イ）（ウ）の性質を
利用することにより、文字外接枠構成手段６を用いて特
徴点座標値から理論的に文字外接枠座標を求めることが
可能となる。As described below, by utilizing the properties of (a), (b), and (c), it is possible to theoretically obtain the character circumscribed frame coordinates from the feature point coordinate values using the character circumscribed frame forming means 6. Becomes

従って、特徴点読み取り手段４においては局所的に各
特徴点とその座標値を抽出するのみですみ、各特徴点が
どの文字矩形イメージのものであるかという情報は一切
抽出する必要がなくなる。従来、イメージデータから一
文字一文字直接文字外接枠座標を求める場合に比して、
処理時間を大幅に短縮することができる根拠はここにあ
る。Therefore, the feature point reading means 4 only needs to locally extract each feature point and its coordinate value, and there is no need to extract any information as to which character rectangular image each feature point is. Conventionally, compared to the case where character-by-character circumscribed frame coordinates are obtained directly from image data,
Here is the rationale for significantly reducing the processing time.

文字外接枠構成手段６の有している機能及び根拠につ
いて以下に説明する。第６図は文字領域イメージの基本
的なパターンとその特徴点と文字外接枠を示す。どんな
複雑な文字領域イメージも第６図のパターンに帰着でき
るので基本パターンを用いて原理を説明する。The functions and grounds of the character circumscribing frame forming means 6 will be described below. FIG. 6 shows a basic pattern of a character area image, its characteristic points, and a character circumscribed frame. Since any complicated character area image can be reduced to the pattern shown in FIG. 6, the principle will be described using basic patterns.

あるタイプＰの特徴点イとタイプＱの特徴点ロが以下
の条件を満たすときその２つの特徴点は同一の文字領域
イメージに属する。When a feature point A of type P and a feature point B of type Q satisfy the following conditions, the two feature points belong to the same character area image.

（１）点イのｘ座標値＜点ロのｘ座標値（２）点イのｙ座標値＜点ロのｙ座標値（３）点イと点ロを対角点とする矩形を考えたとき矩
形の中に対になったタイプＱ、Ｒ以外の特徴点は存在し
ない。(1) x-coordinate value of point b <x-coordinate value of point b (2) y-coordinate value of point b <y-coordinate value of point b (3) A rectangle having points a and b as diagonal points was considered. At this time, there are no feature points other than the paired types Q and R in the rectangle.

（４）点イのｘ座標値より大きなｘ座標値を持ち、か
つ（１）（２）（３）の条件を満たすタイプＰの特徴点
は存在しない。(4) There is no type P feature point having an x coordinate value larger than the x coordinate value of the point a and satisfying the conditions (1), (2), and (3).

（５）点ロのｙ座標値より小さなｙ座標値を持ち、か
つどの文字領域イメージにも属していないタイプＱの特
徴点は存在しない。(5) There is no type Q feature point having a y coordinate value smaller than the y coordinate value of the point b and not belonging to any character area image.

第６図の説明において条件（４）（５）は満たされて
いるものとする。（タイプＱの特徴点のｙ座標値の小さ
い順に特徴点の属する文字領域を決定していくことによ
って、条件（５）は自動的に満たされる。条件（４）は
条件（１）（２）（３）を満たすタイプＰの特徴点の中
でｘ座標値が最大となる点を選ぶことによって満たされ
る）また、条件（１）（２）（３）が満たされているこ
とは明らかである。よって第６図（ａ）（ｂ）（ｃ）で
示されている特徴点イ、ロは同一の文字領域イメージに
属していることがわかる。In the description of FIG. 6, it is assumed that the conditions (4) and (5) are satisfied. (Condition (5) is automatically satisfied by determining the character area to which the feature point belongs in ascending order of the y-coordinate value of the feature point of type Q. Condition (4) is satisfied with conditions (1) and (2). (It is satisfied by selecting a point having the maximum x-coordinate value among the feature points of type P that satisfy (3).) Also, it is clear that the conditions (1), (2), and (3) are satisfied. . Therefore, it can be seen that the feature points A and B shown in FIGS. 6A, 6B and 6C belong to the same character area image.

第６図（ａ）の場合、特徴点イとロを対角点とする矩
形を考えたとき矩形内部に他の特徴点を含まないことか
ら特徴点イとロがそのまま文字外接枠の座標点となる。
第６図（ｂ）の場合、特徴点イとロを文字外接枠とした
場合、内部にタイプＲの特徴点ニが含まれることから完
全な文字外接枠ではないことが判る。タイプＲの特徴点
の場合、対になるタイプＱの特徴点ハが必ず存在し、同
一の文字矩形イメージに属する。そこで特徴点ハを捜
し、特徴点ロとハの座標値からタイプＱの新しい特徴点
として特徴点ホを作り、特徴点ニ、ハ、ロを削除する。
そして特徴点イとホの組み合わせがどの基本パターンに
一致するかを調べていく。第６図（ｃ）の場合、特徴点
ロそのものがタイプＲの特徴点ニと対になる。タイプＲ
の特徴点の定義から、文字領域イメージは特徴点ニの右
側にも存在し、その結果特徴点ホが存在しなければなら
ないことが判明する。そこで特徴点ニ、ロを削除し特徴
点ホを捜し特徴点イ、ホの組み合わせがどの基本パター
ンと一致するかを調べていく。In the case of FIG. 6 (a), when a rectangle having the feature points A and B as diagonal points is considered, the other feature points are not included in the rectangle. Becomes
In the case of FIG. 6 (b), when the feature points A and B are character circumscribed frames, it can be seen that the character circumscribed frame is not a complete character circumscribed frame since the feature point D of type R is included inside. In the case of a feature point of type R, there is always a feature point c of type Q which is a pair, and belongs to the same character rectangular image. Therefore, the feature point c is searched, a feature point e is created as a new feature point of type Q from the coordinate values of the feature points b and c, and the feature points d, c, and b are deleted.
Then, it checks which basic pattern the combination of the feature points A and E matches. In the case of FIG. 6C, the feature point B itself is paired with the type R feature point d. Type R
It is clear from the definition of the characteristic point that the character area image also exists on the right side of the characteristic point d, and as a result, the characteristic point e must exist. Then, the feature points D and E are deleted, and the feature point E is searched to find out which basic pattern the combination of the feature points A and E matches.

上で述べた処理を繰り返すことで完全な文字領域を求
めることができる。従って、すべての文字の外接枠は、
少なくとも３種類の特徴点座標から論理的に文字外接枠
座標を求めることができる。By repeating the processing described above, a complete character area can be obtained. Therefore, the circumscribed frame for all characters is
Character circumscribed frame coordinates can be logically obtained from at least three types of feature point coordinates.

第２図および第３図は本発明の一実施例である。第２
図は制御系統図であって、ポインタ12、ポインタ13、ポ
インタ14は文字画像記憶手段８や文字矩形イメージデー
タ記憶手段９の中の画素を指すポインタである。FLAG15
は後述する画素Ｂの値を決定するための条件を選択する
条件選択フラグである。2 and 3 show one embodiment of the present invention. Second
The figure is a control system diagram, and pointers 12, 13, and 14 are pointers indicating pixels in the character image storage means 8 and the character rectangular image data storage means 9. FLAG15
Is a condition selection flag for selecting a condition for determining the value of the pixel B described later.

第７図（ａ）は文字領域イメージデータ作成手段の一
実施例を示すフローチャートであり、第７図（ｂ）は、
（ａ）図のステップ103のサブルーティーンである。ス
テップ101で文字領域イメージデータ記憶手段９がクリ
アされ、ポインタ12は文字画像記憶手段内の画素Ａ（x,
y）、ポインタ13は文字領域イメージデータ記憶手段内
の画素Ｂ（x,y）、ポインタ14は画素Ｂの１つ上の画素
Ｃ（x,y−１）を指すように設定される。x,yは文字画像
記憶手段内および文字領域イメージデータ記憶手段内に
おける画素の位置を示す変数であり、ｘ＝0,y＝１に設
定される。ステップ102においてポインタ12,13,14はそ
れぞれの画素をさす番地がセットされ、FLAGはクリアさ
れる。ステップ103では第７図（ｂ）のフローチャート
で示される、画素Ｂをセットするか否かを判定するため
のサブルーチンを呼ぶ。FIG. 7A is a flowchart showing an embodiment of the character area image data creating means, and FIG.
(A) Subroutine of step 103 in the figure. In step 101, the character area image data storage means 9 is cleared, and the pointer 12 points to the pixel A (x,
y), the pointer 13 is set so as to point to the pixel B (x, y) in the character area image data storage means, and the pointer 14 is set to point to the pixel C (x, y-1) immediately above the pixel B. x and y are variables indicating the positions of the pixels in the character image storage unit and the character area image data storage unit, and are set to x = 0 and y = 1. In step 102, the pointers 12, 13, and 14 are set to addresses indicating the respective pixels, and FLAG is cleared. In step 103, a subroutine for determining whether or not to set the pixel B shown in the flowchart of FIG. 7B is called.

以下サブルーチンのフローチャートを説明する。ステ
ップ201でFLAGを調べ、FLAGがクリアされている場合
は、ステップ202でポインタ12が指す画素イの状態を調
べ画素イが０であれば終了する。１であれば、ステップ
203でポインタ13が指す画素ロを１にした後、ステップ2
04でポインタ14が指す画素ハを調べ、画素ハが０であれ
ば終了し、１であればステップ205でFLAGをセットして
終了する。もしステップ201でFLAGがセットされている
場合は、ステップ206で画素ハを調べ、１であればステ
ップ207で画素ロを１にして終了し、画素ハが１であれ
ばステップ208でFLAGをクリアしてステップ201に制御を
移す。Hereinafter, a flowchart of the subroutine will be described. In step 201, the FLAG is checked. If the FLAG is cleared, the state of the pixel A indicated by the pointer 12 is checked in step 202, and if the pixel A is 0, the process ends. If 1, step
After setting the pixel b pointed to by the pointer 13 to 1 in 203, step 2
In step 04, the pixel c indicated by the pointer 14 is checked. If the pixel c is 0, the process is terminated. If the pixel is 1, FLAG is set in step 205 and the process is terminated. If the FLAG is set in step 201, the pixel C is checked in step 206. If 1, the pixel B is set to 1 in step 207 and the processing is terminated. If the pixel C is 1, the FLAG is cleared in step 208. Then, control is transferred to step 201.

以上で述べたサブルーチンの処理から戻った後、ステ
ップ104でｘが終点（画像領域の左端）であるかどうか
を調べ終点でなければステップ105でｘを１つ増加させ
た後、ステップ102に戻る。終点であればステップ106で
各ポインタに値をセットした後、FLAGをクリアする。ス
テップ107でステップ102と同じサブルーチンを呼び画素
Ｂをセットするか否かを判定する。ステップ108でｘが
０（画像領域の右端）であるかどうかを調べ０でなけれ
ばステップ109でｘを１つ減少させた後、ステップ106に
戻る。ｘが０であれば、ステップ110でｙを１つ増加さ
せ、ステップ111でｙが終点（画像領域の右下の点）で
あるかどうか調べて終点でなければ、ポインタを変更し
た後ステップ102に戻る。終点であればステップ112でポ
インタ12が文字領域イメージデータ領域の画素Ｂの１つ
下の画素Ｄ（ｘ、ｙ＋１）を指すようにアドレスを設定
し直して、ステップ113に進む。ステップ113〜120はそ
れぞれステップ102〜109と同じである。ステップ121に
おいてｙを１つ減少させた後、ステップ122でｙが０で
あるかどうかを調べて０でなければポインタの値を計算
した後ステップ113に戻る。ｙが０であれば終了する。After returning from the above-described subroutine processing, it is checked in step 104 whether or not x is the end point (the left end of the image area). If it is not the end point, x is incremented by one in step 105, and the flow returns to step 102. . If it is the end point, the value is set in each pointer in step 106, and then FLAG is cleared. In step 107, the same subroutine as in step 102 is called to determine whether or not the pixel B is set. In step 108, it is checked whether or not x is 0 (the right end of the image area). If it is not 0, x is decreased by 1 in step 109, and the process returns to step 106. If x is 0, y is incremented by 1 in step 110, and it is checked in step 111 whether y is the end point (the lower right point of the image area). If it is not the end point, the pointer is changed and then step 102 Return to If it is the end point, in step 112 the address is reset so that the pointer 12 points to the pixel D (x, y + 1) immediately below the pixel B in the character area image data area, and the process proceeds to step 113. Steps 113 to 120 are the same as steps 102 to 109, respectively. After decreasing y by one in step 121, it is checked in step 122 whether or not y is 0. If y is not 0, the value of the pointer is calculated, and the process returns to step 113. If y is 0, the process ends.

次に特徴点読み取り手段４の実施例を説明する。特徴
点の読み取りは、第４図（ａ）に示す３×３マスクで文
字領域イメージデータの画素を順に走査し、第４図
（ｂ）で示される各タイプの条件にマッチする画素を抽
出することによって特徴点を求める。そしてその座標値
を特徴点の種類別に特徴点座標記憶手段10のテーブル
Ｐ、Ｑ、Ｒに格納するため、それぞれの特徴点はｙ座標
の値で昇順に並ぶことになる。Next, an embodiment of the feature point reading means 4 will be described. In the reading of the feature points, the pixels of the character area image data are sequentially scanned with the 3 × 3 mask shown in FIG. 4A, and the pixels matching the conditions of each type shown in FIG. 4B are extracted. The feature point is obtained by doing this. Since the coordinate values are stored in the tables P, Q, and R of the feature point coordinate storage means 10 for each type of feature point, each feature point is arranged in ascending order by the value of the y coordinate.

文字外接枠構成手段６は、文字領域イメージの座標値
から文字外接枠座標を求め文字外接枠記憶手段11に格納
する。第８図は文字外接枠を構成する手段を示したフロ
ーチャートである。ステップ301でテーブルＱに特徴点
データがあるか調べ、なければ終了し、あればステップ
302でテーブルＱの先頭の特徴点データを特徴点ロとし
てその座標値をレジスタX1、Y1に読み込む。ステップ30
3で（0,0）、（X1,Y1）を対角点とする矩形領域の中に
ある特徴点をテーブルＰから捜し、もしあれば、その中
でもっともＸ座標値の大きな特徴点イを取り出しその座
標値をレジスタX2、Y2に読み込む（ステップ305）。な
ければレジスタX2,Y2に０を読み込む（ステップ306）。
ステップ307で（X1,Y1）、（X2,Y2）を対角点とする矩
形によって固まれる範囲にある特徴点をテーブルＲから
捜し、もしなければ（第６図（ａ）の場合に相応）、ス
テップ309で、（X1,Y1）と（X2,Y2）を文字枠座標値と
して文字外接枠記憶装置11に書き込み、テーブルＱから
特徴点ロを、テーブルＰから特徴点イを削除（ステップ
310）してステップ301に戻る。もしあればステップ311
で、テーブルＲから探した特徴点の中でもっともＸ座標
の値の大きな特徴点を取り出し特徴点ニとしてその座標
値をレジスタX3、Y3に読み込む。ステップ312でX3とX1
が等しいかどうか調べ、等しければステップ315に進む
（第６図（ｃ）に相応）。等しくなければ、ステップ31
3でテーブルＱの特徴点でｘ座標がX3と等しく、かつｙ
座標の値が最少なものを特徴点ハとしてその座標値をレ
ジスタX4,Y4に読み込む（第６図（ｂ）に相応）。ステ
ップ314でレジスタX1、X4を比較し大きい方の値をレジ
スタX4に代入し、次にレジスタY1,Y4を比較し、大きい
方の値をレジスタY4に代入して、レジスタX4,Y4の値を
特徴点ハの座標値としてテーブルＱ上の値を書き換え
る。ステップ315で特徴点ニをテーブルＲから、特徴点
ロをテーブルＱから削除して、ステップ301に戻る。以
上により本発明となる文字切り出し装置が実現出来るこ
とが実証できた。The character circumscribed frame forming means 6 calculates the character circumscribed frame coordinates from the coordinate values of the character area image and stores the coordinates in the character circumscribed frame storage means 11. FIG. 8 is a flowchart showing means for forming a character circumscribed frame. In step 301, check whether there is feature point data in the table Q.
At 302, the coordinate value is read into registers X1 and Y1 with the feature point data at the head of table Q as feature point b. Step 30
In step 3, a feature point in a rectangular area having (0,0) and (X1, Y1) as diagonal points is searched from the table P, and if there is, the feature point A having the largest X coordinate value among them is found. The coordinates are taken out and read into registers X2 and Y2 (step 305). If not, 0 is read into the registers X2 and Y2 (step 306).
In step 307, a feature point in a range fixed by a rectangle having (X1, Y1) and (X2, Y2) as diagonal points is searched from the table R, and if not, it is found (corresponding to the case of FIG. 6 (a)). In step 309, (X1, Y1) and (X2, Y2) are written into the character circumscribed frame storage device 11 as character frame coordinate values, and the characteristic point b is deleted from the table Q and the characteristic point a is deleted from the table P (step 309).
310) and return to step 301. Step 311 if any
Then, the feature point having the largest X coordinate value among the feature points searched from the table R is taken out and the coordinate values are read into the registers X3 and Y3 as the feature point d. X3 and X1 in step 312
Are checked for equality, and if they are equal, the routine proceeds to step 315 (corresponding to FIG. 6 (c)). If not, step 31
In step 3, the x coordinate is equal to X3 at the characteristic point of table Q, and y
The coordinate value having the minimum value is set as the feature point C, and the coordinate value is read into the registers X4 and Y4 (corresponding to FIG. 6B). In step 314, compare the registers X1 and X4, assign the larger value to the register X4, then compare the registers Y1 and Y4, assign the larger value to the register Y4, and change the values of the registers X4 and Y4. The value on the table Q is rewritten as the coordinate value of the feature point c. In step 315, feature point d is deleted from table R and feature point b is deleted from table Q, and the process returns to step 301. As described above, it was proved that the character segmenting device according to the present invention can be realized.

［発明の効果］以上説明したように、請求項１に記載した発明によれ
ば、文字画素の連結状態に応じた直角多角形の文字領域
イメージを生成することができ、この文字領域イメージ
から適切な文字枠を求めることができる。[Effects of the Invention] As described above, according to the first aspect of the invention, it is possible to generate a right-angled polygonal character area image according to the connection state of the character pixels, and to appropriately generate the character area image from the character area image. Character frame can be obtained.

また、請求項２に記載した発明によれば、特徴点の組
み合わせからから適切な文字枠を求めることができる。According to the invention described in claim 2, an appropriate character frame can be obtained from the combination of the feature points.

[Brief description of the drawings]

第１図は本発明となる文字切り出し装置の構成を示すブ
ロック図である。第２図は制御系統図である。第３図（ａ）〜（ｃ）は文字領域イメージデータ作成手
段２における入力データ（ａ）、処理中途のデータ
（ｂ）、出力データ（ｃ）の例を示す図である。第４図（ａ）（ｂ）は特徴点読み取り手段４における特
徴点を探すための３×３マスクと周辺画素の条件を示す
図である。第５図は第３図で示した文字領域イメージデータにおけ
る特徴点の例を示す図である。第６図（ａ）〜（ｃ）は文字領域イメージのパターンと
して集約される３種の基本パターンと、その特徴点を示
す図である。第７図（ａ）（ｂ）は文字領域イメージデータ作成手段
２の実施例を示すフローチャートである。第８図は文字外接枠構成手段６の実施例を示すフローチ
ャートである。FIG. 1 is a block diagram showing a configuration of a character segmenting apparatus according to the present invention. FIG. 2 is a control system diagram. FIGS. 3A to 3C are diagrams showing examples of input data (a), data (b) being processed, and output data (c) in the character area image data creating means 2. FIG. FIGS. 4 (a) and 4 (b) are diagrams showing a 3 × 3 mask for searching for a feature point in the feature point reading means 4 and conditions of peripheral pixels. FIG. 5 is a diagram showing an example of feature points in the character area image data shown in FIG. FIGS. 6 (a) to 6 (c) are diagrams showing three types of basic patterns aggregated as character area image patterns and their characteristic points. FIGS. 7A and 7B are flowcharts showing an embodiment of the character area image data creating means 2. FIG. FIG. 8 is a flowchart showing an embodiment of the character circumscribing frame forming means 6.

───────────────────────────────────────────────────── フロントページの続き (72)発明者中島雄二長野県諏訪市大和３丁目３番５号セイコーエプソン株式会社内 (56)参考文献特開昭63−37489（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06K 9/34──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Yuji Nakajima 3-3-5 Yamato, Suwa-shi, Nagano Seiko Epson Corporation (56) References JP-A-63-37489 (JP, A) (58) Field surveyed (Int.Cl. ⁶ , DB name) G06K 9/34

Claims

(57) [Claims]

1. A character image storage means for storing an image of a read character string as character image data composed of character pixels and background pixels; and a set of the character pixels connected to each other from the character image data. To form a right-angled polygon,
Character area image data creating means for creating character area image data representing the character area image of the right-angled polygon; character area image data storage means for storing the character area image data; and feature points at vertices of the character area image A feature point reading means for extracting feature point coordinate values, a feature point storage means for storing the feature point coordinate values, a character circumscribed frame forming means for obtaining a character circumscribed frame from the feature point coordinate values, A character circumscribing frame storing means for storing coordinate values of the character circumscribing frame, wherein the character area image data creating means comprises: a coordinate value (X, Y) in the character area image and a pixel value in the character area image Means for generating the character area image so as to satisfy the following equations (a) and (b) for G (X, Y): Character extraction device. For any X where Xs ≦ X ≦ Xe, Ymin (X) = Ys ...
(A) For any X satisfying Xs ≦ X ≦ Xe, G (X, k) ≠ G0, Ym
in (X) ≦ k ≦ Ymax (X) (b) where Xs and Xe are domains in the X coordinate of the character area image, Ys and Ye are domains in the Y coordinate of the character area image, and Ymin (X), Ymax (X) is the minimum and maximum values of the Y coordinate value of the character pixel group at the coordinate X, and G0 is the pixel value of the background pixel.

2. The character segmenting apparatus according to claim 1, wherein said characteristic point reading means has means for extracting a plurality of types of characteristic points by performing a mask process on said character area image data. The circumscribing frame forming unit applies a combination of the feature points to a plurality of basic patterns, thereby obtaining a combination of feature points corresponding to any of the plurality of basic patterns. A character segmentation device having means for obtaining coordinates of a character.