JPH0581474A

JPH0581474A - Character string extracting method and character area detecting method

Info

Publication number: JPH0581474A
Application number: JP3241486A
Authority: JP
Inventors: Hisafumi Saika; 尚史斎鹿; Yoshihiro Kitamura; 義弘北村; Yasuhisa Nakamura; 安久中村; Minako Kuwata; みな子桑田; Kazuhiro Takehara; 和宏竹原
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1991-09-20
Filing date: 1991-09-20
Publication date: 1993-04-02
Anticipated expiration: 2015-07-04
Also published as: JP3058489B2

Abstract

PURPOSE:To perform stable character string extraction with small storage capacity. CONSTITUTION:A document is scanned in S1 and character elements of the document image are extracted in S2. In S3-S5, whether the relation of 'mutual closeness' is satisfied or not is checked as to all pairs of the character elements to define the equal value relation of 'same level', and the character elements are classified in an equal-value group. In S5, the character elements contained in the equal value group are extracted as elements constituting the same character string. In S6 and successors, the extracted character string is divided into characters, which are matched with standard character patterns and recognized.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、画像から文字を認識す
る方法に関し、特に、入力画像から個別の文字領域およ
び文字列を抽出するための方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for recognizing characters from an image, and more particularly to a method for extracting individual character areas and character strings from an input image.

【０００２】[0002]

【従来の技術】従来の文字抽出方法において、スキャナ
などの手段を用いて入力された画像（以下「原画像」と
称する）から、以下のように文字を取出している。ま
ず、原画像における水平方向または鉛直方向のうち、実
際の文字列の進む方向により近い方向（以下「文字列方
向」と呼ぶ）に、画像の黒点の累積をとる。黒点とは、
文字領域を構成する点のことである。そして、累積値の
小さな部分を文字列と文字列との間の空白部分として、
また累積値の大きな部分を文字列の存在する領域として
文字列の位置を決定する。このようにして得られた文字
列を分割することにより、文字を取出す。2. Description of the Related Art In a conventional character extraction method, a character is extracted as follows from an image (hereinafter referred to as "original image") input using a means such as a scanner. First, of the horizontal or vertical direction in the original image, the black dots of the image are accumulated in a direction (hereinafter, referred to as “character string direction”) that is closer to the direction in which the actual character string advances. What is a black dot?
It is a point that constitutes a character area. And, the part of the cumulative value is the blank part between the character strings,
Further, the position of the character string is determined by regarding the portion having a large cumulative value as the area where the character string exists. Characters are extracted by dividing the character string obtained in this way.

【０００３】しかし、このような手法では、原画像の文
字列の傾きが大きい場合、黒点の累積値の大小の差が水
平方向と鉛直方向とであまり大きくならない。そのため
文字列を正確に抽出することは容易ではなく、個別の文
字を正確に取出すことも困難である。However, in such a method, when the inclination of the character string of the original image is large, the difference in the cumulative value of the black dots is not so large in the horizontal direction and the vertical direction. Therefore, it is not easy to accurately extract a character string, and it is also difficult to accurately extract individual characters.

【０００４】これに対し、個々の文字を構成すると見ら
れる文字の要素領域（以下「文字要素」と呼ぶ）を予め
抽出し、それらを統合することによって文字または文字
列を抽出する方法が考えられる。この方法においては、
文字を抽出する前段階として、個々の文字を構成すると
見られる文字要素を予め抽出する必要がある。On the other hand, a method of extracting character elements or character strings by previously extracting element regions (hereinafter referred to as "character elements") of characters that are considered to constitute individual characters and integrating them can be considered. .. In this way,
As a step before extracting characters, it is necessary to extract in advance the character elements that are considered to constitute individual characters.

【０００５】文字要素として、黒点の連結領域を用いる
例が見られる。連結領域とは、画像の黒点の集合からな
る領域であって、その領域内のいかなる２点も、その領
域内のみを通る曲線によって接続され得るような黒点の
集合の領域をいう。このような場合、従来は連結領域を
画像から抽出してから、それに外接する長方形の座標を
求め、それによって文字要素を検出してきた。このよう
な従来の方法によれば、処理の途中において、画像中の
各黒点が、どの連結領域に属するかという情報を保持し
ておく必要がある。そのため、このような方法を用いる
と大容量のメモリが必要とされた。この処理を行なうた
めの専用のハードウェアも存在しているが、全体のシス
テムとしては高価なものになってしまう。As an example of a character element, an example of using a connected area of black dots can be seen. A connected region is a region made up of a set of black dots of an image, and a region of a set of black dots in which any two points in the region can be connected by a curve passing only in the region. In such a case, conventionally, the connected area is extracted from the image, and then the coordinates of the rectangle circumscribing the extracted area are obtained, and the character element is detected thereby. According to such a conventional method, it is necessary to hold information about which connected area each black dot in the image belongs to during the processing. Therefore, when such a method is used, a large capacity memory is required. Although there is dedicated hardware for performing this process, the overall system becomes expensive.

【０００６】また、このような手法では、文字要素の統
合が不十分であれば、単一の文字または文字列が分離さ
れてしまい、また統合が過剰であれば、異なる文字また
は文字列が単一の文字または文字列に誤って統合されて
しまうおそれがある。そのため最適な統合を行なうこと
は容易ではない。Further, in such a method, if the character elements are insufficiently integrated, a single character or character string is separated, and if the integration is excessive, different characters or character strings are separated. May be accidentally merged into a single character or string. Therefore, optimal integration is not easy.

【０００７】[0007]

【発明が解決しようとする課題】それゆえにこの発明の
目的は、原画像の傾きによる影響を可能な限り小さく
し、かつ安定した処理結果を得ることができる文字列抽
出方法および文字領域検出方法を提供することである。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a character string extracting method and a character area detecting method which can minimize the influence of the inclination of the original image and can obtain a stable processing result. Is to provide.

【０００８】[0008]

【課題を解決するための手段】請求項１に記載の文字列
抽出方法は、画像を読取り、連続する複数本の２値走査
線信号に変換するステップと、順次与えられる２値走査
線信号を順次分析し、先行する第１の２値走査線信号よ
り以前の２値走査線信号により表現される連結領域に外
接する、外接枠の形状を特定するための外接枠情報を得
て、これを保持するステップと、第１の２値走査線信号
の直後に後続する第２の２値走査線信号に含まれる連結
線分領域を検出し、連結線分領域と連結関係にある連結
領域の外接枠情報を更新するとともに、連結線分領域の
いずれとも連結関係にない連結領域の存在を検出して、
その外接枠情報を出力するステップと、画像の終了を検
出して、保持されているすべての外接枠情報を出力する
ステップと、各外接枠情報の間に、予め定める同値関係
が成立するか否かを調べることにより、各外接枠情報に
内接する連結領域を同値類に分類するステップと、同一
の同値類に属する連結領域を１つの文字列の構成要素と
して抽出するステップとを含む。According to a first aspect of the present invention, there is provided a character string extracting method, which comprises a step of reading an image and converting the image into a plurality of continuous binary scanning line signals, and a binary scanning line signal sequentially given. Sequential analysis is performed to obtain circumscribing frame information for specifying the shape of the circumscribing frame that circumscribes the connected region represented by the preceding binary scanning line signal before the preceding first binary scanning line signal, and obtains this information. A step of holding and detecting a connecting line segment area included in a second binary scanning line signal immediately after the first binary scanning line signal, and circumscribing a connecting area having a connecting relationship with the connecting line segment area. While updating the frame information, it detects the presence of a connected area that is not connected to any of the connected line segment areas,
Whether a predetermined equivalence relation is established between the step of outputting the circumscribing frame information, the step of detecting the end of the image and outputting all the retained circumscribing frame information, and the step of outputting the circumscribing frame information. By checking whether or not each of the circumscribing frame information pieces is inscribed, a connected area is classified into an equivalence class, and a connected area belonging to the same equivalence class is extracted as a constituent element of one character string.

【０００９】請求項２に記載の文字列抽出方法は、画像
から文字要素を抽出するステップと、抽出された各文字
要素の間に、予め定める同値関係が成立するか否かを調
べることにより文字要素を同値類に分類するステップ
と、同一の同値類に属する文字要素を１つの文字列の構
成要素として抽出するステップとを含む。According to a second aspect of the present invention, there is provided a method for extracting a character string by extracting a character element from an image and checking whether or not a predetermined equivalence relation is established between the extracted character elements. The method includes a step of classifying elements into equivalence classes and a step of extracting character elements belonging to the same equivalence class as constituent elements of one character string.

【００１０】請求項３に記載の文字領域検出方法は、画
像を読取り、連続する複数本の２値走査線信号に変換す
るステップと、順次与えられる２値走査線信号を順次分
析し、先行する第１の２値走査線信号より以前の２値走
査線信号により表現される連結領域に外接する、外接枠
の形状を特定するための外接枠情報を得て、これを保持
するステップと、第１の２値走査線信号の直後に後続す
る第２の２値走査線信号に含まれる連結線分領域を検出
し、連結線分領域と連結関係にある連結領域の外接枠情
報を更新するとともに、連結線分領域のいずれとも連結
関係にない連結領域の存在を検出して、その外接枠情報
を出力するステップと、画像の終了を検出して、保持さ
れているすべての外接枠情報を出力するステップとを含
む。According to a third aspect of the present invention, there is provided a method for detecting a character area, which comprises the steps of reading an image and converting the image into a plurality of continuous binary scanning line signals, and sequentially analyzing the binary scanning line signals that are sequentially applied. Obtaining circumscribing frame information for specifying the shape of a circumscribing frame circumscribing a connected region represented by a binary scanning line signal earlier than the first binary scanning line signal, and holding the circumscribing frame information; The connection line segment area included in the second binary scan line signal immediately following the binary scan line signal of 1 is detected, and the circumscribed frame information of the connection area having the connection relation with the connection line segment area is updated. , Detecting the presence of a connected area that is not connected to any of the connected line segment areas and outputting the circumscribing frame information, and detecting the end of the image and outputting all the retained circumscribing frame information. And a step of performing.

【００１１】[0011]

【作用】請求項１に記載の文字列抽出方法によれば、連
結領域の外接枠情報を得るためには、隣接する２つの２
値走査線信号の間で、連結線分領域の連結関係を調べれ
ば十分である。すべての画像について、各画素の属する
連結領域の情報を保持する必要がない。また、このよう
にして得られた外接枠情報を、数学的に同値類分解と呼
ばれる手法で分類して各文字列の構成要素を抽出する。
予め定める同値関係が成立するか否かは、原画像の傾き
からの影響をそれほど受けずに定められる。また、各文
字要素の間で同値関係を調べるため、その結果は処理順
序に依存しない。According to the character string extraction method of the first aspect, in order to obtain the circumscribed frame information of the connected area, two adjacent two
It is sufficient to check the connection relation of the connection line segment regions between the value scanning line signals. It is not necessary to hold the information of the connected area to which each pixel belongs for all images. Further, the circumscribing frame information thus obtained is mathematically classified by a method called equivalence class decomposition to extract the constituent elements of each character string.
Whether or not a predetermined equivalence relation is established is determined without being significantly affected by the inclination of the original image. In addition, since the equivalence relation is checked between each character element, the result does not depend on the processing order.

【００１２】請求項２に記載の文字列抽出方法において
は、抽出された文字要素の間に同値関係が成立するか否
かを調べることにより、文字要素が同値類分解と呼ばれ
る手法によって各文字列の構成要素として抽出される。
同値関係は、原画像の傾きに関係なくその成否を調べる
ことができる。また、同値類分解による処理結果は、処
理順序に依存しない。In the character string extraction method according to the second aspect, by checking whether or not an equivalence relation is established between the extracted character elements, the character elements are converted into character strings by a method called equivalence class decomposition. It is extracted as a component of.
The equivalence relation can be checked for success or failure regardless of the inclination of the original image. Further, the processing result by the equivalence class decomposition does not depend on the processing order.

【００１３】請求項３に記載の文字領域検出方法におい
ては、先行する第１の２値走査線信号より以前の走査線
信号により表現される連結領域と、後続する２値走査線
信号の連結線分領域との連結関係が、第１および第２の
２値走査線信号の間の連結線分領域の連結関係を調べる
ことにより行なえる。したがって画像中のすべての画素
について、それがどの連結領域に属するかを表わす情報
を記憶しておく必要がない。According to a third aspect of the present invention, there is provided a character area detecting method, wherein a connecting area represented by a scan line signal before the preceding first binary scan line signal and a connecting line between the following binary scan line signals. The connection relation with the segment area can be established by examining the connection relation of the connection line segment area between the first and second binary scanning line signals. Therefore, it is not necessary to store information indicating which connected area it belongs to for every pixel in the image.

【００１４】[0014]

【実施例】図１３は、本発明を実施するための文字認識
装置のブロック図である。図１３を参照して、この文字
認識装置は、画像を走査して連続する複数本のデジタル
走査線信号に変換するための画像入力部１と、画像入力
部１により入力された画像から文字要素を抽出するため
の文字要素抽出部２と、抽出された各文字要素を分析
し、文字列を抽出するための文字列抽出部３と、抽出さ
れた文字列を個別の文字に分離することにより個別の文
字を抽出するための個別文字抽出部４と、予め用意され
た文字標準パターン８と、抽出された個別文字と文字標
準パターン８とをマッチングすることにより、入力され
た画像に含まれる個別の文字を認識するための文字マッ
チング部５と、文字マッチングにより得られた文字の認
識結果に対し、必要な修正や整形を加えるための認識結
果後処理部６と、後処理がされた認識結果を、使用者が
可読な形に変換して出力するための最終結果出力部７と
を含む。FIG. 13 is a block diagram of a character recognition apparatus for carrying out the present invention. With reference to FIG. 13, this character recognition device includes an image input unit 1 for scanning an image and converting it into a plurality of continuous digital scanning line signals, and character elements from the image input by the image input unit 1. By extracting the character element extracting unit 2 for extracting the character string, analyzing each extracted character element, extracting the character string, and separating the extracted character string into individual characters. An individual character extraction unit 4 for extracting an individual character, a character standard pattern 8 prepared in advance, and the extracted individual character and the character standard pattern 8 are matched to each other, so that an individual character included in the input image is obtained. Character matching unit 5 for recognizing the character, the recognition result post-processing unit 6 for adding necessary correction and shaping to the character recognition result obtained by the character matching, and the post-processed recognition result To Use person and a final result output unit 7 for outputting the converted into a readable form.

【００１５】この文字認識装置は、コンピュータにより
実現される。そして、画像入力部１はスキャナにより、
最終結果出力部７はディスプレイにより、文字要素抽出
部２と文字列抽出部３と個別文字抽出部４と文字マッチ
ング部５と文字標準パターン８と認識結果後処理部６と
はすべて中央処理装置で実行されるプログラムにより実
現される。This character recognition device is realized by a computer. Then, the image input unit 1 is
The final result output unit 7 is a display, and the character element extraction unit 2, the character string extraction unit 3, the individual character extraction unit 4, the character matching unit 5, the character standard pattern 8, and the recognition result post-processing unit 6 are all central processing units. It is realized by the program to be executed.

【００１６】図１４を参照して、文字要素抽出部２は、
画像入力部１から入力される少なくとも１ライン分の画
像を記憶するための画像メモリ２１と、画像メモリ２１
に記憶された現在処理中のライン上の連結領域を検出
し、現ライン上の連結領域に関する情報を出力するため
の現ライン連結領域抽出部２２と、現ライン連結領域抽
出部２２により抽出された情報を格納し、次のラインに
対する処理のために一時記憶しておくための前ライン連
結領域メモリ２４と、現ライン連結領域抽出部２２によ
り抽出された現ライン上の連結領域に関する情報と、前
ライン連結領域メモリ２４に格納された１行前のライン
上の連結領域に関する情報に基づき、前ラインと現ライ
ンとの間の連結領域の連結関係をチェックし、互いに連
結関係にある、ライン上の連結領域に外接する長方形を
統合する処理を行なうための、連結関係チェック部２３
と、連結関係チェック部２３によって検出された、画像
上の連結領域に外接する長方形を特定するための座標を
記憶するとともに、処理途中で、それ以上他の領域と連
結することがないと判断された連結領域を表わす情報を
文字要素として出力するための外接長方形座標メモリ２
５とを含む。Referring to FIG. 14, the character element extraction unit 2
An image memory 21 for storing an image of at least one line input from the image input unit 1, and an image memory 21.
The current line connected area extracting unit 22 for detecting the connected area on the line currently being processed stored in the current line, and outputting the information about the connected area on the current line, and the current line connected area extracting unit 22. Previous line connected area memory 24 for storing information and temporarily storing it for processing for the next line, information about connected areas on the current line extracted by current line connected area extraction unit 22, and Based on the information about the connected area on the line immediately preceding the line stored in the line connected area memory 24, the connection relationship of the connected area between the previous line and the current line is checked, and the connected area on the line A connection relation checking unit 23 for performing a process of integrating rectangles circumscribing the connection area.
And stores the coordinates for identifying the rectangle circumscribing the connected area on the image, which is detected by the connection relationship check unit 23, and determines that the area is not connected to any other area during the processing. Circumscribed rectangular coordinate memory 2 for outputting information representing a connected region as a character element
Including 5 and.

【００１７】文字列抽出部３は、外接長方形座標メモリ
２５から与えられる文字要素の情報を順次記憶するため
の文字要素メモリ３１と、文字要素メモリ３１に記憶さ
れた文字要素の数と等しい数の要素を有する整数配列を
準備可能なラベルメモリ３５と、文字要素メモリ３１に
格納された文字要素の数に応じて、ラベルメモリ３５内
の整数配列｛Ｓｎ｝の値を初期化するためのラベル初期
化部３３と、文字要素メモリ３１に格納された文字要素
情報の任意の２つの組合せを取出し、各文字要素間に予
め定める同値関係が成り立つか否かを検査するための同
値関係検査部３２と、同値関係検査部３２によって同値
関係が成り立つと判断された文字要素間について、ラベ
ルメモリ３５に格納された、各文字要素に対応するラベ
ルの値を予め定める方法に従って更新するためのラベル
更新部３６と、文字要素メモリ３１に格納された文字要
素のすべての対に対して、同値関係検査が行なわれた
後、ラベルメモリ３５内において同じ値のラベルを有す
るような文字要素を同じ文字列に属する文字要素と判断
して文字要素メモリ３１から抽出するための同値ラベル
文字要素抽出部３４とを含む。The character string extraction unit 3 has a character element memory 31 for sequentially storing the information of the character elements given from the circumscribed rectangular coordinate memory 25, and a number equal to the number of character elements stored in the character element memory 31. A label memory 35 capable of preparing an integer array having elements, and a label initial for initializing the value of the integer array {Sn} in the label memory 35 according to the number of character elements stored in the character element memory 31. A conversion unit 33 and an equivalence relation inspection unit 32 for extracting any two combinations of the character element information stored in the character element memory 31 and inspecting whether or not a predetermined equivalence relation is established between the respective character elements. For the character elements determined to have the equivalence relation by the equivalence relation checking unit 32, the value of the label corresponding to each character element stored in the label memory 35 is determined in advance. A label updating unit 36 for updating according to the method and all the pairs of character elements stored in the character element memory 31 are subjected to an equivalence relation check, and then have a label having the same value in the label memory 35. An equivalence label character element extraction unit 34 for determining such a character element as a character element belonging to the same character string and extracting it from the character element memory 31.

【００１８】図１３〜図１５に示される装置は原理的に
以下のように動作する。画像入力部１は、入力される画
像をスキャンし、複数本の走査線からなる画像信号に変
換して画像メモリ２１に順次与える。画像メモリ２１は
入力される画像信号のうち少なくとも１走査線分の信号
を順次格納する。現ライン連結領域抽出部２２は、画像
メモリ２１に格納された１ラインの画像データに基づ
き、処理中のライン上の連結領域を抽出し、抽出された
連結領域を表わす情報を連結関係チェック部２３と前ラ
イン連結領域メモリ２４とに与える。前ライン連結領域
メモリ２４には、１ライン前の連結領域を表わす情報が
格納されている。The device shown in FIGS. 13 to 15 operates in principle as follows. The image input unit 1 scans an input image, converts it into an image signal composed of a plurality of scanning lines, and sequentially supplies the image signal to the image memory 21. The image memory 21 sequentially stores at least one scanning line signal of the input image signals. The current line connected area extracting unit 22 extracts the connected area on the line being processed based on the image data of one line stored in the image memory 21, and the information indicating the extracted connected area is used as the connection relation checking unit 23. And the previous line connection area memory 24. The previous line connected area memory 24 stores information indicating the connected area of one line before.

【００１９】連結関係チェック部２３は、現ライン連結
領域抽出部２２から与えられる現ライン上の連結領域を
表わす情報と、前ライン連結領域メモリ２４に格納され
ている１ライン前の連結領域を表わす情報とに基づき、
現ラインと前ラインとに含まれる連結領域の間に連結関
係があるか否かをチェックする。チェックの結果連結関
係があると判断された場合には、外接長方形座標メモリ
２５に格納されている、前ラインの連結領域に外接する
長方形の座標に所定の処理を施し、現ライン連結領域を
前ラインまでの連結領域と統合する。この統合処理の詳
細については後述する。処理の途中で、外接長方形座標
メモリ２５に格納されている、前ラインまでの連結領域
のうち現ライン上の連結領域のいずれとも連結関係にな
いものについては、外接長方形座標メモリ２５はこれを
１つの文字要素としてその情報を文字要素装置３１に与
える。すべての画像に対する読込と以上の処理が終了し
たときには、外接長方形座標メモリ２５は、その時点で
格納されている残りの外接長方形座標を文字要素情報と
して出力する。The connection relation check unit 23 represents the information indicating the connection region on the current line provided from the current line connection region extraction unit 22 and the connection region one line before stored in the previous line connection region memory 24. Based on the information
It is checked whether or not there is a connection relationship between the connection areas included in the current line and the previous line. If it is determined as a result of the check that there is a connection relationship, a predetermined process is performed on the coordinates of the rectangle circumscribing the connection area of the previous line, which is stored in the circumscribing rectangle coordinate memory 25, and the current line connection area is moved forward. Integrate with the connected area up to the line. Details of this integration processing will be described later. In the middle of the process, the circumscribed rectangular coordinate memory 25 stores 1 in the circumscribed rectangular coordinate memory 25 for the connected regions up to the previous line that are not connected to any of the connected regions on the current line. The information is given to the character element device 31 as one character element. When the reading of all the images and the above processing are completed, the circumscribed rectangular coordinate memory 25 outputs the remaining circumscribed rectangular coordinate stored at that time as the character element information.

【００２０】文字要素メモリ３１は、外接長方形座標メ
モリ２５から与えられる文字要素情報を順次記憶する。
ラベル初期化部３３は、文字列抽出処理に先立って、ラ
ベルメモリ３５に準備された整数配列の値を、後述する
ような方法で初期化する。同値関係検査部３２は、文字
要素メモリ３１に格納された文字要素のすべての対につ
いて、各文字要素間に予め定める同値関係が成立するか
否かを調べる。同値関係が成立した場合には、ラベル更
新部３６は、互いに同値関係にある文字要素に対応して
準備された、ラベルメモリ３５内のラベルの値を、所定
の方法によって書換える。この方法については後に詳述
する。The character element memory 31 sequentially stores the character element information given from the circumscribed rectangular coordinate memory 25.
The label initialization unit 33 initializes the value of the integer array prepared in the label memory 35 by a method described below, prior to the character string extraction processing. The equivalence relation inspection unit 32 examines, for all pairs of character elements stored in the character element memory 31, whether or not a predetermined equivalence relation is established between the respective character elements. When the equivalence relation is established, the label updating unit 36 rewrites the value of the label in the label memory 35 prepared corresponding to the character elements having the equivalence relation by a predetermined method. This method will be described later in detail.

【００２１】文字要素メモリ３１に格納された文字要素
のすべての対について、互いの間に同値関係が成立する
か否かの判断が終わった後、同値ラベル文字要素抽出部
３４は、ラベルメモリ３５に格納された各ラベルのう
ち、同一の値を有するラベルに対応する文字要素を、同
一の文字列に属する文字要素であると判断し、文字要素
メモリ３１から対応する文字要素を抽出し個別文字抽出
部４に与える。After determining whether or not an equivalence relation is established between all pairs of character elements stored in the character element memory 31, the equivalence label character element extraction unit 34 determines the label memory 35. The character elements corresponding to the labels having the same value among the respective labels stored in are determined to be the character elements belonging to the same character string, and the corresponding character elements are extracted from the character element memory 31 to extract the individual character. It is given to the extraction unit 4.

【００２２】個別文字抽出部４は、与えられた文字列領
域を適宜分割して個々の文字を取出し、文字マッチング
部５に与える。文字マッチング部５は、文字標準パター
ン８に予め準備されている標準パターンと、処理対象と
なる文字との間の類似関係をマッチングによりチェック
し、最も類似度の高い標準パターンを、入力された文字
として認識する。認識結果後処理部６は、認識された結
果に対し、必要な修正や成形を加えて最終結果出力部７
に与える。最終結果出力部７は、認識された文字を使用
者に可読な形で出力する。The individual character extraction unit 4 appropriately divides the given character string area to take out individual characters and supplies them to the character matching unit 5. The character matching unit 5 checks the similarity between the standard pattern prepared in advance in the character standard pattern 8 and the character to be processed by matching, and the standard pattern with the highest similarity is input to the input character. Recognize as. The recognition result post-processing unit 6 adds necessary corrections and shaping to the recognized result and outputs the final result output unit 7
Give to. The final result output unit 7 outputs the recognized character in a user-readable form.

【００２３】本発明にかかる文字列抽出方法は文字列抽
出部３により、文字領域検出方法は文字要素抽出部２に
よりそれぞれ行なわれる。The character string extraction method according to the present invention is performed by the character string extraction unit 3, and the character area detection method is performed by the character element extraction unit 2.

【００２４】文字要素抽出部２、文字列抽出部３は前述
のようにそれぞれコンピュータプログラムによって実現
される。図１は文字列抽出方法を実現するためのプログ
ラムのフローチャートである。図２は、図１に示される
プログラムのうち、画像中の文字要素を取出すための処
理のフローチャートである。The character element extraction unit 2 and the character string extraction unit 3 are realized by computer programs as described above. FIG. 1 is a flowchart of a program for realizing the character string extracting method. FIG. 2 is a flowchart of a process for extracting a character element in an image in the program shown in FIG.

【００２５】図１を参照して、ステップＳ１において、
原画像の原稿スキャンが行なわれる。制御はステップＳ
２に進む。Referring to FIG. 1, in step S1,
A document scan of the original image is performed. Control is step S
Go to 2.

【００２６】ステップＳ２においては、入力された原画
像中の文字要素を取出す処理が行なわれる。この処理
は、図２に示されるようなプログラムにより実現され
る。In step S2, a process of extracting a character element in the input original image is performed. This processing is realized by a program as shown in FIG.

【００２７】図２を参照して、画像中の文字要素を取出
す処理のためのプログラムは、以下のような制御の構造
を有する。以下の説明において、入力画像は横ｍピクセ
ル、縦ｎピクセルからなる長方形の画像であるものとす
る。横方向にｘ軸、縦方向にｙ軸をとる。以下、この画
像中に認識される、１対の辺がｘ軸に、他の１対の辺が
ｙ軸に平行な長方形の座標を、（最小のｘ座標、最小の
ｙ座標）、（最大のｘ座標、最大のｙ座標）の２点の組
で表わす。以下、この座標をそれぞれ（ｓｘ、ｓｙ）、
（ｅｘ、ｅｙ）と書くものとする。この場合各座標は、
１ピクセルのｘ軸方向およびｙ軸方向の各辺の長さを１
としてとったものとする。Referring to FIG. 2, the program for processing for extracting a character element in an image has the following control structure. In the following description, the input image is assumed to be a rectangular image having m pixels horizontally and n pixels vertically. The x axis is in the horizontal direction and the y axis is in the vertical direction. Below, the coordinates of a rectangle that is recognized in this image with one pair of sides parallel to the x-axis and the other pair of sides parallel to the y-axis are expressed as (minimum x coordinate, minimum y coordinate), (maximum X coordinate, maximum y coordinate). Hereinafter, these coordinates are respectively (sx, sy),
It shall be written as (ex, ey). In this case, each coordinate is
Set the length of each side of 1 pixel in the x-axis direction and the y-axis direction to 1
It was taken as.

【００２８】上述の画像中において、２点（０、ｉ）
（ｍ−１、ｉ）を結ぶ、ｘ軸に平行な１ピクセル幅の直
線を考える。この直線をラインｉと呼ぶことにする。In the above image, two points (0, i)
Consider a 1-pixel-wide straight line connecting (m-1, i) and parallel to the x-axis. This straight line will be called line i.

【００２９】ラインｉ上の連結領域を、ラインｉに含ま
れる隣あった黒点を結んだものと定義する。特許請求の
範囲においては、ラインｉ上の連結領域を「連結線分領
域」と表現している。また、以下の説明において画像中
の連結領域とは、前述のように画像に含まれる黒点の集
合からなる領域であって、領域に含まれるいかなる２つ
の黒点も、その領域に含まれる他の黒点をたどっていく
ことにより互いに接続されるような領域をいうものと規
約する。The connected area on the line i is defined as a connection of adjacent black dots included in the line i. In the claims, the connection area on the line i is expressed as "connection line segment area". Further, in the following description, the connected area in the image is an area including a set of black dots included in the image as described above, and any two black dots included in the area are different from other black dots included in the area. It is defined as an area that can be connected to each other by tracing.

【００３０】（１）図２を参照して、ステップＳ１１
において、画像のライン０上の各連結領域（ＣＬ_{0 j}と
する）を抽出する。ライン０上の連結領域ＣＬ_{0 j}に外
接し、１対の辺がｘ軸に、他の１対の辺がｙ軸に平行な
長方形を、連結領域ＣＬ_{0 j}が属する長方形とし、その
座標を所定の記憶領域に記憶する。すなわち、ｓｘ＝ＣＬ_{0 j}の始点ｘ座標ｅｘ＝ＣＬ_{o j}の終点ｘ座標ｓｙ＝ＣＬ_{o j}の始点ｙ座標（＝０）ｅｙ＝ＣＬ_{0 j}の終点ｙ座標（＝０）とする。制御はステップＳ１２に進む。(1) Referring to FIG. 2, step S11
At, each connected region ( _{denoted as} CL _{0 j} ) on line 0 of the image is extracted. A rectangle circumscribing the connected region CL _{0 j} on the line 0 and having one pair of sides parallel to the x-axis and the other pair of sides parallel to the y-axis is a rectangle to which the connected region CL _{0 j} belongs, and its coordinates are Store in a predetermined storage area. That is, the sx = CL _{0 j} starting x coordinate ex = CL starting y-coordinate (= 0) of the end point x-coordinate sy = CL _oj the _oj of ey = CL _{0 j} endpoint y-coordinate (= 0). The control proceeds to step S12.

【００３１】（２）ステップＳ１２において、以下の
繰返しを制御するための変数ｉに初期値「１」がセット
される。制御はＳ１３に進む。(2) In step S12, an initial value "1" is set to a variable i for controlling the following iterations. The control proceeds to S13.

【００３２】以下、ステップＳ１３〜Ｓ１７の処理が、
１≦ｉ≦ｍ−１となるｉについて順次行なわれる。Hereinafter, the processing of steps S13 to S17 will be described.
This is sequentially performed for i that satisfies 1 ≦ i ≦ m−1.

【００３３】（３）ステップＳ１３において、ライン
ｉ上の各連結領域（ＣＬ_{i j}とする）を抽出する。そし
てこの連結領域ＣＬ_{i j}を、連結領域ＣＬ_{i j}が属する
長方形とする。すなわち、この長方形を特定するための
座標は以下のようになる。(3) In step S13, each connected region ( _denoted by CL _ij ) on the line i is extracted. And the connecting region CL _ij, and rectangular connection region CL _ij belongs. That is, the coordinates for specifying this rectangle are as follows.

【００３４】ｓｘ＝ＣＬ_{i j}の始点ｘ座標ｅｘ＝ＣＬ_{i j}の終点ｘ座標ｓｙ＝ＣＬ_{i j}の始点ｙ座標（＝ｉ）ｅｙ＝ＣＬ_{i j}の終点ｙ座標（＝ｉ）制御はステップＳ１４に進む。[0034] sx = CL _ij of the start point x coordinate ex = CL _ij endpoint x-coordinate sy = CL _ij of the start point y coordinate (= i) ey = CL _ij endpoint y-coordinate (= i) the control proceeds to step S14.

【００３５】（４）ライン（ｉ−１）上の連結領域と
ラインｉ上の連結領域とについて、互いの連結関係を検
査する。連結関係が成立するか否かの判断は以下のよう
にして行なわれる。ライン（ｉ−１）上の連結領域ＣＬ
０の始点、終点の座標をそれぞれ（ｘ００，ｉ−１）、
（ｘ０１，ｉ−１）、ラインｉ上の連結領域ＣＬ１の始
点、終点の座標をそれぞれ（ｘ１０，ｉ）、（ｘ１１，
ｉ）とする。この場合、以下の関係が成り立つ場合に連
結領域ＣＬ０とＣＬ１とが連結であると判断すればよ
い。(4) With respect to the connection area on the line (i-1) and the connection area on the line i, the connection relationship between them is inspected. The determination as to whether or not the connection relationship is established is made as follows. Connected area CL on line (i-1)
The coordinates of the start point and the end point of 0 are (x00, i-1),
(X01, i-1), the coordinates of the start point and end point of the connected region CL1 on the line i are (x10, i), (x11,
i). In this case, it may be determined that the connected regions CL0 and CL1 are connected when the following relationships are established.

【００３６】ｘ００≦ｘ１１かつｘ１０≦ｘ０１ただし、上述の式は、１つの画素と連結関係になること
ができる画素が、その画素の上下および左右の画素のみ
に限られる、いわゆる「４連結」の場合に適用すべき式
である。もしも１つの画素に斜めに隣接する画素もこの
画素と連結になれるものとする、いわゆる「８連結」の
場合には、以下の式による必要がある。X00.ltoreq.x11 and x10.ltoreq.x01 However, in the above formula, the number of pixels that can be connected to one pixel is limited to the pixels above and below and to the left and right of the pixel. This is the formula that should be applied in some cases. In the case of so-called “8-connection”, in which pixels that are diagonally adjacent to one pixel can also be connected to this pixel, the following formula must be used.

【００３７】ｘ００≦ｘ１１＋１かつｘ１０≦ｘ０１＋１要するに、一方のライン上の連結領域の始点のｘ座標
が、もう一方のライン上の連係領域の終点のｘ座標より
も大きくならないという条件が成り立てばよい。この条
件が成り立てば２つの連結領域が互いに連結関係にあ
り、この条件が成り立たない場合にはこれらは連結関係
にはないと判断される。検査の結果互いに連結なライン
（ｉ−１）、ラインｉ上の連結領域がある場合について
は、その属する長方形を統合する。複数の長方形を統合
する処理とは、以下のようにして新たな長方形の形状を
特定するための座標を定める処理をいう。X00 ≦ x11 + 1 and x10 ≦ x01 + 1 In short, the condition that the x coordinate of the start point of the connected region on one line does not become larger than the x coordinate of the end point of the linked region on the other line is satisfied. If this condition is satisfied, the two connected regions are in a connected relationship with each other, and if this condition is not satisfied, it is determined that they are not connected. When there is a line (i-1) and a connected region on the line i which are connected to each other as a result of the inspection, the rectangles to which they belong are integrated. The process of integrating a plurality of rectangles is a process of determining coordinates for specifying the shape of a new rectangle as follows.

【００３８】ｓｘ＝ｍｉｎ｛統合する長方形のｓｘ｝ｓｙ＝ｍｉｎ｛統合する長方形のｓｙ｝ｅｘ＝ｍａｘ｛統合する長方形のｅｘ｝ｅｙ＝ｍａｘ｛統合する長方形のｅｙ｝上述のように定められる座標を有する（ｓｘ、ｓｙ）、
（ｅｘ、ｅｙ）により特定される長方形Ｒをもって、統
合された長方形とする。そして、これらの連結領域はこ
のようにして得られた共通の長方形Ｒに属するものとす
る。制御はステップＳ１５に進む。Sx = min {integrated rectangular sx} sy = min {integrated rectangular sy} ex = max {integrated rectangular ex} ey = max {integrated rectangular ey} Coordinates determined as described above With (sx, sy),
The rectangle R specified by (ex, ey) is an integrated rectangle. Then, these connected regions are assumed to belong to the common rectangle R thus obtained. The control proceeds to step S15.

【００３９】（５）ステップＳ１５において、ライン
（ｉ−１）上の各連結領域が属する長方形であって、か
つステップＳ１４の処理の前後を通じて変化しなかった
ものを検出する。このような長方形が存在する場合、こ
の長方形が外接する画像中の連結領域は、ラインｉで途
切れていることになる。これ以降の処理でこの連結領域
に連結される他のライン上の連結領域は出現し得ない。
したがってこの長方形を特定するための座標は以降の処
理で変化することはない。そのためこの座標を最終結果
として出力することができる。出力時に、この連結領域
のｙ方向の境界座標ｅｙに値ｉ−１を代入してもよい。
仮に現在処理中のラインｉに黒点が１つも存在しないと
きには、ライン（ｉ−1 ）上の各連結領域が属する長方
形はすべて上述の条件に該当し、その座標を出力するこ
とができる。そしてそれ以降、連結領域に外接する長方
形を求めるための処理において、これら長方形を特定す
るための座標を記憶しておく必要はまったくない。制御
はステップＳ１６に進む。(5) In step S15, a rectangle to which each connected region on line (i-1) belongs and which has not changed before and after the process of step S14 is detected. When such a rectangle exists, the connected area in the image circumscribing the rectangle is discontinued at the line i. In the subsequent processing, connected areas on other lines connected to this connected area cannot appear.
Therefore, the coordinates for specifying this rectangle do not change in the subsequent processing. Therefore, this coordinate can be output as the final result. At the time of output, the value i−1 may be substituted for the boundary coordinate ey in the y direction of this connected region.
If there is no black dot on the line i currently being processed, all the rectangles to which each connected region on the line (i-1) belongs satisfy the above-mentioned condition, and the coordinates can be output. Then, thereafter, in the process for obtaining the rectangle circumscribing the connected area, it is not necessary to store the coordinates for specifying these rectangles at all. The control proceeds to step S16.

【００４０】ステップＳ１６において、変数ｉの内容が
１インクリメントされる。制御はステップＳ１７に進
む。In step S16, the content of the variable i is incremented by 1. The control proceeds to step S17.

【００４１】ステップＳ１７において、変数ｉの値が、
縦方向のピクセル数（総ライン数）ｍと等しいか否かの
判断が行なわれる。判断の答がＹＥＳであれば制御はス
テップＳ１８に進むが、それ以外の場合には制御はＳ１
３に進む。そして、前述のように（３）〜（５）の処理
が該当する変数ｉについて繰返される。In step S17, the value of the variable i is
It is determined whether the number of pixels in the vertical direction (total number of lines) is equal to m. If the determination result is YES, the control proceeds to step S18, but if not, the control proceeds to S1.
Go to 3. Then, as described above, the processes (3) to (5) are repeated for the corresponding variable i.

【００４２】（６）ステップＳ１８において、処理の
最後として、ライン（ｍ−１）上の黒点が属する長方形
の座標を出力する。この場合、この長方形の座標値ｅｙ
の値はｍ−１となる。(6) In step S18, the coordinates of the rectangle to which the black dot on the line (m-1) belongs are output as the final step. In this case, the coordinate value ey of this rectangle
The value of is m-1.

【００４３】以上のように図２に示されるようなフロー
に従って動作するプログラムを用いることにより、画像
中の各連結領域に外接する長方形の形状を特定するため
の座標を求めることができる。この場合、処理のために
必要な画像情報は、現在着目しているラインおよびその
１つ前のラインに含まれる各点についての情報のみであ
る。そして、２つ前のライン以前のラインに含まれる画
像中の点についての情報は一切不要である。そのため、
この方法によれば文字領域を検出する際に必要な記憶容
量を、従来のようにすべての画像についての情報を記録
する場合と比較してはるかに少なくすることができる。
また、最終結果として得られる外接長方形座標が処理途
中で順次出力されていく。そのため、この最終結果を入
力とする他の処理をこの処理と平行して行なうことが可
能となり、処理全体の速度を向上することも可能であ
る。By using the program that operates according to the flow as shown in FIG. 2 as described above, the coordinates for specifying the shape of the rectangle circumscribing each connected region in the image can be obtained. In this case, the image information necessary for the processing is only the information about each point included in the current line of interest and the line immediately before it. Information about points in the image included in the line before the line two lines before is not necessary at all. for that reason,
According to this method, the storage capacity required for detecting a character area can be made much smaller than that in the case of recording information about all images as in the conventional case.
Further, the circumscribed rectangular coordinates obtained as the final result are sequentially output during the process. Therefore, it is possible to perform other processing in which the final result is input in parallel with this processing, and it is possible to improve the speed of the entire processing.

【００４４】図４〜図６は、図２に示される処理の途中
経過を示す、画像の模式図である。図４を参照して、ラ
イン（ｉ−１）上には、ライン（ｉ−１）上の連結領域
（連結線分領域）Ａ、Ｂが存在する。一方、ラインｉ上
には、ラインｉ上の連結領域（連結線分領域）Ｃ、Ｄ、
Ｅが存在する。連結領域の定義によれば、連結（線分）
領域Ａ、Ｃは互いに連結である。また、連結（線分）領
域Ｂ、Ｄ、Ｅも互いに連結である。4 to 6 are schematic views of images showing the progress of the processing shown in FIG. Referring to FIG. 4, on line (i-1), there are connection regions (connection segment regions) A and B on line (i-1). On the other hand, on the line i, the connecting regions (connecting line segment regions) C, D, and
E exists. According to the definition of connection area, connection (line segment)
Regions A and C are connected to each other. The connected (line segment) areas B, D, and E are also connected to each other.

【００４５】図５および図６は、ライン（ｉ−１）上
の、互いに別々の連結領域が属する長方形が、ラインｉ
に対する処理において統合される様子を示す。図５を参
照して、ライン（ｉ−１）上の連結領域Ａ、Ｂは共通の
長方形Ｒ１に属する。またライン（ｉ−１）上の連結領
域Ｃは、長方形Ｒ１と異なる長方形Ｒ２に属する。In FIG. 5 and FIG. 6, the rectangle on the line (i-1) to which the separate connection regions belong is indicated by the line i.
The following shows how they are integrated in the processing for. With reference to FIG. 5, the connection regions A and B on the line (i-1) belong to the common rectangle R1. The connected region C on the line (i-1) belongs to a rectangle R2 different from the rectangle R1.

【００４６】図６を参照して、ラインｉ上に連結（線
分）領域Ｄが存在するものとする。連結（線分）領域Ｄ
は、ライン（ｉ−１）上の連結（線分）領域Ｂ、Ｃと連
結である。またこの場合、連結（線分）領域Ｄが属する
長方形自身は、連結領域Ｄそれ自身と同じ形である。前
述の処理に従って、ライン（ｉ−１）上の連結領域Ｂ、
Ｃが属する長方形Ｒ１、Ｒ２と、ラインｉ上の連結領域
Ｄが属する長方形（Ｄ自身）とを統合することにより、
図６に示される長方形Ｒ０が得られる。前述のようにラ
イン（ｉ−１）上の連結領域Ａ、Ｂは共通の長方形Ｒ１
に属するため、連結領域Ａも長方形Ｒ０に属することに
なる。Referring to FIG. 6, it is assumed that a connected (line segment) region D exists on line i. Connection (line segment) area D
Is connected to the connected (line segment) regions B and C on the line (i-1). In this case, the rectangle itself to which the connected (line segment) area D belongs has the same shape as the connected area D itself. According to the processing described above, the connection area B on the line (i-1),
By integrating the rectangles R1 and R2 to which C belongs and the rectangle (D itself) to which the connected region D on the line i belongs,
The rectangle R0 shown in FIG. 6 is obtained. As described above, the connecting regions A and B on the line (i-1) have the common rectangle R1.
Therefore, the connection area A also belongs to the rectangle R0.

【００４７】このようにして順次各ライン上の連結領域
と、１ライン前の連結領域との連結関係を調べ、互いの
属する長方形を順次統合していくことにより、画像上の
文字要素が属する領域が分離・結合されていく。したが
って、各連結領域が、その属する長方形に従って分離・
結合されていくことになる。In this way, the connection relationship between the connection area on each line and the connection area one line before is checked in order, and the rectangles to which they belong are sequentially integrated, whereby the area to which the character element on the image belongs Are separated and combined. Therefore, each connected area is separated according to the rectangle to which it belongs.
Will be combined.

【００４８】図７は、このようにして文字画像を、文字
要素である各連結領域に分離・結合した結果を示す。図
７に示されるように、文字画像「システ」が、その各文
字要素に分割されていく。FIG. 7 shows the result of separating / combining a character image into each connected area which is a character element in this way. As shown in FIG. 7, the character image "system" is divided into each character element.

【００４９】再び図１を参照して、ステップＳ３〜Ｓ５
においては、この発明にかかる文字列抽出方法に従って
文字列を抽出する処理が行なわれる。この処理において
は、ステップＳ２において取出された各文字要素を、数
学的に「同値類分解」と呼ばれる方法で複数個の同値類
に分類する処理が行なわれる。同値類分解のために、本
実施例の場合には、２つの文字要素について、「互いに
近い」という関係を以下のように定義しておく。ただ
し、この「互いに近い」という関係自身は、必ずしも同
値関係にならない。Referring again to FIG. 1, steps S3 to S5
In, the process of extracting a character string is performed according to the character string extracting method according to the present invention. In this process, each character element extracted in step S2 is classified into a plurality of equivalence classes by a method mathematically called "equivalence class decomposition". For the equivalence class decomposition, in the case of the present embodiment, the relationship “close to each other” is defined as follows for two character elements. However, the relationship of “close to each other” does not necessarily have an equivalence relationship.

【００５０】２つの文字要素にそれぞれ長方形ａ、ｂが
外接しているものとする。２つの長方形ａ、ｂの文字列
方向の中心点間の文字方向の距離をＬ、長方形ａ、ｂの
文字列方向の長さをｓ_a、ｓ_bとする。また、長方形
ａ、ｂ間の、文字列方向と垂直の方向の中心点間の、文
字列方向と垂直方向における距離をＨ、長方形ａ、ｂ
の、文字列方向と垂直方向の長さをそれぞれｈ_a、ｈ_b
とする。この場合、２つの文字要素は、以下の式が同時
に成り立つときに、「互いに近い」と定義する。It is assumed that rectangles a and b are circumscribing the two character elements, respectively. The distance in the character direction between the center points of the two rectangles a and b in the character string direction is L, and the lengths of the rectangles a and b in the character string direction are s _a and s _b . Further, the distance between the center points of the rectangles a and b in the direction perpendicular to the character string direction is H, and the distances between the rectangles a and b are the rectangles a and b.
The string length and the vertical length of h _a and h _b , respectively.
And In this case, two character elements are defined as “close to each other” when the following expressions are satisfied at the same time.

【００５１】Ｌ≦ｍａｘ｛ｓ_a、ｓ_b｝＊ｃ０Ｈ≦ｍａｘ｛ｈ_a、ｈ_b｝＊ｃ１ただし、ｃ０、ｃ１はそれぞれ定数である。本実施例の
場合、定数ｃ０としては「４」程度、定数ｃ１としては
「０．５」程度を用いた。ただしこの値はあくまで一例
である。[0051] _{L ≦ max {s a, s} b} * c0 H ≦ max {h a, h b} * c1 However, c0, c1 are each constant. In this embodiment, the constant c0 is about "4" and the constant c1 is about "0.5". However, this value is just an example.

【００５２】ステップＳ３において、整数配列｛Ｓｎ｝
を用意しておく。この整数配列｛Ｓｎ｝の要素の数は、
文字要素の数と同じである。そして、整数配列｛Ｓｎ｝
の各要素の値を、それぞれ互いに異なるように、すなわ
ちｉ≠ｊならＳ_i≠Ｓ_jとなるように初期化しておく。
たとえば、ｉ番目の要素Ｓｉの値をｉ（ｉ＝０、１、
２、…、ｎ）とすればよい。制御はステップＳ４に進
む。In step S3, an integer array {Sn}
Be prepared. The number of elements of this integer array {Sn} is
It is the same as the number of character elements. And the integer array {Sn}
The values of the respective elements are initialized so that they are different from each other, that is, if i ≠ j, then S _i ≠ S _j .
For example, if the value of the i-th element Si is i (i = 0, 1,
2, ..., N). The control proceeds to step S4.

【００５３】ステップＳ４においては、図３のフローチ
ャートに示す処理が行なわれる。ステップＳ４Ａにおい
て、画像中の、ステップＳ２によって取出された文字要
素から、２つの、互いに異なる文字要素の組合せを１回
ずつ取出す。ｉ番目の文字要素をＣｉと表わすことにす
れば、２つの文字要素の組合せは（Ｃｉ、Ｃｊ）（ｉ、
ｊ＝０、１、２、…、ｎ、ただしｉ≠ｊ）と表わされ
る。ステップＳ４Ｂにおいて、この組合せ（Ｃｉ、Ｃ
ｊ）が、前述の「互いに近い」という関係を満たすか否
かが判断される。判断の答がＹＥＳならば制御はステッ
プＳ４Ｄに、それ以外のときはステップＳ４Ｃにそれぞ
れ進む。ステップＳ４Ｃにおいては、文字要素Ｃｉ、Ｃ
ｊに対応する整数配列の要素Ｓｉ、Ｓｊの値を次のよう
に書換える。In step S4, the process shown in the flowchart of FIG. 3 is performed. In step S4A, two different combinations of character elements are extracted once from the character elements extracted in step S2 in the image. If the i-th character element is represented as Ci, the combination of the two character elements is (Ci, Cj) (i,
j = 0, 1, 2, ..., N, where i ≠ j). In step S4B, this combination (Ci, C
It is determined whether or not j) satisfies the above-mentioned "close to each other" relationship. If the determination result is YES, the control proceeds to step S4D, otherwise it proceeds to step S4C. In step S4C, the character elements Ci, C
The values of the elements Si and Sj of the integer array corresponding to j are rewritten as follows.

【００５４】（ａ）Ｓｉ＜Ｓｊであれば、整数配列
｛Ｓｎ｝の要素のうち、値がＳｊの値に等しいものの値
をＳｉの値に書換える。この場合、Ｓｊの値もＳｉに書
換える。(A) If Si <Sj, the value of the element of the integer array {Sn} whose value is equal to the value of Sj is rewritten to the value of Si. In this case, the value of Sj is also rewritten to Si.

【００５５】（ｂ）Ｓｉ＞Ｓｊであれば、整数配列
｛Ｓｎ｝の要素のうち、値がＳｉに等しい要素の値を、
Ｓｊの値に書換える。この場合、Ｓｉ自身の値もＳｊに
書換える。(B) If Si> Sj, the value of the element whose value is equal to Si among the elements of the integer array {Sn} is
Rewrite to the value of Sj. In this case, the value of Si itself is also rewritten to Sj.

【００５６】（ｃ）Ｓｉ＝Ｓｊであれば、何も行なわな
い。続いてステップＳ４Ｄにおいて、すべての文字要素
の組合せ（Ｃｉ、Ｃｊ）について処理が終わったか否か
の判断が行なわれる。判断の答がＮＯであれば制御は再
びステップＳ４Ａに戻り、さもなければこの部分の処理
は終了し、図１のステップＳ５の処理に進む。(C) If Si = Sj, nothing is done. Then, in step S4D, it is determined whether or not the processing has been completed for all combinations (Ci, Cj) of character elements. If the answer to the decision is NO, control returns to step S4A, otherwise the process for this part ends and the process proceeds to step S5 in FIG.

【００５７】ステップＳ４においては、上述のような場
合分けに従った処理が、すべての文字要素の組合せ
（対）について行なわれる。すべての文字要素の組合せ
について処理をし終わった時点で、整数配列｛Ｓｎ｝の
要素のうち、その値が共通な要素に対応する文字要素
は、同じ文字列に属すると判断してよい。したがって、
ステップＳ５において、整数配列｛Ｓｎ｝の要素のうち
同じ値を有する要素に対応する文字要素を集めることに
より、文字列を抽出できる。In step S4, the processing according to the above case classification is performed for all combinations (pairs) of character elements. When all combinations of character elements have been processed, the character elements corresponding to the elements having the same value among the elements of the integer array {Sn} may be determined to belong to the same character string. Therefore,
In step S5, the character string can be extracted by collecting the character elements corresponding to the elements having the same value among the elements of the integer array {Sn}.

【００５８】上述の関係について若干の説明を付け加え
ておく。２つの文字要素Ｃｉ、Ｃｊが「互いに近い」こ
とをＣｉ〜Ｃｊと表わす。ある整数の組ｎ１、ｎ２、
…、ｎｋおよびｍ１、ｍ２について、次の式に示される
関係が成り立つものとする。A little explanation will be added to the above relationship. The fact that the two character elements Ci and Cj are “close to each other” is represented as Ci to Cj. A set of integers n1, n2,
, Nk and m1 and m2, the relationship shown in the following equation holds.

【００５９】Ｃｍ１〜Ｃｎ１〜Ｃｎ２〜…〜Ｃｎｋ〜Ｃ
ｍ２これは、文字要素Ｃｍ１とＣｍ２とは必ずしも直接には
「互いに近い」という関係にはないが、一方から、それ
と「互いに近い」という関係にある文字要素を順次辿っ
ていくことによって、他方に到達できることを表わす。
もちろん、Ｃｍ１〜Ｃｍ２である場合もこれに含まれ
る。このような関係にあるＣｍ１、Ｃｍ２を「同じラベ
ルを持つ」と呼ぶものとする。Cm1 to Cn1 to Cn2 ... Cnk to C
m2 This does not necessarily mean that the character elements Cm1 and Cm2 are directly “close to each other”, but by sequentially tracing the character elements that are “close to each other” from one to the other, Indicates that you can reach.
Of course, the case of Cm1 to Cm2 is also included in this. Cm1 and Cm2 having such a relationship are referred to as “having the same label”.

【００６０】定義により、このときＣｍ１からＣｍ２へ
の経路に現われるＣｎ１、Ｃｎ２、差Ｃｎｋのうちのい
かなる２つの文字要素の組合せもやはり「同じラベルを
持つ」という関係にあることになる。By definition, any combination of two character elements among Cn1, Cn2 and the difference Cnk appearing in the path from Cm1 to Cm2 at this time also has a relation of “having the same label”.

【００６１】文字要素Ｃｍ１とＣｍ２とが「同じラベル
を持つ」という関係は、上記の説明中においては、ステ
ップＳ４における処理が終了したとき、整数配列｛Ｓ
ｎ｝の要素のうち、文字要素Ｃｍ１とＣｍ２とに対応す
る要素Ｓｍ１、Ｓｍ２の値が共通であるということによ
り表わされている。The relationship that the character elements Cm1 and Cm2 "have the same label" means that the integer array {S when the processing in step S4 is finished in the above description.
Of the elements of n}, the elements Sm1 and Sm2 corresponding to the character elements Cm1 and Cm2 have the same value.

【００６２】また、整数の組ｉ１、ｉ２、…、ｉｍ、ｊ
１、ｊ２、…、ｊｎ、およびｉ、ｊについて、以下の２
つの関係が成り立っているものとする。Further, a set of integers i1, i2, ..., Im, j
For j, 1, j2, ..., Jn, and i, j, the following 2
It is assumed that two relationships are established.

【００６３】Ｃｉ１〜Ｃｉ２〜…〜Ｃｉｍ〜ＣｉＣｊ〜Ｃｊ１〜Ｃｊ２〜…〜Ｃｊｍこのことは、ステップＳ４の処理途中において、Ｃｉ
１、Ｃｉ２、…、Ｃｉｍに対応する整数配列｛Ｓｎ｝の
要素の値がＣｉに対応する要素Ｓｉの値に等しく、Ｃｊ
１，Ｃｊ２、…、Ｃｊｍに対応する要素の値がＣｊに対
応する要素Ｓｊの値に等しいという状態に対応する。Ci1-Ci2 -...- Cim-Ci Cj-Cj1-Cj2 -...- Cjm This means that Ci during the process of step S4.
The value of the element of the integer array {Sn} corresponding to 1, Ci2, ..., Cim is equal to the value of the element Si corresponding to Ci, and Cj
, Cj2, ..., Cjm correspond to the value of the element Sj corresponding to Cj.

【００６４】このとき、Ｃｉ〜Ｃｊがわかったとする
と、次の関係が得られる。Ｃｉ１〜Ｃｉ２〜…〜Ｃｉｍ〜Ｃｉ〜Ｃｊ〜Ｃｊ１〜Ｃ
ｊ２〜…〜Ｃｊｍこのことは、ステップＳ４Ｃ（図３参照）において、２
つの文字要素Ｃｉ、Ｃｊが「互いに近い」という関係を
満たすとき、それらに対応する整数配列｛Ｓｎ｝のう
ち、Ｓｉ＜ＳｊであればＳｊと同じ値の強さの値をＳｉ
の値に書換え、Ｓｉ＞Ｓｊであれば、Ｓｉと同じ値を持
つ要素の値をＳｊの値に書換えることで表わされてい
る。At this time, assuming that Ci to Cj are known, the following relationship is obtained. Ci1-Ci2 -...- Cim-Ci-Cj-Cj1-C
j2 -...- Cjm This means that in step S4C (see FIG. 3), 2
When two character elements Ci and Cj satisfy the relationship of being “close to each other”, in the integer array {Sn} corresponding to them, if Si <Sj, the strength value having the same value as Sj is set to Si.
, And if Si> Sj, the value of the element having the same value as Si is rewritten to the value of Sj.

【００６５】すでに述べたように、「互いに近い」とい
う関係は必ずしも数学的には同値関係ではないが、それ
を用いて定義した「同じラベルを持つ」という関係は同
値関係の一種である。As described above, the relationship of “close to each other” is not necessarily an equivalence relationship mathematically, but the relationship of “having the same label” defined by using it is a kind of equivalence relationship.

【００６６】上述のように定義した「同じラベルを持
つ」という関係は、数学的には同値関係の一種である。
文字要素を、互いの間でこの同値関係が成り立つか否か
によって分類していく方法は、数学でいう同値類分解と
呼ばれる作業である。同値類分解においては、処理結果
が処理順序によらないという著しい特徴がある。上述の
実施例の場合には、画像中の文字要素が「互いに近い」
かどうかを判定するために、２つの文字要素を取出す順
序がどのようなものであれ、得られる結果は同一であ
る。The relationship "having the same label" defined above is mathematically a kind of equivalence relationship.
A method of classifying character elements according to whether or not this equivalence relation holds between them is a work called equivalence class decomposition in mathematics. The equivalence class decomposition has a remarkable feature that the processing result does not depend on the processing order. In the case of the above-described embodiment, the character elements in the image are “close to each other”.
The result obtained is the same no matter what order the two character elements are taken to determine whether or not.

【００６７】続いてステップＳ６において、ステップＳ
５までの処理で抽出された文字列は、文字列方向に適宜
分割することにより、個別の文字を取出す処理が行なわ
れる。制御はステップＳ７に進む。Then, in step S6, step S
The character string extracted by the processes up to 5 is appropriately divided in the character string direction to perform the process of extracting individual characters. The control proceeds to step S7.

【００６８】ステップＳ７においては、ステップＳ６で
得られた個別の文字と、予め準備されていた文字標準パ
ターン（図３参照）とをマッチングし、処理対象となる
文字と最も類似している文字標準パターンを認識結果と
して得る処理が行なわれる。制御はステップＳ８に進
む。In step S7, the individual character obtained in step S6 is matched with the prepared character standard pattern (see FIG. 3), and the character standard most similar to the character to be processed is set. Processing for obtaining the pattern as a recognition result is performed. The control proceeds to step S8.

【００６９】ステップＳ８においては、ステップＳ７に
おいて認識された文字に対し、必要な修正や整形を行な
う認識結果後処理が行なわれる。制御はステップＳ９に
進む。In step S8, post-recognition result processing is carried out for the characters recognized in step S7 to make necessary corrections and shaping. The control proceeds to step S9.

【００７０】ステップＳ９においては、ステップＳ８に
よって得られた最終認識結果を、使用者に可読な形、た
とえば印刷文字あるいはディスプレイ表示のような形で
出力する処理が行なわれる。そしてすべての文字につい
てステップＳ９までの処理が行なわれてこのプログラム
が終了する。In step S9, a process of outputting the final recognition result obtained in step S8 in a user-readable form, for example, a print character or a display display form is performed. Then, the process up to step S9 is performed for all the characters, and the program ends.

【００７１】図８〜図１２は、文字要素として画像中の
連結領域をとった場合に、本発明に従った方法によりど
のように文字列が抽出されるかを示す模式図である。以
下の説明中においては、文字要素Ｃｉに対応する、整数
配列｛Ｓｎ｝の要素をＳｉと表現する。FIGS. 8 to 12 are schematic diagrams showing how a character string is extracted by the method according to the present invention when a connected region in an image is taken as a character element. In the following description, the element of the integer array {Sn} corresponding to the character element Ci is expressed as Si.

【００７２】図８を参照して、画像中の各文字要素（連
結領域）に与えられるＳｉが、それぞれ異なった値に初
期化される。この処理はステップＳ３において行なわれ
る。図８においては、全部で１１個の連結領域に０〜１
１という値がそれぞれ割り当てられる。With reference to FIG. 8, Si given to each character element (connected region) in the image is initialized to a different value. This process is performed in step S3. In FIG. 8, 0 to 1 are added to 11 connecting regions in total.
A value of 1 is assigned to each.

【００７３】図９を参照して、文字要素Ｃ０、Ｃ３は互
いに近いという関係を満たす。確認のために述べておく
と、前述の定義による「互いに近い」という関係は、画
像上においては、２つの連結領域が属する長方形の間の
距離が所定の値よりも小さい場合に成り立つと考えられ
る。文字要素Ｃ０、Ｃ３が互いに近いため、図１のステ
ップＳ４の処理に従い、Ｓ０、Ｓ３の値が比較される。
この場合はＳ０＜Ｓ３が成り立つ。すなわち、前述の条
件でいえば（ａ）が成立する。そのため、Ｓ３と共通の
値を持つＳｉの値をすべてＳ０の値に書換える。この場
合、Ｓ３に等しい値を持つようなＳｉは、Ｓ３の他には
存在しない。そのため、Ｓ０の値（０）に書換えられる
のは、Ｓ３のみである。書換えの結果が図９に示されて
いる。Referring to FIG. 9, the character elements C0 and C3 are close to each other. For confirmation, it is considered that the relationship of “close to each other” according to the above definition is established when the distance between rectangles to which two connected regions belong is smaller than a predetermined value on the image. .. Since the character elements C0 and C3 are close to each other, the values of S0 and S3 are compared according to the processing of step S4 of FIG.
In this case, S0 <S3 holds. That is, (a) is satisfied under the above conditions. Therefore, all Si values having the same value as S3 are rewritten to S0. In this case, there is no Si other than S3 that has a value equal to S3. Therefore, only S3 is rewritten to the value (0) of S0. The result of rewriting is shown in FIG.

【００７４】図１０は、引続き行なわれる処理の途中状
態を示す図である。図１０の状態となる以前に、文字要
素Ｃ０、Ｃ１、Ｃ３、Ｃ４が「互いに近い」という関係
にあることが確認され、それにしたがってＳｉが書換え
られている。さらに、図１０に示されるように、文字要
素Ｃ４、Ｃ８が「互いに近い」という関係を満たすこと
が確認されたものとする。このとき、Ｓ４＝０、Ｓ８＝
７であって、Ｓ４＜Ｓ８が成立する。前述の条件（ａ）
に従って、Ｓ８（＝７）に等しい値を持つＳｉを、すべ
てＳ４の値で書換える。その結果、図１１に示されるよ
うに、画像中の文字列「システム」に含まれる各文字要
素のすべてには、対応する整数として０が割り当てられ
る。FIG. 10 is a diagram showing an intermediate state of the processing to be performed subsequently. Before reaching the state of FIG. 10, it has been confirmed that the character elements C0, C1, C3, and C4 have a relationship of being “close to each other”, and Si is rewritten accordingly. Furthermore, as shown in FIG. 10, it is assumed that it is confirmed that the character elements C4 and C8 satisfy the relationship of being “close to each other”. At this time, S4 = 0, S8 =
7 and S4 <S8 is satisfied. The above condition (a)
According to, all Si having a value equal to S8 (= 7) is rewritten with the value of S4. As a result, as shown in FIG. 11, 0 is assigned as a corresponding integer to each of the character elements included in the character string “system” in the image.

【００７５】図１２を参照して、画像中の「システム」
という文字列中の文字要素と、「メニュー」という文字
列中の文字要素とのいずれの間にも、「互いに近い」と
いう関係が成立しない。したがって、この画像中の任意
の２つの文字要素のすべての組合せについて上述の処理
を行なうことにより、最終結果として図１２に示される
ようなＳｉの値を得ることができる。画像中のすべての
文字要素は、前述のようにその文字要素に対応付けられ
ている整数の値によって分類される。図８〜図１２に示
される例では、各文字要素は対応する整数として０を持
つものと２を持つものとに分類され、それぞれ「システ
ム」、「メニュー」という文字列となる。Referring to FIG. 12, "system" in the image
There is no relation "close to each other" between the character element in the character string "" and the character element in the character string "menu". Therefore, by performing the above-described processing for all combinations of arbitrary two character elements in this image, the value of Si as shown in FIG. 12 can be obtained as the final result. All character elements in the image are classified by the integer value associated with the character element as described above. In the examples shown in FIGS. 8 to 12, the respective character elements are classified into those having 0 and 2 as the corresponding integers, and the character strings are “system” and “menu”, respectively.

【００７６】ステップＳ６以下においては、図１２に得
られた結果の文字列から、その文字列を適宜分割するこ
とによって個別の文字を取出し、それぞれの文字をマッ
チングによって認識するという処理が行なわれる。In step S6 and subsequent steps, a process is carried out in which individual characters are extracted from the resulting character string obtained in FIG. 12 by appropriately dividing the character string, and each character is recognized by matching.

【００７７】以上述べたようにこの発明にかかる文字列
抽出方法においては、同値類分解という手法によって文
字列の抽出が行なわれた。同値類分解による処理結果は
その処理順序には依存せず、安定した処理結果を得るこ
とができる。また、隣接する文字要素の間では、前述の
「互いに近い」という関係に対して、画像の傾きが与え
る影響はごく微小である。画像の傾きが上述の文字列抽
出に対して与える影響は小さい。したがって、従来技術
による方法よりも、入力画像の傾きの広い範囲に対し
て、適性な文字列抽出が行なえるという効果がある。As described above, in the character string extracting method according to the present invention, the character string is extracted by the method called equivalence class decomposition. The processing result by equivalence class decomposition does not depend on the processing order, and a stable processing result can be obtained. Further, between the adjacent character elements, the influence of the inclination of the image on the above-mentioned "close to each other" is very small. The influence of the image inclination on the above-mentioned character string extraction is small. Therefore, as compared with the method according to the related art, there is an effect that a proper character string can be extracted for a wide range of inclination of an input image.

【００７８】なお、この発明が上述の実施例に基づいて
説明されたが、この発明は必ずしも上述の実施例に限定
されるわけではない。たとえば、「互いに近い」という
関係の定義は、上述の説明中に挙げたものに限らず、２
つの文字要素が異なる文字列に属するときには一般に成
り立たず、２つの文字要素が同一の文字列に属しており
かつ隣合っているときに一般に成り立つようてものであ
ればどのように定めてもよい。Although the present invention has been described based on the above embodiment, the present invention is not necessarily limited to the above embodiment. For example, the definition of the relationship “close to each other” is not limited to the definition given in the above description, and
It does not generally hold when two character elements belong to different character strings, and may generally be set when two character elements belong to the same character string and are adjacent to each other.

【００７９】また、上述の整数配列の要素の値の書換え
方法も、説明したような場合分け（ａ）〜（ｃ）に挙げ
られたもののみに限らない。たとえば、（ａ）〜（ｃ）
に挙げられた式中において、記号「＞」と「＜」とを入
れ換えても差し支えない。The method of rewriting the values of the elements of the above-mentioned integer array is not limited to the methods described in the case classifications (a) to (c) as described above. For example, (a)-(c)
In the formula given in (1), the symbols “>” and “<” may be interchanged.

【００８０】以上この発明を実施例に基づいて詳細に説
明したが、この発明は上述の実施例には限定されず、こ
れ以外にも様々な変形を加えて実施可能であることはい
うまでもない。Although the present invention has been described in detail based on the embodiments, it is needless to say that the present invention is not limited to the above-mentioned embodiments and can be implemented with various modifications. Absent.

【００８１】[0081]

【発明の効果】以上のように請求項１に記載の文字列抽
出方法によれば、画像中の連結領域に外接する枠を特定
するための情報が、隣接する２つの走査線信号に含まれ
る連結線分領域の連結関係を調べることによって得られ
る。従来のように画像上のすべての画素につきどの連結
領域に属するかを記憶しておく必要がなく、必要な記憶
容量を大幅に減少することができる。また、得られた外
接枠情報の間に同値関係を成立するか否かを調べること
により、画像に含まれる連結領域が同値類に分類され、
同一の同値類に属する連結領域が１つの文字列を構成す
る要素として抽出される。この分類過程は処理順序に関
係なく一定の結果が得られ、安定した処理結果を得るこ
とができる。その結果、少ない記憶容量で、安定した処
理結果を得ることができる、文字列抽出方法を提供する
ことができる。As described above, according to the character string extracting method of the first aspect, the information for specifying the frame circumscribing the connected area in the image is included in the two adjacent scanning line signals. It is obtained by examining the connection relation of the connection line segment areas. Unlike the conventional case, it is not necessary to store which connected region all pixels on the image belong to, and the required storage capacity can be greatly reduced. Further, by checking whether or not an equivalence relation is established between the obtained circumscribing frame information, the connected region included in the image is classified into an equivalence class,
Connected regions that belong to the same equivalence class are extracted as elements that form one character string. In this classification process, a constant result can be obtained regardless of the processing order, and a stable processing result can be obtained. As a result, it is possible to provide a character string extraction method that can obtain a stable processing result with a small storage capacity.

【００８２】請求項２に記載の文字列抽出方法によれ
ば、抽出された文字要素の間に、予め定める同値関係が
成立するか否かを調べることにより、文字要素を同値類
に分類する処理が行なわれる。そして、同一の同値類に
属する文字要素が、１つの文字列を構成する要素として
抽出される。同値関係の正否に基づいて同値類に分類す
る処理は、その処理順序に依存せず、安定した処理結果
を得ることができる。According to the character string extraction method of the second aspect, a process of classifying character elements into equivalence classes by checking whether or not a predetermined equivalence relation is established between the extracted character elements. Is performed. Then, the character elements belonging to the same equivalence class are extracted as the elements constituting one character string. The processing of classifying into the equivalence class based on whether the equivalence relation is correct or not can obtain a stable processing result without depending on the processing order.

【００８３】請求項３に記載の文字領域検出方法によれ
は、画像中の連結領域に外接する外接枠を特定するため
の情報が、隣接する２つの走査線信号に含まれる連結線
分領域の連結関係を調べることのみによって行なえる。
画像中のすべての画素について、その属する連結領域を
記憶する必要がなく、外接枠情報を得るために必要とさ
れる記憶容量が大幅に削減できる。According to the character area detecting method of the third aspect, the information for specifying the circumscribing frame circumscribing the connecting area in the image is the connecting line segment area included in the two adjacent scanning line signals. It can be done only by examining the connection relationship.
It is not necessary to store the connected area to which all the pixels in the image belong, and the storage capacity required to obtain the circumscribing frame information can be greatly reduced.

[Brief description of drawings]

【図１】図１は、本発明にかかる文字列抽出方法を行な
うためのプログラムのフローチャートである。FIG. 1 is a flowchart of a program for carrying out a character string extracting method according to the present invention.

【図２】図２は、図１に示される、画像中の文字要素を
取出す処理を実現するためのプログラムのフローチャー
トである。FIG. 2 is a flowchart of a program for realizing a process of extracting a character element in an image shown in FIG.

【図３】図３は、図１に示される文字列を取出す処理を
実現するためのプログラムのフローチャートである。FIG. 3 is a flowchart of a program for realizing the process of extracting the character string shown in FIG.

【図４】図４は、画像中の２つの走査線に含まれるライ
ン上の連結領域間の連結関係を示す模式図である。FIG. 4 is a schematic diagram showing a connection relationship between connection regions on lines included in two scanning lines in an image.

【図５】図５は、異なる連結領域に含まれる、ライン上
の連結領域およびそれら連結領域が属する長方形を示す
模式図である。FIG. 5 is a schematic diagram showing connected areas on a line and rectangles to which the connected areas belong, which are included in different connected areas.

【図６】図６は、図５に示される２つの連結領域の属す
る長方形が、ラインｉ上の連結領域Ｄの存在によって互
いに長方形Ｒ０に融合される状態を示す模式図である。6 is a schematic diagram showing a state in which the rectangles to which the two connected regions shown in FIG. 5 belong are merged into the rectangle R0 due to the presence of the connected region D on the line i.

【図７】図７は、画像から抽出される連結領域を示す模
式図である。FIG. 7 is a schematic diagram showing a connected region extracted from an image.

【図８】図８は、画像上の各連結領域に割り当てられる
整数値が初期化された状態を示す模式図である。FIG. 8 is a schematic diagram showing a state in which an integer value assigned to each connected region on an image is initialized.

【図９】図９は、文字要素Ｃ０、Ｃ３の間に「互いに近
い」という関係が成り立つ場合の、文字列抽出の途中経
過を示す模式図である。FIG. 9 is a schematic diagram showing an intermediate process of character string extraction in the case where the relationship “close to each other” is established between the character elements C0 and C3.

【図１０】図１０は、文字要素Ｃ４、Ｃ８が「互いに近
い」という関係を満たす場合の、画像の模式図である。FIG. 10 is a schematic diagram of an image when the character elements C4 and C8 satisfy the relationship of being “close to each other”.

【図１１】図１１は、文字要素Ｃ４、Ｃ８に「互いに近
い」という関係が成り立った場合に、各文字要素に割り
当てられている整数値を更新した後の状態を示す模式図
である。FIG. 11 is a schematic diagram showing a state after updating an integer value assigned to each character element when the relationship “close to each other” is established for the character elements C4 and C8.

【図１２】図１２は、本発明にかかる文字列抽出方法に
より、文字列が抽出された結果を示す模式図である。FIG. 12 is a schematic diagram showing a result of extracting a character string by the character string extracting method according to the present invention.

【図１３】図１３は、本発明にかかる文字列抽出方法を
適用した、文字認識装置のブロック図である。FIG. 13 is a block diagram of a character recognition device to which a character string extraction method according to the present invention is applied.

【図１４】図１４は、文字要素抽出部２のより詳細な模
式的ブロック図である。FIG. 14 is a more detailed schematic block diagram of the character element extraction unit 2.

【図１５】図１５は、文字列抽出部３の、より詳細な模
式的ブロック図である。FIG. 15 is a more detailed schematic block diagram of the character string extraction unit 3.

[Explanation of symbols]

１画像入力部２文字要素抽出部３文字列抽出部４個別文字抽出部５文字マッチング部６認識結果後処理部７最終結果出力部８文字標準パターン 1 image input part 2 character element extraction part 3 character string extraction part 4 individual character extraction part 5 character matching part 6 recognition result post-processing part 7 final result output part 8 character standard pattern

───────────────────────────────────────────────────── フロントページの続き (72)発明者桑田みな子大阪市阿倍野区長池町22番22号シヤープ株式会社内 (72)発明者竹原和宏大阪市阿倍野区長池町22番22号シヤープ株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Minako Kuwata 22-22 Nagaike-cho, Abeno-ku, Osaka-shi, Sharp Corporation (72) Kazuhiro Takehara 22-22 Nagaike-cho, Abeno-ku, Osaka City, Sharp Corporation

Claims

[Claims]

1. A step of reading an image and converting it into a plurality of continuous binary scan line signals, and sequentially analyzing the binary scan line signals that are sequentially applied to precede the first binary scan line signal. Obtaining circumscribing frame information for specifying the shape of the circumscribing frame, which circumscribes the connected region represented by the binary scanning line signal earlier, and holds the circumscribing frame information; A connection line segment area included in the second binary scan line signal immediately following the value scan line signal is detected, and the circumscribing frame information of the connection area having a connection relationship with the connection line segment area is updated. Together with the step of detecting the presence of the connected area that is not connected to any of the connected line segment areas, and outputting the circumscribing frame information, and detecting the end of the image, all the held Outputs the information of the circumscribed frame And a step of classifying the connected region inscribed in each of the circumscribing frame information into equivalence classes by checking whether or not a predetermined equivalence relation is established between the step and any two of the circumscribing frame information, And a step of extracting the connected regions belonging to the same equivalence class as constituent elements of one character string.

2. A step of extracting a character element from an image, and a step of classifying the character element into an equivalence class by checking whether or not a predetermined equivalence relation is established between each of the extracted character elements. And a step of extracting the character elements belonging to the same equivalence class as constituent elements of one character string.

3. A step of reading an image and converting it into a plurality of continuous binary scan line signals, and sequentially analyzing the binary scan line signals that are sequentially applied to precede the first binary scan line signal. Obtaining circumscribing frame information for specifying the shape of the circumscribing frame, which circumscribes the connected region represented by the binary scanning line signal earlier, and holds the circumscribing frame information; A connection line segment area included in the second binary scan line signal immediately following the value scan line signal is detected, and the circumscribing frame information of the connection area having a connection relationship with the connection line segment area is updated. Together with the step of detecting the presence of the connected area that is not connected to any of the connected line segment areas, and outputting the circumscribing frame information, and detecting the end of the image, all the held Output the information of the circumscribed frame A character area detection method including a step.