JPH05266250A

JPH05266250A - Character string detector

Info

Publication number: JPH05266250A
Application number: JP4065087A
Authority: JP
Inventors: Masami Oguro; 雅己小黒; Osamu Nakamura; 修中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1992-03-23
Filing date: 1992-03-23
Publication date: 1993-10-15

Abstract

PURPOSE:To provide a character string detector capable of detecting required character strings at a high speed without performing excess recognition processing such as deleting objects not to be recognized. CONSTITUTION:This detector is provided with as pixel selection means 1 selecting either of white and black picture elements among a binary pictures, means 2 scanning image data in a transverse direction and determining the number of picture elements in the vertical direction of the picture element selected by the means 1, means 3 scanning the image data in a longitudinal direction and determining the number of picture elements in the horizontal direction of the picture element selected by the means 1, coordinate detecting means 4 detecting the four coordinate points where the difference of the picture element number becomes maximal, a verifying means 5 comparing the coordinate value with the threshold value and verifying whether it is the size of the character string to be processed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は文字列検出装置に係り、
特に、文書中の文字列を読み取り、特定の文字列のみを
認識するために、文書中から文字列を探し出して、認識
する場合の文字列検出を行う文字列検出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character string detecting device,
In particular, the present invention relates to a character string detection device that searches a character string in a document and detects the character string when the character string is read in the document to recognize only a specific character string.

【０００２】[0002]

【従来の技術】従来の文字列検出装置は、２値画像デー
タから特定の文字列を検出するには、文字列の先頭等に
特定のマークを付与し、そのマークを認識して検出する
ことにより、文字列を検出する方法（以下、マーク検出
方式）がある。2. Description of the Related Art In order to detect a specific character string from binary image data, a conventional character string detecting device must add a specific mark to the beginning of the character string and recognize and detect the mark. , There is a method of detecting a character string (hereinafter, a mark detection method).

【０００３】また、画像内の文字列の意味する情報を元
に、文字列を分類し、文字認識により、文字列を認識、
理解し、処理対象文字列を抽出する方法（以下、文字列
理解方式と呼ぶ）がある。これは、例えば、対象文字列
のみが数字で構成されている場合、画像内のすべての文
字を認識して、数字と認識された文字列を抽出する方法
である。Further, the character strings are classified based on the information that the character strings in the image mean, and the character strings are recognized by character recognition.
There is a method of understanding and extracting a processing target character string (hereinafter referred to as a character string understanding method). This is a method of recognizing all the characters in an image and extracting a character string recognized as a number when only the target character string is composed of numbers, for example.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、従来の
技術では、マーク検出方式、文字列理解方式のいずれの
場合においても、２値画像から特定の文字列を検出する
ために、画像内から文字を１文字単位に切り出すことに
より、文字矩形を抽出し、全ての文字矩形に対して文字
認識を行う必要がある。このため、本来文字認識を行う
必要のない文字に対しても文字の切り出し、及び、認識
処理を行うため、冗長な処理量が存在し、処理時間が増
大するという問題がある。However, in the conventional techniques, in order to detect a specific character string from a binary image, a character is detected from the image in both the mark detection method and the character string understanding method. It is necessary to extract a character rectangle by cutting out in units of one character and perform character recognition for all character rectangles. For this reason, since character cutting and recognition processing are performed even for characters that originally do not need to be recognized, there is a problem that a redundant processing amount exists and processing time increases.

【０００５】一方、画像内に罫線、飾り等が存在する場
合、これらは認識する必要がないため、罫線等を除去す
る処理が必要となり、処理時間増加の要因となる。On the other hand, if there are ruled lines, decorations, etc. in the image, it is not necessary to recognize them, so a process for removing the ruled lines, etc. is required, which causes an increase in processing time.

【０００６】本発明は上記の点に鑑みなされたもので、
余分な認識処理を行わず、認識する対象外のものを消去
する作業を行わずに高速に必要な文字列を検出すること
が可能な文字列検出装置を提供することを目的とする。The present invention has been made in view of the above points,
An object of the present invention is to provide a character string detection device capable of detecting a necessary character string at high speed without performing an extra recognition process and erasing an unrecognized object.

【０００７】[0007]

【課題を解決するための手段】本発明は、イメージ入力
装置で入力された予め処理対象とすべき文字列と、処理
対象外の文字列を区別して表現される２値画像のイメー
ジデータから処理対象の文字列を検出する文字列検出装
置において、２値画像のうち、画素または、黒画素のう
ちいずれかの画素を選択する画素選択手段と、イメージ
データを横方向に走査し、画素選択手段で選択された画
素の垂直方向の画素数を求める手段と、イメージデータ
を縦方向に走査し、画素選択手段で選択された画素の水
平方向の画素数を求める手段と、矩形を構成する任意の
位置の画素数の差が最大となる座標位置を４点検出する
座標検出手段と、座標検出手段によって得られた矩形の
座標値と所定の閾値と対比することにより処理対象とす
る文字列の大きさであるかを検定する検定手段とを有す
る。According to the present invention, processing is performed from image data of a binary image which is expressed by distinguishing a character string to be processed in advance inputted from an image input device and a character string not to be processed. In a character string detecting device for detecting a target character string, a pixel selecting means for selecting one of a pixel and a black pixel in a binary image, and a pixel selecting means for horizontally scanning image data. Means for obtaining the number of pixels in the vertical direction of the pixels selected by, and means for scanning the image data in the vertical direction to obtain the number of pixels in the horizontal direction of the pixels selected by the pixel selection means; The size of the character string to be processed by comparing the coordinate value of the rectangle obtained by the coordinate detecting unit with the predetermined threshold value, and the coordinate detecting unit that detects four coordinate positions where the difference in the number of position pixels is maximum. It And a test means to test whether or not there.

【０００８】また、本発明は、画素選択手段で黒画素が
選択された場合に、座標検出手段は、第１の任意の位置
での垂直方向の黒画素数と第１の任意の位置の直左の垂
直方向の黒画素数の差分が最大となる第１の任意の位置
と、第２の任意の位置での垂直方向の黒画素数と第２の
任意の位置の直右の垂直方向の黒画素数との差分が最大
となる第２の任意の位置と、第３の任意の位置での水平
方向の黒画素数と第３の任意の位置の直上の水平方向の
黒画素数との差分が最大となる第３の任意の位置と、第
４の任意の位置での水平方向の黒画素数と第４の任意の
位置の直下の水平方向の黒画素との差分が最大となる第
４の任意の位置の各位置から矩形の各座標を求める。Further, according to the present invention, when the black pixel is selected by the pixel selecting means, the coordinate detecting means determines the number of black pixels in the vertical direction at the first arbitrary position and the direct number of the first arbitrary position. The number of black pixels in the vertical direction at the first arbitrary position and the second arbitrary position where the difference in the number of black pixels in the left vertical direction is the maximum, and the number of black pixels in the vertical direction immediately to the right of the second arbitrary position The second arbitrary position having the maximum difference from the black pixel number, the horizontal black pixel number at the third arbitrary position, and the horizontal black pixel number immediately above the third arbitrary position. The third arbitrary position where the difference is maximum and the horizontal black pixel number at the fourth arbitrary position and the horizontal black pixel immediately below the fourth arbitrary position where the difference is maximum. The respective coordinates of the rectangle are obtained from the respective arbitrary positions of 4.

【０００９】[0009]

【作用】本発明は、処理対象とすべき文字列、または、
非処理対象の文字列を予め白抜き文字で印刷し、これを
スキャナ等で入力した画像イメージデータに対して、横
方向及び縦方向に走査し、それぞれ垂直方向の黒画素数
または、白画素数のいずれか一方の画素数と、水平方向
の黒画素数または、白画素数のいずれか一方の画素数を
求め、任意の４位置で囲まれる矩形領域を処理対象領域
として検出することにより、その処理対象領域について
のみ、文字の切り出しや、文字認識を行う。これによ
り、処理対象領域以外の部分の文字の切り出しや、文字
認識を行う必要がないため、処理時間を浪費することが
ない。According to the present invention, a character string to be processed, or
The unprocessed character string is printed in white characters in advance, and this is scanned in the horizontal and vertical directions with respect to the image image data input by a scanner, etc., and the number of black pixels or the number of white pixels in the vertical direction, respectively. The number of pixels of one of the two and the number of pixels of the black pixels or the number of white pixels in the horizontal direction are obtained, and a rectangular area surrounded by arbitrary four positions is detected as a processing target area. Character cutting and character recognition are performed only for the processing target area. As a result, it is not necessary to cut out characters in a portion other than the processing target area or perform character recognition, so that processing time is not wasted.

【００１０】[0010]

【実施例】以下図面を用いて本発明の実施例を説明す
る。本実施例では処理対象となる文字列は白抜きされて
いるものとする。Embodiments of the present invention will be described below with reference to the drawings. In this embodiment, it is assumed that the character string to be processed is outlined.

【００１１】図１は本発明の一実施例のシステム構成を
示すブロック図である。本発明の文字列検出装置は、白
画素及び黒画素のいずれかを選択し、処理対象とする画
素選択部１（本実施例では黒画素を選択）と、画像イメ
ージデータを水平方向に走査する水平方向ヒストグラム
生成部２、画像イメージデータを垂直方向に走査する垂
直方向ヒストグラム生成部３と、任意の一での垂直方向
及び水平方向の黒画素数の最大差分を検出する最大差分
検出部４と、最大差分検出部４で得られた矩形により、
処理対象とする矩形を検出する領域検定部５より構成さ
れる。FIG. 1 is a block diagram showing the system configuration of an embodiment of the present invention. The character string detection device of the present invention selects either a white pixel or a black pixel, and scans the image selection data 1 (the black pixel is selected in the present embodiment) and the image image data in the horizontal direction. A horizontal direction histogram generation unit 2, a vertical direction histogram generation unit 3 that scans image image data in the vertical direction, and a maximum difference detection unit 4 that detects a maximum difference in the number of black pixels in the vertical direction and the horizontal direction in any one. , By the rectangle obtained by the maximum difference detection unit 4,
It is composed of an area inspection unit 5 which detects a rectangle to be processed.

【００１２】図２は本発明の一実施例の動作を説明する
ための図である。図１と共に本発明の文字列検出装置の
動作について説明する。FIG. 2 is a diagram for explaining the operation of one embodiment of the present invention. The operation of the character string detection device of the present invention will be described with reference to FIG.

【００１３】先ず、文字列検出装置に動作する以前
に、入力される画像イメージデータは、処理対象とすべ
き文字列を予め白抜き文字で印刷し、これをスキャナ等
で入力する場合に、画素選択部１において、黒画素を選
択し処理対象とする。First, before the operation of the character string detection device, the input image image data is such that the character string to be processed is printed in white characters in advance, and when the character string is input by a scanner or the like, the pixel The selection unit 1 selects a black pixel to be a processing target.

【００１４】水平方向ヒストグラム生成部２は、ス
キャナ等のイメージ入力装置で入力された２値データを
水平方向（図２（Ｘ）方向）に走査する。この走査に伴
って、各ライン毎に、そのラインに存在する黒画素数を
垂直方向にカウントする。図２のａで示されるのは、カ
ウントしたＸ方向ヒストグラムである。The horizontal histogram generation unit 2 scans the binary data input by an image input device such as a scanner in the horizontal direction (direction (X) in FIG. 2). Along with this scanning, the number of black pixels existing in each line is counted in the vertical direction. Shown in FIG. 2a is the counted X-direction histogram.

【００１５】次に、垂直方向ヒストグラム生成部３
は、垂直方向（図２（Ｙ）方向）に走査し、各ライン毎
にそのラインに存在する黒画素数を水平方向にカウント
し、図２のｂで示されるのは、カウントしたＹ方向ヒス
トグラムである。Next, the vertical histogram generation unit 3
Scans in the vertical direction (Y direction in FIG. 2) and horizontally counts the number of black pixels existing in that line for each line, and FIG. 2B shows the counted Y direction histogram. Is.

【００１６】最大差分検出部４は、Ｘ方向ヒストグ
ラムａで任意の隣合う位置ｉ，ｊ（但し、ｉは左側）の
黒画素数をＸｂｌｏｃｋ（ｉ），Ｘｂｌｏｃｋ（ｊ）と
した時、Ｘｂｌｏｃｋ（ｊ）− Ｘｂｌｏｃｋ（ｉ）が最大となる位置を左端座標Ｘ１（図２）に格納し、Ｘｂｌｏｃｋ（ｉ）− Ｘｂｌｏｃｋ（ｊ）が最大となる位置を右端座標Ｘ２（図２）に格納する。When the number of black pixels at any adjacent position i, j (where i is the left side) in the X-direction histogram a is Xblock (i), Xblock (j), the maximum difference detection unit 4 defines Xblock ( The position where j) −Xblock (i) is maximum is stored in the left end coordinate X1 (FIG. 2), and the position where Xblock (i) −Xblock (j) is maximum is stored in the right end coordinate X2 (FIG. 2).

【００１７】図２のｃに示すＸ方向ヒストグラムの差分
のグラフ中、ｅは、Ｘｂｌｏｃｋ（ｊ）− Ｘｂｌｏｃ
ｋ（ｉ）が最大となる位置であり、ｆは、Ｘｂｌｏｃｋ
（ｉ）− Ｘｂｌｏｃｋ（ｊ）が最大となる位置であ
る。In the difference graph of the X-direction histogram shown in FIG. 2C, e is Xblock (j) -Xblock.
k (i) is the maximum position, and f is Xblock
(I) − This is the position where Xblock (j) is maximum.

【００１８】と同様に、最大差分検出部４はＹ方
向ヒストグラムｂにおいても、任意の上下に隣合う位置
ｍ，ｎ（但し、ｍは上）の黒画素数をＹｂｌｏｃｋ
（ｍ），Ｙｂｌｏｃｋ（ｎ）とした時、Ｙｂｌｏｃｋ（ｎ）− Ｙｂｌｏｃｋ（ｍ）が最大となる位置を上端座標Ｙ１（図２）に格納し、Ｙｂｌｏｃｋ（ｍ）− Ｙｂｌｏｃｋ（ｎ）が最大となる位置を下端座標Ｙ２（図２）に格納する。Similarly to the above, the maximum difference detection unit 4 also determines the number of black pixels at arbitrary vertically adjacent positions m and n (where m is the upper side) in the Y-direction histogram b as Yblock.
(M), Yblock (n), the position where Yblock (n) −Yblock (m) is the maximum is stored in the upper end coordinate Y1 (FIG. 2), and Yblock (m) −Yblock (n) is the maximum. Is stored in the bottom coordinate Y2 (FIG. 2).

【００１９】図２のｄに示すＹ方向ヒストグラムの差分
のグラフ中、ｇのところがＹｂｌｏｃｋ（ｎ）− Ｙｂ
ｌｏｃｋ（ｍ）が最大となる位置であり、ｈのところが
Ｙｂｌｏｃｋ（ｍ）− Ｙｂｌｏｃｋ（ｎ）が最大とな
る位置である。In the graph of the difference in the Y-direction histogram shown in FIG. 2d, g is Yblock (n) -Yb.
lock (m) is the maximum position, and h is the maximum Yblock (m) −Yblock (n) position.

【００２０】これにより、左上座標（Ｘ１，Ｙ１）
と右下座標（Ｘ２，Ｙ２）で囲まれる長方形領域ｐを求
めることができる。As a result, the upper left coordinate (X1, Y1)
Then, a rectangular region p surrounded by the lower right coordinates (X2, Y2) can be obtained.

【００２１】次に、領域検定部５は、長方形領域ｐ
の横（Ｘ２−Ｘ１）と、縦（Ｙ１−Ｙ２）の大きさを求
め、文字領域として妥当な横サイズ及び縦サイズの所定
の閾値（Ｓｘ，Ｓｙ）と比較し、（Ｘ２−Ｘ１）＞Ｓｘ（Ｙ１−Ｙ２）＞Ｓｙで、閾値より横・縦サイズの方が大きければ、処理対象
とすべき文字列と判定し、検出処理を終了する。Next, the area inspection unit 5 determines the rectangular area p.
The horizontal (X2-X1) and vertical (Y1-Y2) sizes are calculated and compared with predetermined thresholds (Sx, Sy) of the horizontal size and vertical size that are appropriate for the character area, and (X2-X1)> If Sx (Y1−Y2)> Sy and the horizontal / vertical size is larger than the threshold value, it is determined that the character string is to be processed, and the detection process ends.

【００２２】一方、横・縦サイズが（Ｘ２−Ｘ１）＜Ｓｘ（Ｙ１−Ｙ２）＜Ｓｙであれば、最大差分検出部４の処理に戻り、最初に求め
た領域を除いて、新たに最大差分により長方形領域を求
め直す。On the other hand, if the horizontal / vertical size is (X2-X1) <Sx (Y1-Y2) <Sy, the process returns to the process of the maximum difference detection unit 4, and the newly obtained maximum is removed except for the region obtained first. The rectangular area is recalculated by the difference.

【００２３】図３は本発明の一実施例の長方形領域を求
め直す場合を示し、また、罫線が存在している場合を示
す。FIG. 3 shows a case of re-obtaining a rectangular area according to an embodiment of the present invention, and shows a case where ruled lines are present.

【００２４】１回目の処理は、処理対象範囲ｚに罫線ｒ
が含まれているため、Ｙ方向ヒストグラムの最大差分か
ら左上座標（Ｘａ，Ｙａ）右下座標（Ｘｂ，Ｙｂ）で囲
まれる長方形領域ｔは、領域検定部５により文字領域と
して妥当な横サイズ及び縦サイズの所定の閾値（Ｓｘ，
Ｓｙ）と比較されると、ことに、横サイズＸｂ−Ｘａは
文字サイズより大きいため、閾値以上となるが、縦サイ
ズＹｂ−Ｙａは、文字サイズより小さくなる。このた
め、領域検定部５は上端、下端の座標が誤っていると判
断する。これにより、Ｙａ〜Ｙｂを除いて、最大差分と
なる位置を探し、上端座標Ｙｃ、下端座標Ｙｄを得る。
これにより、左上座標（Ｘａ，Ｙｃ）、右下座標（Ｘ
ｂ，Ｙｄ）となり、長方形領域ｐが求められる。この長
方形領域ｐは縦・横サイズは（Ｘｂ−Ｘａ）＞Ｓｘ（Ｙｃ−Ｙｄ）＞Ｓｙで、閾値より横・縦サイズの方が大きければ、処理対象
とすべき文字列と判定し、検出処理を終了する。In the first processing, the ruled line r is set in the processing target range z.
Therefore, the rectangular area t surrounded by the upper-left coordinates (Xa, Ya) and the lower-right coordinates (Xb, Yb) from the maximum difference of the Y-direction histogram is determined by the area test unit 5 to have an appropriate lateral size as a character area. Vertical size threshold (Sx,
When compared with Sy), in particular, the horizontal size Xb-Xa is larger than the character size, and therefore the threshold size or more, but the vertical size Yb-Ya is smaller than the character size. Therefore, the area inspection unit 5 determines that the coordinates of the upper end and the lower end are incorrect. As a result, the position having the maximum difference is searched for excluding Ya to Yb, and the upper end coordinate Yc and the lower end coordinate Yd are obtained.
As a result, the upper left coordinates (Xa, Yc) and the lower right coordinates (X
b, Yd), and the rectangular area p is obtained. This rectangular area p has a vertical / horizontal size of (Xb-Xa)> Sx (Yc-Yd)> Sy, and if the horizontal / vertical size is larger than the threshold value, it is determined as a character string to be processed and detected. The process ends.

【００２５】これにより、処理対象領域となる長方形領
域ｐが認識処理等の対象となり、罫線ｒは、閾値より縦
サイズが小さいため、所定の文字サイズより小さいと判
断され処置対象とすべき文字列とは判定されない。As a result, the rectangular area p to be processed becomes the target of recognition processing and the like, and the ruled line r has a vertical size smaller than the threshold value. Is not determined.

【００２６】なお、本実施例では処理対象とすべき文字
列を白抜き文字で印刷することを前提条件としたが、本
発明はこの例に限定されることなく、反対に処置対象外
の文字列を白抜き文字としてもよい。この場合には、白
画素または黒画素の選択時に予め、白画素を選択するこ
とにより上記実施例の黒画素の代わりに白画素を用いて
処理することにより、上記実施例と同様の処理が可能で
あり、同様の効果が得られる。Although the present embodiment is premised on that the character string to be processed is printed with blank characters, the present invention is not limited to this example, and conversely, characters not treated The columns may be outlined characters. In this case, when the white pixel or the black pixel is selected, the white pixel is selected in advance so that the white pixel is used instead of the black pixel of the above-described embodiment to perform the same processing as that of the above-described embodiment. Therefore, the same effect can be obtained.

【００２７】従来の技術では特定の文字列を検出するた
めに文字認識を行う必要のない文字に対しても、文字の
切り出しや認識処理を行う必要があったが、上述のよう
に本発明の文字列検出装置によれば、処理対象とすべき
文字列を白抜きで印刷するだけで、黒画素ヒストグラム
での処理対象文字列の検出が可能となり、文字列検出の
ための時間が大幅に削減できる。In the prior art, it was necessary to perform character segmentation and recognition processing even for a character that does not need to be recognized in order to detect a specific character string. According to the character string detection device, it is possible to detect the character string to be processed in the black pixel histogram by simply printing the character string to be processed in white, and the time for character string detection is greatly reduced. it can.

【００２８】一方、画像内に、罫線、飾り等が存在する
場合には、文字サイズから決まる文字列領域サイズとの
比較を行うことにより、罫線除去を行うことがない。On the other hand, when there are ruled lines, decorations, etc. in the image, the ruled lines are not removed by comparing with the character string area size determined by the character size.

【００２９】[0029]

【発明の効果】上述のように、本発明によれば処理対象
外の文字の切り出し及び、認識処理を行う必要がなく、
処理時間が削減される。As described above, according to the present invention, it is not necessary to cut out a character not to be processed and perform recognition processing,
Processing time is reduced.

【００３０】また、罫線等の不要なものを除去する処理
が不要となり、イメージデータから所望の文字列を高速
に検出することができる。Further, the process of removing unnecessary things such as ruled lines becomes unnecessary, and a desired character string can be detected at high speed from the image data.

[Brief description of drawings]

【図１】本発明の一実施例のシステム構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a system configuration of an embodiment of the present invention.

【図２】本発明の一実施例の動作を説明するための図で
ある。FIG. 2 is a diagram for explaining the operation of the embodiment of the present invention.

【図３】本発明の一実施例の長方形領域を求め直す場合
を示し、また、罫線が存在している場合を示す図であ
る。FIG. 3 is a diagram showing a case of re-obtaining a rectangular area according to an embodiment of the present invention and a case where ruled lines exist.

[Explanation of symbols]

１画素選択部２水平方向ヒストグラム生成部３垂直方向ヒストグラム生成部４最大差分検出部５領域検定部ａカウントしたＸ方向ヒストグラムｂカウントしたＹ方向ヒストグラムｃＸ方向ヒストグラムの差分のグラフｄＹ方向ヒストグラムの差分のグラフｅＸｂｌｏｃｋ（ｊ）− Ｘｂｌｏｃｋ（ｉ）が最大
となる位置ｆＸｂｌｏｃｋ（ｉ）− Ｘｂｌｏｃｋ（ｊ）が最大
となる位置ｉＸ方向ヒストグラムａで任意の隣合う位置ｊＸ方向ヒストグラムａで任意の隣合う位置ｍ任意の上下に隣合う位置ｎ任意の上下に隣合う位置ｐ長方形領域ｒ罫線ｔ長方形領域1 Pixel selection unit 2 Horizontal direction histogram generation unit 3 Vertical direction histogram generation unit 4 Maximum difference detection unit 5 Area test unit a Counted X direction histogram b Counted Y direction histogram c Graph of X direction histogram difference d Y direction histogram Difference graph e Xblock (j) −Position where Xblock (i) is maximum f Xblock (i) −Position where Xblock (j) is maximum i Any adjacent position in the X direction histogram a j In the X direction histogram a Arbitrary adjacent positions m Arbitrary upper and lower adjacent positions n Arbitrary upper and lower adjacent positions p Rectangular area r Ruled line t Rectangular area

Claims

[Claims]

1. A character for detecting a character string to be processed from image data of a binary image which is expressed by distinguishing a character string to be processed in advance inputted from an image input device and a character string not to be processed. In a column detection device, a pixel selection means for selecting either a white pixel or a black pixel in a binary image, and a pixel selected by the pixel selection means by horizontally scanning the image data. Means for obtaining the number of pixels in the vertical direction, a means for scanning the image data in the vertical direction, and obtaining the number of pixels in the horizontal direction of the pixels selected by the pixel selecting means, Coordinate detecting means for detecting four coordinate positions where the difference in the number of pixels is maximum, and the size of the character string to be processed by comparing the rectangular coordinate values obtained by the coordinate detecting means with a predetermined threshold value. Is String detecting apparatus characterized by having a test means for test.

2. When a black pixel is selected by the pixel selecting means, the coordinate detecting means determines the number of black pixels in the vertical direction at a first arbitrary position and the left of the first arbitrary position. Of the number of black pixels in the vertical direction at the first arbitrary position and the second arbitrary position where the difference in the number of black pixels in the vertical direction becomes maximum,
At the second arbitrary position where the difference between the right arbitrary vertical position and the number of black pixels in the vertical direction becomes maximum, and at the third arbitrary position, the number of horizontal black pixels and the third arbitrary position. Of the number of black pixels in the horizontal direction immediately above the position, and the number of black pixels in the horizontal direction at the fourth arbitrary position and the fourth arbitrary position. 2. The character string detection device according to claim 1, wherein each coordinate of the rectangle is obtained from each position of the fourth arbitrary position where the difference with the black pixel in the horizontal direction immediately below is the maximum.