JPH02263272A

JPH02263272A - Document picture processor

Info

Publication number: JPH02263272A
Application number: JP1080257A
Authority: JP
Inventors: Tsutomu Kuramochi; 倉持　勉
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1989-04-01
Filing date: 1989-04-01
Publication date: 1990-10-26
Anticipated expiration: 2014-05-10
Also published as: JP2887803B2

Abstract

PURPOSE:To integrate one or more areas in a document picture and to facilitate an area segmenting process by obtaining a coupling component of black picture elements after excluding the blank areas out of an input document picture which can include a rectangle set previously. CONSTITUTION:A document picture processor consists of a picture input device 5 which reads the documents as the binary pictures, a picture memory 6 which stores temporarily the input pictures, a picture processor 15 including a run extracting means 12 which extracts the runs longer than the prescribed value, a picture logical arithmetic means 13 which performs the logical arithmetic of pictures, and a contour tracking means 14 which tracks the contours of pictures, etc. Then the blank areas are detected out of a document picture which can include a rectangle set previously, and the means 13 turns all picture elements into black ones except the blank areas to integrate the areas of the document picture. As a result, the blank between stages is ignored within a stage set and the areas of the stage set are integrated into a single area. Then an area segmenting process is facilitated.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は文書画像処理装置に係り、特に領域を統合する
ことにより、容易に領域の切り出しを行う文書画像処理
装置に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a document image processing device, and more particularly to a document image processing device that easily cuts out regions by integrating regions.

（従来技術）文書画像中の領域を切り出す従来の方式には、文書の射
影を利用するものとして、垂直および水平方向の射影を
求め、画素の各行および各列中の黒画素の有無に着目す
る方法（例えば、橋本新−部編著、電気通信協会発行、
［文字認識概論ＪＰ５９〜６０参照）がある。(Prior art) Conventional methods for cutting out regions in document images utilize the projection of the document, obtain vertical and horizontal projections, and focus on the presence or absence of black pixels in each row and column of pixels. Methods (e.g., edited by Arata Hashimoto, published by Telecommunications Association,
[Refer to Introduction to Character Recognition JP59-60].

例えば、第２図のような文書画像ｌ（斜線は文字等を表
す）では、その射影２により各領域の位置を求められる
。すなわち、各領域を切り出すことができる。For example, in a document image l as shown in FIG. 2 (diagonal lines represent characters, etc.), the position of each area can be determined by its projection 2. That is, each area can be cut out.

（発明が解決しようとする課題）しかしながら、第３図のように枠３や段組４を有する文
書画像１′では、それらの枠３および段組４の存在によ
り、その射影２から各領域の位置を求めることは困難で
ある。(Problem to be Solved by the Invention) However, in a document image 1' having a frame 3 and columns 4 as shown in FIG. It is difficult to determine the location.

本発明は、段組および枠の有無によらず、文書画像中の
領域を統合して、容易に領域を切り出すことのできる文
書画像処理装置を提供することを目的とする。An object of the present invention is to provide a document image processing device that can integrate regions in a document image and easily cut out regions regardless of the presence or absence of columns and frames.

（課題を解決するための手段）本発明の文書画像処理装置は、文書を２値画像として入
力する画像入力手段（第１図５）と、入力した画像を記
憶する入力画像記憶手段（６）と、画像を処理した結果
得られる新たな画像を記憶する処理画像記憶手段（７〜
１０）と、予め設定した矩形を包含可能な文書画像中の
空白領域を検出する空白領域検出手段（１２）と、検出
した空白領域に基づき文書画像中の領域を統合する画像
論理演算手段（１３）とを備えたことを特徴とする。(Means for Solving the Problems) The document image processing device of the present invention includes an image input means (FIG. 1, 5) for inputting a document as a binary image, and an input image storage means (6) for storing the input image. and processed image storage means (7 to 7) for storing new images obtained as a result of processing the images.
10), blank area detection means (12) for detecting a blank area in a document image that can include a preset rectangle, and image logic operation means (13) for integrating areas in the document image based on the detected blank area. ).

また、本発明の他の態様による文書画像処理装置は、文
書を２値画像として入力する画像入力手段（第２図５）
と、入力した文書画像を記憶する入力画像記憶手段（６
）と、入力画像を処理した結果得られる画像を記憶する
処理画像記憶手段（７，８，９１，９２，１０）と、予
め設定した値より長い入力文書画像中の白ランを抽出す
る白ラン抽出手段（１２’）と、第一の設定値（ｌｙｌ
）より長い入力文書画像中の垂直方向の白ランと第二の
設定値（ｌｘｌ）より長与）水平方向の白ランとの一致
する部分、および第三の設定値（ｌｙ２）より長い垂直
方向の白ランと第四の設定値（ｌｘ２）より長い水平方
向の白ランとの一致する部分、を除く全ての画素を黒に
することにより、入力文書画像中の領域を統合する画像
論理演算手段とを備えことを特徴とする。Further, a document image processing device according to another aspect of the present invention includes image input means (FIG. 2, 5) for inputting a document as a binary image.
and input image storage means (6) for storing the input document image.
), a processed image storage means (7, 8, 91, 92, 10) for storing an image obtained as a result of processing the input image, and a white run for extracting a white run in the input document image that is longer than a preset value. an extraction means (12') and a first setting value (lyl
) longer than the second setting value (lxl) and the corresponding part of the horizontal white run in the input document image that is longer than the second setting value (lxl), and the vertical direction longer than the third setting value (ly2) image logical operation means for integrating regions in an input document image by turning all pixels to black except for a portion where a white run of 1 and a white run in a horizontal direction longer than a fourth setting value (lx2) coincide with each other; It is characterized by comprising:

（作用）本発明の第１の態様の文書画像処理装置は、空白領域検
出手段（１２）により、予め設定した矩形を包含可能な
文書画像中の空白領域を検出し、μ像論理演算手段によ
り空白領域を除く画素を全て黒にする処理を行うことに
より文書画像中の領域を統合する。この処理により、上
記予め設定した矩形よりも小さい空白領域は空白領域か
ら除外されるので、段組４内の段間の空白は第４図（ｂ
）の破線で示すように無視され、段組の領域は一つに統
合される。また、枠３は空白領域以外の領域として残り
文字線画像領域等として検出される。(Operation) The document image processing device according to the first aspect of the present invention detects a blank area in a document image that can include a preset rectangle using a blank area detecting means (12), and detects a blank area in a document image that can include a preset rectangle, and Areas in a document image are integrated by performing processing to make all pixels except blank areas black. By this process, blank areas smaller than the preset rectangle are excluded from the blank areas, so the blank spaces between columns in column set 4 are
) are ignored, and the column areas are merged into one, as shown by the dashed line. Furthermore, frame 3 is detected as a remaining character line image area, etc. as an area other than a blank area.

このように領域の統合を行うことにより領域を切り出す
処理が容易に実行可能である。By integrating the regions in this way, the process of cutting out the regions can be easily executed.

また、本発明の第２の態様の文書画像処理装置は、画像
の論理積および論理和および白黒反転等の論理演算を行
う画像論理演算手段において、白ラン抽出手段により抽
出した文書画像中の設定値より長い垂直および水平方向
の白ランの一致する部分を除く全ての画素を黒にする演
算を行う。これにより、文書画像中の領域を統合して、
領域を切り出す処理を容易に実行可能にする。Further, in the document image processing device according to the second aspect of the present invention, in the image logic operation means for performing logical operations such as logical AND and OR of images and black and white inversion, settings in the document image extracted by the white run extraction means are provided. An operation is performed to make all pixels black except for the matching part of the vertical and horizontal white runs that are longer than the value. This allows regions in the document image to be integrated,
To easily execute a process of cutting out an area.

（実施例の説明）第１の実施例第１図は本発明を適用する第１の実施例による文書画像
処理装置の構成を示すブロック構成図であり、この装置
は、文書を２値画像として読み込む画像入力装置！！５
と、人力した画像を一時的に記憶する画像メモリ６と、
画像メモリ６と同じメモリサイズを有する画像メモリ７
ないし１０と、装置全体の制御を行う制御装置１１と、
予め設定した値より長いランを抽出するラン抽出手段１
２および画像の論理演算を行う画像論理演算手段１３お
よび画像の輪郭を追跡する輪郭追跡手段１４からなる画
像処理装置１５と、コマンド等を入力する入力装！１！
１Ｂと、人力装置１６から入力されるコマンド等および
画像メモリ６ないし１０に記憶される画像を表示するデ
イスプレィ１７と、画像データを保存するファイル装置
１８と、画像メモリ６な％Ｎ　Ｌ、　１０に記憶される
画像をプリントする画像出力装置１９とを備えている。(Description of Embodiments) First Embodiment FIG. 1 is a block diagram showing the structure of a document image processing device according to a first embodiment to which the present invention is applied. Image input device to read! ! 5
and an image memory 6 for temporarily storing manually generated images.
Image memory 7 having the same memory size as image memory 6
to 10, and a control device 11 that controls the entire device;
Run extraction means 1 for extracting runs longer than a preset value
2, an image processing device 15 consisting of an image logic operation means 13 that performs logical operations on the image, and a contour tracking means 14 that traces the contour of the image, and an input device that inputs commands and the like! 1!
1B, a display 17 for displaying commands etc. input from the human-powered device 16 and images stored in the image memories 6 to 10, a file device 18 for storing image data, and an image memory 6%N L, 10. It also includes an image output device 19 that prints the stored image.

次に、上記の装置が入力した文書画像中の領域を統合し
、該領域を切り出す手順の一例について詳細に説明する
。Next, an example of a procedure for integrating regions in a document image input by the above-mentioned apparatus and cutting out the regions will be described in detail.

第３図に示した文書画像１′の領域切り出しにおいて、
第４図（ａ）の破線で示したように各領域を完全に分離
して切り出す場合と、同図（ｂ）の破線で示したように
文書の割り付は構造の観点から関係が強いと推定される
複数の領域を１つの領域として切り出す場合がある。本
発明においては、前記した抽出するランの長さを変更す
るだけで、切り出す領域の単位を変更できる。本実施例
においては、第４図（ｂ）のように段組にした領域を一
つの領域に統合する場合を例にして説明する。In cutting out the area of the document image 1' shown in FIG.
There is a case where each area is completely separated and cut out as shown by the broken line in Figure 4(a), and a case where the layout of the document is strongly related from a structural point of view as shown by the broken line in Figure 4(b). A plurality of estimated regions may be cut out as one region. In the present invention, the unit of the region to be extracted can be changed by simply changing the length of the run to be extracted. In this embodiment, an example will be explained in which the regions arranged in columns as shown in FIG. 4(b) are integrated into one region.

第５図（ａ）（ｂ）（ｃ）は領域を統合するための処理
フローであり、同図中の■ないしＯは主要な処理ステッ
プを表す。この処理フローに沿って本発明の第１の実施
例の処理手順を説明する。FIGS. 5(a), 5(b), and 5(c) are processing flows for integrating regions, and .circlein. through O in the drawings represent main processing steps. The processing procedure of the first embodiment of the present invention will be explained along this processing flow.

処理ステップ■：画像入力装置１５により文書を２値画
像として入力し、その入力文書画像を画像メモリ６に記
憶する。Processing step (2): A document is input as a binary image using the image input device 15, and the input document image is stored in the image memory 6.

処理ステップ■：画像メモリ８と画像メモリ９の全ての
画素を白にする。Processing step (2): All pixels in the image memory 8 and the image memory 9 are made white.

処理ステップ■：画像メモリ７の全ての画素を白にする
。Processing step (2): Make all pixels in the image memory 7 white.

処理ステップ■：画像メモリ６に記憶されて＋１）る入
力文書画像を垂直方向に順次走査していく。Processing step (2): The +1) input document image stored in the image memory 6 is sequentially scanned in the vertical direction.

処理ステップ■：走査中に白ランがあれば、その白ラン
の長さと予め設定した値１ｙｌとを比較し、その白ラン
の長さの方が大きければ処理ステップ■へ進み、他の場
合は処理ステップ■へ進む。上記のｌｙｌは経験的に定
められる値である。Processing step ■: If there is a white run during scanning, compare the length of the white run with a preset value 1yl, and if the length of the white run is greater, proceed to processing step ■; otherwise, Proceed to processing step ■. The above lyl is a value determined empirically.

処理ステップ■：処理ステップ■で抽出した白ランと同
じ位置に相当する画像メモリ７の画素を黒にする。Processing step (2): The pixels in the image memory 7 corresponding to the same position as the white run extracted in processing step (2) are made black.

処理ステップ■：処理ステップ■が終了したかを判定し
、終了であれば処理ステップ■に進み、他であれば処理
ステップ■に移る。ここで、終了と判定した時の画像メ
モリ７に記憶されて＋１）る画像は、第６図（ａ）のよ
うになる。Processing step ■: Determine whether processing step ■ has ended. If it has ended, proceed to processing step ■; otherwise, proceed to processing step ■. Here, the image stored in the image memory 7 and increased by +1) when it is determined that the process has ended is as shown in FIG. 6(a).

処理ステップ■：画像メモリ７に記憶されている画像を
水平方向に走査してい（。Processing step ■: The image stored in the image memory 7 is scanned in the horizontal direction (.

処理ステップ■：走査中に黒ランがあれば、その黒ラン
の長さと予め設定した値１ｘｌとを比較し、その黒ラン
の長さの方が大きければ処理ステップ［相］に進み、他
の場合は処理ステップ■へ進む。上記のＩｘｌは経験的
に定められる値である。Processing step ■: If there is a black run during scanning, compare the length of the black run with a preset value 1xl, and if the length of the black run is greater, proceed to the processing step [phase] and perform other steps. If so, proceed to processing step ■. The above Ixl is a value determined empirically.

処理ステッ、ブ［相］：処理ステップ■で抽出した黒ラ
ンと同じ位置に相当する画像メモリ８の画素を黒にする
。Processing step B [phase]: Pixels in the image memory 8 corresponding to the same position as the black run extracted in processing step (2) are made black.

処理ステップ■：処理ステップ■が終了したかを判定し
、終了であれば処理ステップ＠に進み、他であれば処理
ステップ■に移る。ここで、終了と判定した時の画像メ
モリ８に記憶されている画像は、第６図（ｂ）のように
なる。この画像２１の黒領域は幅１　ｘｉ、高さｌｙｌ
の矩形を包含できる画像メモリ６に格納されている入力
文書画像中の空白領域を示して（する。本実施例では、
縦長の空白領域を検出するような値をＩｘｌおよびｌｙ
ｌに設定している。Processing step ■: Determine whether processing step ■ has ended. If it has ended, proceed to processing step @; otherwise, proceed to processing step ■. Here, the image stored in the image memory 8 when it is determined that the process has ended is as shown in FIG. 6(b). The black area of this image 21 has a width of 1 xi and a height of lyl
In this embodiment,
Set Ixl and ly values to detect vertically long blank areas.
It is set to l.

処理ステップ＠：処理ステップ■と同様である。Processing step @: Same as processing step ■.

処理ステップ［相］：画像メモリ６に記憶されている入
力文書画像を水平方向に順次走査してい（。Processing step [phase]: The input document image stored in the image memory 6 is sequentially scanned in the horizontal direction (.

処理ステップ０：走査中に白ランがあれば、その白ラン
の長さと予め設定した値１ｘ２とを比較し、その白ラン
の長さの方が大きければ処理ステップ［相］へ進み、他
の場合は処理ステップ［相］へ進む。上記のｌｘ２は経
験的に定められる値である。Processing step 0: If there is a white run during scanning, compare the length of the white run with a preset value of 1x2, and if the length of the white run is greater, proceed to processing step [phase] and perform other steps. If so, proceed to processing step [phase]. The above lx2 is a value determined empirically.

→処理ステップ［相］：処理ステップ■で抽出した白ラ
ンと同じ位置に相当する画像メモリ７の画素を黒にする
。→Processing step [phase]: The pixels in the image memory 7 corresponding to the same position as the white run extracted in processing step (2) are made black.

処理ステップ［相］：処理ステップ＠が終了したかを判
定し、終了であれば処理ステップ■に進み、他であれば
処理ステップ＠に移る。ここで、終了と判定した時の画
像メモリ７に記憶されている画像は、第６図（Ｃ）のよ
うになる。Processing step [phase]: Determine whether processing step @ has ended. If it has ended, proceed to processing step (2); otherwise, proceed to processing step @. Here, the image stored in the image memory 7 when it is determined that the process has ended is as shown in FIG. 6(C).

処理ステップ０２画像メモリ７に記憶されて〜）る画像
を垂直方向に走査して（１く。Processing Step 02 The image stored in the image memory 7 is scanned in the vertical direction.

処理ステップ＠二走査中に黒ランがあれば、その黒ラン
の長さと予め設定した値１ｙ２とを比較し、その黒ラン
の長さの方が大きければ処理ステップ［株］へ進み、他
の場合は処理ステップのへ進む。上記のｌｙ２は経験的
に定められる値である。If there is a black run during the processing step @2 scanning, compare the length of the black run with a preset value 1y2, and if the length of the black run is greater, proceed to the processing step [Stock] and proceed to the other If so, proceed to the processing step. The above ly2 is a value determined empirically.

処理ステップ＠：処理ステップ［相］で抽出した黒ラン
と同じ位置に相当する画像メモリ９の画素を黒にする。Processing step @: The pixels in the image memory 9 corresponding to the same position as the black run extracted in the processing step [phase] are made black.

処理ステップ＠：処理ステップＯが終了したかを判定し
、終了であれば処理ステップＯに進み、他であれば処理
ステップＯに移る。ここで終了と判定した時の画像メモ
リ８に記憶されている画像は第６図（ｄ）のようになる
。この画像２２の黒領域は幅１ｘＬ高さｌｙ２の矩形を
包含できる画像メモリ６に格納されている入力文書中の
空白領域を示している。本実施例では、横長の空白領域
を検出するような値をＩｘ２およびｌｙ２に設定してい
る。Processing step @: Determine whether processing step O has ended. If it has ended, proceed to processing step O; otherwise, proceed to processing step O. The image stored in the image memory 8 when it is determined that the process has ended is as shown in FIG. 6(d). The black area of this image 22 indicates a blank area in the input document stored in the image memory 6 that can contain a rectangle with a width of 1×L and a height of ly2. In this embodiment, values for detecting a horizontally long blank area are set for Ix2 and ly2.

処理ステップＯ：画像メモリ８と画像メモリ９の論理和
を求めて得られる画像を白黒反転し、その結果を画像メ
モリ１０に記憶する。画像メモリ１０に記憶される画像
は第６図（ｅ）のようになり、この処理の目的である領
域の統合が達成される。Processing step O: The image obtained by calculating the logical sum of the image memory 8 and the image memory 9 is inverted in black and white, and the result is stored in the image memory 10. The image stored in the image memory 10 becomes as shown in FIG. 6(e), and the purpose of this processing, which is the integration of regions, is achieved.

更に、上記の統合した領域の位置を求める場合は、例え
ば、第６図（ｅ）の画像を垂直または水平方向に走査し
て、白から黒へ変化する画素を始点としで、黒画素連結
成分である統合した領域の輪郭を追跡すればよい。輪郭
追跡の方法は公知の任意の方法を利用でき、例えば、坂
内正夫、大沢裕共著、昭晃堂発行、　「画像データベー
ス」、Ｐ　　９１ないしＰ９５に詳述されている方法を
用いることができる。以上で、文書画像中の領域の統合
および切り出しを完了する。Furthermore, in order to find the position of the above-mentioned integrated area, for example, scan the image in FIG. It is sufficient to trace the contour of the integrated area. Any known method can be used for contour tracking, and for example, the method detailed in "Image Database", co-authored by Masao Sakauchi and Yutaka Osawa, published by Shokodo, pages 91 to 95, can be used. This completes the integration and extraction of regions in the document image.

本実施例では、文書の割り付は積構造の観点から関係が
強いと推定される複数の領域を１つの領域として切り出
す場合について説明したが、行間または文字間のような
小さい空白領域も検出するように前述の設定値を定める
ことにより、文字列またはサブ文字パターンを黒画素連
結成分とすることができ、実施例で述べたように、その
黒画素連結成分の輪郭を追跡することによって文字列ま
たはサブ文字パターンを同様に切り出すことができる。In this example, a case has been described in which multiple areas that are estimated to have a strong relationship from the perspective of product structure are extracted as one area for document layout, but small blank areas such as between lines or characters can also be detected. By setting the above settings, a character string or sub-character pattern can be made into a black pixel connected component, and as described in the example, by tracing the outline of the black pixel connected component, the character string can be Or sub-character patterns can be similarly cut out.

以上のように本発明によれば、予め設定した矩形を包含
可能な入力文書画像中の空白領域を除き、黒画素の連結
成分とすることにより、入力文書画像中の領域を統合す
ることができるので、容易に領域を切り出す処理を適用
できる。As described above, according to the present invention, areas in the input document image can be integrated by excluding blank areas in the input document image that can include a preset rectangle and making them a connected component of black pixels. Therefore, processing for cutting out regions can be easily applied.

第２の実施例第７図は本発明の第２の実施例の文書画像処理装置の構
成を示すブロック構成図である。なお、第１図に示す第
１の実施例と同一の部分には同じ参照符号を付し、対応
する部分には「１」を付した参照符号を用いている。こ
の装置は、文書を２値画像として読み込むＩ！ｌｉ像入
力装置５と、入力した画像を一時的に記憶する画像メモ
リ６と、画像メモリ６と同じメモリサイズを育する画像
メモリ７．８．９１．９２．１０と、装置全体の制御を
行う制御装置１１と、予め設定した値より長い白ランを
抽出する白ラン抽出手段１２’および画像の論理演算を
行う画像論理演算手段１３°および画像の輪郭を追跡す
る輪郭追跡手段１４からなる画像処理装置ｆ　５’　と
、コマンド等を入力する入力袋［１Ｂと、入力装置１ｔ
ｌＢから入力されるコマンド等および画像メモリ８，７
，８，９１．９２゜１０に記憶される画像を表示するデ
イスプレィ１７と、画像データを保持するファイル装＠
１８と、画像メモリ６．７，８．９１．９２．１０に記
憶される画像をプリントする画像出力装置１９とを備え
ている。Second Embodiment FIG. 7 is a block diagram showing the structure of a document image processing apparatus according to a second embodiment of the present invention. Note that the same parts as in the first embodiment shown in FIG. 1 are given the same reference numerals, and corresponding parts are given the reference numerals with "1" added. This device reads I! documents as binary images. li image input device 5, an image memory 6 that temporarily stores input images, and an image memory 7.8.91.92.10 that has the same memory size as the image memory 6, and controls the entire device. Image processing comprising a control device 11, a white run extraction means 12' for extracting white runs longer than a preset value, an image logic operation means 13 for performing logical operations on an image, and a contour tracking means 14 for tracing the contour of the image. A device f 5', an input bag [1B, and an input device 1t for inputting commands, etc.
Commands etc. input from IB and image memories 8, 7
, 8, 91.92° 10 A display 17 that displays images stored in the 10, and a file device that holds image data @
18, and an image output device 19 for printing images stored in image memories 6.7, 8.91.92.10.

次に、上記の装置が入力した文書画像中の領域を統合し
、その統合した領域を切り出す手順の一例について詳細
に説明する。Next, an example of a procedure for integrating areas in a document image input by the above-mentioned apparatus and cutting out the integrated area will be described in detail.

第３図に示した文書画像の領域切り出しにおいて、第４
図（ａ）の破線で示したように各領域を完全に分離して
切り出す場合と、同図（ｂ）の破線で示したように文書
の割り付は構造の観点から関係が強いと推定される領域
を統合して切り出す場合がある。本発明においては、前
記した抽出する白ランの長さを変更するだけで、切り出
す領域の単位を変更できる。本実施例においては、第４
ｒｙＪ（ｂ）の場合を例にして説明する。In the region extraction of the document image shown in Fig. 3, the fourth
It is assumed that there is a strong relationship between the case where each area is completely separated and cut out as shown by the broken line in Figure (a), and the document layout from a structural perspective, as shown by the broken line in Figure (b). In some cases, the areas that are separated may be merged and extracted. In the present invention, the unit of the area to be extracted can be changed simply by changing the length of the white run to be extracted. In this example, the fourth
The case of ryJ(b) will be explained as an example.

第８図（ａ）（ｂ）は領域を統合するための処理フロー
であり、同図中の■ないし［相］は主要な処理ステップ
を表す。この処理フローに沿って本発明の処理手順を説
明する。FIGS. 8(a) and 8(b) are processing flows for integrating regions, and in the figures, ■ to [phase] represent main processing steps. The processing procedure of the present invention will be explained along this processing flow.

処理ステップ■：画像入力装！１５により文書を２値画
像として入力し、その入力文書画像を画像メモリ６に記
憶する。Processing step ■: Image input device! 15, the document is input as a binary image, and the input document image is stored in the image memory 6.

処理ステップ■；画像メモリ７と画像メモリ８の全ての
画素を白にする。Processing step (2): All pixels in the image memory 7 and the image memory 8 are made white.

処理ステップ■：画像メモリ６に記憶されている入力文
書画像の全面を垂直方向に順次走査していく。Processing step (2): The entire surface of the input document image stored in the image memory 6 is sequentially scanned in the vertical direction.

処理ステップ■：走査中に白ランがあれば、その白ラン
の長さと予め設定した値１ｙｌとを比較し、その白ラン
の長さの方が大きければ処理ステップ■へ進み、他の場
合は処理ステップ■へ進む。Processing step ■: If there is a white run during scanning, compare the length of the white run with a preset value 1yl, and if the length of the white run is greater, proceed to processing step ■; otherwise, Proceed to processing step ■.

上記のｌｙｌは経験的に定められる値である。（本実施
例では、ｌｘｌとＩｙｌを縦長の空白領域を抽出するよ
うな値としている。）処理ステップ■：処理ステップ■で抽出した白ランと同
じ位置に相当する画像メモリ７の画素を、黒にする。The above lyl is a value determined empirically. (In this embodiment, lxl and Iyl are set to values that extract a vertically long blank area.) Processing step ■: The pixels in the image memory 7 corresponding to the same position as the white run extracted in processing step ■ are Make it.

処理ステップ■：処理ステップ■が終了したかを判定し
、終了であれば処理ステップ■に進み、他であれば処理
ステップ■に移る。ここで、終了と判定した時の画像メ
モリ７に記憶される画像は、第９図（ａ）のようになる
。Processing step ■: Determine whether processing step ■ has ended. If it has ended, proceed to processing step ■; otherwise, proceed to processing step ■. Here, the image stored in the image memory 7 when it is determined that the process has ended is as shown in FIG. 9(a).

処理ステップ■：走査中に白ランがあれば、その白ラン
の長さと予め設定した値ＩＸＩとを比較し、ランの長さ
の方が大きければ処理ステップ■へ進み、他の場合は処
理ステップ［相］へ進む。上記１ｘｌは経験的に定めら
れる値である。Processing step ■: If there is a white run during scanning, compare the length of the white run with a preset value IXI, and if the run length is larger, proceed to processing step ■; otherwise, proceed to processing step Proceed to [phase]. The above 1xl is a value determined empirically.

処理ステップ■：処理ステップ■で抽出した白ランと同
じ位置に相当する画像メモリ８の画素を黒にする。Processing step (2): The pixels in the image memory 8 corresponding to the same position as the white run extracted in processing step (2) are made black.

処理ステップ＠：処理ステップ■が終了したかを判定し
、終了であれば処理ステップ■に進み、他であれば処理
ステップ■に移る。ここで、終了と判定した時の画像メ
モリ８に記憶される画像は、第９図（ｂ）のようになる
。Processing step @: Determine whether processing step (2) has been completed. If it has been completed, proceed to processing step (2); otherwise, proceed to processing step (2). Here, the image stored in the image memory 8 when it is determined that the process has ended is as shown in FIG. 9(b).

処理ステ７１０２画像メモリ７と画像メモリ８の論理積
を求め、その結果を画像メモリ９１に記憶する。画像メ
モリ９１に記憶される画像は第９図（ｃ）のようになる
。Processing step 7102 calculates the logical product of image memory 7 and image memory 8, and stores the result in image memory 91. The image stored in the image memory 91 is as shown in FIG. 9(c).

処理ステップＯ：設定値１ｙｌをｌｙ２に、設定値ｌｘ
ｌをｌｘ２に各々変更して、処理ステップ■ないし■を
行う（第８図の処理フローは簡略的に示しである）。上
記の設定値１ｙ２とｌｘ２は経験的に定められる値であ
る。（本実施例では、１ｘ２とＩｙ２を横長の空白領域
を抽出するような値としている。Processing step O: Set value 1yl to ly2, set value lx
1 is changed to lx2, respectively, and processing steps (1) to (2) are performed (the processing flow in FIG. 8 is a simplified illustration). The above set values 1y2 and lx2 are values determined empirically. (In this embodiment, 1x2 and Iy2 are set to values that allow extraction of a horizontally long blank area.

）この処理ステップが終了した時点で、画像メモリ７に
記憶される画像を第９図（ｄ）に、画像メモリ８に記憶
される画像を同図（ｅ）に示す。) When this processing step is completed, the image stored in the image memory 7 is shown in FIG. 9(d), and the image stored in the image memory 8 is shown in FIG. 9(e).

処理ステップ［相］：画像メモリ７と画像メモリ８の論
理積を求め、その結果を画像メモリ９２に記憶する。画
像メモリ９２に記憶される画像は第８図（ｆ）のように
なる。Processing step [phase]: The logical product of the image memory 7 and the image memory 8 is calculated, and the result is stored in the image memory 92. The image stored in the image memory 92 is as shown in FIG. 8(f).

処理ステ７１０２画像メモリ９１と画像メモリ９２の論
理和を求めて得られる画像を白黒反転し、その結果を画
像メモリ１０に記憶する。画像メモリ１０に記憶される
画像は第９図（ｇ）のように、なり、この処理の目的で
ある領域の統合が達成される。Processing Step 7102 The image obtained by calculating the logical sum of the image memory 91 and the image memory 92 is inverted in black and white, and the result is stored in the image memory 10. The image stored in the image memory 10 becomes as shown in FIG. 9(g), and the integration of the regions, which is the purpose of this processing, is achieved.

更に、上記の統合した領域の位置を求める場合は、例え
ば、第９図（ｇ）の画像を垂直または水平方向に走査し
て、白から黒へ変化する画素を始点として、黒画素連結
成分である統合した領域の輪郭を追跡すればよい。Furthermore, in order to find the position of the above-mentioned integrated area, for example, scan the image in FIG. It is sufficient to trace the outline of a certain integrated area.

以上で、文書画像中の領域の統合および切り出しを完了
する。This completes the integration and extraction of regions in the document image.

以上のようにこの第２の実施例によれば、入力した文書
画像中の設定値より長い垂直方向との白ラン七、設定値
より長い水平方向の白ランとの一致する部分を除き、黒
画素の連結成分とすることにより、文書画像中の１つ以
上の領域を統合することができる。このように領域の統
合ができるので、領域を切り出す処理を容易に行うこと
ができるようになる。As described above, according to the second embodiment, seven white runs in the vertical direction that are longer than the set value in the input document image and the parts that match the white runs in the horizontal direction that are longer than the set value are blacked out. By creating connected components of pixels, one or more regions in a document image can be integrated. Since the regions can be integrated in this way, the process of cutting out the regions can be easily performed.

（発明の効果）本発明によれば、予め設定した矩形を包含可能な入力文
書画像中の空白領域を除き、黒画素の連結成分とするこ
とにより、あるいは、入力した文書画像中の設定値より
長い垂直方向との白ランと、設定値より長い水平方向の
白ランとの一致する部分を除き、黒画素の連結成分とす
ることにより、文書画像中の１つ以上の領域を統合する
ことができる。従って、段組や枠等があっても、領域の
適切な判定を行うことができ、また、本発明のこのよう
な領域の統合により、領域の切り出しを行う際に処理が
容易となる。(Effects of the Invention) According to the present invention, by excluding a blank area in an input document image that can include a preset rectangle and making it a connected component of black pixels, or by using a set value in an input document image, It is possible to integrate one or more areas in a document image by excluding the matching part of a long vertical white run and a horizontal white run longer than a set value and making it a connected component of black pixels. can. Therefore, even if there are columns, frames, etc., it is possible to appropriately determine the region, and the integration of regions according to the present invention facilitates processing when cutting out regions.

[Brief explanation of drawings]

第１図は本発明の第１の実施例のブロック構成を示す図
である。第２図と第３図は文書画像とその射影を示す図である。第４図（ａ）と（ｂ）は文書画像中の切り出すべき領域
の例を示す図である。第５図（ａ）〜（Ｃ）は第１の実施例の処理フローの一
例を示す図である。第６図（ａ）ないしくｄ）は処理途中で生成される画像
の例を示す図、第６図（ｅ）は処理した結果得られる画
像の一例を示す図である。第７図は本発明の第２の実施例の構成を示すブロック図
である。第８図は第２の実施例の処理フローの一例を示す図であ
る。第９図（ａ）ないしくｆ）は処理途中で生成される画像
の例を示す図、第９図（ｇ）は処理した結果得られる画
像の一例を示す図である。１．１′・・・文書画像、２．２’・・・射影、３・・
・枠、４・・・段組、５・・・画像入力装置、６〜１０
，９１゜９２・・・画像メモリ、１１・・・制御装置、
１２・・・ラン抽出手段、１２′・・・白ラン抽出手段
、Ｉ３゜１３’・・・画像論理演算出段、１４・・・輪
郭追跡手段、１５．１５’・・・画像処理装置、１６・
・・人力装置、１７・・・デイスプレィ装置、１８・・
・ファイル装置、１９・・・画像出力装置。特許出願人　富士ゼロックス株式会社第１図第４図第図第５図（ｂ）第５図（ａ）第図（ａ）（ｂ）（ｃ）（ｄ）（ｅ）第７図第図（ａ）（ｂ）（ｃ）（ｄ）（ｅ）第８図（ａ）第８図（ｂ）第図（ｆ）FIG. 1 is a diagram showing a block configuration of a first embodiment of the present invention. FIGS. 2 and 3 are diagrams showing document images and their projections. FIGS. 4(a) and 4(b) are diagrams showing examples of regions to be cut out in a document image. FIGS. 5(a) to 5(C) are diagrams showing an example of the processing flow of the first embodiment. FIGS. 6(a) to 6(d) are diagrams showing examples of images generated during processing, and FIG. 6(e) is a diagram showing an example of images obtained as a result of processing. FIG. 7 is a block diagram showing the configuration of a second embodiment of the present invention. FIG. 8 is a diagram showing an example of the processing flow of the second embodiment. FIGS. 9(a) to 9(f) are diagrams showing examples of images generated during processing, and FIG. 9(g) is a diagram showing an example of images obtained as a result of processing. 1.1'...document image, 2.2'...projection, 3...
・Frame, 4... Columns, 5... Image input device, 6 to 10
, 91° 92... Image memory, 11... Control device,
12...Run extraction means, 12'...White run extraction means, I3゜13'...Image logical calculation output stage, 14...Contour tracking means, 15.15'...Image processing device, 16.
...Human power device, 17...Display device, 18...
- File device, 19... image output device. Patent applicant Fuji Xerox Co., Ltd. Figure 1 Figure 4 Figure 5 (b) Figure 5 (a) Figure (a) (b) (c) (d) (e) Figure 7 ( a) (b) (c) (d) (e) Figure 8 (a) Figure 8 (b) Figure (f)

Claims

[Claims]

(1) Image input means for inputting a document as a binary image; input image storage means for storing the input image; and processed image storage means for storing a new image obtained as a result of processing the image; Document image processing characterized by comprising a blank area detection means for detecting a blank area in a document image that can include a rectangle, and an image logical operation means for integrating areas in the document image based on the detected blank area. Device.

(2) image input means for inputting a document as a binary image; input image storage means for storing the input document image; and processed image storage means for storing an image obtained as a result of processing the input image; a white run extraction means for extracting a white run in an input document image that is longer than a value; a vertical white run in the input document image that is longer than a first set value; and a horizontal white run that is longer than a second set value; By making all pixels black except for the matching part, and the matching part of the vertical white run longer than the third setting value and the horizontal white run longer than the fourth setting value, 1. A document image processing device, comprising: image logic operation means for integrating regions in an input document image.