JP2818448B2

JP2818448B2 - System and method for automatic document segmentation

Info

Publication number: JP2818448B2
Application number: JP1235477A
Authority: JP
Inventors: ヤコブ・アルベルト・ウエストデエイク
Original assignee: オセ‐ネーデルランド・ベー・ヴエー
Priority date: 1988-09-12
Filing date: 1989-09-11
Publication date: 1998-10-30
Anticipated expiration: 2013-10-30
Also published as: DE3881392T2; EP0358815A1; EP0358815B1; JPH02105978A; US5073953A; DE3881392D1

Description

【発明の詳細な説明】本発明は電子文書処理装置で走査された文書を自動的
にセグメント化するためのシステム及び方法に係わる。The present invention relates to a system and method for automatically segmenting a document scanned by an electronic document processing device.

電子複写機、光学式文字認識システム又はデータ圧縮
システムのような電子文書処理装置で処理されるべき文
書は、十分な複写品質及び十分な圧縮を得るためには、
及び／又は画像操作を可能とするためには異なる方法で
処理されなければならない異なる種類の情報から成る。
例えば、文書はテキスト又は図表のような白／黒の情報
はもちろん、連続階調写真、又は、以下では「ハーフト
ーン」と称されるラスタ化された（rastered）もしくは
ディザー化された（dithered）画像から成ってよい。そ
うした文書を走査することによって得られるデータが電
子複写機内で画像処理、画像保存及び印刷を受ける際に
は、テキスト又は図表を表す情報は通常では２進イメー
ジ（binary image）を得るために閾値化され、一方、周
期情報（ラスタ）はグレー値（grey−value）に閾値化
され、連続階調情報はディザー化される。その結果とし
て、異なった種類の情報を含む文書の領域又は区分の位
置を発見し及び識別することが必要である。この処理は
「セグメンテーション（segmen tation）と呼ばれる。Documents to be processed by an electronic document processing device, such as an electronic copier, optical character recognition system or data compression system, require sufficient copy quality and sufficient compression to be obtained.
And / or consists of different types of information that must be processed in different ways to enable image manipulation.
For example, a document may be a continuous tone photograph, or white or black information such as text or diagrams, or rasterized or dithered, hereinafter referred to as "halftone". May consist of images. When the data obtained by scanning such a document undergoes image processing, image storage and printing in an electronic copier, information representing text or charts is usually thresholded to obtain a binary image. On the other hand, the period information (raster) is thresholded to a gray value (grey-value), and the continuous tone information is dithered. As a result, it is necessary to find and identify locations of regions or sections of the document that contain different types of information. This process is called "segmentation."

テキスト情報をハーフトーン画像情報から区別するこ
とが可能な従来の自動セグメンテーションシステムの一
例はEP−A2−0 202 425号に開示されている。このシス
テムでは文書の走査画像は４×４画素の大きさを有する
ブロック又はサブイメージ（subimage）のマトリックス
に小分割される。更に、これらのブロック各々がTEXT又
はIMAGEのどちらかのラベルを付けて分類される。比較
的小さなブロックのラベリングは統計学的変動を被るが
故に、このようにして得られたラベルのマトリックス
は、IMAGEブロックが優勢である領域内にはTEXTブロッ
クの短い連続を含むことが多く、その逆もまた同様であ
る。セグメンテーション処理の最終段階では、ラベルマ
トリックスがブロックのそうした短い連続を除去するこ
とによって緩和される。言い換えれば、孤立したブロッ
クのラベル又はブロックの孤立した短い連続がその環境
内において優勢なラベルと同一になるように転換される
ことを命じるコンテクスト規則が適用される。One example of a conventional automatic segmentation system capable of distinguishing text information from halftone image information is disclosed in EP-A2-0 202 425. In this system, a scanned image of a document is subdivided into a matrix of blocks or subimages having a size of 4 × 4 pixels. Further, each of these blocks is classified with a label of either TEXT or IMAGE. Because the labeling of relatively small blocks suffers statistical variability, the resulting matrix of labels often contains short runs of TEXT blocks in the region where the IMAGE block predominates, The reverse is also true. In the final stage of the segmentation process, the label matrix is relaxed by removing such short runs of blocks. In other words, a context rule is applied that dictates that the label of an isolated block or an isolated short sequence of blocks be transformed to be identical to the dominant label in the environment.

一般に自動セグメンテーションシステムは２つの相反
する必要条件を満たさなければならない。一方では、そ
のシステムは文書の高速処理が可能なように高速でなけ
ればならない。他方では、例えばテキスト印刷の文字が
明るい色彩で印刷されもしくは色の暗い下地の上に印刷
されているために又は例えば写真が明るい領域を含むた
めに、異なった種類の情報を区別するのが困難な文書を
も取り扱うことが可能であるように、そのシステムは十
分に強力なものでなければならない。従来のセグメンテ
ーションシステムの能力を改善するためには、ラベリン
グ段階及び／又は緩和段階において多数の基準を調べる
ことが必要であり、従って処理時間が増大される。In general, automatic segmentation systems must satisfy two conflicting requirements. On the one hand, the system must be fast to allow for fast processing of documents. On the other hand, it is difficult to distinguish different types of information, for example, because the text-printed characters are printed in light colors or on a dark background or because, for example, the photograph contains light areas. The system must be powerful enough to handle even the most important documents. In order to improve the performance of conventional segmentation systems, it is necessary to look at a number of criteria in the labeling and / or relaxation stages, thus increasing the processing time.

改善された強力さ及び速度を有する自動文書セグメン
テーションのシステム及び方法を提供することが本発明
の目的である。It is an object of the present invention to provide a system and method for automatic document segmentation with improved power and speed.

請求項１で明示されたシステム及び請求項17で示され
た方法によって、この目的が達成される。This object is achieved by a system as set forth in claim 1 and a method as set forth in claim 17.

本発明によれば、初期ラベリング段階で選択されるこ
とが可能な種々のラベルの数は、最終的に区別されなけ
ればならない異なった種類の情報の数よりも多い。その
結果として、初期ラベリング段階では情報の種類を確実
に同定する必要はない。この理由から、統計学的変動を
減少させるために比較的大きなサブイメージのサイズが
選ばれる場合でさえ、初期ラベリング段階は比較的短時
間で完了されることが可能である。初期ラベルによって
表される情報の種類はコンテクスト規則に基づき緩和段
階において最終的に決定される。この目的に適したコン
テクスト規則は演算時間をあまり多くは必要とせず、従
って正味の時間節約が得られるということが分かった。
更に初期ラベルが微分化された分類を与えるが故に、初
期ラベリング段階で生じた多少の誤りは、これらの誤り
がサブイメージの比較的大きな連続に生じた場合でさ
え、緩和段階において訂正されることが可能である。こ
れは本システムの改善された強力さに寄与するものであ
る。According to the invention, the number of different labels that can be selected in the initial labeling stage is greater than the number of different types of information that must ultimately be distinguished. As a result, there is no need to reliably identify the type of information at the initial labeling stage. For this reason, the initial labeling stage can be completed in a relatively short time, even if a relatively large sub-image size is chosen to reduce statistical variability. The type of information represented by the initial label is ultimately determined in the mitigation phase based on context rules. It has been found that a context rule suitable for this purpose does not require much computation time, and thus provides a net time savings.
Furthermore, because the initial labels give a differentiated classification, some errors that occur in the initial labeling stage are corrected in the relaxation stage, even if these errors occur in relatively large series of subimages. Is possible. This contributes to the improved strength of the system.

本発明によるシステムの有益な詳細及び更なる改善
は、添付の特許請求に示されている。Useful details and further refinements of the system according to the invention are set forth in the appended claims.

以下では本発明の好ましい具体例が添付の図面に関連
して説明される。In the following, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

第１図に示されるように、文書セグメンテーションシ
ステムは初期ラベリングモジュール10及び緩和モジュー
ル12から成る。文書全体の走査画像を表す信号は文書ス
キャナ（図示されていない）から初期ラベリングモジュ
ール10に伝送される。走査画像はサブイメージのマトリ
ックスから構成されると見なされる。代表的な一例で
は、A4文書が走査分解能500dpiで走査され、そのサブイ
メージの大きさは64×64画素である。各々の画素のグレ
ーレベルは256のグレーレベルの１つに対応する８ビッ
ト語で表される。As shown in FIG. 1, the document segmentation system comprises an initial labeling module 10 and a mitigation module 12. Signals representing a scanned image of the entire document are transmitted from a document scanner (not shown) to the initial labeling module 10. The scanned image is considered to be composed of a matrix of sub-images. In a typical example, an A4 document is scanned at a scanning resolution of 500 dpi, and the size of the sub-image is 64 × 64 pixels. The gray level of each pixel is represented by an 8-bit word corresponding to one of the 256 gray levels.

初期ラベリングモジュール10内では、個々のサブイメ
ージ各々は、サブイメージから特有の特徴を抽出する幾
つかのルーチンから成る分類器によって解析される。そ
の抽出された特徴に基づいて、特定の初期ラベルがその
サブイメージに割り当てられる。その結果として、初期
ラベルマトリックスが得られ、それは文書全体を表し、
そのマトリックス要素は個々のサブイメージのラベルで
ある。Within the initial labeling module 10, each individual sub-image is analyzed by a classifier consisting of several routines that extract unique features from the sub-image. Based on the extracted features, a particular initial label is assigned to the sub-image. The result is an initial label matrix, which represents the entire document,
The matrix elements are the labels of the individual sub-images.

更に初期ラベルマトリックスは緩和モジュール12内で
処理される。隣接するサブイメージに割当てられたラベ
ルに従属する初期ラベルを変換するために、幾つかのコ
ンテクスト規則が適用される。そのコンテクスト規則は
緩和処理の過程で初期ラベルの幾つかが完全に除去され
るように設計される。Further, the initial label matrix is processed in the relaxation module 12. Some context rules are applied to transform the initial labels that depend on the labels assigned to adjacent sub-images. The context rule is designed so that some of the initial labels are completely removed during the relaxation process.

その結果として本具体例では、走査された文書の連続
階調領域（例えば写真）及び白／黒領域（例えばテキス
ト又は図表）に対応するセグメントを別々に形成する異
なった２つのラベルだけから成る、緩和されたラベルマ
トリックスが得られる。Consequently, in this embodiment, the scanned document consists of only two different labels that separately form segments corresponding to the continuous tone area (eg, a photograph) and the white / black area (eg, a text or chart). A relaxed label matrix is obtained.

初期ラベリングモジュール内で使用される分類器は、
ヒストグラム評価、空間解析、微分演算子、スペクトル
解析又はこれらの方法の組み合わせといった画像解析の
従来方法に基づいてもよい。分類器は適用される検査ル
ーチンが先行する検査の結果に別々に従属するトリー分
類器（tree classifier）であってもよく、又はその代
わりにワンショット分類器（one−shot classifier）が
使用されてもよい。好ましい具体例として、第２図はサ
ブイメージのグレーレベルヒストグラムを評価するトリ
ー分類器を図解している。第２図で示されるトリー図式
の分岐点の各々は、ヒストグラムデータが検査される特
定の基準に対応する。例えば、次の基準が考慮に入れら
れる。The classifier used in the initial labeling module is
It may be based on conventional methods of image analysis, such as histogram evaluation, spatial analysis, differential operator, spectral analysis or a combination of these methods. The classifier may be a tree classifier in which the test routine applied is dependent on the results of the preceding test separately, or alternatively a one-shot classifier is used. Is also good. In a preferred embodiment, FIG. 2 illustrates a tree classifier that evaluates a gray level histogram of a sub-image. Each of the branches of the tree diagram shown in FIG. 2 corresponds to a particular criterion against which the histogram data is examined. For example, the following criteria are taken into account:

− 横座標上のヒストグラムの最高ピーク位置、即ち最
も頻繁に生じるグレーレベル。この基準はサブイメージ
の全体的明るさの概略的な指標を与える。The highest peak position of the histogram on the abscissa, ie the most frequently occurring gray level. This criterion gives a rough indication of the overall brightness of the sub-image.

− ヒストグラムのピーク数。特に、別々の２つのピー
クを有するヒストグラムはテキスト又は図表の指標であ
り得る。The number of peaks in the histogram; In particular, a histogram with two separate peaks can be a text or chart indicator.

− ２つのピークの間の高さの差異。テキサス又は図表
情報の場合の殆どでは、最高ピークと第２の最高ピーク
との間に高さの大きな差異がある。The height difference between the two peaks. In most cases of Texas or chart information, there is a large difference in height between the highest peak and the second highest peak.

− ２つのピークの間のグレーレベルの差異。白／黒画
像では、この差異が大きいだろう。The difference in gray level between the two peaks. For white / black images, this difference will be large.

− ２つの支配的なピークの間の最小レベルの高さ。連
続階調画像では、このレベルが高いだろう。The height of the minimum level between the two dominant peaks. In continuous tone images, this level will be high.

− 「信号対ノイズ懸隔」の一種のような、最高ピーク
とその片側又は両側の最小レベルとの間の高さの差異。A height difference between the highest peak and the lowest level on one or both sides, such as a kind of "signal-to-noise suspension".

− ２つの主ピークの間の谷間の最小レベルよりも低い
画素の数。この数はハーフトーン画像では大きなものと
なるだろう。The number of pixels below the minimum level of the valley between the two main peaks. This number will be large for halftone images.

− 最高ピーク又は２つの最高ピークの幅。狭い幅はテ
キスト又は図表の指標であり得る。The width of the highest peak or the two highest peaks. The narrow width can be a text or chart indicator.

その分類器で使用される基準が幅広い範囲の結果をも
たらし得る場合には、実用的な数の分岐を得るためにそ
の結果が閾値化される。そのトリーの構造、それに使用
される基準及びその結果を表す閾値は、幾つかの標準文
書から得られた統計学的結果にそれらを整合させること
によって最適化されてもよい。標準文書の種類が広がれ
ば広がるほど、その分類器は強力なものとなるが、又そ
れに必要な複雑性も増大する。If the criteria used in the classifier can yield a wide range of results, the results are thresholded to get a practical number of branches. The structure of the tree, the criteria used for it, and the thresholds that represent the results may be optimized by matching them to statistical results obtained from several standard documents. The greater the variety of standard documents, the more powerful the classifier is, but the more complexity it requires.

第２図に示される例では、トリー分類器はそれに可能
な分類の結果として４つの異なったラベルを与え、それ
らはBW、BIM、BG及びＵと表示される。この例では、こ
れらのラベルによって表示される画像の特徴は次のよう
に説明できる。In the example shown in FIG. 2, the tree classifier gives four different labels as a result of its possible classification, which are labeled BW, BIM, BG and U. In this example, the features of the images displayed by these labels can be described as follows.

BW:高いコントラストを有する２つの支配的なグレーレ
ベル；テキスト又は図表の候補である画像（BWは黒／白
を表す）、 BIM:2つの支配的なグレーレベルを有するが、他の基準
から見てテキスト又は図表の有力な候補ではない画像
（BIMは「２つのモードのある（bimodal）」を表す）、 BG:典型的な背景（background）領域；比較的明るく及
び低コントラストを有する；テキスト及び図表セグメン
トで生じるばかりでなく、ハーフトーンセグメントでも
生じることがあり、 U:拡散したグレーレベル分布状態を有する領域（Ｕは
「不確定（undefined）」を表す）；連続階調画像の候
補。BW: two dominant gray levels with high contrast; an image (BW stands for black / white), which is a candidate for text or charts, BIM: two dominant gray levels, but with respect to other criteria Images that are not strong candidates for text or diagrams (BIM stands for "bimodal"); BG: typical background area; relatively bright and low contrast; It can occur not only in chart segments but also in halftone segments. U: Region with diffuse gray level distribution (U stands for "undefined"); candidate for continuous tone image.

上記の分類器を用いて得られる初期ラベルマトリック
スの一例は第４図（Ａ）図に示されている。この初期ラ
ベルマトリックスが緩和段階において除去されなければ
ならない幾らかの変動をなお含んでいることが理解され
るだろう。An example of an initial label matrix obtained using the above classifier is shown in FIG. 4 (A). It will be appreciated that this initial label matrix still contains some variation that must be removed in the relaxation step.

その緩和段階で使用されるコンテクスト規則は第３
（Ａ）図〜第３（Ｅ）図に関連して以下で説明されるだ
ろう。The context rule used in the mitigation phase is the third
It will be described below with reference to FIGS. 3A to 3E.

そのマトリックス要素をその各々に隣接する要素と比
較するために、そのマトリックス要素は３×３要素の配
列A,A′に組み合わされる。第３（Ａ）図から第３
（Ｄ）図に示される４つのコンテクスト規則が個々の３
×３配列に適用される。The matrix elements are combined into a 3.times.3 element array A, A 'to compare the matrix elements with the elements adjacent to each. From FIG. 3 (A) to third
(D) The four context rules shown in FIG.
Applies to x3 arrays.

第３（Ａ）図に示される所謂「ローカル（LOCAL）」
コンテクスト規則は、均一的な環境内において孤立した
ラベルを除去する目的を有する。この規則は次のように
定式化される。The so-called "LOCAL" shown in Fig. 3 (A)
Context rules have the purpose of removing isolated labels in a homogeneous environment. This rule is formulated as follows.

もしラベルＸが、ラベルＹを有する上部の、下部の、
右部の及び左部の隣接要素によって取り囲まれるなら
ば、その時ＸをＹに変換する。If label X has an upper, lower,
If surrounded by right and left neighbors, then convert X to Y.

この規則では、Ｘ及びＹは初期ラベルBW、BIM、BG及
びＵを表す。In this rule, X and Y represent the initial labels BW, BIM, BG and U.

第３（Ｂ）図及び第３（Ｃ）図に示されるコンテクス
ト規則は「弱拡張」規則と呼ばれてよく、次のような構
造を有している。The context rules shown in FIG. 3 (B) and FIG. 3 (C) may be referred to as “weak extension” rules, and have the following structure.

もし３×３配列Ａ′内の少なくとも１つの要素がラベ
ルBW（弱く広がるラベル）を有し且つその配列が予め決
められたグループからラベルを含まないならば、その時
ラベルBWをその配列全体に広げる。If at least one element in the 3x3 array A 'has a label BW (weakly spreading label) and the array does not contain a label from a predetermined group, then spread the label BW over the entire array. .

第３（Ｂ）図に示される拡張規則は、BW及びBGの組み
合わせをBWに転換し、BW/BG…＞BWと簡単に書き表され
てもよい。この規則では、その配列に含まれてはならな
いラベルの「予め決められたグループ」はラベルBIM及
びＵから成る。もしこれらのラベルの何れかがその配列
内に含まれていれば、その配列はこのコンテクスト規則
によって変換されないままにされる。The extension rule shown in FIG. 3 (B) converts the combination of BW and BG into BW, and may be simply written as BW / BG...> BW. Under this rule, the "predetermined group" of labels that must not be included in the array consists of the labels BIM and U. If any of these labels are included in the sequence, the sequence is left untransformed by this context rule.

第３（Ｃ）図では、禁止されたラベルの「前もって決
められたグループ」はラベルBG及びＵから成る。従っ
て、このコンテクスト規則はBW及びBIMの組み合わせか
ら成る配列だけを転換し、これはBW/BIM……＞BWと書き
表されてもよい。In FIG. 3C, the "predetermined group" of the forbidden labels consists of the labels BG and U. Thus, this context rule only converts sequences consisting of a combination of BW and BIM, which may be written as BW / BIM...> BW.

その配列内に含まれてはならないラベルの他のグルー
プを定義することによって、同一の構造を持つ他のコン
テクスト規則を作ることが可能である。例えば、配列が
BW、BG及びBIMの組み合わせから成る場合には、その配
列全体を一度にBWに転換することが可能である。By defining other groups of labels that must not be included in the array, it is possible to create other context rules with the same structure. For example, if the array is
If it consists of a combination of BW, BG and BIM, it is possible to convert the entire sequence into BW at once.

更に、その配列がラベルBWを有する少なくとも２つ
の、３つの又はそれ以上の要素を含まなければならない
ということを求めることによって、これらのコンテクス
ト規則が修正されてもよい。In addition, these context rules may be modified by requiring that the array must include at least two, three, or more elements with the label BW.

第３（Ｄ）図に示される「EXPAND」コンテクスト規則
は次のように規定する。The "EXPAND" context rule shown in FIG. 3 (D) specifies as follows.

もし配列Ａ′がラベルＵ（強く広がるラベル）を有す
る要素を少なくとも１つ含むならば、ラベルＵはその配
列全体に広げられる。If the array A 'contains at least one element with a label U (a strongly spreading label), the label U is spread over the entire array.

この規則では、他のラベルが初期配列内に出現するこ
とに関し何ら制限はない。This rule places no restrictions on the appearance of other labels in the initial sequence.

第３（Ｅ）図は「FILL」と称されるコンテクスト規則
を示し、この規則は３×３配列に拘束されない。この規
則は次のように定義される。FIG. 3E shows a context rule called "FILL", which is not constrained by a 3.times.3 array. This rule is defined as follows:

１）ラベルＵが交差する垂直な及び水平な連続14、16を
形成する場合は、これらの連続を縦横の長さとする長方
形18全体がラベルＵで満たされる（「連続（run）」と
いう用語は、そのマトリックスの行又は列におけるラベ
ルＵの中断されることのない順序を示す）。1) If the labels U form vertical and horizontal runs 14, 16 that intersect, the entire rectangle 18 of these runs is filled with labels U (the term “run” means , Indicating the uninterrupted order of the labels U in the rows or columns of the matrix).

２）Ｕで満たされた領域を最大化するために水平及び垂
直な連続の組み合わせを検査する。2) Examine the combination of horizontal and vertical runs to maximize the area filled with U.

３）その最大化された領域の高さが４つの要素より小さ
いならば、又はその幅が４つの要素より小さいならば、
その時にはこの領域のすべてのラベルをBWに変換する。3) If the height of the maximized area is less than four elements, or if its width is less than four elements,
Then convert all labels in this area to BW.

コンテクスト規則FILLへの拡張として、Ｕの連続を縦
横の長さとする長方形が予め決められた比率（Ｕ最小数
/U）より多い数のラベルＵを含むならば、及び／又はそ
の長方形の形状が、例えば予め決められた最小値より大
きい又は予め決められた最大値より小さいというよう
に、特定の条件に拘束されるならば、その長方形はラベ
ルＵによって満たされるだけである。As an extension to the context rule FILL, a rectangle whose length and width are the continuation of U is a predetermined ratio (U minimum number
/ U) if it contains a greater number of labels U and / or the shape of the rectangle is constrained to certain conditions, for example greater than a predetermined minimum or less than a predetermined maximum. If done, the rectangle is only filled by the label U.

緩和モジュール12は第３（Ａ）図〜第３（Ｅ）図に示
されるコンテクスト規則を次に示す順序で適用する。The mitigation module 12 applies the context rules shown in FIGS. 3A to 3E in the following order.

１） LOCAL ２） BW/BG……＞BW ３） LOCAL ４） BW/BIM……＞BW ５） LOCAL ６） BW/BG……＞BW ７） LOCAL ８） EXPAND ９） LOCAL 10） FILL これらのステップの各々では、その次のステップが実
行される前に、コンテクスト規則がマトリックス全体に
適用される。コンテクスト規則LOCALの場合には、１つ
の要素のステップにおいてマトリックス全体が３×３ウ
インドを用いて走査され、従って各々の要素は一度は３
×３配列Ａの中心要素と見なされる。1) LOCAL 2) BW / BG ……> BW 3) LOCAL 4) BW / BIM ……> BW 5) LOCAL 6) BW / BG ……> BW 7) LOCAL 8) EXPAND 9) LOCAL 10) FILL In each of the steps, the context rules are applied to the entire matrix before the next step is performed. In the case of the context rule LOCAL, in one element step the entire matrix is scanned using a 3 × 3 window, so that each element is
It is considered the central element of the × 3 array A.

ステップ２）、４）及び６）では、同一の手続きが適
用されてもよい。またその代わりに、そのマトリックス
が３×３配列Ａ′の固定グリットの中へ分割されてもよ
い。In steps 2), 4) and 6), the same procedure may be applied. Alternatively, the matrix may be divided into fixed grids in a 3 × 3 array A ′.

ステップ８）では、３×３配列Ａ′の固定グリッドが
使用される。またその代わりに、浮動配列法が採用され
てもよいが、各々の配列が少なくとも２つのラベルＵを
含むことが必要とされなければならない。なぜなら、そ
うでなければ、その拡張された領域は大きすぎるものに
なるだろうからである。In step 8), a fixed grid of 3 × 3 array A ′ is used. Alternatively, a floating array method may be employed, but each array must be required to include at least two labels U. Otherwise, the expanded area would be too large.

ステップ１）はモジュール10内で生成される初期ラベ
ルマトリックスから開始する。他のステップすべてはそ
の先行ステップの結果として得られる修正されたマトリ
ックスに対して実行される。他のコンテクスト規則を交
互に間に入れながら規則LOCALの適用が幾度か実行され
るということが理解されるだろう。規則BW/BG……＞BW
はステップ２）に適用され、そして再びステップ６）に
適用される。Step 1) starts with the initial label matrix generated in module 10. All other steps are performed on the modified matrix resulting from the preceding step. It will be appreciated that the application of the rule LOCAL is performed several times, interleaving other context rules. Rule BW / BG ……> BW
Is applied to step 2) and again to step 6).

ステップ７）の終わりには、ラベルBC及びBIMの大部
分が除去され、そのマトリックスはラベルBWによって均
一的に満たされた領域を示すであろうし、一方で他の領
域は他のラベルと組合せた形でラベルＵを含む。これら
の領域内では、ラベルＵがステップ８）及び10）で広げ
られるであろうし、従ってステップ10）の終わりには、
マトリックス全体がBW又はＵによって均一的に満たされ
た長方形領域から構成される。しかし規則FILLは、Ｕで
満たされる領域が小さすぎる場合にはその領域がBWに転
換されることを命じる。こうしてステップ10）の終わり
に得られるラベルマトリックスは、走査された文書の白
／黒領域及び連続階調領域の各々を表す大きな長方形セ
グメントを形成するラベルBW及びＵだけから成る。この
マトリックスは必要とされる緩和ラベルマトリックスに
一致する。初期ラベルマトリックスからこのマトリック
スを区別するために、ラベルBW及びＵの各々は、緩和ラ
ベルＴ（「TEXT」を表す）及びＰ（「PHOTO」を表す）
に名称を変えられる。At the end of step 7), most of the labels BC and BIM have been removed and the matrix will show the area uniformly filled by the label BW, while other areas have been combined with other labels Includes the label U in the form. Within these regions, the label U will be spread out in steps 8) and 10), so at the end of step 10)
The entire matrix is made up of rectangular areas uniformly filled with BW or U. But the rule FILL mandates that if the area filled with U is too small, that area will be converted to BW. The label matrix thus obtained at the end of step 10) consists only of the labels BW and U which form a large rectangular segment representing each of the white / black and contone areas of the scanned document. This matrix corresponds to the required relaxation label matrix. To distinguish this matrix from the initial label matrix, each of the labels BW and U is represented by the relaxed labels T (for "TEXT") and P (for "PHOTO").
You can change the name.

第４（Ｂ）図は、第４（Ａ）図に示された初期ラベル
マトリックスから得られた緩和ラベルマトリックスを表
す。幾つかのテキスト形式を有するテキスト領域及び２
つの写真領域を含むテスト用文書に対して上記のセグメ
ンテーション処理を適用することによって得られた実験
結果をこの図は表している。その文書の写真領域の実際
の境界線が破線20で示されている。FIG. 4 (B) shows a relaxed label matrix obtained from the initial label matrix shown in FIG. 4 (A). Text area with several text formats and 2
This figure shows an experimental result obtained by applying the above-described segmentation processing to a test document including two photograph regions. The actual boundaries of the photographic area of the document are shown by dashed lines 20.

第４（Ｂ）図のＰセグメントがサブイメージマトリッ
クスの分解能の範囲内で写真領域の実際の境界線と合致
するということが理解されるだろう。It will be appreciated that the P segment of FIG. 4 (B) matches the actual boundaries of the photographic region within the resolution of the sub-image matrix.

第４（Ａ）図に示されるようにその写真領域は、テキ
スト領域と解釈されることも可能なラベルBW、BIM及びB
Gで満たされた相対的に大きな干渉性領域を含む。緩和
処理においては、これらの不明瞭さはコンテクスト規則
によって首尾良く取り除かれている。As shown in FIG. 4 (A), the photo area has labels BW, BIM and B, which can be interpreted as text areas.
Includes a relatively large coherent region filled with G. In the mitigation process, these ambiguities have been successfully removed by context rules.

第５図は自動セグメンテーションシステムのハードウ
ェア装置の一例を示している。FIG. 5 shows an example of a hardware device of the automatic segmentation system.

文書はスキャナ22内で走査され、個々の画素のグレー
レベルを表すディジタル数値がビットマップ（bitmap）
内に記憶される。更にこれらの数値は、その文書の個々
のサブイメージのためのヒストグラムを生じさせるヒス
トグラムユニット24に伝送される。ヒストグラムデータ
は第１図の初期ラベリングモジュール10に相当する分類
器26の中で評価される。分類器26はヒストグラムの特徴
を検査する特徴抽出器28、並びに検査されるべき特徴を
選択し且つ調査されたサブイメージに初期ラベルの１つ
を最終的に割当てるトリー分類器30から成る。The document is scanned in scanner 22 and a digital number representing the gray level of each pixel is converted to a bitmap.
Is stored within. Further, these numbers are transmitted to a histogram unit 24 which produces a histogram for the individual sub-images of the document. The histogram data is evaluated in a classifier 26 corresponding to the initial labeling module 10 of FIG. The classifier 26 comprises a feature extractor 28 for examining the features of the histogram, and a tree classifier 30 for selecting the features to be examined and ultimately assigning one of the initial labels to the examined sub-images.

更に初期ラベルは、第１図の緩和モジュール12に相当
するコンテクストプロセッサ32内で処理される。コンテ
クストプロセッサは、コンテクスト規則（ステップ１）
〜ステップ10））を順次的に適用するための処理モジュ
ール34及び初期ラベルマトリックス、中間結果及び緩和
ラベルマトリックスを記憶するためのバッファ36から成
る。Further, the initial labels are processed in a context processor 32 corresponding to the mitigation module 12 of FIG. The context processor determines the context rule (step 1)
It comprises a processing module 34 for sequentially applying steps 10)) and a buffer 36 for storing the initial label matrix, the intermediate result and the relaxed label matrix.

改良されたハードウェア装置には、初期ラベリング段
階において複数のサブイメージが並行的に処理されるこ
とが可能なように、複数のヒストグラムユニット24及び
分類器26が揃えられてもよい。In the improved hardware device, a plurality of histogram units 24 and a classifier 26 may be arranged so that a plurality of sub-images can be processed in parallel in an initial labeling stage.

第１図から第４図に示される具体例では、そのセグメ
ンテーションシステムは２つの異なった種類の情報だけ
を、即ち、白／黒情報（ラベルＴ）及び連続階調情報
（ラベルＰ）だけを判別するにすぎない。写真区分はラ
スタ化又はディザー化された画像のような周期的な情報
ばかりでなく、連続階調情報を含んでもよい。また本発
明による考案は、連続階調情報及び周期性情報の間を更
に判別するセグメンテーションシステムにも適用可能で
ある。例えば、第６図又は第７図に示されるようにセグ
メンテーションシステムを改良することによって、これ
を実現することができる。In the embodiment shown in FIGS. 1 to 4, the segmentation system determines only two different types of information: white / black information (label T) and continuous tone information (label P). Just do it. Photo sections may include continuous tone information as well as periodic information such as rasterized or dithered images. The invention according to the present invention is also applicable to a segmentation system for further discriminating between continuous tone information and periodicity information. This can be achieved, for example, by modifying the segmentation system as shown in FIG. 6 or FIG.

この両方の図ではハーフトーン指標がハーフトーン情
報を検出するために使用される。各々のサブイメージに
対して次の基準を用いることによって、ハーフトーン情
報を検出することが可能である。In both figures, the halftone index is used to detect halftone information. By using the following criteria for each sub-image, it is possible to detect halftone information.

− そのスペクトラム内の第１の非DCピーク値とそのス
ペクトラムの原点との間の距離。The distance between the first non-DC peak value in the spectrum and the origin of the spectrum.

− そのスペクトラム内のDCピーク値と第１の非DCピー
ク値との間の比率。The ratio between the DC peak value in the spectrum and the first non-DC peak value.

第６図では、初期ラベリング処理及び緩和処理は第１
図と同一の方法で行われ、その後、写真領域（ラベル
Ｐ）が連続階調情報及び周期性情報を判別するために更
に解析される。これは前述のラスタ指標の１つを検査す
ることによって実現され、周期性モジュール38によって
実行される。第６図に示されるシステムは、時間を要す
る周期性情報の検査が、写真領域として識別された区分
だけに限定されるという利点を有する。In FIG. 6, the initial labeling process and the relaxation process are the first
This is done in the same way as in the figure, after which the photographic area (label P) is further analyzed to determine continuous tone information and periodicity information. This is achieved by examining one of the aforementioned raster indices and is performed by the periodicity module 38. The system shown in FIG. 6 has the advantage that the time-consuming examination of periodicity information is limited to only those sections identified as photographic areas.

またその代わりに、周期性情報に対する検査が第７図
に示されるように初期ラベリング段階で行われてもよ
い。この場合には、緩和ラベルマトリックスが、連続階
調情報、周期性情報及び白／黒情報に相当する３つの異
なったラベルを含むように、初期ラベルはラスタ画像の
強力な候補であることを示す少なくとも１つのラベルを
含むだろうし、その緩和モジュール内のコンテクスト規
則はこのラベルを拡張するための規則を含むことだろ
う。Alternatively, a check on the periodicity information may be performed at the initial labeling stage as shown in FIG. In this case, the initial label indicates a strong candidate for a raster image, such that the relaxed label matrix includes three different labels corresponding to continuous tone information, periodicity information, and white / black information. It will include at least one label, and the context rules in its mitigation module will include rules to extend this label.

ハーフトーン情報を含む区域を見出すコンテクスト規
則は、連続階調領域を見出すよう意図された前述のコン
テクスト規則FILL及びEXPAND（第３図）と類似のもので
あることが可能である。この場合には目標ラベルＵの代
わりに、ハーフトーン情報を指示するラベルが使用され
るだろう。The context rules for finding areas containing halftone information can be similar to the previously described context rules FILL and EXPAND (FIG. 3) intended to find continuous tone regions. In this case, instead of the target label U, a label indicating the halftone information will be used.

第７図に示される例では、セグメント（第4B図におけ
るＰ及びＴ）と文書の写真領域の実際の境界線20との間
の一致を改善するために、境界線解析モジュール40が付
け加えられる。In the example shown in FIG. 7, a border analysis module 40 is added to improve the match between the segments (P and T in FIG. 4B) and the actual border 20 of the photographic area of the document.

例えば、サブウインドウの大きさ（即ち、1/4、1/2、
3/4）の一定割合ずつにサブイメージのグリッドを垂直
且つ水平的に移動することによって、並びにその移動さ
れたグリッドに対して初期ラベリング及び緩和手順を繰
り返すことによって、境界線解析が行われてもよい。更
にその異なった結果の比較が、写真領域境界線の実際の
位置に関するより詳細な情報を提供する。For example, the size of the subwindow (ie, 1/4, 1/2,
Boundary analysis is performed by vertically and horizontally moving the sub-image grid by a fixed percentage of 3/4), and by repeating the initial labeling and relaxation procedure on the moved grid. Is also good. Furthermore, a comparison of the different results provides more detailed information regarding the actual location of the photographic area boundaries.

随意に、垂直及び水平境界線に関する境界線解析は、
写真領域の垂直及び水平境界線の各々が存在するに違い
ないと予想される文書部分に限定されてもよい。Optionally, the boundary analysis for the vertical and horizontal boundaries is
The vertical and horizontal boundaries of the photographic region may each be limited to those portions of the document that are expected to be present.

それに代わる方法では、緩和ラベルマトリックス内に
おけるラベル転移座標の中心に置かれた特定の目標領域
を更に解析することによって、境界線解析が行われても
よい。例えば、目標領域は初期ラベリング段階で使用さ
れるものよりも高い分解能を提供するサブウインドウに
更に小区分され、更に、サブウインドウ各々が境界サブ
ウインドウ又は非境界サブウインドウに分類されること
が可能である。In an alternative method, a boundary analysis may be performed by further analyzing a particular target area centered on the label transition coordinates in the relaxed label matrix. For example, the target area can be further subdivided into sub-windows that provide higher resolution than that used in the initial labeling stage, and each sub-window can be further classified as a bounded sub-window or a non-bounded sub-window. is there.

目標領域の解析は、緩和ラベルマトリックス内の転移
線上の孤立した場所に限定されてもよい。これらの目標
領域内で境界線の位置が正確に限定される場合には、境
界線全体の正確な位置が外挿法によって発見されてもよ
い。Analysis of the target area may be limited to isolated locations on the transition line in the relaxation label matrix. If the position of the boundary line is precisely defined within these target areas, the exact position of the entire boundary line may be found by extrapolation.

以上では本発明の特定の具体例が説明されてきたが、
当業者は本特許請求に明記された発明案の範囲内にすべ
て含まれる様々な変更を案出することが可能だろう。Although specific embodiments of the present invention have been described above,
Those skilled in the art will be able to devise various changes that fall entirely within the scope of the inventions specified in the claims.

例えば第３図に関して言及されたコンテクスト規則
は、３×３サブイメージという前述の大きさよりも大き
いマトリックスを用いて実行されることが可能である。For example, the context rules mentioned with respect to FIG. 3 can be implemented using a matrix of 3 × 3 sub-images, which is larger than the aforementioned size.

[Brief description of the drawings]

第１図は自動セグメンテーションシステムの概括的な構
成を示すブロック線図、第２図は初期ラベリング段階で
使用されるトリー分類器を示す図、第３（Ａ）図〜第３（Ｅ）図は緩和ステップに使用され
るコンテクスト規則図、第４（Ａ）図及び第４（Ｂ）図
は初期ラベルマトリックス及びそれから得られる緩和ラ
ベルマトリックスの一例を示す図、第５図は本発明によ
るシステムのハードウェア装置の一例のブロック線図、
第６図及び第７図は本セグメンテーションシステムの改
良例のブロック線図である。 10……初期ラベリングモジュール、 12……緩和モジュール、22……スキャナ、 24……ヒストグラムユニット、26……分類器、 28……特徴抽出器、30……トリー分類器、 32……コンテクストプロセッサ、 34……処理モジュール、36……バッファ、 38……周期性モジュール。FIG. 1 is a block diagram showing a general configuration of an automatic segmentation system, FIG. 2 is a diagram showing a tree classifier used in an initial labeling stage, and FIGS. 3 (A) to 3 (E) are shown in FIGS. The context rule diagrams used in the mitigation step, FIGS. 4 (A) and 4 (B) show an example of the initial label matrix and the resulting relaxed label matrix, and FIG. 5 shows the hardware of the system according to the invention. Block diagram of an example of a wear device,
6 and 7 are block diagrams of an improved example of the present segmentation system. 10 ... initial labeling module, 12 ... relaxation module, 22 ... scanner, 24 ... histogram unit, 26 ... classifier, 28 ... feature extractor, 30 ... tree classifier, 32 ... context processor, 34 Processing module, 36 Buffer, 38 Periodic module.

Claims

(57) [Claims]

1. A document area (T, P) containing various types of image information such as white / black images, continuous tone images and the like.
Automatically segmenting a scanned document in an electronic document processing device to classify a scanned image representing a whole document into sub-image matrices. And a labeling means for analyzing the information contained in each sub-image and assigning initial labels (BW, BIM, BG, U) to said sub-images in order to obtain an initial label matrix thereby. And relaxing means for relaxing the initial label matrix by transforming the labels according to context rules, with the aim of obtaining a uniformly labeled segment pattern representing the document area of the document area. Adapted to select an initial label from a first set of labels (BW, BIM, BG, U); The moderator is adapted to convert the initial label to a moderated label (T, P), and the moderated label is selected from a second set of labels that is less in number than the first set. System.

2. The context rule executed in said mitigation means comprises rules for extending some (BW, U) of said initial labels and removing other initial labels (BIM, BG), The system of claim 1, wherein the extended label is also finally identified using the relaxed label (T, P).

3. The context rule: a) where n is a predetermined number, BW is a predetermined initial label, and G is a predetermined subset of the initial label set, If at least n elements in the predetermined array (A ') of matrix elements in the initial label matrix have a label BW and this array does not include a label from group G, then this array B) convert all the labels in B into a predetermined array of matrix elements in the initial label matrix (A), where m is a predetermined number and U is a predetermined initial label. '), If at least m elements have a label U, then convert all labels in this array to U, c) c1) the vertical where label U intersects When forming horizontal and horizontal continuations, the entire rectangle having these continuations in length and width is filled with the label U. c2) In order to maximize the area to be filled with the label U, the vertical and horizontal continuations are formed. Check all combinations, c3) height of the maximized area when U and BW are predetermined initial labels and the minimum height and minimum width are predetermined numbers. From at least one rule having one of the following structures: if is less than the height minimum element, or if its width is less than the width minimum element, convert all labels in this area to BW. 3. The system according to claim 2, comprising:

4. If the context rule further comprises: c4) if the rectangle referred to in c1 includes a number of labels U greater than a predetermined minimum number of U, the entire rectangle is filled with the label U; c5) When the minimum and maximum are predetermined numbers, the rectangular shape referred to in c1 has width / height> minimum and width /
4. The system according to claim 3, comprising a rule of construction: if the condition height <max is satisfied, the entire rectangle is filled with a label U.

5. The system according to claim 3, wherein the predetermined arrangement (A ′) to which the context rules (a) and (b) are applied has a size of 3 × 3 sub-images.

6. The method of claim 3 wherein said mitigation means comprises a plurality of steps for modifying said initial label matrix stepwise by applying each of said context rules at least once according to a predetermined order. 6. The system according to claim 5.

7. A context rule having said structure (a) is applied before a context rule having said structure (b), and said structure (b) is preceding a context rule having said structure (c). 7. The system of claim 6, wherein context rules apply.

8. If the context rule is that a given matrix element is surrounded by elements having the same label X immediately adjacent to the top, bottom, right and left, 8. A label according to claim 6 or claim 7, wherein the label also comprises a local rule ordering to be converted to X, and wherein the local rule is applied immediately before each of the rules having the structure (a), (b) or (c). System.

9. The method of claim 1, wherein each subimage has a size of at least 16
A system according to any one of the preceding claims, wherein the system is x16 pixels, preferably 64x64 pixels, and the scanning resolution is greater than 1000 dpi.

10. The system of claim 9, wherein said labeling means comprises a histogram unit for generating a gray level histogram for each of the sub-images.

11. The method according to claim 1, wherein said labeling means comprises a tree classifier for determining a label to be assigned to a given sub-image by examining characteristics of the input data according to the tree structure. A system according to claim 1.

12. The system according to claim 1, wherein said labeling means comprises means for detecting rasterized or dithered information.

13. The system according to claim 1, comprising a context rule for finding halftone regions.

14. A method according to claim 1, further comprising means for detecting rasterized or dithered information only in a photographic area already segmented by said relaxation means. system.

15. The system according to claim 1, comprising a plurality of labeling means for processing data from a plurality of sub-images in parallel.

16. The apparatus of claim 1, further comprising a boundary analysis means responsive to an output of said mitigation means for determining a boundary position of the segmented region with higher resolution.
The system according to any one of claims 15 to 15.

17. A document region (T, T) containing various types of image information such as white / black images, continuous tone images and the like.
P) A method for automatically segmenting a scanned document in an electronic document processing device to classify a document, wherein a scanned image representing the entire document is subdivided into a matrix of sub-images and an initial label matrix The information contained in each sub-image is analyzed to give initial labels (BW, BIM, BG, U) to the sub-images for the purpose of obtaining the same, and a uniform label representing the various document areas is provided. The initial label matrix is relaxed by transforming the labels of the individual matrix elements according to the context rules to obtain the segment pattern given the first label (BW, BIM, BG, U), and wherein, in the mitigation phase, the initial label has a smaller number of labels than the first set. Wherein the be converted to relaxed labels being selected from (TP).