JP2615051B2

JP2615051B2 - Document image area division and identification method

Info

Publication number: JP2615051B2
Application number: JP62134420A
Authority: JP
Inventors: 治一五十嵐
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1987-05-28
Filing date: 1987-05-28
Publication date: 1997-05-28
Anticipated expiration: 2012-05-28
Also published as: JPS63298487A

Description

【発明の詳細な説明】技術分野本発明は、文書画像の領域の分割と認識に関する。Description: TECHNICAL FIELD The present invention relates to division and recognition of a region of a document image.

従来技術文書画像を“理解する”（「その文書画像に関する構
造上・意味上の、いかなる質問にも答えられるだけの情
報を獲得する」）ためには、その文書の構造（「本文、
標題、図、表などの構成」）を理解することがまず第一
歩である。また、本出願人が提案した文書画像処理シス
テムにおいては、抽象化された、いわゆる上位レベルで
の画像処理操作を定義して、より高度な画像処理操作の
表現を可能とすることが要求される。これらの観点か
ら、領域分割（識別）操作は非常に重要な画像処理操作
の一つであるといえる。2. Description of the Related Art In order to "understand" a document image ("obtain information sufficient to answer any structural or semantic questions related to the document image"), the structure of the document ("text,
Understanding the structure of titles, figures, tables, etc. ") is the first step. Further, in the document image processing system proposed by the present applicant, it is required to define an abstracted image processing operation at a so-called higher level so as to enable expression of a more advanced image processing operation. . From these viewpoints, it can be said that the area division (identification) operation is one of very important image processing operations.

以下、従来の領域分割についてまとめると（ただし、
書式定義との照合方式のシステムは除く）、前田、松
浦、南部（三菱）：“手書き文書の構造解析”電子通信
学会総合全国大会（昭和61年）1517では、横長の小矩形
領域の濃度特徴を用いて、文字行、ブロックを合成して
いる。樋野、福田、田畑（日立）：“マルチメディア処
理によるオフィスワークステイション（その２）−文書
構造の分離抽出方式−”情報処理学会全国大会（昭和61
年前期）3K−2,及び、岩城、木田、荒川（NTT）：“機
能分散形文書認識システム"PRU86−32では、黒連結成分
の外接矩形の隣接状態から文字行を抽出してゆく。ま
た、罫線と空白部分を利用しておおよその分割を行い、
周辺分布の周期性、線密度などの特徴を基に領域の識別
を行う方式が秋山、萩田、益田（NTT）：“書式未知文
書の自動読み取り"PRU86−33で示されている。これらの
方式では、次の３つの特徴量が領域のカテゴリーを識別
するために用いられている：（１）周辺分布特徴…黒画素の位置的な広がりと周期性
を表す。Below is a summary of conventional segmentation (however,
Maeda, Matsuura, Nambu (Mitsubishi): "Structural analysis of handwritten documents" IEICE General Conference (1986) 1517, density characteristics of horizontally long small rectangular areas Are used to combine character lines and blocks. Hino, Fukuda, Tabata (Hitachi): "Office Work Station by Multimedia Processing (Part 2)-Separation and Extraction Method of Document Structure-" National Convention of IPSJ (Showa 61)
3K-2, and Iwaki, Kida, Arakawa (NTT): The "functionally distributed document recognition system" PRU86-32 extracts character lines from the adjacent state of the circumscribed rectangle of the black connected component. In addition, we make approximate division using ruled line and blank space,
Akiyama, Hagita, Masuda (NTT): "Automatic reading of unknown format documents" PRU86-33 shows a method for identifying areas based on features such as periodicity of the marginal distribution and line density. In these schemes, the following three feature values are used to identify the category of the region: (1) Marginal distribution feature: Indicates the positional spread and periodicity of black pixels.

（２）線密度特徴……図形的な複雑さを表す（３）外接矩形特徴…個々の要素の大きさを表すしかし、こうした特徴量をどう利用するかという知識
は、処理手続きとしてプログラム中に組み込まれてしま
っており、処理方式の変更・追加に柔軟に対応出来ない
という欠点がある。(2) Line density feature: Represents graphic complexity (3) Bounding rectangle feature: Represents the size of each element However, knowledge of how to use these feature amounts is stored in a program as a processing procedure. It has a disadvantage that it cannot be flexibly coped with a change or addition of a processing method.

これに対し、AI技法であるプロダクションルールで処
理手続きを表現したシステムも提案されている岩城、久
保田、遠城、荒川（NTT）：“文字・図形分離処理にお
けるプロダクションシステムの導入の一検討"PRL83−6
3。そこでは、一つの特徴量を計算する度に、一連のル
ールセットが起動し、その結果がブラックボードに書き
込まれていくという方式を採っている。しかし、このシ
ステムでも分割処理方法、領域識別方法の大枠が固定さ
れてしまっている。On the other hand, systems that express processing procedures using production rules, which are AI techniques, have also been proposed. Iwaki, Kubota, Toshiro, Arakawa (NTT): "Study on introduction of production system in character / graphic separation processing" PRL83 −6
3. Here, a system is adopted in which a series of rule sets is activated every time one feature value is calculated, and the result is written to a blackboard. However, even in this system, the outlines of the division processing method and the area identification method are fixed.

マルチメディアデータを処理するには、文書画像理解
によりメディア分離をシステムが自動的に行なわれるこ
とが望ましい。入力した文書画像を領域に分割し、その
カテゴリーを識別することは、文書画像の構造を理解す
る上で重要である。本出願人はこの領域分割・識別を正
確かつ高速に行なう方式について提案したが、その方式
は、イメージピラミッドからの白領域抽出による大局的
構造の把握、分割処理と識別処理とを分離した柔軟な判
定方式、特徴量に対する確信度の導入とその合成方法の
工夫にある。而して、上記方式においては、各特徴量に
各カテゴリーに対する確信度を経験的に与え、領域の識
別処理に利用してきた。しかし、入力する文書が複数で
あり、入力文書の種類が別のものに切りかわったときに
は、確信度のわり当て方が固定されてしまっていると正
しい識別ができなくなる可能性がでてくる。従って、確
信度のわり当て方を入力文書の種類の変化に応じて柔軟
に変えていく必要がある。In order to process multimedia data, it is desirable for the system to automatically perform media separation by document image understanding. It is important to divide the input document image into regions and identify the categories in order to understand the structure of the document image. The present applicant has proposed a method for accurately and rapidly performing this region division and identification, but the method is a flexible method in which the global structure is grasped by extracting a white region from an image pyramid, and the division process and the identification process are separated. The present invention is based on the introduction of a certainty factor for a judgment method and a feature amount and a method of synthesizing the same. Thus, in the above-described method, certainty factors for each category are empirically given to each feature amount and used for the region identification processing. However, when there are a plurality of input documents and the type of the input document is switched to another, if the assignment method of the certainty is fixed, there is a possibility that correct identification cannot be performed. Therefore, it is necessary to flexibly change the method of assigning the certainty according to the change in the type of the input document.

目的本発明は、上述のごとき実情に鑑みてなされたもの
で、オペレータとの対話により、識別結果の正否から、
確信度を与える関数を変化させていく、いわば学習効果
を持たせることを目的としてなされたものである。Objective The present invention has been made in view of the above-mentioned circumstances, and based on a dialogue with an operator, whether the identification result is correct or not,
The purpose is to change the function that gives certainty, that is, to have a learning effect.

構成本発明は、上記目的を達成するために、入力画像に対
してイメージピラミッドを利用した領域分割を行い、各
分割層から領域識別のための複数個の特徴量を抽出し、
各特徴量に割り当てられた確信度を合成することによ
り、その領域のカテゴリーを識別する文書画像の領域分
割及び識別方法において、入力画像の識別結果に基づい
て、より確からしい識別結果が得られるように各特徴量
に確信度を割り当てる関数の値を変化させるようにした
ことを特徴としたものである。以下、本発明の実施例に
基づいて説明する。Configuration In order to achieve the above object, the present invention performs region division on an input image using an image pyramid, and extracts a plurality of feature amounts for region identification from each divided layer,
By synthesizing the certainty factors assigned to the respective feature amounts, a more reliable classification result can be obtained based on the classification result of the input image in the region division and classification method of the document image for identifying the category of the region. The value of a function for assigning certainty to each feature amount is changed. Hereinafter, a description will be given based on examples of the present invention.

本発明が適用される文書画像の領域分割及び識別方式
は、次の２つの特徴を持っている。The area division and identification method of a document image to which the present invention is applied has the following two features.

（１）イメージピラミッドを利用した大局的な情報によ
る領域分割。(1) Region division by global information using an image pyramid.

（２）特徴量の抽出処理と領域識別処理との完全分離。(2) Complete separation between feature quantity extraction processing and area identification processing.

特に、（２）では、領域のカテゴリーの判定に確信度を
持込み、重み付き平均や、Dempster−Shafer理論に基づ
く計算方法を採用し最終的な判定を下している。また、
確実な判定ができなかったときのバックトラックや重み
の変更などの詳細解析手段も持たせることができる。In particular, in (2), certainty is brought into the determination of the category of the area, and a weighted average or a calculation method based on the Dempster-Shafer theory is employed to make a final determination. Also,
It is also possible to provide detailed analysis means such as backtracking or weight change when reliable determination cannot be made.

第１図は、本発明の全体の処理フローを説明するため
の図、第２図は、イメージピラミッドを説明するための
図、第３図は、変換ルールを説明するための図で、図
中、１は原画像であり、文字行、表、図、写真などが混
在した文書画像を対象としている。これをスキャナーか
ら入力（２）し、スキュー補正などの前処理を行なう
（３）。入力画像は０（白）と１（黒）の２値データの
集まりと表現される。このデータからイメージピラミッ
ドを作成する（４）。イメージピラミッドとは一種のス
ケール変換であり、第２図に示すように、画像を遠くか
ら見ることすなわちP₁からP₂を見、P₂₀からP₃を見るこ
と（粗視化）に対応する。これにより、１ページ全体の
画像の構造が把握しやすくなる。第３図は、イメージピ
ラミッドにおけるルール変換を説明するための図で、図
示のように、隣接する４つの画素を見て（第３図
（ａ））、ある条件を満たす場合に、４つの画素を黒１
に置換し、この４つの黒（又は白）画素分を新たに一つ
の黒画素とみなす（第３図（ｂ））。本例では、４つの
うちどれか１つが黒であれば黒に変換することにしてい
る。このようなイメージピラミッドを使って原画像から
矩形の白領域（空白部分）を抽出する（５）。上記の作
成方法から、上の層で白画素である部分はそれ以下のす
べての層で、対応する領域は完全に白領域である（黒画
素を含まない）ことが言える。本例では、最上層からあ
るレベルの層までを使い、上の層から順に白い矩形領域
を抽出していく方法をとっている。白領域が抽出された
画像は、いくつかの領域（内部に黒画素を含んでいる）
に分割されるが、この領域の境界を抽出する（６）。白
領域を抽出するために、どのレベルの層までを使用する
かによりイメージピラミッドの層の数と同じ数の領域分
割の仕方がある。ある条件にいより最初にどの分割され
た層（以下、分割層と称す）に対して領域識別（カテゴ
リーに分類すること）を行うかを決定する。本例では、
ある値以上の大きさをもった領域が一定数以上である分
割層のうちで、最も下の層を使った分割層を最初に解析
することとしている。こうして最初に処理すべき分割層
が決定されると、その層内の各領域に対応する原画像領
域を切り出し、切り出された各領域に対していくつかの
特徴量を計算する（７）。本例では、下記の12種類の特
徴量を計算している。FIG. 1 is a diagram for explaining the overall processing flow of the present invention, FIG. 2 is a diagram for explaining an image pyramid, and FIG. 3 is a diagram for explaining a conversion rule. Reference numeral 1 denotes an original image, which is intended for a document image in which character lines, tables, figures, photographs, and the like are mixed. This is input from a scanner (2), and preprocessing such as skew correction is performed (3). The input image is expressed as a set of binary data of 0 (white) and 1 (black). An image pyramid is created from this data (4). Is a kind of scaling the image pyramid, as shown in FIG. 2, the image viewed P ₂ since i.e. P ₁ viewed from afar, corresponding to viewing P ₃ from P ₂₀ (coarse-grained) . This makes it easy to grasp the structure of the image of the entire page. FIG. 3 is a diagram for explaining rule conversion in the image pyramid. As shown in FIG. 3, looking at four adjacent pixels (FIG. 3 (a)), if a certain condition is satisfied, four pixels are set. The black one
And the four black (or white) pixels are newly regarded as one black pixel (FIG. 3 (b)). In this example, if any one of the four is black, it is converted to black. Using such an image pyramid, a rectangular white area (blank portion) is extracted from the original image (5). According to the above-described creation method, it can be said that the portion of the upper layer that is a white pixel is all the layers below it, and the corresponding region is a completely white region (not including the black pixel). In this example, a method is used in which white rectangular areas are extracted in order from the upper layer using layers from the uppermost layer to a certain level. The image from which the white area has been extracted has several areas (including black pixels inside)
The boundary of this area is extracted (6). In order to extract the white region, there are as many region division methods as the number of layers of the image pyramid, depending on which layer is used. Under certain conditions, it is first determined which divided layer (hereinafter referred to as a divided layer) is to be subjected to area identification (to be classified into categories). In this example,
Among divided layers in which the number of regions having a size equal to or more than a certain value is equal to or more than a certain number, a divided layer using the lowest layer is analyzed first. When the divided layer to be processed first is thus determined, an original image region corresponding to each region in the layer is cut out, and some feature amounts are calculated for each cut out region (7). In this example, the following 12 types of feature values are calculated.

特徴量…領域（本文、標題、表、図、罫線・外枠、その
他）を識別するために以下に示す12個の画像特徴を利用
する。Feature amount: The following 12 image features are used to identify regions (text, titles, tables, figures, ruled lines / outer frames, etc.).

1.黒連結成分の外接矩形のサイズ分布：［サイズ；矩形
の対角線の長さ］・最大値・平均値・変異率（％）…（標準偏差／平均値）×100 ・面積比が最大のサイズ…矩形のサイズごとに面積を加
算した結果、最大の面積割合を持つサイズ値。1. Size distribution of the circumscribed rectangle of the black connected component: [size; length of the diagonal of the rectangle] ・ Maximum value ・ Average value ・ Mutation rate (%)… (Standard deviation / Average value) × 100 ・ Maximum area ratio Size: The size value that has the largest area ratio as a result of adding the area for each rectangle size.

・文字密度（％）…あるサイズ（１〜70に設定してあ
る）の黒連結を文字とみなしたときの面積比。Character density (%): Area ratio when black concatenation of a certain size (set to 1 to 70) is regarded as a character.

2.黒画素の周辺分布・文字行間スペースによる周期性の存在・コラムの存在…縦方向または横方向の字揃え 3.直線の存在・水平直線の本数・垂直直線の本数 4.フレーム（枠）の存在 5.平均線密度…X,Y軸の各点から垂線を下したときのス
トロークとの交点数を領域の外接矩形面積Ｓで正規化
（×100/S）した値。2. Peripheral distribution of black pixels-Existence of periodicity due to space between character lines-Existence of columns: vertical or horizontal alignment 3. Existence of straight lines-Number of horizontal straight lines-Number of vertical straight lines 4. Frame (frame) 5. Average linear density: A value obtained by normalizing (× 100 / S) the number of intersections with the stroke when a perpendicular line is drawn from each point on the X and Y axes by the circumscribed rectangular area S of the region.

6.伸張度…A/W²［A;多角形領域の面積］ W;多角形が縮退操作（ここでは、８連結近傍に白画素が
存在する黒画素を消去する操作）により完全に消滅する
のに要するステップ数を２倍した値（＝幅）］上述のごとき各特徴量に対して、その特徴量の値がど
れだけであれば、その領域が各カデゴリーに属すること
の確からしさ（確信度）が与えられており、ステップ８
の確信度の割当てで割り当てる。確信度の割り当て方
は、経験的に例えば下記のようにして決められる。6. Degree of expansion: A / W ² [A; area of polygon area] W; polygon completely disappears by degeneration operation (here, operation of erasing black pixels in which white pixels are present in the vicinity of 8-connections) Value (= width) of the number of steps required for (2) (= width)] For each feature value as described above, if the value of the feature value is large, it is certain that the region belongs to each category (confidence) Degree) is given and step 8
With a certainty factor assignment. The assignment method of the certainty factor is empirically determined, for example, as follows.

確信度の設定…各特徴量の値d_iに対し、各領域カテゴリ
ーαへの確信度（０〜100）を与える関数［g_i（d_i,
α）］をヒューリスティックに表１〜表４のように定め
た。Confidence Settings ... to the value d _i of the feature amounts, functions to provide a confidence (0-100) for each region Category alpha [g _i (d _i,
α)] was heuristically determined as shown in Tables 1 to 4.

ただし、区間の端点は、表中に先に出現した方の区間に
含まれるものとする。 However, it is assumed that the end point of the section is included in the section that appears first in the table.

黒画素の周辺分布 ZERO_PERIOD（label_poly,direction）…周期性のある
ゼロ（白）領域の存在有無（表５）直線の存在…水平・垂直直線の本数（表６）フレーム（枠）の存在（表７）コラムの存在（表８）…横書きの場合：縦方向の文字揃
え（コラム）の有無縦書きの場合：横方向の文字揃え（コラム）の有無 COLUMN（label_poly,intensity）（表８）例えば、横書きの場合、コラムあり１を第５図（ａ）、
コラムあり２を第５図（ｂ）、コラムあり３を第５図
（ｃ）又は（ｄ）に示す。Peripheral distribution of black pixels ZERO_PERIOD (label_poly, direction): existence of periodic zero (white) area (Table 5) Existence of straight line: Number of horizontal and vertical straight lines (Table 6) Existence of a frame (Table 7) Existence of columns (Table 8): Horizontal writing: presence / absence of vertical character alignment (column) Vertical writing: presence / absence of horizontal character alignment (column) COLUMN (label_poly, intensity) (Table 8) For example, in the case of horizontal writing, FIG.
The column 2 is shown in FIG. 5 (b), and the column 3 is shown in FIG. 5 (c) or (d).

文字の密度（表９）平均線密度（表10）伸張度 REDUCE（label_poly,elongatedness）（表11）カテゴリーについては、本例では本文、標題、表、
図、罫線・枠、その他の６種類を考えているが、写真な
どの濃淡画像を考えることも可能である。記号で表現す
れば、特徴量ｉに対して、特徴量ｉの値d_iに応じてその
領域がカテゴリーαに属することの確信度▲ｇ^α _ｉ▼
（d_i）［０≦▲ｇ^α _ｉ▼（d_i）≦100,i＝1,…,n/α＝1,
…,mとする］がステップ８で割り当てられる。Character density (Table 9) Average linear density (Table 10) Elongation REDUCE (label_poly, elongatedness) (Table 11) For categories, in this example, the text, title, table,
Although the figure, the ruled line / frame, and the other six types are considered, it is also possible to consider a gray image such as a photograph. In terms of a symbol, for a feature amount i, the degree of certainty that the region belongs to the category α according to the value d _i of the feature amount i ▲ g ^α _i ▼
(D _i ) [0 ≦ ▲ g ^α _i ▼ (d _i ) ≦ 100, i = 1,..., N / α = 1,
, M] is assigned in step 8.

而して、以上に説明した確信度の割当ては、１種類に
固定されているので、入力文書が別のものに切りかわっ
たときに、それに対応して適切な確信度をわり当ててや
る必要がある。本発明は、オペレータとの対話により、
識別結果の適否をシステムが認識し、それに応じて、確
信度のわり当て（関数系▲ｇ^α _ｉ▼（d_i））を変化させ
ていこうとするものである。Since the certainty assignment described above is fixed to one type, when the input document is switched to another one, it is necessary to assign an appropriate certainty accordingly. There is. The present invention, through the interaction with the operator,
The system recognizes whether the identification result is appropriate or not, and attempts to change the assignment of the certainty factor (function system ｇg ^α _i ▼ (d _i )) accordingly.

以下に、その具体的方法について説明する。特徴量ｉ
が値d_iをとったときに、その領域がカテゴリーαに属す
ることの確からしさ（確信度）を階段関数▲ｇ^α _ｉ▼
（d_i）で表わす。この例を表２に示す。表２中のサイズ
とは黒連結成分に外接する最小矩形の対角線の長さであ
る。本発明では、正解のカテゴリーがα_０でd_iの属する
区間がDj（区間の総数はｌ個とする）であったときに、
▲ｇ^αｏ _ｉ▼（d_i）を区間Djでは一定値なので▲ｇ^αｏ
_ｉ▼（Dj）と表わす］、▲ｇ^αｏ _ｉ▼（Dj）が｛▲ｇ
^αｏ _ｉ▼（Dj）｝（ｊ＝1,…,l）のうちで最大値をとっ
ていれば、確信度のわり当ては正当と判定し、関数系
｛▲ｇ^α _ｉ▼（d_i）｝は変化させない。しかし、▲ｇ
^αｏ _ｉ▼（Dk）（ｋ≠ｊ）が最大値であれば▲ｇ^αｏ _ｉ
▼（Dk）をｈ（例えばｈ＝５（％））だけ減らし、▲ｇ
^αｏ _ｉ▼（Dj）をｈだけ増やす。ただし、０≦▲ｇ^αｏ
_ｉ▼（Ds）（ｓ＝1,…,l）≦100と制限をつけておく。
このように変化させれば、正確カテゴリーの確信度分布
｛▲ｇ^αｏ _ｉ▼（Ds）｝（ｓ＝1,…,l）は実際の特徴量
ｉの値がd_i∈Djで最大値をとるような方向に変化してい
く。The specific method will be described below. Feature i
Takes a value d _i , the likelihood (confidence) that the region belongs to the category α is determined by the step function ▲ g ^α _i ▼
(D _i ). This example is shown in Table 2. The size in Table 2 is the length of the diagonal line of the minimum rectangle circumscribing the black connected component. When the present invention, the correct answer categories is section belongs d _i at alpha ₀ which was Dj (the total number of intervals and the l),
Since ▲ g ^αo _i ▼ (d _i ) is a constant value in the section Dj, ▲ g ^αo
_i ▼ (Dj)], ▲ g ^αo _i ▼ (Dj) becomes △ ▲ g
If the maximum value is obtained among ^αo _i ▼ (Dj)｝ (j = 1,..., l), the assignment of the certainty factor is determined to be valid, and the functional system ｛▲ g ^α _i ▼ (d _i ) ｝ Does not change. However, ▲ g
^{If αo} _i ▼ (Dk) (k ≠ j) is the maximum value, then ▲ g ^αo _i
▼ (Dk) is reduced by h (for example, h = 5 (%)), and ▲ g
^αo _i ▼ (Dj) is increased by h. However, 0 ≦ ▲ g ^αo
_i ▼ (Ds) (s = 1,..., l) ≦ 100.
With such a change, the certainty factor certainty distribution {｛g ^αo _i ▼ (Ds)} (s = 1,..., L) indicates that the actual value of the feature i is d _i ∈Dj and the maximum value is It changes in the direction you take.

具体例を表２で説明すると、特徴量ｉをサイズの平均
値、正解カテゴリーα_０が本文であり、特徴量d_i＝30で
あったとする。このとき、｛▲ｇ^αｏ _ｉ▼（Ds）｝＝
｛10,95,10,0｝である。d_i（＝30）を含む区間Djは（2
6,41）であるが、▲ｇ^αｏ _ｉ▼（Dj）＝10は｛▲ｇ^αｏ
_ｉ▼（Ds）｝の中で最大値ではないので、 ▲ｇ^αｏ _ｉ▼（Dj）→▲ｇ^αｏ _ｉ▼（Dj）＋ｂ（ｈ＝
５）＝10＋５＝15（％），最大値は、となる。なお、以上の説明では、▲ｇ^αｏ _ｉ▼（d_i）が
｛▲ｇ^αｏ _ｉ▼（Ds）｝（ｓ＝1,…,l）の中で最大値を
とるときには、関数系｛▲ｇ^αｏ _ｉ▼（d_i）｝を変化さ
せないとしたが、頻度に応じて▲ｇ^αｏ _ｉ▼（d_i）を大
きくし、他の▲ｇ^αｏ _ｉ▼（Ds）（ｓ≠ｊ）の値を小さ
くすることも有効である。To explain a specific example with reference to Table 2, it is assumed that the feature value i is the average value of the size, the correct answer category α ₀ is the text, and the feature value d _i = 30. At this time, {▲ g ^αo _i ▼ (Ds)} =
{10,95,10,0}. The interval Dj including d _i (= 30) is (2
It is a ^{_{6,41), ▲ g αo i ▼}} (Dj) = 10 is {▲ g ^.alpha.o
_i ▼ (Ds)｝ is not the maximum value, so that ▲ g ^αo _i ▼ (Dj) → ▲ g ^αo _i ▼ (Dj) + b (h =
5) = 10 + 5 = 15 (%), The maximum value is Becomes In the above ^{_{description, ▲ g αo i ▼ (d}} i) is ^{_{{▲ g αo i ▼ (Ds}} )} (s = 1, ..., l) when a maximum value in the function system {▲ g ^.alpha.o _i ▼ was with (d _i)} does not change the, in accordance with the frequency ^{_{_{▲ g αo i ▼ (d i}}} ) and the large, other ▲ g ^.alpha.o _i ▼ value of (Ds) (s ≠ j) Making it smaller is also effective.

上述のごとくして得た確信度をｎ個全体の特徴量につ
いて合成することにより、その領域がカテゴリーαに属
することの確からしさを見積もることができる（ステッ
プ９）。この合成方法については、次の２つの方式があ
る。By combining the certainty factors obtained as described above with respect to all n feature amounts, it is possible to estimate the certainty that the region belongs to the category α (step 9). There are the following two methods for this combining method.

［方式１］重みつき平均による計算方法・対象領域がカテゴリーαに属することの確信度（０〜
100）：０≦a_i（α）≦１０≦g_i（d_i,α）≦100 i;特徴量の種類 d_i;特徴量の値 n;特徴量の個数 g_i（d_i,α）；特徴量ｉの値がd_iであるときにカテゴリ
ーαに属する確信度ただし、g_i,_j（d_i,d_j,α）のような相互作用の項も将来
は考える必要がある。[Method 1] Calculation method using weighted average-Confidence (0 to 0) that target region belongs to category α
100): 0 ≦ a _i (α) ≦ 1 0 ≦ g _i (d _i , α) ≦ 100 i; feature quantity of type d _i; characteristic amount value n; the number of feature quantity _{_{g i (d i, α)}} ; confidence belonging to the category alpha when the value of the feature amount i is d _i, however, g _i , _j (d _i , d _j , α) also need to be considered in the future.

このようにして合成された確信度のうちで最大値をと
るカテゴリーが、領域識別の最終結果となるわけである
が、どの程度その判定が確定的なものかを表す尺度とし
て、次の“確定度”という量を定義する。The category that takes the maximum value of the certainty factors synthesized in this way is the final result of the area identification. Degree ”is defined.

・領域識別の確定度（０〜100）：第６図において、Ｈは領域識別の確定度を表わしてお
り、この確定度Ｈは、より求める。ただし、 m;領域のカテゴリー数］［方式２］ DempsterとShaferの基本確率の結合則による計算方法。Determining degree of area identification (0 to 100): In FIG. 6, H indicates the degree of determining area identification. Find more. However, m; number of categories in the area] [Method 2] A calculation method based on the combination rule of the basic probabilities of Dempster and Shafer.

この方式では、最終判定結果は、上界確率と下界確率
の値により次の３つ場合に分類される：［判定結果の分類］ｉ）領域は単一のカテゴリーからなり、そのカテゴリー
が特定できる。In this method, the final decision result is classified into the following three cases according to the values of the upper and lower bound probabilities: [Classification of decision result] i) The region is composed of a single category, and the category can be specified. .

ii）領域は複数のカテゴリーからなり、その候補カテゴ
リーが特定できる。ii) The region is composed of a plurality of categories, and the candidate categories can be specified.

iii）領域を識別できなかったが、ある程度の信頼性の
ある候補カテゴリーを提示できる。iii) Although the region could not be identified, the candidate category with a certain degree of reliability can be presented.

上に述べた３つの場合分けは、次の３つの判定ルール
にそれぞれ基づいている。The above three cases are respectively based on the following three determination rules.

［判定ルール］領域カテゴリーαの上界確率（％）をP1（α）、下界
確率をCr（α）とし、適当なしきい値γ_１、γ_２［ただ
し、０＜γ_２＜50＜γ_１＜100（％）］を設定する。[Determination Rule] The upper limit probability (%) of the region category α is P1 (α), the lower limit probability is Cr (α), and appropriate threshold values γ ₁ and γ ₂ [where 0 <γ ₂ <50 <γ ₁ <100 (%)] is set.

ｉ）P1（α）＞γ₁ and Cr（α）＞50（第７図（ａ）参
照）なるカテゴリーαが存在すれば、対象領域は単一カテゴ
リーαからなる。i) If there is a category α such that P1 (α)> γ ₁ and Cr (α)> 50 (see FIG. 7 (a)), the target region consists of a single category α.

ならば、P1（β）＞γ_２を満たすカテゴリー集合｛β｝
全部が混在している。 If, P1 (β)> γ ₂ categories set satisfying {beta}
Everything is mixed.

iii）P1（α）＞γ_１で、［α;P1のmaxを与える］ Cr（α）≦50（第７図（ｃ）参照）ならば、P1（β）＞γ_２を満たすカテゴリー集合｛β｝
全部を有力候補カテゴリーとする。(in _{α)> γ 1, [α} ; give max of P1] Cr (α) ≦ 50 ( FIG. 7 (c) iii) P1 reference), then category set that satisfies P1 (β)> γ ₂ { β｝
All are considered as the leading candidate categories.

以下、これらのルールについてそれぞれ説明を加えて
おく。Hereinafter, each of these rules will be described.

［判定ルールの説明］ｉ）Cr（α）＞50をみたすカテゴリーαが存在すれば、
β≠αなる任意のカテゴリーβに対して、が成立するので、そのようなカテゴリーαは一意に定ま
り、CrとP1はともにカテゴリーαで（真）の最大値をと
る。なお、（特に後者は、）対象領域がカテゴリーαに属している
ことを肯定する証拠の方が、否定する証拠よりも有力で
あることを意味している。[Explanation of Judgment Rules] i) If there is a category α that satisfies Cr (α)> 50,
For any category β where β ≠ α, Holds, such a category α is uniquely determined, and both Cr and P1 take the maximum value of (true) in the category α. In addition, (Especially the latter) means that evidence that affirms that the target area belongs to the category α is more powerful than evidence that denies it.

P1（α）＞γ_１なる基準は、ii）の場合を除外するた
めに設けたものであるが、これも同時に成立するときはとなるから、対象領域がカテゴリーαに属していること
を肯定する証拠の強さとそれを否定する証拠の強さの間
にはγ_１−50よりも大きな差があることになる。P1 (α)> γ ₁ becomes a reference when it is provided in order to exclude the case of ii), which is also to be satisfied at the same time Therefore, there is a difference greater than γ ₁ -50 between the strength of the evidence that affirms that the target area belongs to the category α and the strength of the evidence that denies it.

より、P1がある程度（γ_１）以上のカテゴリーが存在し
ないということは、他のカテゴリーを支持する証拠も少
なくはないということを意味している。これは、いくつ
かのカテゴリーに属する画像が混在しており、それぞれ
の特徴が抽出されたためであると考えられる。 Thus, the absence of a category with a certain degree of P1 (γ ₁ ) means that there is not a small amount of evidence supporting other categories. It is considered that this is because images belonging to several categories are mixed, and the characteristics of each are extracted.

iii）この前提条件が成り立っている状態は、情報の不
足を表しており、カテゴリーαに属することを支持する
証拠が十分増えれば、ｉ）の状態に移行するものと考え
られる。また、P1の情報だけからカテゴリーを特定する
のは危険であるという考えから、やや緩めの基準［γ_２
＝10（％）］を用いる。なお、もう一つのしきい値は、
γ_１＝70（％）と設定してある。iii) The state in which the preconditions are satisfied indicates a lack of information, and it is considered that the state moves to the state i) if the evidence supporting the category α increases sufficiently. Also, considering that it is dangerous to specify a category only from the information of P1, a slightly looser standard [γ ₂
= 10 (%)]. Another threshold is
γ ₁ = 70 (%) is set.

上記２つの方式により各領域がどのカテゴリーに属す
るかが判定されるが、判定できない場合（方式１で確信
度がある値より低い場合／方式２でケースiii）の場
合）か、方式２で２つ以上のカテゴリーが混在している
と判定された領域に対しては重みつけかえや一つ下の分
割層を使うなどの詳細解析をほどこす（11）。さらに、
12では小領域を隣接する大領域にあるルールに従って結
合する。例えば、“図領域に囲まれている小面積の文字
領域は図中文字として見なし図領域に結合する”などで
ある。Which category each region belongs to is determined by the above two methods, but it cannot be determined (case of certainty lower than a certain value in method 1 / case of case iii) in method 2 or 2 in method 2 A detailed analysis such as re-weighting or using the next lower layer is performed on the area determined to have two or more categories mixed (11). further,
At 12, small areas are combined in accordance with rules in adjacent large areas. For example, "a character area having a small area surrounded by a figure area is regarded as a character in the figure and combined with the figure area".

効果以上の説明から明らかなように、本発明によると、入
力文書ごとに確信度を与える関数形を変える必要がな
い。自動的によりもっともらしいわり当て方に変化させ
ていくために、多種類の文書画像を取り扱うことが可能
である等の利点がある。Effects As is clear from the above description, according to the present invention, it is not necessary to change the function form that gives certainty for each input document. There are advantages such as being able to handle many types of document images because it is automatically changed to a more plausible way of contact.

[Brief description of the drawings]

第１図は、本発明の一実施例を説明するための全体フロ
ー図、第２図は、イメージピラミッドの一例を説明する
ための図、第３図は、変換ルールの一例を説明するため
の図、第４図（ａ）〜（ｄ）は、本発明の動作原理を説
明するための図、第５図は、横書きの場合のコラムあり
の場合の例を説明するための図、第６図は、領域識別の
確定度を説明するための図、第７図は、判定ルールの例
を説明するための図である。FIG. 1 is an overall flow chart for explaining an embodiment of the present invention, FIG. 2 is a chart for explaining an example of an image pyramid, and FIG. 3 is a chart for explaining an example of a conversion rule. FIGS. 4 (a) to 4 (d) are diagrams for explaining the operation principle of the present invention, FIG. 5 is a diagram for explaining an example with columns in the case of horizontal writing, and FIG. FIG. 7 is a diagram for explaining the degree of determination of area identification, and FIG. 7 is a diagram for explaining an example of a determination rule.

Claims

(57) [Claims]

An input image is divided into regions using an image pyramid, a plurality of feature amounts for region identification are extracted from each divided layer, and a certainty factor assigned to each feature amount is synthesized. Thus, in the method of segmenting and identifying a document image that identifies the category of the area, the value of a function that assigns certainty to each feature based on the identification result of the input image so as to obtain a more reliable identification result And a method for segmenting and identifying a document image.