JPH0337785B2

JPH0337785B2 -

Info

Publication number: JPH0337785B2
Application number: JP57153992A
Authority: JP
Inventors: Osamu Asada
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1982-09-06
Filing date: 1982-09-06
Publication date: 1991-06-06
Also published as: JPS5943468A

Description

[Detailed description of the invention]

（技術分野）本発明は、未知入力文書画像を対象とした文書
画像処理装置に関するものであり、文字、図形、
写真あるいはタイトル、本文などの簡単な構造情
報を抽出し、人間との対話形式による処理を円滑
にする、また、自動抽出を可能にする装置であ
る。（背景技術）従来の画像処理装置を第１図に示す。１−１は
濃度値を１画素４ビツトで量子化する画像入力装
置、１−２は入力画像を格納する画像メモリであ
る。１−２に格納された画像を（Ｎ×Ｍ）画素を
１要素（ここではＮ＝Ｍ＝12）として各要素に要
素内の平均濃度値（１−３）を割り当て、大まか
な濃度情報だけからなる第１の抽象画像を画像メ
モリ１（１−４）に格納する。次に、（１−４）
に格納された画像をもとに要素間の平均濃度の差
を求める（１−５）。ここでは、現要素の平均濃
度をD_ij、４近傍の平均濃度値をD_i,j-1、D_i-1,j、
D_i,j+1、D_i+1,jとした時（第２図参照）、１／２｜D_i,j−D_i-1,j｜＋１／２｜D_i,j−D_i,j-1｜の値を適当な値（フルスケールの1/4即ち4N²）
で正規化したものか、 max^l,m=1,+1 （D_i,j−D_i-l,j-n）の値を適当な値（3/2N²）で正規化し主要輪郭強
度の初期値を確立P⁽⁰⁾ _ijとして表わし、第２の抽象
画像として画像メモリ２（１−６）に格納する。
ただし１以上の値をとつた時１とする。第２の抽象画像をもとにして、主要輪郭要素を
弛緩法を用いて抽出する（１−７）。第２の抽象
画像の各（ｉ、ｊ）要素の値より、初期画像空間
を P⁽⁰⁾＝｛P⁽⁰⁾ _i,j｜１≦ｉ≦ｎ、１≦ｍ；ｉ、ｊ整数｝とし、次式による繰り返し演算を行う。即ち、 P^(k+1) _ij＝P_ij ^(k)q_ij ^(k)／P_i,j ^(k)q_ij ^(k)＋（１−P_ij ^(k)（１−q_ij ^(k)）q^(k) _ij＝P^(k) _ij−（１−Δ^(k) _ij） Δ^(k) _ij＝max｛１／２｜P^(k) _i-1,j＋P^(k) _i+1,j−P^(k) _i,
j-1−P^(k) _i,j+1｜、１／２｜P^(k) _i-1,j-1＋P^(k) _i+1,j+1−P^(k) _i-1,j+1−P^(k) _i+1,j-1｜｝である。ここで物理量Δ^(k) _ijは第２の画像の各要素
の連結性に関するもので、各画像要素の値P^(k) _ijは
自身の輪郭強度P^(k) _ijと連結性Δ^(k) _ijによつて更新さ
れることになる。この繰り返し演算の打ち切りは、値が0.9以上
あるいは0.1以下の要素の数が全要素数の0.95以
上に達したところで行う。次に二値化して、あら
ためて画像メモリ２（１−６）に格納する。第２
の画像は、主要な輪郭要素から構成される画像で
ある。次に第２の画像をもとに、領域分割を行う（１
−８）。主要輪郭要素を８連結で空間的に統合し、
外接長方形で区分する。その代表点を画像テーブ
ル（１−９）に格納する。画像テーブル（１−９）を参照して、各小領域
のラベル付けを行なう（１−１０）。ラベル付け
は次のようにして行なう。各小領域内に相当する
第１の画像、第２の画像をもとに２つの物理量を
計算する。即ち (1) 主要輪郭要素を境として両側の平均濃度の差
を求め、領域内の変化値の平均値（ΔD） (2) 領域内を占める主要輪郭要素数の割合（D_N）
である。ΔD対D_N平面でΔD≧0.7の時“写真”、
ΔD＜0.7且つD_N＜0.4の時“グラフ”、またΔD
＜0.7且つD_N＞0.4の時“文字”の各ラベルを付
け、結果として第３図に示す画像テーブル（１
−９）を得る。従つて従来の技術では、“文字”と判定された
領域から文字列および各文字列の整理した記述が
なされていない欠点があつた。（発明の課題）本発明はこれらの欠点を除去し、文字列の抽出
および文字列の整理により見出し、本文の抽出を
行なうことを可能にしたものである。本発明は、
抽象化した第１の画像“文字”領域に直交変換を
施し、大まかなピツチを算出し、それに応じたマ
スクを用意し論理演算によつて文字列を抽出する
ことと、抽出した文字列の大きさ、位置、類似ピ
ツチの行数をもとにタイトルと本文を抽出するこ
とを特徴とする画像処理装置にある。（発明の構成および作用）第４図は本発明の実施例であつて、４−１〜４
−１１は従来の画像処理装置（第１図）の１−１
〜１−１１と全く同等のものである。４−１２は
画像テーブルを参照して画像メモリ（４−２）の
“文字”ラベルの領域に直交変換を施し、空間周
波数を求める。ここでは、直交変換として離散的
フーリエ変換を採用し、二次元の直交変換を次の
ように近似して求める。p_ijを画像メモリ（４−
２）の各要素の値とし、ｘ方向、ｙ方向の空間周
波数スペクトルを (Technical Field) The present invention relates to a document image processing device that targets unknown input document images, including characters, figures,
This is a device that extracts simple structural information such as photos, titles, and text, facilitates interactive processing with humans, and also enables automatic extraction. (Background Art) A conventional image processing device is shown in FIG. 1-1 is an image input device that quantizes density values by 4 bits per pixel, and 1-2 is an image memory that stores an input image. The image stored in 1-2 is assumed to have (N x M) pixels as one element (in this case, N=M=12), and the average density value (1-3) within the element is assigned to each element, and only rough density information is obtained. A first abstract image consisting of is stored in the image memory 1 (1-4). Next, (1-4)
The difference in average density between elements is determined based on the images stored in (1-5). Here, the average concentration of the current element is D _ij , the average concentration values of the four neighbors are D _i,j-1 , D _i-1,j ,
When D _i,j+1 and D _i+1,j (see Figure 2), 1/2 |D _i,j −D _i-1,j |+1/2|D _i,j −D _{i ,j-1} ｜ to an appropriate value (1/4 of full scale, i.e. 4N ² )
Or normalize the value of max ^l,m=1,+1 (D _i,j −D _il,jn ) with an appropriate value (3/2N ² ) to establish the initial value of the main contour strength. It is expressed as P ⁽⁰⁾ _ij and stored in the image memory 2 (1-6) as a second abstract image.
However, when it takes a value of 1 or more, it is treated as 1. Based on the second abstract image, main contour elements are extracted using the relaxation method (1-7). From the values of each (i, j) element of the second abstract image, the initial image space is defined as P ⁽⁰⁾ = {P ⁽⁰⁾ _i,j | 1≦i≦n, 1≦m; i, j integer} Then, perform the iterative calculation using the following formula. That is, P ^(k+1) _ij =P _ij ^(k) q _ij ^(k) /P _i,j ^(k) q _ij ^(k) + (1−P _ij ^(k) (1−q _ij ^(k) )q ^(k) _ij =P ^(k) _ij −(1−Δ ^(k) _ij ) Δ ^(k) _ij =max{1/2|P ^(k) _{i -1,j} +P ^(k) _i+1,j −P ^(k) _{i,
j-1} −P ^(k) _i,j+1 ｜, 1/2｜P ^(k) _i-1,j-1 ＋P ^(k) _i+1,j+1 −P ^(k) _i-1,j+1 −P ^(k) _i+1,j-1 |}. Here, the physical quantity Δ ^(k) _ij is related to the connectivity of each element of the second image, and the value P ^(k) _ij of each image element is its own contour strength P ^(k) _ij and connectivity Δ ^(k) It will be updated by _ij . This repeated operation is terminated when the number of elements with values greater than or equal to 0.9 or less than or equal to 0.1 reaches 0.95 or more of the total number of elements. Next, it is binarized and stored again in the image memory 2 (1-6). Second
The image is an image composed of main outline elements. Next, area segmentation is performed based on the second image (1
-8). The main contour elements are spatially integrated by 8 connections,
Divide by circumscribed rectangle. The representative points are stored in the image table (1-9). Each small area is labeled (1-10) with reference to the image table (1-9). Labeling is done as follows. Two physical quantities are calculated based on the first image and second image corresponding to each small area. That is, (1) find the difference in average density on both sides of the main contour element as a boundary, and calculate the average value of change values within the region (ΔD); and (2) the ratio of the number of main contour elements occupying the region (D _N ).
It is. ΔD vs. D When ΔD≧0.7 on _{the N} plane, “photo”;
“Graph” when ΔD<0.7 and D _N <0.4, and ΔD
<0.7 and D _N >0.4, each label of "character" is attached, and as a result, the image table shown in Figure 3 (1
-9) is obtained. Therefore, the conventional technology has the disadvantage that character strings and organized descriptions of each character string are not made from the area determined to be "character". (Problems to be solved by the invention) The present invention eliminates these drawbacks and makes it possible to perform search and text extraction by extracting character strings and organizing character strings. The present invention
Perform orthogonal transformation on the abstracted first image "character" area, calculate the rough pitch, prepare a corresponding mask, extract the character string by logical operation, and calculate the size of the extracted character string. An image processing apparatus is characterized in that a title and a main text are extracted based on the size, position, and number of lines with similar pitches. (Structure and operation of the invention) FIG. 4 shows an embodiment of the present invention.
-11 is 1-1 of the conventional image processing device (Fig. 1)
~1-11 is completely equivalent. Step 4-12 refers to the image table and performs orthogonal transformation on the region of the "character" label in the image memory (4-2) to determine the spatial frequency. Here, a discrete Fourier transform is employed as the orthogonal transform, and the two-dimensional orthogonal transform is approximated as follows. p _ij to image memory (4-
2), and the spatial frequency spectrum in the x and y directions is

【式】によつて求める（第５図）。ここでL_x、L_yは各
ｘ、ｙ方向の領域の空間的大きさである。P_k，P_l
によりスペクトルのピークを求める。スペクトル
のピークは直流成分を除くスペクトル成分で、あ
る閾値（直流成分の大きさの10％）以上で、また
複数ピークが存在する場合には最も低周波側のス
ペクトルのピークを採用し、空間周波数（k_x、
k_y）を求める。もし相当するものがない場合に
は、k_xork_yを１とする。次に（k_x、k_y）（L_x、L_y）を参照して文字列抽
出（４−１３）を行う。先ず、縦書きか横書きか
の判定を行う。 L_y／k_y＜L_x／k_xのとき縦書きとし、L_y／k_y＞L_x／k_xの
とき横書きとする。次に縦書きのときには（L_y／k_y画素 ×L_y／8k_y画素）、横書きのときには（L_x／8k_x画素× L_x／k_x画素）のマスクを用意し、論理和（閾値はフルスケールの半分とする）演算を画像メモリ（４
−２）に施し、文字列が抽出する。各文字列はそ
の外接長方形によつて領域が記述され、２点（第
６図）を画像テーブル（４−９）に格納する。各
文字列は縦書きの場合にはj_naxの大きいものから
番号付けし、横書きの場合にはi_nioの小さいもの
から番号付けを行ない、行番号で整理する。但
し、任意の二つの文字列のアドレス（i_1nio、
i_1nio）（i_1nax、i_1nax）と（i_2nio、j_2nio）（i_2nax、
j_2nax）が与えられたとき、横書きのときi_1nio＜
（i_2nioori_2nax）＜i_1nax、また縦書きのときj_1nio＜
（j_2nioorj_2nax）＜j_1naxのとき同一行番号を与え、同
一行番号を付けられたものは新たに書式に従つて
iorj方向で順位付けを行なう。このようにして第
６図に示すような画像テーブルを得る。次に、各文字列から文字の大きさ（L_c）、行ピ
ツチ（L_p）を各文字領域から平均値として推定
する。即ちである。そこで、i_onax、i_onio、j_onax、j_onioは各文
字列の代表アドレス、Ｎはその文字領域で抽出さ
れた文字列の数である。また、（I_1nio、J_1nio）、
（I_1nax、J_1nax）は文字領域の代表アドレス、N_Lは
その文字領域に含まれる行数である。 L_c、L_p、N_Lと座標値をもとにして、本文及び
見出し及び本文の流れを識別する。先ず、L_c、
L_pにより分類する。L_c、L_pの±10％以内のもの
を一つにまとめる。分類後、行数の最も多いもの
を本文とし、それと同一の書式のものを一つにま
とめ他を別のグループに置く。同一書式の者を書
式に従つて順位付ける。横書きの場合には、二つ
の文字領域（I_1nio、J_1nio）（I_1nax、J_1nax）と
（I_2nio、J_2nio）（I_2nax、J_2nax）を比べ、もしJ_1nax
＜J_2nioあるいはI_1nax＜I_2nioのときには前者を先の
順位とする。そうでない場合には逆の順位とす
る。縦書きの場合には、同様にJ_1nio＞J_2naxある
いはI_1nax＜I_2nioのときには前者を先の順位とし、
そうでない場合には逆の順位とする。結果として
“本文”とラベル付けされたものと同じ書式のも
のを順番通り並べる。“本文”とラベル付けされ
たもののL_c以上の値をもつ文字列を捜す。あれ
ば、その順位を調べる。その文字領域が“本文”
とラベル付けされた文字領域の間、あるいは下位
に順位として位置する場合には“サブ・タイト
ル”とラベル付けを行なう。 “本文”とラベル付けされた文字領域より上位
の文字領域があれば、その中で最も上位のものの
L_cの値が“本文”あるいは“サブ・タイトル”の
L_cの値の1.1倍より大きい時には、“タイトル”と
する。また、そのL_cの値の±10％以内で等しいも
のを上位から捜し、“タイトル”とラベル付けを
行なう。そうでないものが見つかれば、それ以下
を“サブ・タイトル”とする。また書式によらず、本文のL_cの値の1.5倍以上
のものがあれば、“タイトル”のラベルを与える。（発明の効果）本発明は文字列の抽出及び各文字領域の順位付
け、文字列の性質から本文タイトル、サブタイト
ルのラベルが付いているため、フリーフオーマツ
ト文書の自動フアイル及び自動編集が可能とな
る。また、人間との対話手段により、柔軟な文書
画像編集装置が実現することができる。Calculate by [Formula] (Figure 5). Here, L _x and L _y are the spatial sizes of the regions in each x and y direction. P _k , P _l
Find the peak of the spectrum. The peak of the spectrum is the spectrum component excluding the DC component, and if it is above a certain threshold (10% of the magnitude of the DC component) or if there are multiple peaks, the peak of the spectrum on the lowest frequency side is adopted, and the spatial frequency (k _x ,
Find k _y ). If there is no equivalent, set k _x ork _y to 1. Next, character string extraction (4-13) is performed with reference to (k _x , k _y ) (L _x , L _y ). First, it is determined whether the writing is vertical or horizontal. When L _y /k _y <L _x /k _x , it is written vertically, and when L _y /k _y >L _x /k _x , it is written horizontally. Next, prepare a mask for vertical writing (L _y /k _y pixels x L _y /8k _y pixels) and for horizontal writing (L _x /8k _x pixels x L _x /k _x pixels), and use the logical sum (threshold value). is half of the full scale) calculation is performed in the image memory (4
-2) and extract the character string. The area of each character string is described by its circumscribed rectangle, and two points (FIG. 6) are stored in the image table (4-9). Each character string is numbered from the largest j _nax when written vertically, and from the smallest i _nio when written horizontally, and organized by line number. However, addresses of any two character strings (i _1nio ,
i _1nio ) (i _1nax , i _1nax ) and (i _2nio , j _2nio ) (i _2nax ,
j _2nax ) is given, in horizontal writing i _1nio <
(i _2nio ori _2nax ) <i _1nax , and in vertical writing j _1nio <
(j _2nio orj _2nax ) < j _1nax , give the same line number, and those with the same line number will be rewritten according to the new format.
Ranking is performed in the iorj direction. In this way, an image table as shown in FIG. 6 is obtained. Next, the character size (L _c ) and line pitch (L _p ) of each character string are estimated as average values from each character area. That is, It is. Therefore, i _onax , i _onio , j _onax , and j _onio are the representative addresses of each character string, and N is the number of character strings extracted in that character area. Also, (I _1nio , J _1nio ),
(I _1nax , J _1nax ) is the representative address of the character area, and N _L is the number of lines included in the character area. Based on L _c , L _p , N _L and coordinate values, the main text, headings, and flow of the main text are identified. First, L _c ,
Classified by L _p . Combine L _c and L _p within ±10%. After classification, the text with the largest number of lines is used as the main text, those with the same format are grouped together, and the others are placed in another group. Rank those with the same format according to the format. In the case of horizontal writing, compare two character areas (I _1nio , J _1nio ) (I _1nax , J _1nax ) and (I _2nio , J _2nio ) (I _2nax , J _2nax ), and if J _1nax
<J _2nio or I _1nax <I _2nio , the former is ranked first. If not, the order will be reversed. In the case of vertical writing, similarly, when J _1nio > J _2nax or I _1nax < I _2nio , the former is ranked first,
If not, the order will be reversed. As a result, items with the same format as those labeled “body” are arranged in order. Find a string labeled “body” that has a value greater than or equal to L _c . If so, check its ranking. That character area is the “body”
If the title is located between or below the character area labeled as "subtitle", it is labeled as "subtitle". If there is a character area higher than the character area labeled “body”, the highest
If the value of L _c is “body” or “subtitle”
When the value is greater than 1.1 times the value of L _c , it is considered a “title”. Furthermore, a search is made for a value that is equal to the L _c value within ±10% from the top, and is labeled as a "title." If you find something that is not, use what follows as the "subtitle." Regardless of the format, if the value of L _c is 1.5 times or more than the main text, a "title" label is given. (Effects of the Invention) The present invention extracts character strings, ranks each character area, and labels the text title and subtitle based on the nature of the character strings, making it possible to automatically file and edit free-format documents. Become. Furthermore, a flexible document image editing device can be realized by means of human interaction.

[Brief explanation of the drawing]

第１図は従来の画像処理装置のブロツク図、第
２図と第３図は第１図の装置の動作説明図、第４
図は本発明による画像処理装置のブロツク図、第
５図と第６図は第４図の装置の動作説明図であ
る。４−１２；直交変換回路、４−１３；文字列抽
出回路。 FIG. 1 is a block diagram of a conventional image processing device, FIGS. 2 and 3 are explanatory diagrams of the operation of the device in FIG.
This figure is a block diagram of an image processing apparatus according to the present invention, and FIGS. 5 and 6 are explanatory diagrams of the operation of the apparatus shown in FIG. 4. 4-12; Orthogonal transformation circuit; 4-13; Character string extraction circuit.

Claims

[Scope of Claims] 1. means for converting the density of each pixel of the original image into an electrical signal and storing it; means for storing the average density of an element composed of a plurality of pixels as a first image;
The image is divided into regions, and each divided region is divided into two parameters: the proportion of the main contour elements (D _N ) and the average density change value on both sides of the main contour elements (ΔD). Therefore, the photographic field,
In an image processing apparatus having means for classifying into graphic areas and character areas, a first image processing apparatus configured with an average density for each character area
means for obtaining a spatial frequency (k) of the image by performing orthogonal transformation on the image; means for providing a mask of a size corresponding to the spatial frequency; and means for extracting a character string by a logical operation on the mask and the image; An image processing apparatus comprising means for determining vertical writing or horizontal writing based on spatial frequency and providing a label to the line number of each character string.