JPH0337785B2 - - Google Patents

Info

Publication number
JPH0337785B2
JPH0337785B2 JP57153992A JP15399282A JPH0337785B2 JP H0337785 B2 JPH0337785 B2 JP H0337785B2 JP 57153992 A JP57153992 A JP 57153992A JP 15399282 A JP15399282 A JP 15399282A JP H0337785 B2 JPH0337785 B2 JP H0337785B2
Authority
JP
Japan
Prior art keywords
image
character
spatial frequency
processing apparatus
image processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP57153992A
Other languages
Japanese (ja)
Other versions
JPS5943468A (en
Inventor
Osamu Asada
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP57153992A priority Critical patent/JPS5943468A/en
Publication of JPS5943468A publication Critical patent/JPS5943468A/en
Publication of JPH0337785B2 publication Critical patent/JPH0337785B2/ja
Granted legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0007Image acquisition

Description

【発明の詳細な説明】[Detailed description of the invention]

(技術分野) 本発明は、未知入力文書画像を対象とした文書
画像処理装置に関するものであり、文字、図形、
写真あるいはタイトル、本文などの簡単な構造情
報を抽出し、人間との対話形式による処理を円滑
にする、また、自動抽出を可能にする装置であ
る。 (背景技術) 従来の画像処理装置を第1図に示す。1−1は
濃度値を1画素4ビツトで量子化する画像入力装
置、1−2は入力画像を格納する画像メモリであ
る。1−2に格納された画像を(N×M)画素を
1要素(ここではN=M=12)として各要素に要
素内の平均濃度値(1−3)を割り当て、大まか
な濃度情報だけからなる第1の抽象画像を画像メ
モリ1(1−4)に格納する。次に、(1−4)
に格納された画像をもとに要素間の平均濃度の差
を求める(1−5)。ここでは、現要素の平均濃
度をDij、4近傍の平均濃度値をDi,j-1、Di-1,j
Di,j+1、Di+1,jとした時(第2図参照)、 1/2|Di,j−Di-1,j|+1/2|Di,j−Di,j-1| の値を適当な値(フルスケールの1/4即ち4N2
で正規化したものか、 maxl,m=1,+1 (Di,j−Di-l,j-n) の値を適当な値(3/2N2)で正規化し主要輪郭強
度の初期値を確立P(0) ijとして表わし、第2の抽象
画像として画像メモリ2(1−6)に格納する。
ただし1以上の値をとつた時1とする。 第2の抽象画像をもとにして、主要輪郭要素を
弛緩法を用いて抽出する(1−7)。第2の抽象
画像の各(i、j)要素の値より、初期画像空間
を P(0)={P(0) i,j|1≦i≦n、 1≦m;i、j整数} とし、次式による繰り返し演算を行う。即ち、 P(k+1) ij=Pij (k)qij (k)/Pi,j (k)qij (k)+(1−Pij
(k)(1−qij (k))q(k) ij=P(k) ij−(1−Δ(k) ij) Δ(k) ij=max{1/2|P(k) i-1,j+P(k) i+1,j−P(k) i,
j-1
−P(k) i,j+1|、1/2|P(k) i-1,j-1+P(k) i+1,j+1
−P(k) i-1,j+1−P(k) i+1,j-1|} である。ここで物理量Δ(k) ijは第2の画像の各要素
の連結性に関するもので、各画像要素の値P(k) ij
自身の輪郭強度P(k) ijと連結性Δ(k) ijによつて更新さ
れることになる。 この繰り返し演算の打ち切りは、値が0.9以上
あるいは0.1以下の要素の数が全要素数の0.95以
上に達したところで行う。次に二値化して、あら
ためて画像メモリ2(1−6)に格納する。第2
の画像は、主要な輪郭要素から構成される画像で
ある。 次に第2の画像をもとに、領域分割を行う(1
−8)。主要輪郭要素を8連結で空間的に統合し、
外接長方形で区分する。その代表点を画像テーブ
ル(1−9)に格納する。 画像テーブル(1−9)を参照して、各小領域
のラベル付けを行なう(1−10)。ラベル付け
は次のようにして行なう。各小領域内に相当する
第1の画像、第2の画像をもとに2つの物理量を
計算する。即ち (1) 主要輪郭要素を境として両側の平均濃度の差
を求め、領域内の変化値の平均値(ΔD) (2) 領域内を占める主要輪郭要素数の割合(DN
である。ΔD対DN平面でΔD≧0.7の時“写真”、
ΔD<0.7且つDN<0.4の時“グラフ”、またΔD
<0.7且つDN>0.4の時“文字”の各ラベルを付
け、結果として第3図に示す画像テーブル(1
−9)を得る。 従つて従来の技術では、“文字”と判定された
領域から文字列および各文字列の整理した記述が
なされていない欠点があつた。 (発明の課題) 本発明はこれらの欠点を除去し、文字列の抽出
および文字列の整理により見出し、本文の抽出を
行なうことを可能にしたものである。本発明は、
抽象化した第1の画像“文字”領域に直交変換を
施し、大まかなピツチを算出し、それに応じたマ
スクを用意し論理演算によつて文字列を抽出する
ことと、抽出した文字列の大きさ、位置、類似ピ
ツチの行数をもとにタイトルと本文を抽出するこ
とを特徴とする画像処理装置にある。 (発明の構成および作用) 第4図は本発明の実施例であつて、4−1〜4
−11は従来の画像処理装置(第1図)の1−1
〜1−11と全く同等のものである。4−12は
画像テーブルを参照して画像メモリ(4−2)の
“文字”ラベルの領域に直交変換を施し、空間周
波数を求める。ここでは、直交変換として離散的
フーリエ変換を採用し、二次元の直交変換を次の
ように近似して求める。pijを画像メモリ(4−
2)の各要素の値とし、x方向、y方向の空間周
波数スペクトルを
(Technical Field) The present invention relates to a document image processing device that targets unknown input document images, including characters, figures,
This is a device that extracts simple structural information such as photos, titles, and text, facilitates interactive processing with humans, and also enables automatic extraction. (Background Art) A conventional image processing device is shown in FIG. 1-1 is an image input device that quantizes density values by 4 bits per pixel, and 1-2 is an image memory that stores an input image. The image stored in 1-2 is assumed to have (N x M) pixels as one element (in this case, N=M=12), and the average density value (1-3) within the element is assigned to each element, and only rough density information is obtained. A first abstract image consisting of is stored in the image memory 1 (1-4). Next, (1-4)
The difference in average density between elements is determined based on the images stored in (1-5). Here, the average concentration of the current element is D ij , the average concentration values of the four neighbors are D i,j-1 , D i-1,j ,
When D i,j+1 and D i+1,j (see Figure 2), 1/2 |D i,j −D i-1,j |+1/2|D i,j −D i ,j-1 | to an appropriate value (1/4 of full scale, i.e. 4N 2 )
Or normalize the value of max l,m=1,+1 (D i,j −D il,jn ) with an appropriate value (3/2N 2 ) to establish the initial value of the main contour strength. It is expressed as P (0) ij and stored in the image memory 2 (1-6) as a second abstract image.
However, when it takes a value of 1 or more, it is treated as 1. Based on the second abstract image, main contour elements are extracted using the relaxation method (1-7). From the values of each (i, j) element of the second abstract image, the initial image space is defined as P (0) = {P (0) i,j | 1≦i≦n, 1≦m; i, j integer} Then, perform the iterative calculation using the following formula. That is, P (k+1) ij =P ij (k) q ij (k) /P i,j (k) q ij (k) + (1−P ij
(k) (1−q ij (k) )q (k) ij =P (k) ij −(1−Δ (k) ij ) Δ (k) ij =max{1/2|P (k) i -1,j +P (k) i+1,j −P (k) i,
j-1
−P (k) i,j+1 |, 1/2|P (k) i-1,j-1 +P (k) i+1,j+1
−P (k) i-1,j+1 −P (k) i+1,j-1 |}. Here, the physical quantity Δ (k) ij is related to the connectivity of each element of the second image, and the value P (k) ij of each image element is its own contour strength P (k) ij and connectivity Δ (k) It will be updated by ij . This repeated operation is terminated when the number of elements with values greater than or equal to 0.9 or less than or equal to 0.1 reaches 0.95 or more of the total number of elements. Next, it is binarized and stored again in the image memory 2 (1-6). Second
The image is an image composed of main outline elements. Next, area segmentation is performed based on the second image (1
-8). The main contour elements are spatially integrated by 8 connections,
Divide by circumscribed rectangle. The representative points are stored in the image table (1-9). Each small area is labeled (1-10) with reference to the image table (1-9). Labeling is done as follows. Two physical quantities are calculated based on the first image and second image corresponding to each small area. That is, (1) find the difference in average density on both sides of the main contour element as a boundary, and calculate the average value of change values within the region (ΔD); and (2) the ratio of the number of main contour elements occupying the region (D N ).
It is. ΔD vs. D When ΔD≧0.7 on the N plane, “photo”;
“Graph” when ΔD<0.7 and D N <0.4, and ΔD
<0.7 and D N >0.4, each label of "character" is attached, and as a result, the image table shown in Figure 3 (1
-9) is obtained. Therefore, the conventional technology has the disadvantage that character strings and organized descriptions of each character string are not made from the area determined to be "character". (Problems to be solved by the invention) The present invention eliminates these drawbacks and makes it possible to perform search and text extraction by extracting character strings and organizing character strings. The present invention
Perform orthogonal transformation on the abstracted first image "character" area, calculate the rough pitch, prepare a corresponding mask, extract the character string by logical operation, and calculate the size of the extracted character string. An image processing apparatus is characterized in that a title and a main text are extracted based on the size, position, and number of lines with similar pitches. (Structure and operation of the invention) FIG. 4 shows an embodiment of the present invention.
-11 is 1-1 of the conventional image processing device (Fig. 1)
~1-11 is completely equivalent. Step 4-12 refers to the image table and performs orthogonal transformation on the region of the "character" label in the image memory (4-2) to determine the spatial frequency. Here, a discrete Fourier transform is employed as the orthogonal transform, and the two-dimensional orthogonal transform is approximated as follows. p ij to image memory (4-
2), and the spatial frequency spectrum in the x and y directions is

【式】 によつて求める(第5図)。ここでLx、Lyは各
x、y方向の領域の空間的大きさである。Pk,Pl
によりスペクトルのピークを求める。スペクトル
のピークは直流成分を除くスペクトル成分で、あ
る閾値(直流成分の大きさの10%)以上で、また
複数ピークが存在する場合には最も低周波側のス
ペクトルのピークを採用し、空間周波数(kx
ky)を求める。もし相当するものがない場合に
は、kxorkyを1とする。 次に(kx、ky)(Lx、Ly)を参照して文字列抽
出(4−13)を行う。先ず、縦書きか横書きか
の判定を行う。 Ly/ky<Lx/kxのとき縦書きとし、Ly/ky>Lx/kx
とき 横書きとする。次に縦書きのときには(Ly/ky画素 ×Ly/8ky画素)、横書きのときには(Lx/8kx画素× Lx/kx画素)のマスクを用意し、論理和(閾値はフ ルスケールの半分とする)演算を画像メモリ(4
−2)に施し、文字列が抽出する。各文字列はそ
の外接長方形によつて領域が記述され、2点(第
6図)を画像テーブル(4−9)に格納する。各
文字列は縦書きの場合にはjnaxの大きいものから
番号付けし、横書きの場合にはinioの小さいもの
から番号付けを行ない、行番号で整理する。但
し、任意の二つの文字列のアドレス(i1nio
i1nio)(i1nax、i1nax)と(i2nio、j2nio)(i2nax
j2nax)が与えられたとき、横書きのときi1nio
(i2nioori2nax)<i1nax、また縦書きのときj1nio
(j2nioorj2nax)<j1naxのとき同一行番号を与え、同
一行番号を付けられたものは新たに書式に従つて
iorj方向で順位付けを行なう。このようにして第
6図に示すような画像テーブルを得る。 次に、各文字列から文字の大きさ(Lc)、行ピ
ツチ(Lp)を各文字領域から平均値として推定
する。 即ち である。そこで、ionax、ionio、jonax、jonioは各文
字列の代表アドレス、Nはその文字領域で抽出さ
れた文字列の数である。また、(I1nio、J1nio)、
(I1nax、J1nax)は文字領域の代表アドレス、NL
その文字領域に含まれる行数である。 Lc、Lp、NLと座標値をもとにして、本文及び
見出し及び本文の流れを識別する。先ず、Lc
Lpにより分類する。Lc、Lpの±10%以内のもの
を一つにまとめる。分類後、行数の最も多いもの
を本文とし、それと同一の書式のものを一つにま
とめ他を別のグループに置く。同一書式の者を書
式に従つて順位付ける。横書きの場合には、二つ
の文字領域(I1nio、J1nio)(I1nax、J1nax)と
(I2nio、J2nio)(I2nax、J2nax)を比べ、もしJ1nax
<J2nioあるいはI1nax<I2nioのときには前者を先の
順位とする。そうでない場合には逆の順位とす
る。縦書きの場合には、同様にJ1nio>J2naxある
いはI1nax<I2nioのときには前者を先の順位とし、
そうでない場合には逆の順位とする。結果として
“本文”とラベル付けされたものと同じ書式のも
のを順番通り並べる。“本文”とラベル付けされ
たもののLc以上の値をもつ文字列を捜す。あれ
ば、その順位を調べる。その文字領域が“本文”
とラベル付けされた文字領域の間、あるいは下位
に順位として位置する場合には“サブ・タイト
ル”とラベル付けを行なう。 “本文”とラベル付けされた文字領域より上位
の文字領域があれば、その中で最も上位のものの
Lcの値が“本文”あるいは“サブ・タイトル”の
Lcの値の1.1倍より大きい時には、“タイトル”と
する。また、そのLcの値の±10%以内で等しいも
のを上位から捜し、“タイトル”とラベル付けを
行なう。そうでないものが見つかれば、それ以下
を“サブ・タイトル”とする。 また書式によらず、本文のLcの値の1.5倍以上
のものがあれば、“タイトル”のラベルを与える。 (発明の効果) 本発明は文字列の抽出及び各文字領域の順位付
け、文字列の性質から本文タイトル、サブタイト
ルのラベルが付いているため、フリーフオーマツ
ト文書の自動フアイル及び自動編集が可能とな
る。また、人間との対話手段により、柔軟な文書
画像編集装置が実現することができる。
Calculate by [Formula] (Figure 5). Here, L x and L y are the spatial sizes of the regions in each x and y direction. P k , P l
Find the peak of the spectrum. The peak of the spectrum is the spectrum component excluding the DC component, and if it is above a certain threshold (10% of the magnitude of the DC component) or if there are multiple peaks, the peak of the spectrum on the lowest frequency side is adopted, and the spatial frequency (k x ,
Find k y ). If there is no equivalent, set k x ork y to 1. Next, character string extraction (4-13) is performed with reference to (k x , k y ) (L x , L y ). First, it is determined whether the writing is vertical or horizontal. When L y /k y <L x /k x , it is written vertically, and when L y /k y >L x /k x , it is written horizontally. Next, prepare a mask for vertical writing (L y /k y pixels x L y /8k y pixels) and for horizontal writing (L x /8k x pixels x L x /k x pixels), and use the logical sum (threshold value). is half of the full scale) calculation is performed in the image memory (4
-2) and extract the character string. The area of each character string is described by its circumscribed rectangle, and two points (FIG. 6) are stored in the image table (4-9). Each character string is numbered from the largest j nax when written vertically, and from the smallest i nio when written horizontally, and organized by line number. However, addresses of any two character strings (i 1nio ,
i 1nio ) (i 1nax , i 1nax ) and (i 2nio , j 2nio ) (i 2nax ,
j 2nax ) is given, in horizontal writing i 1nio <
(i 2nio ori 2nax ) <i 1nax , and in vertical writing j 1nio <
(j 2nio orj 2nax ) < j 1nax , give the same line number, and those with the same line number will be rewritten according to the new format.
Ranking is performed in the iorj direction. In this way, an image table as shown in FIG. 6 is obtained. Next, the character size (L c ) and line pitch (L p ) of each character string are estimated as average values from each character area. That is, It is. Therefore, i onax , i onio , j onax , and j onio are the representative addresses of each character string, and N is the number of character strings extracted in that character area. Also, (I 1nio , J 1nio ),
(I 1nax , J 1nax ) is the representative address of the character area, and N L is the number of lines included in the character area. Based on L c , L p , N L and coordinate values, the main text, headings, and flow of the main text are identified. First, L c ,
Classified by L p . Combine L c and L p within ±10%. After classification, the text with the largest number of lines is used as the main text, those with the same format are grouped together, and the others are placed in another group. Rank those with the same format according to the format. In the case of horizontal writing, compare two character areas (I 1nio , J 1nio ) (I 1nax , J 1nax ) and (I 2nio , J 2nio ) (I 2nax , J 2nax ), and if J 1nax
<J 2nio or I 1nax <I 2nio , the former is ranked first. If not, the order will be reversed. In the case of vertical writing, similarly, when J 1nio > J 2nax or I 1nax < I 2nio , the former is ranked first,
If not, the order will be reversed. As a result, items with the same format as those labeled “body” are arranged in order. Find a string labeled “body” that has a value greater than or equal to L c . If so, check its ranking. That character area is the “body”
If the title is located between or below the character area labeled as "subtitle", it is labeled as "subtitle". If there is a character area higher than the character area labeled “body”, the highest
If the value of L c is “body” or “subtitle”
When the value is greater than 1.1 times the value of L c , it is considered a “title”. Furthermore, a search is made for a value that is equal to the L c value within ±10% from the top, and is labeled as a "title." If you find something that is not, use what follows as the "subtitle." Regardless of the format, if the value of L c is 1.5 times or more than the main text, a "title" label is given. (Effects of the Invention) The present invention extracts character strings, ranks each character area, and labels the text title and subtitle based on the nature of the character strings, making it possible to automatically file and edit free-format documents. Become. Furthermore, a flexible document image editing device can be realized by means of human interaction.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は従来の画像処理装置のブロツク図、第
2図と第3図は第1図の装置の動作説明図、第4
図は本発明による画像処理装置のブロツク図、第
5図と第6図は第4図の装置の動作説明図であ
る。 4−12;直交変換回路、4−13;文字列抽
出回路。
FIG. 1 is a block diagram of a conventional image processing device, FIGS. 2 and 3 are explanatory diagrams of the operation of the device in FIG.
This figure is a block diagram of an image processing apparatus according to the present invention, and FIGS. 5 and 6 are explanatory diagrams of the operation of the apparatus shown in FIG. 4. 4-12; Orthogonal transformation circuit; 4-13; Character string extraction circuit.

Claims (1)

【特許請求の範囲】 1 原画像の各画素の濃度を電気信号に変換して
蓄積する手段と、複数の画素から構成される要素
の平均濃度を第1の画像として蓄積する手段と、
画像の領域分割を行ない各分割領域を主要輪郭要
素が占める割合(DN)と主要輪郭要素を境とす
る両側の平均濃度の変化値を領域内で平均した値
(ΔD)の2つのパラメータに従つて写真領域、
図形領域、及び文字領域に分類する手段とを有す
る画像処理装置において、 各文字領域に対して平均濃度で構成される第1
の画像に直交変換を施し画像の空間周波数(k)を求
める手段と、空間周波数に応じた大きさのマスク
を提供する手段と、マスクと画像との論理演算に
より文字列を抽出する手段と、空間周波数より縦
書き又は横書きを判断し各文字列の行番号にラベ
ルを提供する手段とを有することを特徴とする画
像処理装置。
[Scope of Claims] 1. means for converting the density of each pixel of the original image into an electrical signal and storing it; means for storing the average density of an element composed of a plurality of pixels as a first image;
The image is divided into regions, and each divided region is divided into two parameters: the proportion of the main contour elements (D N ) and the average density change value on both sides of the main contour elements (ΔD). Therefore, the photographic field,
In an image processing apparatus having means for classifying into graphic areas and character areas, a first image processing apparatus configured with an average density for each character area
means for obtaining a spatial frequency (k) of the image by performing orthogonal transformation on the image; means for providing a mask of a size corresponding to the spatial frequency; and means for extracting a character string by a logical operation on the mask and the image; An image processing apparatus comprising means for determining vertical writing or horizontal writing based on spatial frequency and providing a label to the line number of each character string.
JP57153992A 1982-09-06 1982-09-06 Picture processor Granted JPS5943468A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57153992A JPS5943468A (en) 1982-09-06 1982-09-06 Picture processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57153992A JPS5943468A (en) 1982-09-06 1982-09-06 Picture processor

Publications (2)

Publication Number Publication Date
JPS5943468A JPS5943468A (en) 1984-03-10
JPH0337785B2 true JPH0337785B2 (en) 1991-06-06

Family

ID=15574544

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57153992A Granted JPS5943468A (en) 1982-09-06 1982-09-06 Picture processor

Country Status (1)

Country Link
JP (1) JPS5943468A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0535868A (en) * 1991-07-31 1993-02-12 Toppan Printing Co Ltd Image cutting device
US7848567B2 (en) * 2004-09-23 2010-12-07 Fuji Xerox Co., Ltd. Determining regions of interest in synthetic images

Also Published As

Publication number Publication date
JPS5943468A (en) 1984-03-10

Similar Documents

Publication Publication Date Title
US20210256253A1 (en) Method and apparatus of image-to-document conversion based on ocr, device, and readable storage medium
CN109117836B (en) Method and device for detecting and positioning characters in natural scene based on focus loss function
JP3086702B2 (en) Method for identifying text or line figure and digital processing system
US9235759B2 (en) Detecting text using stroke width based text detection
JP2930460B2 (en) How to segment handwritten and machine-printed text
JP3950777B2 (en) Image processing method, image processing apparatus, and image processing program
CN108805116B (en) Image text detection method and system
US20210133437A1 (en) System and method for capturing and interpreting images into triple diagrams
WO2022012121A1 (en) Layout analysis method, reading assisted device, circuit, and medium
CN110334709B (en) License plate detection method based on end-to-end multi-task deep learning
KR950001551A (en) Image segmentation and how to classify image elements
JP2890482B2 (en) Document image relocation filing device
KR20110139113A (en) System and method for clean document reconstruction from annotated document images
CN106203539A (en) The method and apparatus identifying container number
JPH11345339A (en) Method, device and system for picture segmentation, and computer-readable memory
CN114863408A (en) Document content classification method, system, device and computer readable storage medium
RU2626656C2 (en) Method and system of determining orientation of text image
EP0587860B1 (en) Bitmap image segmentation using a charge model for pixels
CN111832390B (en) Handwritten ancient character detection method
JPH0337785B2 (en)
CN113807218B (en) Layout analysis method, device, computer equipment and storage medium
Normand et al. A background based adaptive page segmentation algorithm
Worring et al. Segmentation of color documents by line oriented clustering using spatial information
JP4116377B2 (en) Image processing method and image processing apparatus
Kaur et al. Bilingual script identification of printed text image