JPS62224871A

JPS62224871A - Document picture processing system

Info

Publication number: JPS62224871A
Application number: JP61065650A
Authority: JP
Inventors: Yasuaki Nakano; 中野　康明; Hiromichi Fujisawa; 浩道藤澤; Toshihiro Hananoi; 花野井　歳弘; Kiyomichi Kurino; 栗野　清道
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-03-26
Filing date: 1986-03-26
Publication date: 1987-10-02

Abstract

PURPOSE:To automatically extract the identification information of document from an input document by eliminating noises from an input picture, executing a specific preprocessing, thereafter extracting an area. CONSTITUTION:The picture of the document 3 is converted by a photoelectric converter 4, and stored in a memory 51. The said picture in the memory 51 is normalized, and stored in a memory 52. The picture in the memory 52 is eliminated from its noises, and stored in a memory 53. The picture in the memory 53 is subjected to a continuous filtering process, and stored in a memory 54. The picture in the memory 54 is compressed, and stored in a memory 55. The contour extraction processing is applied to the picture in the memory 55, and the result is stored in a memory 57. Since the input picture is simplified by the above-said continuous filtration, the processing can be speeded up. In such a way, the identification information of document can automatically extracted from the input document.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は文書画像処理方式に係り、特に画像ファイルに
格納すべき文書画像から上記画像の見出し情報を自動的
に抽出する目的に好適な文書画像処理方式に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a document image processing method, and particularly to a document image processing method suitable for automatically extracting index information of the image from a document image to be stored in an image file. Regarding image processing methods.

[Conventional technology]

従来の文書画像ファイル装置では文書画像を格納すると
き、入力された文書画像の識別情報をキーボードなどか
ら与える必要があった。文書画像の識別情報とは、たと
えば特許公開公報を例に取れば、特許公開番号や合本中
でのノンプル（頁番号）などである、識別情報をキーボ
ードから入力する作業は煩雑であるばかりでなく、自動
給紙機構を用いた大量文書の連続入力に際し処理速度を
低下させる問題があった。先頭頁のみ識別情報をキー人
力し、以後は自動的に内部でカウンタを歩進して識別情
報を生成する方法も考えられるが。In a conventional document image file device, when storing a document image, it is necessary to input identification information of the inputted document image from a keyboard or the like. For example, in the case of a patent publication, the identification information of a document image includes the patent publication number and the non-pull (page number) in a combined publication.The task of inputting identification information from a keyboard is not only complicated. However, there was a problem in that the processing speed was reduced when continuously inputting a large number of documents using an automatic paper feeding mechanism. It is conceivable to enter identification information only for the first page, and then automatically increment the counter internally to generate identification information.

文書中のある頁が脱落した場合にずれが生ずるなどの問
題があった。There were problems such as misalignment occurring when a certain page in a document was dropped.

従って、文書識別情報を含む特定領域（以下見出し領域
と呼ぶ）を文書画像から抽出し、この部分の画像を見出
し画像として別に登録したり、見出し画像の中の文字を
認識して符号情報として登録することが望まれている。Therefore, a specific area containing document identification information (hereinafter referred to as a heading area) can be extracted from a document image and the image of this part can be registered separately as a heading image, or characters in the heading image can be recognized and registered as code information. It is desired to do so.

従来技術では、公開特許公報昭和６０−１７５６６号（
昭和６０年１月２９１４公開）において、文書−ヒの指
定された領域の文字を認識し、検索キーワードを得る方
法が提案されている。また、公開特許公報昭和ｆｉｏ−
１７５７１号（昭和６０年１月２９日公開）において、
入力画像から特定の閉曲線を抽出し、その閉曲線の内部
を文字認識する方法が提案されている。さらに、昭和６
０年度電子通信学会総合全国大会講演論文集所載論文、
８１０−２　ｒ書式定義言語による文書画像理解」　（
昭和６０年３月５日発行）において、文書画像から輪郭
抽出によって文字成分を抽出し、予め定めた文書の書式
定義と照合して文書の各部分領域を同定する手法が提案
されている。また、アイ・ビー・エム・ジャーナル・オ
ブ・リサーチ・アンド・ディベロップメント誌、第２６
巻６号６４７〜６５６ページ所載論文「文書解析システ
ムＪ　（１９８２年１１月発行）では、入力画像から水
平及び垂直方向の走査線内で連続した白画素の群（連）
を抽出し、その連の長さを予め定めた閾値と比較して、
連の長さが閾値より小さいときには上記白画素の連を黒
画素の連で置き換える処理を、水平及び垂直方向につい
てそれぞれ行って得た画像の画素とどの論理積を作り、
その結果得られる画像について特徴抽出を行って原像画
の領域分割を行う手法が提案されている。In the prior art, published patent publication No. 17566/1983 (
2914 (January 1985), a method for obtaining a search keyword by recognizing characters in a specified area of a document is proposed. Also, published patent publication Showa fio-
In No. 17571 (published on January 29, 1985),
A method has been proposed that extracts a specific closed curve from an input image and recognizes characters inside the closed curve. Furthermore, Showa 6
Papers included in the 2008 IEICE General Conference National Conference Proceedings,
810-2 "Document image understanding using r format definition language" (
(Published on March 5, 1985) proposes a method of extracting character components from a document image by contour extraction and comparing them with a predetermined document format definition to identify each partial region of the document. Also, IBM Journal of Research and Development, No. 26
In the paper published in Volume 6, Pages 647 to 656, "Document Analysis System J (published in November 1982), a group (series) of continuous white pixels within horizontal and vertical scanning lines from an input image is used.
, and compare the length of the run with a predetermined threshold,
When the length of the run is smaller than the threshold, the process of replacing the run of white pixels with a run of black pixels is performed in the horizontal and vertical directions, respectively, and which logical product is made with the pixels of the image obtained.
A method has been proposed in which features are extracted from the resulting image and the original image is segmented into regions.

[Problems to be solved by the present invention]

従来の方式の第一の方法は、文書上の領域を人手で指定
するため自動化に適しない。第二の方法は１文書の識別
情報が閉曲線の内部に存在する場合に限定され、一般的
でない、第三の方法は強力ではあるが１文書画像の輪郭
抽出を行うため、処理時間やプロセッサ内部のメモリ量
が増加する問題がある。第四の方法は文字と写真などが
混在した文書の領域分割には有効であるが１文書画像を
水平及び垂直方向について処理するため、プロセッサの
内部メモリ量や処理時間が増加すること、また画像上の
小さい雑音で誤動作しやすいこと、さらに見出し領域の
画像的特徴とは何であるかを記述することは困難である
から、ファイリングのための見出し領域の自動抽出には
直ちに適用できないこと、などの問題がある。The first conventional method is not suitable for automation because the area on the document is specified manually. The second method is limited to cases where the identification information of one document exists inside a closed curve, and is not common.The third method is powerful, but requires processing time and processor internal processing because it extracts the outline of a single document image. There is a problem that the amount of memory increases. The fourth method is effective for dividing documents containing a mixture of text and photos, but because it processes one document image horizontally and vertically, it increases the amount of internal memory of the processor and the processing time. It is easy to malfunction due to the small noise above, and it is difficult to describe the image characteristics of the heading area, so it cannot be immediately applied to automatic extraction of heading areas for filing. There's a problem.

本発明の目的は、単純な構成により文書の識別情報を含
む部分領域を同定する手法を実現することを目的とする
。SUMMARY OF THE INVENTION An object of the present invention is to realize a method of identifying a partial area containing identification information of a document using a simple configuration.

[Means to solve the problem]

上記〔従来の技術〕で述べた目的、すなわち見出し領域
の自動抽出のためには、上記の第三の方法が基本的には
適していると考えられる。上記第三の方法の問題点は、
入力画像に対して比較的に単純な前処理を施して、入力
画像の単純化を行ってから領域抽出を行うことにより解
決される。この目的に適する前処理としては、入力画像
について雑音除去を行った後、白画素の連を抽出し、そ
の連の長さが閾値より小さいときに白画素の連を黒画素
の連で置き換える処理を用いることができる。The third method described above is considered to be basically suitable for the purpose described in the above [Prior Art], that is, automatic extraction of a heading area. The problem with the third method above is that
This problem can be solved by applying relatively simple preprocessing to the input image to simplify the input image and then extracting the region. Preprocessing suitable for this purpose involves removing noise from the input image, extracting a run of white pixels, and replacing the run of white pixels with a run of black pixels when the length of that run is smaller than a threshold. can be used.

[Effect]

上記の前処理を適用することにより、凹型的な文書画像
に対し、輪郭数がほぼ１／２０に減少できることが実験
的に確かめられており、処理を著しく単純かつ高速化で
きる。さらに、前処理を適用した後では画像が単純にな
っているので、原画像では適用不可能だった画像の間引
き処理によるデータ削減が適用可能になり、一層の高速
化を実現することができる。たとえば、縦横それぞれ１
／４に間引きすれば、全体の画素数は１／１６に削減さ
れる。It has been experimentally confirmed that by applying the above preprocessing, the number of contours can be reduced to approximately 1/20 for concave document images, and processing can be significantly simplified and speeded up. Furthermore, since the image becomes simpler after preprocessing is applied, data reduction through image thinning processing, which could not be applied to the original image, can be applied, making it possible to achieve even higher speeds. For example, 1 each in the vertical and horizontal directions.
If the pixels are thinned out to /4, the total number of pixels will be reduced to 1/16.

〔Example〕

以下、本発明を図面を用いて詳細に説明する。 Hereinafter, the present invention will be explained in detail using the drawings.

第１図は本発明の文書画像処理方式の一実施例における
処理の流れをＰ　Ａ　Ｄ　（Ｐｒｏｂｌｅｍ　Ａｎａｌ
ｙｓ、ｔｓＤｉａｇｒａｍ）で表したものである。第２
図は第１図の処理を実施する装置の構成を示すブロック
図である。装置の各部はバス１に接続され、全体の動作
は制御部２により制御される。FIG. 1 shows the flow of processing in an embodiment of the document image processing method of the present invention.
ys, tsDiagram). Second
The figure is a block diagram showing the configuration of an apparatus that implements the process of FIG. 1. Each part of the device is connected to a bus 1, and the overall operation is controlled by a controller 2.

第１図の１０１は処理中で用いられる各種の変数の初期
化である。101 in FIG. 1 is initialization of various variables used during processing.

第１図の１０２の文書画像の入力であり、第２図に示さ
れた文書３上の情報（文書画像）を光電変換装置４によ
り走査し、ディジタル化し、バス１を介してメモリ５１
に格納することに対応する。This is the input of the document image 102 in FIG. 1, where the information (document image) on the document 3 shown in FIG.
Corresponds to storing in .

メモリ５１は後述する５２〜５９とともにメモリ５の一
部をなす、以下の説明では１画像１ビットに二値化する
ものとし、値“０″は白画素を、ＩＩ　Ｉ　ＩＩは黒画
素を表すものとするが、それ以外の表現でもよい。The memory 51 forms a part of the memory 5 together with 52 to 59, which will be described later.In the following explanation, it is assumed that one image is binarized into one bit, and the value "0" represents a white pixel, and II II II represents a black pixel. However, other expressions may be used.

第１図の１０３では文書画像に対し位置補正処理、傾き
補正処理などの正規化を行うもので、第２図でメモリ５
１の画像を正規化した画像をメモリ５２に格納する。入
力画像の位置決め精度が十分良いときには、１０３を省
略しても良い。103 in FIG. 1 performs normalization such as position correction processing and tilt correction processing on the document image.
An image obtained by normalizing the image of No. 1 is stored in the memory 52. If the positioning accuracy of the input image is sufficiently high, step 103 may be omitted.

第１図の１０４では雑音除去処理を行う、この雑音除去
処理は、白地中の孤立した黒画素の小さい塊を除去する
もので、公知の手法１例えば３Ｘ３の大きさのマスクで
画像を走査し、周囲を１０”で囲まれた１１１”の画素
を“０″に置換するなどの手法を用いれば良い。第２Ｉ
ｉｉｉ１でメモリ５２の画像から雑音除去した画像をメ
モリ５３に格納する。At 104 in FIG. 1, a noise removal process is performed. This noise removal process removes small clusters of isolated black pixels in a white background, using a known method 1, for example, by scanning an image with a 3×3 mask. , a method such as replacing the 111" pixel surrounded by 10" with "0" may be used. 2nd I
In step iii1, the image from which noise has been removed from the image in the memory 52 is stored in the memory 53.

第１図の１０５では連長フィルタリング処理であり、第
２図では、メモリ５４の画像から連長フィルタリング処
理を施した画像をメモリ５４に格納する。連長フィルタ
リング処理の詳細な処理内容は、後で第３図のＰＡＤ図
により説明する。105 in FIG. 1 is a run length filtering process, and in FIG. 2, an image that has been subjected to a run length filtering process from an image in the memory 54 is stored in the memory 54. The detailed contents of the run length filtering process will be explained later with reference to the PAD diagram in FIG.

第１ｙＡ１０６は１画像の間引き処理を行う部分であっ
て、第２図では、メモリ５４中の連長フィルタリング処
理を施した画像を縦・横それぞれ１／４に公知の手法に
より圧縮した画像をメモリ５５に格納する。１０６の圧
縮処理は処理速度があまり問題にならない場合には省略
することができる。The first yA 106 is a part that performs a thinning process for one image, and in FIG. 55. The compression process in step 106 can be omitted if processing speed is not much of an issue.

第１図１０７は、輪郭抽出処理を行い、輪郭上の座標点
列を抽出する部分であって、第２図では、メモリ５５中
の圧縮画像から抽出した輪郭の番号・種類と各輪郭の座
標点列がメモリ５６に得られる。107 in FIG. 1 is a part that performs contour extraction processing and extracts a sequence of coordinate points on the contour. In FIG. 2, the number and type of contour extracted from the compressed image in the memory 55 and the coordinates of each contour are shown. A sequence of points is obtained in memory 56.

この座標点列は輪郭の個数だけ得られる。ここで、輪郭
には外輪郭と内輪郭との区別がなされているものとする
。内輪郭（外輪郭）とは白地を取り囲む黒地（黒地を取
り囲む白地）の境界であり、白地を右側に黒地を左側に
見るようにして輪郭を一周したとき時計回り（反時計回
り）に−周するような輪郭を言う、外輪郭と内輪郭との
区別は、公知の手法により輪郭追跡と同時に行える。This coordinate point sequence is obtained as many times as there are contours. Here, it is assumed that the contour is distinguished into an outer contour and an inner contour. The inner contour (outer contour) is the boundary of the black background surrounding the white background (the white background surrounding the black background), and when you go around the outline with the white background on the right and the black background on the left, the boundary changes clockwise (counterclockwise). Distinguishing between an outer contour and an inner contour, which are defined as contours, can be performed simultaneously with contour tracking using a known method.

第１図１０８は輪郭について１０９〜１１３の処理を繰
り返すループ制御であり、Ｎは輪郭数、ｊは輪郭番号を
示す。第２図では、メモリ５６中の輪郭データについて
処理が行われ、処理結果はメモリ５７に格納される。１
０９は、第ｉ輸郭が外輪郭か内輪郭かを判定する判定部
であり、外輪郭のとき１．１０〜１１３の処理を行う。FIG. 1 108 shows loop control in which the processes 109 to 113 are repeated for contours, where N is the number of contours and j is the contour number. In FIG. 2, processing is performed on the contour data in memory 56, and the processing results are stored in memory 57. 1
09 is a determination unit that determines whether the i-th contour is an outer contour or an inner contour, and when it is an outer contour, processes 1.10 to 113 are performed.

】１０は、第１輪郭の幅Ｗ（ｉ）と高さＨ（ｉ）を計算
する処理であり、具体的には第ｉ輸郭の座標点列からＸ
座標とＹ座標の最大値と最小値Ｘ　ｗａｘ　。]10 is the process of calculating the width W(i) and height H(i) of the first contour. Specifically,
Maximum and minimum values of coordinates and Y coordinates X wax.

Ｘ　ｍａｘ　、　Ｙ　ｍａｘ　、　Ｙ　ｍｉｎを求めて
Ｗ　　（ｉ　）　　＝　Ｘｍａｘ　−ＸｍａｘＨ（ｉ）
＝Ｙｍａｘ−Ｙｍｉｎとすればよい、１１１は、第ｉ輸郭が文字行としての条
件を満たすかを判定する判定部で、たとえば、Ｗｌ＜Ｗ　（ｉ）　　　　　　　　＜Ｗ２Ｈ１＜Ｈ（ｉ
）　　　　　　　＜Ｈ２Ｋｌ＜Ｈ（ｉ）／Ｗ　（ｉ）　　＜Ｋ２とすればよい、
ここで、Ｗｌ、Ｗ２．Ｈｌ、Ｈ２゜Ｋｌ、に２は、細長
い領域を抽出するためのパラメータであり、上式は第１
輪郭が文字行であるためにはその大きさと縦横比がある
範囲に入るべきことを指定している。１１２では抽出し
た文字行の個数Ｋを１だけ増す、１１３では、文字行座
標登録エリアのに番目に第ｉ軸郭の代表位置座標（たと
えばＸ　ｗｉｎなど）を登録する。には初期化処理１０
１において０と設定しておく。Find X max , Y max , Y min and calculate W (i) = Xmax - XmaxH(i)
=Ymax-Ymin. 111 is a determination unit that determines whether the i-th contour satisfies the conditions as a character line. For example, Wl<W (i) <W2H1<H(i
) <H2 Kl<H(i)/W (i) <K2,
Here, Wl, W2. Hl, H2゜Kl, and 2 are parameters for extracting a long and narrow region, and the above equation is the first
It specifies that for an outline to be a character line, its size and aspect ratio must fall within a certain range. At step 112, the number K of extracted character lines is increased by 1. At step 113, the representative position coordinates (for example, X win, etc.) of the i-th axis are registered in the character line coordinate registration area. Initialization process 10
1 is set to 0.

第１図１１４〜１２５の処理は抽出した文字行領域を対
象とする文字に関する書式データと照合する部分である
。書式データはＭ個の文（ステートメント）からなり、
各文は抽出すべき見出し領域の位置及び大きさに関する
Ｍｌ個の文と、相互の相対位置に関するＭ２個の文とを
含む（Ｍ。The processing shown in FIG. 1 114 to 125 is a portion in which the extracted character line area is compared with format data regarding the target character. The format data consists of M statements.
Each sentence includes M1 sentences regarding the position and size of the heading area to be extracted, and M2 sentences regarding the mutual relative positions (M.

Ｍｌ、Ｍ２は書式データ番号ｊにより変化するが簡単の
ため、以下では添え字ｊを省略している）。Although M1 and M2 change depending on the format data number j, the subscript j is omitted below for simplicity.)

第１図１１４は、５個の書式データについて１１５〜１
２０の処理を繰り返すループ制御であり、ｊは書式デー
タの番号を表す。114 in FIG. 1 shows 115 to 1 for five format data.
This is a loop control that repeats 20 processes, and j represents the number of format data.

第１図１１５は第ｊｌＦ式データを取り出す部分であり
、具体的には第２図で制御部２から書式データをメモリ
５８に書き込むことによって行われる。115 in FIG. 1 is a part for extracting the jIF-th format data, and specifically, this is done by writing format data from the control section 2 into the memory 58 in FIG.

第１図１１６はＭｌ個の文について１１７〜１１９の処
理を繰り返すループ制御であり、ｍは見出し領域の番号
を表す。１１７はに個の行領域について１１８〜１１９
の処理を繰り返すループ制御であり、ｋは行領域番号を
表す、１１８は、第に行領域が第ｍ見出し領域としての
条件を満たすかを判定する判定部であり、メモリ５７の
中に登録された代表位置座標を第ｍ文に含まれる条件、
例えば、Ｌ　］　＜　Ｘｍｊｎ＜　Ｌ　２Ｄ　１　＜Ｙｍｉｎ＜Ｄ　２と照合することによって判定する。判定結果が照合成功
であれば、１１９で第ｍ見出し領域の候補Ｃ（ｍ）とし
てｋを登録する。FIG. 1 116 shows a loop control in which the processes 117 to 119 are repeated for Ml sentences, where m represents the number of the heading area. 117 for row areas 118-119
118 is a determination unit that determines whether the row area satisfies the conditions for the m-th heading area, and k represents the line area number. The condition that the representative position coordinates are included in the m-th sentence,
For example, the determination is made by comparing L ] < Xmjn < L 2 D 1 < Ymin < D 2 . If the determination result is that the matching is successful, k is registered as a candidate C(m) for the m-th heading area in step 119.

１１６〜１１９のループが終了すると、１２０で全ての
見出し領域について候補Ｃ（ｍ）が求まっているかを判
定し、全てが求まっていれば１２１以降の処理を行う。When the loop from 116 to 119 is completed, it is determined in 120 whether candidates C(m) have been found for all the heading areas, and if all have been found, the processes from 121 onwards are performed.

１２１は、Ｍ２個の文について１２２の処理を繰り返す
ループ制御であり、ｎは文の番号を表す。１２２では二
つの見出し領域Ｉ　（ａ）　、　Ｉ　（ｂ）の間の相対
位置関係、例えば、Ｒ１＜Ｘｍｊｎ　（ｂ）　−Ｘｍａ
ｘ　（ａ）を記述しており、ａ、ｂは具体的な領域番号
である。従って、抽出された候補領域Ｃ（ａ）　、　Ｃ
（ｂ）に対する代表座標が上記の条件を満足するか否か
を調べればよい。121 is a loop control that repeats the process of 122 for M2 sentences, and n represents the sentence number. 122, the relative positional relationship between the two heading areas I (a) and I (b), for example, R1<Xmjn (b) -Xma
x (a) is described, where a and b are specific area numbers. Therefore, the extracted candidate regions C(a), C
It is sufficient to check whether the representative coordinates for (b) satisfy the above conditions.

１２１〜１２２のループが終了すると、１２３で全ての
相対関係に関する条件が満足されたか否かを調べ、条件
が満足されていれば１２４でｊを入力文書の書式番号で
あると識別し、ｊと候補領域Ｃ（ｍ）（ｍ＝１．ＭＬ）
及びそれらの代表座標をメモリ５７に登録し、１２５で
ループ１１４を脱出する。本実施例では１２３の判定が
「否」のときは何もしないで１０９のループを次に進む
ようにしているが、再試行するようにしてもよい。When the loop from 121 to 122 is completed, it is checked in 123 whether the conditions regarding all relative relationships are satisfied, and if the conditions are satisfied, j is identified as the format number of the input document in 124, and j and Candidate area C(m) (m=1.ML)
and their representative coordinates are registered in the memory 57, and the loop 114 is exited at step 125. In this embodiment, when the determination at step 123 is "no", the process proceeds to the next step through the loop at step 109 without doing anything, but it is also possible to try again.

すなわち、ループ１１６以降を複数回実行するようにし
、２回目以降では１１８の見出し領域（ｍ）の判定時に
前回に判定された候補領域（ｋ＝Ｃ（ｍ））は除外する
。この処理が可能となるために、ｍとｋとの対応関係を
行列あるいはリストなどの形式で記憶しておき、第１回
目の試行の際。That is, the loop 116 and subsequent steps are executed multiple times, and from the second time onwards, the previously determined candidate area (k=C(m)) is excluded when determining the heading area (m) in 118. In order to make this process possible, the correspondence between m and k is stored in the form of a matrix or a list, and then used during the first trial.

この対応関係をクリアすればよい。All you have to do is clear this correspondence.

１１４で全ての書式ｊについて処理しても照合する書式
番号が発見できないときは、１２６でエラー表示を行い
、この入力文書対する処理を中止する。If the format number to be collated cannot be found even after processing all formats j in step 114, an error is displayed in step 126 and the processing for this input document is stopped.

第１図１２７はＭｌ個の見出し領域について１２８〜１
２９の処理を繰り返すループ制御であり、ｍは見出し領
域の番号を表す、１２８は、メモリ５９の登録された候
補領域Ｃ（ｍ）の番号から、この領域の代表座標を取り
出す、１２９は、この代表座標で指定される部分領域を
第２図のメモリ５１に格納された原画像から取り出し、
その中の文字を認識する部分であり１文字認識は第２図
の文字認識部６で実行される０文字ｌ！ＩＴａ結果は第
２図のメモリ５９に格納される。正規化処理１０３や間
引き処理１０６を施しているので、代表座標で指定され
る部分領域を取り出す際にはこれらの処理による変化を
補正しておくものとする。FIG. 1 127 shows 128 to 1 for Ml heading areas.
128 is a loop control that repeats the process of 29, where m represents the number of the heading area, 128 extracts the representative coordinates of this area from the registered candidate area C(m) number in the memory 59, and 129 represents this area. A partial area specified by the representative coordinates is extracted from the original image stored in the memory 51 in FIG.
This is the part that recognizes the characters in it, and single character recognition is executed by the character recognition unit 6 in FIG. 2, 0 character l! The ITa results are stored in memory 59 in FIG. Since the normalization process 103 and the thinning process 106 are performed, the changes caused by these processes are corrected when extracting the partial area specified by the representative coordinates.

通常の文字認識と同様に、認識が失敗した文字について
は制御部２のコンソール等を用いて修正できることは言
うまでもない。As with normal character recognition, it goes without saying that characters that fail to be recognized can be corrected using the console of the control unit 2, etc.

第１図１３０は、入力された文書画像と識別情報（すな
わち文字認識結果の文字列）とを第２図のファイル７に
出力する処理である。130 is a process for outputting the input document image and identification information (ie, the character string resulting from character recognition) to the file 7 in FIG. 2.

次に連長フィルタリング処理の詳細を説明する。Next, details of the run length filtering process will be explained.

第３図は連長フィルタリング処理の流れを示すＰＡＤ図
、第４図及び第５図はその原理を説明する図である。な
お、入力画像のサイズをｘｘＹ画素とする。FIG. 3 is a PAD diagram showing the flow of run length filtering processing, and FIGS. 4 and 5 are diagrams explaining the principle thereof. Note that the size of the input image is assumed to be xxY pixels.

第３図３０１は、作業エリアのクリアなどの初期化処理
である６第３図３０２は入力画像について走査線数だけ３０３〜
３１０は処理を繰り返すループ制御であり、ｙは走査線
番号を表す。３０３は第ｙ走査線に対する初期化であり
、白画素の連長りを０とする。３０４は第ｙ走査線の中
で画素について左から右へ３０５〜３１０の処理を繰り
返すループ制御であり、又は画素番号を表す。301 in FIG. 3 is an initialization process such as clearing the work area 6 302 in FIG.
310 is a loop control for repeating processing, and y represents a scanning line number. 303 is initialization for the y-th scanning line, in which the consecutive length of white pixels is set to zero. 304 is a loop control that repeats the processes 305 to 310 from left to right for pixels in the y-th scanning line, or represents a pixel number.

３０５で画素Ｐ（ｘ、ｙ）が白か黒かを判定する。Ｐ　
（ｘｔ　ｙ）が白画素の場合は、３０６で白の連畏りを
１だけ増す。黒画素の場合は、３０７で白の連長りをＯ
と比較する。Ｌ＝Ｏの場合は、一つ前の画素も黒画素だ
った場合であり、このときは何もしない。Ｌ≠０のとき
は、３０８で白の連長りを閾値ＬＯと比較する。Ｌ＜Ｌ
Ｏのときは。In 305, it is determined whether the pixel P(x, y) is white or black. P
If (xt y) is a white pixel, the white pixel is increased by 1 in 306. In the case of black pixels, set the white continuous length to 307.
Compare with. When L=O, the previous pixel is also a black pixel, and nothing is done in this case. When L≠0, in 308, the continuous length of white is compared with a threshold value LO. L<L
When O.

十分長く白像素が連続しているときで、このときも何も
しない。Ｌ＜ＬＯのときは、白の連長が短いときで、３
０９〜３１０の処理によりとの自速を黒に変更する。３
０９は、Ｘ座標がｗｌからｗ２までの（すなわち、現在
の画素に先行する）Ｌ個の白画素について３１０の処理
を繰り返すループ制御であり、Ｗは画素番号を表す。３
１０は画素Ｐ　（Ｗ　？　ｙ　）を白から黒に変更する
ことを示す。When the white image elements are continuous for a long enough time, nothing is done in this case either. When L<LO, white's length is short and 3
By the processing of 09 to 310, the own speed of and is changed to black. 3
09 is a loop control that repeats the process of 310 for L white pixels whose X coordinates are from wl to w2 (that is, preceding the current pixel), and W represents a pixel number. 3
10 indicates that the pixel P (W?y) is changed from white to black.

以上説明した第３図の連長フィルタリング処理ＡＤ図、
第４図及び第５図はその原理を説明する図である。なお
、入力画像のサイズをＸ　Ｘ　Ｙ　１ｉＩｉｉ素とする
。The run length filtering process AD diagram of FIG. 3 explained above,
FIGS. 4 and 5 are diagrams explaining the principle. Note that the size of the input image is assumed to be X X Y 1iIiii elements.

一第３図３０１は、作業エリアのクリアなどの初期化処
理である。301 in FIG. 3 shows initialization processing such as clearing the work area.

第３図３０２は、入力画像について走査線数だけ３０３
〜３１０の処理を繰り返すループ制御であり、ｙは走査
線番号を表す。３０３は第ｙ走査線に対する初期化であ
り、白画素の連長丁、をＯとする。３０４は第ｙ走査線
の小で画素について左から右へ３０５〜３１０の処理を
繰り返すループ制御であり、又は画素番号を表す。302 in FIG. 3 shows the number of scanning lines 303 for the input image.
This is a loop control that repeats the processes from 310 to 310, and y represents the scanning line number. 303 is initialization for the y-th scanning line, where O is the continuous length of white pixels. 304 is a loop control that repeats the processes 305 to 310 from left to right for the small pixel of the y-th scanning line, or represents a pixel number.

３０５で画素Ｐ（ｘｔｙ）が白か黒かを判定する。Ｐ　
（ｘｔ　ｙ）が白画素の場合は、３０６で白の連長りを
１だけ増す。黒画素の場合は、３０７で白の連長りをＯ
と比較する。Ｌ＝Ｏの場合は、一つ前の画素も黒画素だ
った場合であり、このときは何もしない、Ｌ≠０のとき
は、３０８の白の連長ｒ、を閾値ＬＯと比較する。Ｌ　
＞　Ｌ　Ｏのときは。In 305, it is determined whether the pixel P(xty) is white or black. P
If (xt y) is a white pixel, the white run length is increased by 1 in 306. In the case of black pixels, set the white continuous length to 307.
Compare with. When L=O, the previous pixel was also a black pixel, and nothing is done in this case. When L≠0, the white run length r of 308 is compared with the threshold LO. L
> When LO.

十分長く白画素が連続しているときで、このときも何も
しない。Ｌ＜ＬＯのときは、白の連長が短いときで、３
０９〜３１０の処理によりこの自速を黒に変更する。３
０９は、Ｘ座標がＷｌからＷ２までの（つまり、現在の
画素に先行する）Ｌ個の白画素について３１０の処理を
繰り返すループ制御であり、Ｗは画素番号を表す。３１
０は画素Ｐ（ｗｔ　ｙ）を白から黒に変更することを示
す。When white pixels are continuous for a long enough time, nothing is done at this time either. When L<LO, white's length is short and 3
This own speed is changed to black by the processing of 09 to 310. 3
09 is a loop control in which the process 310 is repeated for L white pixels whose X coordinates are from Wl to W2 (that is, preceding the current pixel), where W represents a pixel number. 31
0 indicates that the pixel P(wty) is changed from white to black.

以上説明した第３図の連長フィルタリング処理により、
例えば第４図（Ａ）に例示した画像で、白画素の群４１
，４２．４３が同図（Ｒ）に示すように、黒画素群に変
換される。ただし、第４図でハツチした画素は黒、空白
の画素は白を表し、パラメータＬＯは５としている。な
お、第４図で左右両側に十分長い白の領域があるとして
あり、左右両端の白い連は（本例では短いが）黒に変換
していない、なお、左右両端では意図的に白の連を黒に
変換しないようにしてもよい。By the run length filtering process in FIG. 3 explained above,
For example, in the image illustrated in FIG. 4(A), a group of white pixels 41
, 42, and 43 are converted into a black pixel group as shown in FIG. However, hatched pixels in FIG. 4 represent black, blank pixels represent white, and the parameter LO is set to 5. In addition, in Figure 4, it is assumed that there is a sufficiently long white area on both the left and right sides, and the white lines at both the left and right ends are not converted to black (although they are short in this example). may not be converted to black.

第３図の連長フィルタリング処理を適用した結果を模式
的に示すと１例えば第５図（Ａ）の画像は、パラメータ
ＬＯを適当に選ぶことにより同図（Ｂ）に示すように、
隣接する文字の間が接続されて、各字文行が一つの領域
に統合される。この処理において第１図１０４の雑音除
去は長要であり、これを行わないと二つの文字行が雑音
により接続されてしまい、文字行抽出に失敗することが
ある。The results of applying the run length filtering process shown in FIG. 3 are schematically shown.1 For example, the image shown in FIG.
Adjacent characters are connected to integrate each character line into one area. In this process, the noise removal shown in FIG. 1 104 is time consuming, and if this is not done, two character lines will be connected by the noise, and character line extraction may fail.

以上本発明の一実施例について説明したが、発明の本質
に影響を与えることなく各種の変更を加えることが可能
なことは言うまでもない０例えば、抽出した見出し領域
の中の文字を認識せずに、見出し領域の画像を原画像に
対するポインタとともにファイルに格納しておき、表示
に際しては見出し領域の画像をまず表示して、使用者が
指示した見出し領域に連結する原画像を表示することも
有効である。Although one embodiment of the present invention has been described above, it goes without saying that various changes can be made without affecting the essence of the invention. For example, without recognizing the characters in the extracted heading area, It is also effective to store the image of the heading area in a file together with a pointer to the original image, and when displaying, display the image of the heading area first, and then display the original image connected to the heading area specified by the user. be.

また、実施例では横書き文書を仮定し、連を求めるため
の走査方向は水平としたが、縦書き文書では垂直に走査
すればよい。また、白抜き文字に対しては白画素の連の
代わりに黒画素の連を用いればよい。Further, in the embodiment, a horizontally written document is assumed, and the scanning direction for finding runs is horizontal, but a vertically written document may be scanned vertically. Furthermore, for white characters, a series of black pixels may be used instead of a series of white pixels.

また、実施例では文書画像全体について処理を行うとし
たが、別に処理すべき範囲を指定しておき、その中でだ
け見出し領域の抽出を行うようにしてもよい。Further, in the embodiment, the entire document image is processed, but a range to be processed may be specified separately, and the index area may be extracted only within that range.

また、見出し領域の抽出は、文書の光電変換と同時に行
うとしたが、まず、文書の光電変換・ファイルへの格納
だけを行っておき、後刻ファイルから文書画像を読み出
して見出し領域の抽出を行うようにしてもよい。In addition, the extraction of the heading area is performed at the same time as the photoelectric conversion of the document, but first, only the photoelectric conversion of the document and storage in a file is performed, and then the document image is read out from the file and the heading area is extracted. You can do it like this.

さらに、書式識別や見出し領域の抽出に失敗したとき、
処理の中間結果をコンソールに表示して、使用者の助け
を求めろことも可能である。Furthermore, when format identification or heading area extraction fails,
It is also possible to display intermediate results of processing on the console to request user assistance.

〔Effect of the invention〕

以上説明したごとく、本発明によれば入力文書から自動
的に文書の識別情報を抽出することが可能で、従来のよ
うな人間によるキーボード作業が不要となるか、あるい
は大幅に軽減される。As described above, according to the present invention, it is possible to automatically extract document identification information from an input document, and the conventional human keyboard work is not required or is significantly reduced.

【図面の簡単な説明】第１図は本発明の文書処理方式における処理の流れを示
す図、第２図は第１図の処理内容を実施する装置の構成
を示すブロック図、第３図は第１゜図における連長フィ
ルタリング処理の処理の流れを示す図、第４，５図は第
３図の処理内容を説明する図である。１・・・バス、２・・・制御部、３・・・文書、４・・
・光電変換装置、５・・・メモリ、６・・・文字認識部
、７・・・ファイ乎　４　図奉　、５　口[BRIEF DESCRIPTION OF THE DRAWINGS] FIG. 1 is a diagram showing the flow of processing in the document processing method of the present invention, FIG. 2 is a block diagram showing the configuration of a device that implements the processing contents of FIG. 1, and FIG. FIG. 1 is a diagram showing the flow of the run length filtering process in FIG. 1, and FIGS. 4 and 5 are diagrams explaining the processing contents in FIG. 3. 1...Bus, 2...Control unit, 3...Document, 4...
・Photoelectric conversion device, 5...Memory, 6...Character recognition unit, 7...Faiyu 4 diagram, 5 mouth

Claims

[Claims] 1. Means for inputting an image converted into a digital format through photoelectric conversion, sampling, and quantization, and generating a second image from which minute noises in the input image are removed. Then, extract a continuous white or black pixel group within a horizontal or vertical scanning line from the second image, and compare the continuous length of the white or black pixel group with a predetermined threshold. , when the value is smaller than the threshold, a process of replacing the white or black pixel group with a pixel group with inverted black and white is performed on all or part of the input image to generate a third image, and from the third image, means for extracting a characteristic region of a document image, and the extraction of the characteristic region involves extracting connected regions from the image and determining the size, shape, and absolute position of the connected regions, and the relative sizes of the regions. A document image processing method comprising: detecting that a positional relationship is included in a predetermined range.