JPH0214392A

JPH0214392A - Document area analyzing device

Info

Publication number: JPH0214392A
Application number: JP63162493A
Authority: JP
Inventors: Yasuo Hongo; 本郷　保夫
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 1988-07-01
Filing date: 1988-07-01
Publication date: 1990-01-18

Abstract

PURPOSE:To automatically and promptly discriminate where a character area is, where a graphic area is, etc., by composing the title document area analyzing device of a compressed data creating and storing means and a projection discriminating means. CONSTITUTION:Compressed picture data 11 are once stored in a compressed picture memory 36 and read as compressed picture data 41 whenever a projection operation is performed by changing a projection area and a projection direction. An area extracting circuit 14 designates the projection area and the projection direction by a control signal 15. When this control signal 15 is inputted to a projection area designating circuit 37, the circuit 37 gives area and direction data 38 to indicate the projection area and the projection direction to an address for projection generating circuit 39, generates a projection address 40, and successively reads the compressed picture data 41 of the projection area from the compressed picture memory 36. In a counter circuit 42, the number of black picture elements are counted up, projection data 13 are obtained, and the data 13 are outputted. The counter circuit 42 is initialized at the first part of a projection line.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字領域、図形領域、表領域、写真領域等、
様々な領域を含む文書を走査して読取ることにより得た
文書画像を自動的に解析してどこが文字領域であり、ど
こが図形領域であるか、などを判別し、文字領域なら文
字領域だけを選んで自動的に文字読取を行うことを可能
にするためのかかる文書領域解析装置に関するものであ
る。[Detailed Description of the Invention] [Industrial Application Field] The present invention is applicable to character areas, graphic areas, table areas, photographic areas, etc.
It automatically analyzes document images obtained by scanning and reading documents that contain various areas, determines which areas are text areas and which areas are graphic areas, and selects only text areas if they are text areas. The present invention relates to such a document area analysis device for automatically reading characters.

[Conventional technology]

従来、文書画像を自動的に解析してどこが文字領域であ
り、どこが図形領域であるか、などを判別する文書領域
解析装置は存在せず、文字領域だけを印刷された文書を
読取る読取装置とか、文字領域のほかに図形領域、表領
域、写真領域等が混在する文書に対しては、文字領域を
マニュアルで指定してやり、それに従って文字読取を行
わせる読取装置などが存在するに過ぎなかった。Conventionally, there have been no document area analysis devices that automatically analyze document images to determine which areas are text areas and which are graphic areas, and there are no reading devices that can read documents with only text areas printed. For documents containing a mixture of graphic areas, table areas, photographic areas, etc. in addition to character areas, there were only reading devices that manually specified the character area and read the characters accordingly.

[Problem to be solved by the invention]

従って文字領域のほかに図形領域等が混在する文書の読
取に際しては、文字領域をマニュアルで指定するという
操作が必要となり、これが煩わしいという問題があった
。Therefore, when reading a document in which graphic areas and the like coexist in addition to character areas, it is necessary to manually specify the character area, which is troublesome.

本発明は、文字領域のほかに図形領域等が混在する文書
の読取に際しても、マニュアルで文字領域を指定すると
いう操作を必要とせず、それを自動的にかつ高速に指定
可能にするための文書領域解析装置、即ち文書の読取画
像からどこが文字領域であり、どこが図形領域であるか
、などを自動的にかつ迅速に判別することのできる文書
領域解析装置を提供することを目的とする。The present invention provides a document that enables automatic and high-speed designation of text areas without the need for manual designation even when reading a document in which graphic areas, etc. coexist in addition to text areas. It is an object of the present invention to provide an area analysis device, that is, a document area analysis device that can automatically and quickly determine which areas are character areas and which areas are graphic areas from a read image of a document.

[Means to solve the problem]

上記目的達成のため、本発明では、圧縮データ作成記憶
手段と投影判別手段とにより文書領域解析装置を構成し
た。In order to achieve the above object, in the present invention, a document area analysis device is configured by a compressed data creation storage means and a projection determination means.

〔作用〕圧縮データ作成記憶手段は、文書を走査して読取ること
により得た読取画素データを、文書において縦方向ａド
ツト、横方向ｂドツトからなる領域（これを単位領域と
いう）毎に区分しく但しａ。[Operation] The compressed data creation storage means divides the read pixel data obtained by scanning and reading the document into areas (referred to as unit areas) consisting of vertical A dots and horizontal B dots in the document. However, a.

ｂはそれぞれ２又はそれ以上の整数を表す）、各単位領
域内の論理１又は０という特定ドツトの数が或るしきい
値の範囲内にあるか否かにより、各単位領域をそれぞれ
２値化して、前記読取画素データの圧縮データを作成し
圧縮画像として記憶する。b represents an integer of 2 or more), each unit area is given a binary value depending on whether the number of specific dots of logical 1 or 0 in each unit area is within a certain threshold range. to create compressed data of the read pixel data and store it as a compressed image.

また投影判別手段は、前記圧縮データ作成記憶手段から
読み出された圧縮画像を投影するに際し、その投影領域
と投影方向を制御しながら投影し、得られた投影データ
から該圧縮画像を構成する文書内の文字領域とか、図形
領域とか、写真領域の如き領域種別を判別する。Further, the projection determining means projects the compressed image read from the compressed data creation storage means while controlling the projection area and projection direction, and uses the obtained projection data to project the compressed image read from the compressed data creation storage means. The area type, such as a character area, a graphic area, or a photo area, is determined within the area.

わざわざ圧縮画像を作成してから投影するのは、その方
が迅速に領域種別の判別が可能になるからである。圧縮
しない原画像について投影するのでは判別に時間がかか
って実用向けにはならない。The reason why a compressed image is created and then projected is that the area type can be determined more quickly. If an uncompressed original image is projected, it will take a long time to perform the discrimination, making it impractical.

ここで第６図を参照して投影演算とはどういう動作かを
説明しておく。Here, the operation of projection calculation will be explained with reference to FIG.

第６図において、Ｐは横６ドツト、縦６ドツトから成る
パターンＰを表している。その中でハツチを施した各ド
ツトが黒ドツトであり、ハツチを施してない各ドツトが
白ドツトである。In FIG. 6, P represents a pattern P consisting of 6 dots horizontally and 6 dots vertically. Each hatched dot is a black dot, and each unhatched dot is a white dot.

かかるパターンＰを右側からＸ方向に投影するものとす
る。Ｘ方向から見た各横列毎の黒ドツトの数は、＠ＡＩ
に記載したように、０．１，４゜２．０．０であり、こ
れを黒ドツトの数に応じてレベルで表示するとＢ１の如
きパターンになる。It is assumed that such a pattern P is projected from the right side in the X direction. The number of black dots in each row viewed from the X direction is @AI
As described in 2.0.1, 4°, 2.0.0, and when this is displayed in levels according to the number of black dots, a pattern like B1 is obtained.

かかる＋１ｊｌＡ１やパターンＢ１に見られる如きデー
タを得ることを投影演算というのである。Obtaining such data as seen in +1jlA1 and pattern B1 is called projection calculation.

従ってＹ方向に投影したとすれば、同様にして縦行毎に
欄Ａ２やパターンＢ２に見られる如き投影データが得ら
れることが理解されるであろう。Therefore, it will be understood that if projection is performed in the Y direction, projection data as seen in column A2 and pattern B2 can be obtained for each column in the same manner.

〔Example〕

以下、図を参照して本発明の詳細な説明する。 Hereinafter, the present invention will be described in detail with reference to the drawings.

第７図は、解析の対象となる文書の一例を示した説明図
である。同図において、１が文書、２がその中の図表領
域、３ａ、３ｂ、３ｃがそれぞれ文字領域、４が写真領
域、であるとする。図表領域２と文字領域３ａとで１段
が構成され、文字領域３ｂ、写真領域４及び文字領域３
Ｃによってもう１段が構成され、結局この文書１は２段
組の例を示している。FIG. 7 is an explanatory diagram showing an example of a document to be analyzed. In the figure, 1 is a document, 2 is a diagram area therein, 3a, 3b, and 3c are character areas, and 4 is a photo area. One stage is composed of the diagram area 2 and the character area 3a, and the character area 3b, the photo area 4, and the character area 3
Another column is constructed by C, and after all, this document 1 shows an example of a two-column structure.

文字領域とか図表領域とか各領域は、黒画素の分布の仕
方にそれぞれ特徴があるので、その特徴からその領域が
文字領域なのか写真領域なのかを判別することが可能で
ある。Each area, such as a character area or a diagram area, has its own characteristics in the way black pixels are distributed, so it is possible to determine from these characteristics whether the area is a character area or a photo area.

文書１の高さがＨＯ鶴、幅がＷＯｍｍであり、文書１を
画像として見たときの画像サイズが高さがＮｏ　　ドツ
ト、幅がＭＯドツトであるとすると、ドツトとｍｍの換
算係数が得られるわけであるが、文書読取装置としての
イメージスキャナの分解能によりこの換算係数は定まり
、例えば分解能２００ｄｐｉのスキャナであれば換算係
数はＢドツト／耐である。If document 1 has a height of HO and a width of WO mm, and the image size when document 1 is viewed as an image is a height of No dot and a width of MO dot, then the conversion coefficient between dots and mm can be obtained. However, this conversion factor is determined by the resolution of the image scanner used as a document reading device; for example, for a scanner with a resolution of 200 dpi, the conversion factor is B dot/proof.

文書画像１は一般にデータ量が多いので、これをそのま
ま投影すると投影演算に要する時間が長くなるので、投
影に先立って文書画像を圧縮する。Document image 1 generally has a large amount of data, and if it is projected as is, the time required for projection calculation will be long, so the document image is compressed prior to projection.

第８図がこの圧縮の仕方の説明図である。同図において
、文書画像１は、画像サイズが高さがＮ０　ドツト、幅
がＭＯドツトから成るものとして示しである。普通なら
１ドツトが単位であるが、圧縮により　ＫＸＫドツト（
縦、横それぞれにドツトの正方形）領域を単位領域とし
、これが１ビツトデータになる。図ではに＝８ドツト　
の場合を示している。FIG. 8 is an explanatory diagram of this compression method. In the figure, document image 1 is shown as having an image size of N0 dots in height and MO dots in width. Normally, the unit is 1 dot, but due to compression it becomes KXK dot (
A square area with dots on both the vertical and horizontal sides is used as a unit area, and this becomes 1-bit data. In the figure = 8 dots
The case is shown below.

圧縮された結果、単位領域となった圧縮単位領域５は、
Ｘ座標が（８１−７）≦Ｘ≦８■　で定義され、Ｙ座標
が（８Ｊ−７）≦Ｙ≦８Ｊ　で定義される領域である。The compressed unit area 5, which became a unit area as a result of compression, is
This is an area where the X coordinate is defined as (81-7)≦X≦8■, and the Y coordinate is defined as (8J-7)≦Y≦8J.

ただし、１≦１≦（ＭＯ／Ｋ）、１≦Ｊ≦〔ＮＯ／Ｋ〕
の関係がある。〔〕はガウス記号である（つまり（ｘ）
＝ｙとすれば、ｙはＸを超えない整数を表すという意味
である）。However, 1≦1≦(MO/K), 1≦J≦[NO/K]
There is a relationship between [] is a Gaussian symbol (that is, (x)
=y means that y represents an integer not exceeding X).

第１図は本発明の一実施例を示すブロック図である。同
図において、６は文書、７はイメージスキャナ（光学的
文字読取装置）、９は文書画像メモリ、１０は画像圧縮
回路、１２は投影演算回路、１４は領域抽出回路、１７
は文字領域メモリ、２０は文字読取回路、である。FIG. 1 is a block diagram showing one embodiment of the present invention. In the figure, 6 is a document, 7 is an image scanner (optical character reading device), 9 is a document image memory, 10 is an image compression circuit, 12 is a projection calculation circuit, 14 is an area extraction circuit, 17
2 is a character area memory, and 20 is a character reading circuit.

動作を説明する。文書６をイメージスキャナ７で読取っ
てディジタル画像信号８を出力し文書画像メモリ９に記
憶する。他方、その同じディジタル画像信号８を画像圧
縮回路１０に入力し、そこでＫＸＫドツト領域を単位領
域（１ビツトデータ）とするデータに圧縮して圧縮画像
データ１１を出力する。Explain the operation. A document 6 is read by an image scanner 7 to output a digital image signal 8 and stored in a document image memory 9. On the other hand, the same digital image signal 8 is input to an image compression circuit 10, which compresses the KXK dot area into data having a unit area (1 bit data) and outputs compressed image data 11.

次に圧縮画像データ１１は投影演算回路１２に入力され
、ここから出力される投影データ１３を使って領域抽出
回路１４は文字領域やその他の領域を抽出し、特に文字
領域データ１６は文字領域メモリ１７に送られそこに記
録される。Next, the compressed image data 11 is input to the projection calculation circuit 12, and the area extraction circuit 14 uses the projection data 13 output from here to extract character areas and other areas. 17 and recorded there.

領域抽出回路１４は、文字領域を抽出するため、何度か
投影領域を制御するための制御信号１５を投影演算回路
１２に送り、それに従って投影演算を実行させ、そのよ
うにして得られるデータから文字領域などの各領域の抽
出を行う。In order to extract a character area, the area extraction circuit 14 sends a control signal 15 for controlling the projection area several times to the projection calculation circuit 12, executes the projection calculation according to the control signal 15, and extracts the data from the data obtained in this way. Extract each area such as character area.

メモリ１７から読み出された文字領域データは文字領域
アドレス１８として文書画像メモリ９へ送られ、該メモ
リ９から文字画像データ１９を読み出し、これを文字読
取回路２０で読取って結果を出力すれば、文書６におけ
る文字領域だけを抽出しての文字読取が行われたことに
なる。The character area data read from the memory 17 is sent to the document image memory 9 as a character area address 18, and if the character image data 19 is read from the memory 9, and is read by the character reading circuit 20 and the result is output, This means that only the character area in document 6 has been extracted and character reading has been performed.

第２図は、第１図における画像圧縮回路１０の詳細を示
すブロック図である。同図において、２１はにビットサ
イズレジスタ、２３は１ラインなら１ラインという単位
領域内における黒ドツト（画素）の数をカウントすると
共に、前回までのカウント値をそれに加算して出力する
カウント演算回路、２７は１ライン圧縮画像メモリ、２
５はＸ方向アドレス発生回路、２８はＹ方向アドレス発
生回路、３０は第１ライン初期化回路、３３は上下限判
定回路、３５はゲート回路、である。FIG. 2 is a block diagram showing details of the image compression circuit 10 in FIG. 1. In the figure, 21 is a bit size register, and 23 is a count calculation circuit that counts the number of black dots (pixels) in a unit area of 1 line, adds the previous count value to it, and outputs the result. , 27 is a 1-line compressed image memory, 2
5 is an X-direction address generation circuit, 28 is a Y-direction address generation circuit, 30 is a first line initialization circuit, 33 is an upper/lower limit determination circuit, and 35 is a gate circuit.

動作を説明する。ディジタル画像信号８はにビットサイ
ズレジスタ２１に入力され、Ｋビット単位で並列データ
に変換され、並列のにビットデータとしてカウント演算
回路２３へ入力される。カウント演算回路２３は、その
にビットデータを１ライン単位領域データと見てその中
の黒ドツト（画素）の数をカウントし、更に前回までの
カウント値が１ライン圧縮画像メモリ２７に書き込まれ
ているので、それを読み出して加算した上、改めてそれ
までの単位領域内黒画素数２４としてｌライン圧縮画像
メモリ２７に送り、単位領域毎に記憶させる。Explain the operation. The digital image signal 8 is input to the bit size register 21, converted into parallel data in units of K bits, and input to the count calculation circuit 23 as parallel bit data. The count calculation circuit 23 regards the bit data as 1-line unit area data and counts the number of black dots (pixels) therein, and furthermore, the count value up to the previous time is written into the 1-line compressed image memory 27. Therefore, it is read out, added up, and sent to the l-line compressed image memory 27 as the number of black pixels in the unit area up to that point, 24, to be stored for each unit area.

１ライン圧縮画像メモリ２７の単位領域のＸ方向アドレ
スＩ　（先に第８図を参照して説明した１）はアドレス
信号２６としてＸ方向アドレス発生回路２５で生成され
、ｌライン圧縮画像メモリ２７に与えられる。またＹ方
向アドレス発生回路２８は単位領域内でのライン番号信
号２９を生成して送出するが、それが第１ラインである
とき、第１ライン初期化回路３０はこれを受けると初期
値信号３１を出力してカウント回路２３の初期化（零ク
リヤ）を行い、その後、カウント回路２３は前述のよう
にしてにビットデータにおける黒画素数をカウントして
１ライン圧縮画像メモリ２７に送り記憶させる。The X-direction address I (1 described earlier with reference to FIG. 8) of the unit area of the 1-line compressed image memory 27 is generated by the X-direction address generation circuit 25 as an address signal 26, and is sent to the 1-line compressed image memory 27. Given. Further, the Y-direction address generation circuit 28 generates and sends out a line number signal 29 within the unit area, and when this is the first line, the first line initialization circuit 30 receives this and sends out an initial value signal 31. is output to initialize (zero clear) the count circuit 23, and then the count circuit 23 counts the number of black pixels in the bit data as described above and sends it to the 1-line compressed image memory 27 for storage.

第２ラインでは、カウント回路２３は、第２ラインとし
てのにビットデータにおける黒画素数をカウントした後
、１ライン圧縮画像メモリ２７に第１ライン分のそれと
して記憶されている黒画素数を読み出してきて、和をと
り、それを改めて第２ライン分までのそれ（第１ライン
としてのにビットデータにおける黒画素数と第２ライン
としてのにビットデータにおける黒画素数との和）とし
てｌライン圧縮画像メモリ２７に単位領域毎に書き込み
記憶させる。In the second line, the counting circuit 23 counts the number of black pixels in the bit data for the second line, and then reads out the number of black pixels stored in the one-line compressed image memory 27 as that for the first line. , take the sum, and calculate the sum up to the second line (the sum of the number of black pixels in the bit data for the first line and the number of black pixels in the bit data for the second line) for l lines. Each unit area is written and stored in the compressed image memory 27.

以下、同様にして成る一つの単位領域の最終ラインであ
る第にラインに達したところで、ｌライン圧縮画像メモ
リ２７には、成る一つの単位領域（Ｋ本のライン）に含
まれている黒画素数の総和が記憶されることになるので
、これを単位領域内黒画素数信号３２として各単位領域
毎に上下限判定回路３３に取り込み、それが所定の上下
限範囲（ＢＬ、Ｂ１１）内にあるか否かで２値化して単
位領域の２値化信号３４を作成し、この時点で開いてい
るゲート回路３５を通して圧縮画像データ１１として出
力する。Thereafter, in the same way, when the last line of one unit area is reached, the l-line compressed image memory 27 stores the black pixels included in one unit area (K lines). Since the sum total of the numbers is to be stored, this is taken into the upper and lower limit determination circuit 33 for each unit area as the black pixel number signal 32 in the unit area, and if it is within the predetermined upper and lower limit ranges (BL, B11). A binary signal 34 of the unit area is created by binarizing the unit area depending on whether it exists or not, and outputting it as compressed image data 11 through the gate circuit 35 which is open at this point.

第３図は、第１図における投影演算回路１２の詳細を示
すブロック図である。同図において、３６は圧縮画像メ
モリ、３７は投影領域指定側路、３９は投影用アドレス
発生回路、４２はカウンタ回路、である。FIG. 3 is a block diagram showing details of the projection calculation circuit 12 in FIG. 1. In the figure, 36 is a compressed image memory, 37 is a projection area designation circuit, 39 is a projection address generation circuit, and 42 is a counter circuit.

圧縮画像データ１１は圧縮画像メモリ３６に一旦記憶さ
れ、投影領域及び投影方向を変えて投影演算が行われる
度に、圧縮画像データ４１として読み出される。領域抽
出回路１４が制御信号１５により投影領域及び投影方向
（第６図に見られるＸ方向またはＹ方向）を指定してく
る。この制御信号１５を入力されると、投影領域指定回
路３７は、投影領域及び投影方向を表す領域・方向デー
タ３８を投影用アドレス発生回路３９に与えて投影アド
レス４０を発生させ、圧縮画像メモリ３６から投影領域
の圧縮画像データ４１を順次読み出す。The compressed image data 11 is temporarily stored in the compressed image memory 36, and is read out as compressed image data 41 each time a projection calculation is performed by changing the projection area and projection direction. A region extraction circuit 14 specifies a projection region and a projection direction (X direction or Y direction as shown in FIG. 6) using a control signal 15. When this control signal 15 is input, the projection area specifying circuit 37 supplies the area/direction data 38 representing the projection area and the projection direction to the projection address generation circuit 39 to generate a projection address 40, and the compressed image memory 36 The compressed image data 41 of the projection area is sequentially read out from the projection area.

カウンタ回路４２では、先に第６図を参照して説明した
要領で黒画素数をカウントアツプして投影データ１３を
得て出力する。カウンタ回路４２は、投影ラインの最初
のところで初期化される。The counter circuit 42 counts up the number of black pixels in the manner described above with reference to FIG. 6, obtains and outputs the projection data 13. Counter circuit 42 is initialized at the beginning of the projection line.

第４図に文書画像１のＸ方向投影パターン４４とＹ方向
投影パターン４５の例を示した。４６ａ。FIG. 4 shows an example of the X-direction projection pattern 44 and the Y-direction projection pattern 45 of the document image 1. 46a.

４６ｂは領域抽出のためのしきい値である。46b is a threshold value for area extraction.

圧縮画像データの投影値は、それが文字領域であるか、
図形領域であるか、表領域であるか、写真領域であるか
等により色々な値をとる。従ってこのことを利用して領
域判別を行うわけであるが、取敢えず第４図の例では、
投影パターン４４，４５を解析することにより、文書画
像１は２段組構成であることが分かる。The projection value of compressed image data is whether it is a character area or not.
It takes various values depending on whether it is a graphic area, a table area, a photo area, etc. Therefore, this fact is used to perform area discrimination, but for now, in the example of Fig. 4,
By analyzing the projection patterns 44 and 45, it can be seen that the document image 1 has a two-column configuration.

そこで投影領域を文書画像１の左の段組に限るように制
御信号１５を与えてＸ方向に投影すると、第５図に見ら
れる如き投影パターン４７ａ、４７ｂが得られる。これ
を適宜のしきい値４６ｃで２値化すると、文字領域３に
対応するパターン４７ｂでは、文字行が小さな山の連続
として切り出せるし、図表領域２に対応するパターン４
７ａではひとかたまりのパターンになるので、文字領域
ではないことが分かる。Therefore, when the control signal 15 is applied to limit the projection area to the left column of the document image 1 and the image is projected in the X direction, projection patterns 47a and 47b as shown in FIG. 5 are obtained. When this is binarized using an appropriate threshold value 46c, character lines can be cut out as a series of small mountains in pattern 47b corresponding to character area 3, and pattern 4 corresponding to diagram area 2 can be cut out as a series of small mountains.
In 7a, the pattern is a group, so it can be seen that it is not a character area.

同様に右の段組を投影すれば文字領域と写真領域を抽出
することができる。Similarly, by projecting the columns on the right, text areas and photo areas can be extracted.

以上説明したように、圧縮画像メモリから圧縮画像デー
タを読み出しで投影演算を行うことから、圧縮データを
用いない場合に比べて投影演算に要する時間が短くてす
み、また圧縮に際しては、定めた単位領域毎にその中の
黒画素数によって２値化しているため、その領域が文字
であるか、図形であるか、表であるか、写真であるか、
などの特性を損なわない２値化（圧縮）が行われており
、各領域の判別が容易に可能である。As explained above, since projection calculations are performed by reading compressed image data from compressed image memory, the time required for projection calculations is shorter than when compressed data is not used. Since each area is binarized based on the number of black pixels within it, it is possible to determine whether the area is text, graphics, tables, photographs, etc.
Binarization (compression) is performed without impairing the characteristics such as, and each area can be easily distinguished.

〔Effect of the invention〕

以上説明したように、本発明によれば、文書画像だけで
なく、圧縮画像も演算により求めて記憶しているので、
正しい領域判別が行われるまで投影演算を高速に繰り返
すこきができる。また圧縮画像を演算により求める際、
単位領域の２値化に際しての判定基準となる黒画素数の
上下限値を任意に定め得るので、文字領域なら文字領域
を抽出するための圧縮画像を適切に求めることができる
。As explained above, according to the present invention, not only document images but also compressed images are calculated and stored.
Projection calculations can be repeated at high speed until correct area discrimination is performed. Also, when calculating a compressed image,
Since the upper and lower limits of the number of black pixels, which serve as a criterion for binarizing a unit area, can be arbitrarily determined, it is possible to appropriately obtain a compressed image for extracting a character area in the case of a character area.

更に投影演算は、投影領域だけでなく、投影方向もこれ
を指定して行うので、各種領域の判別が可能である。Furthermore, since the projection calculation is performed by specifying not only the projection area but also the projection direction, it is possible to discriminate between various areas.

[Brief explanation of the drawing]

第１図は本発明の一実施例を示すブロック図、第２図は
第１図における画像圧縮回路１０の詳細を示すブロック
図、第３図は第１図における投影演算回路１２の詳細を
示すブロック図、第４図、第５図はそれぞれ文書画像の
投影パターンの例を示す説明図、第６図は投影演算の説
明図、第７図は解析の対象となる文書の一例を示した説
明図、第８図は文書画像の圧縮の仕方の説明図、である
。符号の説明６・・・文書、７・・・イメージスキャナ（光学的文字
読取装置）、９・・・文書画像メモリ、１０・・・画像
圧縮回路、１２・・・投影演算回路、１４・・・領域抽
出回路、１７・・・文字領域メモリ、２０・・・文字読
取回路、代理人　弁理士　並　木　昭　夫代理人　弁理士　松　崎　　　清第　１図講　２　図第　３　＋３１ｇ５図＄ｂ　ＩｊFIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is a block diagram showing details of the image compression circuit 10 in FIG. 1, and FIG. 3 is a block diagram showing details of the projection calculation circuit 12 in FIG. 1. The block diagram, FIGS. 4 and 5 are explanatory diagrams each showing an example of a projection pattern of a document image, FIG. 6 is an explanatory diagram of projection calculation, and FIG. 7 is an explanatory diagram showing an example of a document to be analyzed. FIG. 8 is an explanatory diagram of how to compress a document image. Explanation of symbols 6... Document, 7... Image scanner (optical character reading device), 9... Document image memory, 10... Image compression circuit, 12... Projection calculation circuit, 14...・Area extraction circuit, 17...Character area memory, 20...Character reading circuit, Agent Patent attorney Akio Namiki Agent Patent attorney Kiyota Matsuzaki 1 Illustration 2 Illustration 3 +3 1g5 Illustration $b Ij

Claims

[Claims] 1) The read pixel data obtained by scanning and reading a document is divided into regions (referred to as unit regions) each consisting of a dot in the vertical direction and b dots in the horizontal direction in the document (however, (a and b each represent an integer of 2 or more), each unit area is determined depending on whether the number of specific dots of logical 1 or 0 in each unit area is within a certain threshold range. compressed data creation storage means for binarizing and creating compressed data of the read pixel data and storing it as a compressed image; and when projecting the compressed image read from the compressed data creation storage means, the projection area and projection The compressed image is projected while controlling the direction, and the compressed image is projected from the obtained projection data to determine the type of area, such as a character area, a graphic area, or a photographic area, in the document constituting the compressed image. document area analysis device.