JP2000306103A

JP2000306103A - Method and device for information processing

Info

Publication number: JP2000306103A
Application number: JP11118625A
Authority: JP
Inventors: Makoto Takaoka; 真琴高岡
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1999-04-26
Filing date: 1999-04-26
Publication date: 2000-11-02

Abstract

PROBLEM TO BE SOLVED: To reduce the data volume of a document picture and to hold document data in such a form that data can be easily used again. SOLUTION: A multi-valued picture original 101 is optically read to obtain original image data. Character areas 106 to 108 and image areas 102 to 105 are extracted from obtained original image data, and respective position information are acquired. Image data in character areas are subjected to character recognition, and recognition results are stored in a text layer 116. Image data in image areas are subjected to compression processing, and processing results are stored in a picture layer 112. Data obtained by compressing the whole of original image data are stored in an overall picture layer 111.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、光学的に文書原稿
を読み取って得られた画像データを処理する情報処理装
置及び方法に関し、特に読み取った画像データの再利用
を前提とした処理に好適な情報処理装置及び方法に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus and method for processing image data obtained by optically reading a document original, and more particularly to a method suitable for processing on the premise of reusing read image data. The present invention relates to an information processing apparatus and method.

【０００２】[0002]

【従来の技術】近年、紙による文書原稿をコンピュータ
上で保管したり再利用したりする場合には、スキャナで
その文書画像を読み込み、文書原稿を画像データ化して
保持するのが一般的である。そして、そのような画像デ
ータをワープロソフト等で利用できるように文字コード
化するには、得られた画像データについて文字認識処理
を行い、コード化を行っている。その際、文字情報はコ
ード化させるが、文書中に含まれる画像については画像
データから切り出して、これを文書中に貼り付け可能に
保持する。そのため文書原稿をコンピュータ上で保管す
るためには、文字認識処理に先立って文書画像のレイア
ウト解析処理を行う必要があり、その精度も重要視され
るようになってきている。2. Description of the Related Art In recent years, when a paper document is stored or reused on a computer, it is common to read the document image with a scanner and convert the document into image data and hold it. . In order to convert such image data into a character code so that it can be used by word processing software or the like, character recognition processing is performed on the obtained image data to perform coding. At this time, the character information is coded, but the image included in the document is cut out from the image data and held so that it can be pasted into the document. Therefore, in order to store a document manuscript on a computer, it is necessary to perform a layout analysis process of the document image prior to the character recognition process, and the accuracy of the process is becoming increasingly important.

【０００３】さらには、技術の進歩に伴い、文字認識処
理には言語解析が付属され、レイアウト解析処理には組
み版技術のノウハウが組み入れられ、いずれの処理もか
なり精度の高いものが得られるようになってきている。Further, with the advancement of technology, a language analysis is attached to the character recognition process, and the know-how of the typesetting technology is incorporated in the layout analysis process, so that each process can obtain a highly accurate one. It is becoming.

【０００４】そのような状況の下、文書原稿をレイアウ
ト解析処理と文字認識処理の結果を用いて保存すること
が可能となってきている。この形態で文書画像を保存し
ておけば、文書の文字検索が可能となり、文書画像の再
利用の可能性を向上させることができる。この種の保存
形式としては、特定のアプリケーションのファイルフォ
ーマットを用いる場合が多い。例えば、マイクロソフト
社のＲＴＦ（Rich Text Format）であったり、ジャスト
システム社の一太郎（商標）フォーマットであったり、
或いは、最近では、インターネットでよく用いられてい
るＨＴＭＬ形式であったりする。さらには、少し複雑な
形式になるがＳＧＭＬフォーマットで保存する場合もあ
る。[0004] Under such circumstances, it has become possible to save a document manuscript using the results of layout analysis processing and character recognition processing. If the document image is stored in this form, character search of the document becomes possible, and the possibility of reusing the document image can be improved. As this type of storage format, a file format of a specific application is often used. For example, Microsoft's RTF (Rich Text Format), JustSystem's Ichitaro (trademark) format,
Alternatively, recently, the HTML format is often used on the Internet. Further, although the format becomes a little complicated, it may be stored in the SGML format.

【０００５】一方、文字中心の文書原稿とは異なり画像
原稿ともいうべき文書がある。その名の通り画像を主と
する原稿である。これは、画像を限りなくオリジナルに
近いように保ち、かつデータ量を少なくし、しかも画像
ハンドリングの優れたものが望まれる。文書原稿では、
画像部が多少オリジナルと異なっていても内容を確認で
きれば構わないといった文書が多いのに対して、画像原
稿では、絵となる画像の再現性が重要視される。On the other hand, there is a document which can be called an image original unlike a text original document. As the name implies, it is a document mainly composed of images. It is desired that the image be kept as close as possible to the original, the data amount be reduced, and the image handling be excellent. In the document manuscript,
In many documents, it is sufficient that the contents can be confirmed even if the image part is slightly different from the original, but in an image document, the reproducibility of a picture image is regarded as important.

【０００６】そのため、画像原稿については、全面を圧
縮して保存するのが一般的である。近年、画像を保存す
る画像フォーマットとしてFlashPix（コダック社提唱の
画像フォーマット）という画像フォーマットが一般化さ
れてきた。このFlashPixでは、画像を６４×６４画素の
ブロック領域に分割し、そのブロック毎に符号化を行
う。このブロック毎の符号化においては、ブロック毎に
符号化方式を選択できる。そして、このような形式の符
号化を、段階的に画像の解像度を変化させて行って各々
の結果を保持する。また、画像管理方法を詳しく保存で
きるという特徴も有する。この画像フォーマットは、デ
ータ量を少なくすることと、プリンタ出力時の高解像度
画像とディスプレイ表示時の低解像度画像の要求を満た
す画像ハンドリング方法を実現している。[0006] For this reason, it is common to compress and save the entire surface of an image document. In recent years, an image format called FlashPix (an image format proposed by Kodak Company) has been generalized as an image format for storing images. In this FlashPix, an image is divided into block areas of 64 × 64 pixels, and encoding is performed for each block. In the coding for each block, a coding method can be selected for each block. Then, such encoding is performed by changing the resolution of the image step by step, and each result is held. Another feature is that the image management method can be stored in detail. This image format realizes an image handling method that reduces the amount of data and satisfies the requirements for a high-resolution image when outputting to a printer and a low-resolution image when displaying on a display.

【０００７】FlaxhPixのフォーマットは主に一面自然画
のような写真画像に向いた画像フォーマットである。ま
た、各ブロックに用いられる符号化方式としては、ＪＰ
ＥＧ符号化、ＪＢＩＧ符号化ＭＭＲ符号化などが挙げら
れる。そして、このFlaxhPixのフォーマットで、画像原
稿を保存することも行われている。The FlaxhPix format is an image format mainly suitable for a photographic image such as a one-sided natural image. In addition, as the encoding method used for each block, JP
EG encoding, JBIG encoded MMR encoding, and the like. An image document is also stored in the FlaxhPix format.

【０００８】[0008]

【発明が解決しようとする課題】さて、文字を主体とし
た文書画像の中にも絵となる領域が存在する場合があ
り、そのような場合は、その絵となる領域の再現性も重
視すべきである。或いは、絵を主体とした画像原稿の中
にも文字領域が存在する場合があり、その部分は文字コ
ード化しておきたいという要望もある。A document image mainly composed of characters sometimes has a picture area, and in such a case, importance is placed on the reproducibility of the picture area. Should. Alternatively, there is a case where a character area exists in an image original mainly composed of a picture, and there is a demand that the part is to be converted into a character code.

【０００９】以上のように、画像の再現性と文字のコー
ド化を実現する文書画像の保存形式が望まれてきてい
る。しかしながら、両者を満たすフォーマットおよびそ
の作成方法については以下の課題が存在する。（１）一般に１ページ全体の画像が圧縮して保持される
ため、圧縮率を上げると画像の劣化が顕著になる。（２）１ページをレイアウト解析して、分割して保持す
るためには、その処理方法、およびそのフォーマットが
確立していない。（３）作製された文書画像解析データにおいて、解析結
果に不満足の場合、修正が簡単に施され得る形式である
か。（４）他のアプリケーションソフトのフォーマットとど
のようにデータ交換を行うか。（５）文書画像データとして、どのように検索を行うの
か。As described above, there is a demand for a document image storage format that realizes image reproducibility and character encoding. However, there are the following problems in a format that satisfies both, and a method of creating the format. (1) In general, the image of one entire page is compressed and held, so that when the compression ratio is increased, the deterioration of the image becomes remarkable. (2) A processing method and a format for analyzing layout of one page and dividing and storing the divided pages are not established. (3) In the prepared document image analysis data, if the analysis result is not satisfactory, is it in a format that can be easily corrected? (4) How to exchange data with other application software formats. (5) How to search as document image data

【００１０】本発明は上記の問題に鑑みてなされたもの
であり、文書画像のデータ量を削減するとともに、デー
タの再利用が容易な形態で文書データを保持することを
目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to reduce the amount of data of a document image and to retain document data in a form that facilitates data reuse.

【００１１】[0011]

【課題を解決するための手段】上記の目的を達成するた
めの本発明の情報処理装置は例えば以下の構成を備え
る。すなわち、文書原稿を光学的に読み取り、原稿イメ
ージデータを得る読取手段と、前記原稿イメージデータ
から文字領域とイメージ領域を抽出するとともに、それ
ぞれの位置情報を取得する解析手段と、前記解析手段で
得られた文字領域のイメージデータについて文字認識を
行う認識手段と、前記イメージ領域のイメージデータに
ついて圧縮処理を行う圧縮手段と、前記認識手段で得ら
れた認識結果データと、前記圧縮手段で得られた圧縮デ
ータと、前記解析手段で得られた位置情報を保存する保
存手段とを備える。An information processing apparatus according to the present invention for achieving the above object has, for example, the following arrangement. That is, a reading means for optically reading a document manuscript to obtain manuscript image data, an analyzing means for extracting a character area and an image area from the manuscript image data and obtaining respective position information, Recognition means for performing character recognition on the image data of the obtained character area; compression means for performing compression processing on the image data of the image area; recognition result data obtained by the recognition means; A storage unit for storing the compressed data and the position information obtained by the analysis unit;

【００１２】[0012]

【発明の実施の形態】以下、添付の図面を参照して本発
明の好適な実施形態を説明する。Preferred embodiments of the present invention will be described below with reference to the accompanying drawings.

【００１３】本実施形態では、文書画像を解析し、再利
用可能な形式を有するデータにて保存する（マイクロド
キュメント化する）画像処理装置を説明する。In this embodiment, an image processing apparatus which analyzes a document image and saves (converts into a micro document) in data having a reusable format will be described.

【００１４】本実施形態では、以下の項目を満足する画
像処理装置を説明する。（１）文書原稿画像をスキャナ入力する際に、２値化画
像、白黒グレースケール画像、ＲＧＢ２５６色パレット
カラー画像、ＲＧＢフルカラー画像等、すべての画像形
式を入力可能とする。（２）各形式の画像に対して、レイアウト解析可能な画
像データに変換を行う。（３）（２）における画像に対して、レイアウト解析処
理を行い、文書画像中のText領域、Picture領域等の属
性別、レイアウト解析処理を行う。（４）レイアウト解析処理において、Text領域の順序認
識、Picture領域の優先順位解析を行う。（５）Text領域の文字認識処理を行う。（６）Picture領域の画像を入力原画像から切り出す。（７）切り出した画像を、指定の符号化方式で圧縮す
る。例えばFlashPix形式の画像を作成する。FlashPix形
式の場合、レイアウト解析処理結果、文字認識処理結果
より情報記述部に必要情報を記述する。（８）原画像を全面圧縮符号化する。例えば１／１００
位の圧縮率を目標にする。（９）全面圧縮データ、切り出し画像データ、レイアウ
ト解析データ、文字認識データをまとめて、一つの画像
解析フォーマットとして保存する。（１０）画像表示検索の場合、全面圧縮データ、あるい
は切り出し画像データを表示して検索する。（１１）文字コード検索の場合、文字認識結果を用いて
検索する。その場合候補文字を含めてあいまい検索も行
う。（１２）本解析フォーマットより、再作成する文書画像
は、文字コード、線ベクトル、貼り付け画像を用いて作
成する。作製された文書画像が不適切ならば、全面画像
圧縮データを用いて、修正を行う。（１３）一般のファイル形式、ＲＴＦフォーマット、一
太郎フォーマット、ＨＴＭＬフォーマット、ＸＭＬフォ
ーマット等へは、本解析画像フォーマットからコンバー
トして作成する。（１４）一般ファイルフォーマットから、本解析フォー
マットへは、逆変換コンバートを行い作成する。In this embodiment, an image processing apparatus satisfying the following items will be described. (1) When a document document image is input by a scanner, all image formats such as a binarized image, a monochrome grayscale image, an RGB 256-color palette color image, and an RGB full-color image can be input. (2) The image of each format is converted into image data that can be subjected to layout analysis. (3) A layout analysis process is performed on the image in (2), and a layout analysis process is performed for each attribute such as a Text area and a Picture area in the document image. (4) In the layout analysis processing, the order recognition of the Text area and the priority analysis of the Picture area are performed. (5) Perform character recognition processing for the Text area. (6) The picture in the Picture area is cut out from the input original image. (7) The cut-out image is compressed by a specified encoding method. For example, create an image in FlashPix format. In the case of the FlashPix format, necessary information is described in the information description section based on the layout analysis processing result and the character recognition processing result. (8) Full compression encoding of the original image. For example, 1/100
Target compression ratio of the order. (9) The entire compressed data, cut-out image data, layout analysis data, and character recognition data are collectively saved as one image analysis format. (10) In the case of the image display search, the whole compressed data or the cut-out image data is displayed and searched. (11) In the case of a character code search, a search is performed using a character recognition result. In that case, a fuzzy search including candidate characters is also performed. (12) A document image to be re-created from the analysis format is created using a character code, a line vector, and a pasted image. If the created document image is inappropriate, the document image is corrected using the entire image compression data. (13) A general file format, an RTF format, an Ichitaro format, an HTML format, an XML format, and the like are created by converting this analysis image format. (14) Reverse conversion conversion is performed from the general file format to the present analysis format.

【００１５】以上の機能を実現するために、本実施形態
では、文書画像をその内容に応じてパーツ化させて保持
する。そして、コード化、ベクトル化が可能なパーツは
コード化、ベクトル化を行う。高精細な情報を保持する
べき自然画や絵の領域は、高精細画像を再現できる様に
圧縮符号化を行う。このような形式で文書画像を保存す
ることにより、オリジナル画像の大きなメモリ量を削減
すること、必要な領域（パーツ）は情報を劣化させない
こと、文字コードを自動的に得ることができる等の利点
が生じる。In order to realize the above functions, in the present embodiment, the document image is stored as parts according to the contents. Parts that can be coded and vectorized are coded and vectorized. Compression encoding is performed on a natural picture or picture area where high-definition information is to be held so that a high-definition image can be reproduced. By saving a document image in such a format, the large memory capacity of the original image can be reduced, the necessary area (parts) does not deteriorate the information, and the character code can be automatically obtained. Occurs.

【００１６】図１８は、本実施形態による画像格納装置
の構成を示すブロック図である。図１８において、１１
はＣＰＵであり、本実施形態における画像格納装置の各
種処理を実行する。１２はＲＯＭであり、ＣＰＵ１１が
実行するブートプログラムや各種パラメータを格納す
る。１３はＲＡＭであり、ＣＰＵ１１によって実行する
ための制御プログラムのロード先を提供したり、ＣＰＵ
１１が各種処理を実行するにあたっての作業領域を提供
する。FIG. 18 is a block diagram showing the configuration of the image storage device according to the present embodiment. In FIG. 18, 11
Denotes a CPU, which executes various processes of the image storage device in the present embodiment. A ROM 12 stores a boot program executed by the CPU 11 and various parameters. Reference numeral 13 denotes a RAM which provides a load destination of a control program to be executed by the CPU 11,
11 provides a work area for executing various processes.

【００１７】１４はディスプレイであり、ＣＰＵ１１の
制御により各種表示を行う。１５は入力部であり、例え
ばキーボードやポインティングデバイスを備え、画像格
納装置に対する各種操作入力を実現する。１６は外部記
憶装置であり、例えばハードディスクで構成され、後述
する画像格納処理を実現するための制御プログラムや、
画像格納処理によって生成された画像データを格納す
る。１７はスキャナであって、格納すべき原稿を読取
り、原稿画像データを提供する。１８はプリンタであ
り、入力された画像データに基づいて可視画像を記録す
る。１９は上述した各構成を接続するシステムバスであ
る。A display 14 performs various displays under the control of the CPU 11. An input unit 15 includes, for example, a keyboard and a pointing device, and realizes various operation inputs to the image storage device. Reference numeral 16 denotes an external storage device, which is configured by, for example, a hard disk, and implements a control program for realizing an image storage process described later,
The image data generated by the image storing process is stored. A scanner 17 reads a document to be stored and provides document image data. A printer 18 records a visible image based on the input image data. A system bus 19 connects the above-described components.

【００１８】まず、本実施形態の文書データ格納形式で
あるところのマイクロドキュメントについて説明する。
図１は、本実施形態によるマイクロドキュメントのデー
タ構成を説明する図である。First, a micro document which is a document data storage format of the present embodiment will be described.
FIG. 1 is a diagram illustrating a data structure of a micro document according to the present embodiment.

【００１９】図１の（１）において、１０１は多値画像
原稿である。ここでは図示の都合上線画で示したが、こ
の多値画像原稿１０１はＲＧＢカラー画像あるいはその
他の形式の濃度値を保存した画像である。多値画像原稿
１０１において、１０２〜１０５は写真画像であり、１
０６〜１０８は文字画像である。なお、この多値原稿画
像データ１０１の画像データは、オリジナル原稿の状態
を極力再現するために３００ＤＰＩ以上の高解像度で保
存してあるものとする。この場合、ＲＧＢ各色８ビット
多値画像とすると、Ａ４原稿で、実に３０ＭＢｙｔｅ以
上のメモリ量となる。In FIG. 1A, reference numeral 101 denotes a multi-valued image original. Although shown here as a line drawing for convenience of illustration, the multi-valued image document 101 is an RGB color image or an image in which density values in other formats are stored. In the multi-valued image document 101, reference numerals 102 to 105 denote photographic images,
06 to 108 are character images. It is assumed that the image data of the multi-valued document image data 101 is stored at a high resolution of 300 DPI or more in order to reproduce the state of the original document as much as possible. In this case, if an 8-bit multi-valued image for each of RGB is used, an A4 document has a memory amount of 30 MByte or more.

【００２０】本実施形態では、このような文書画像を画
像解析し、内部に書かれている内容にしたがって、各パ
ーツに分離する。そして、各パーツを図１の（２）に示
したようなレイヤに保存する。このパーツ全体を含むレ
イヤ構造のデータが本実施形態のマイクロドキュメント
となる。なお、以下では、文書原稿内のパーツ一つ一つ
を、文書内のオブジェクトと呼ぶ。In the present embodiment, such a document image is subjected to image analysis, and is separated into parts according to the contents written inside. Then, each part is stored in a layer as shown in FIG. The data of the layer structure including the whole part becomes the micro document of the present embodiment. In the following, each part in the document is referred to as an object in the document.

【００２１】図１の（２）では、各オブジェクトが管理
される場所であるところのレイヤが概念的に示されてい
る。図１の（１）に示される文書画像が文書解析され、
その画像内容に従って、各レイヤにオブジェクトが格納
されることになる。FIG. 1B conceptually shows a layer where each object is managed. The document image shown in (1) of FIG.
An object is stored in each layer according to the image content.

【００２２】最上位階層の、全面画像Layer１１１は、
文書画像全体をＪＰＥＧ圧縮等で圧縮した結果を保存す
る階層である。文書画像を画像データとして扱う分に
は、このデータを伸長すればよいことになる。全面画像
Layer１１１では、圧縮率を上げるため、画像劣化が多
少大きくなっても（すなわち画質を多少犠牲にして
も）、少なくとも１／１００ぐらいを目安に圧縮する。
この全面画像Layer１１１は、全体を画像として保存す
るため、多少画像が劣化していても、文書解析結果を格
納する他のレイヤの結果を訂正するのに役立つことや、
ページ全体を小さな画像として表示するサムネール画像
としても役立つ。また、この全面画像Layer１１１の画
像データは、復元してディスプレイで表示した場合に
は、文書画像として何ら申し分ない画像を得ることがで
きる程度のものとする。一方、この全画像Layer１１１
のデータを用いてプリントを行った場合にぎざぎざ等が
現れてしまうのは仕方がない問題とする。The entire image Layer 111 at the highest level is
This is a hierarchy for storing the result of compressing the entire document image by JPEG compression or the like. In order to handle a document image as image data, it is sufficient to expand this data. Full image
In the layer 111, in order to increase the compression ratio, even if the image deterioration is somewhat large (that is, even if the image quality is somewhat sacrificed), the compression is performed at least about 1/100 as a guide.
Since the entire image layer 111 stores the entire image as an image, even if the image is slightly deteriorated, it is useful for correcting the result of another layer storing the document analysis result,
It is also useful as a thumbnail image that displays the entire page as a small image. In addition, when the image data of the full-size image Layer 111 is restored and displayed on a display, it is assumed that an image that is satisfactory as a document image can be obtained. On the other hand, this entire image Layer 111
It is unavoidable that jaggedness or the like appears when printing is performed using the above data.

【００２３】Picture Layer１１２は、後述する文書解
析結果から得られるPictureという属性のついた領域の
画像データを格納する。ここでPictureとは、写真等の
濃度変化が多く、自然画と判断された領域をさす。図１
の（１）では、１０２〜１０５で示される部分がPictur
eという属性を有する領域となる。The Picture Layer 112 stores image data of a region having a Picture attribute obtained from a document analysis result described later. Here, “Picture” refers to an area where the density of a photograph or the like changes greatly and is determined to be a natural image. FIG.
In (1), the portions indicated by 102 to 105 are Pictur
It becomes a region having the attribute e.

【００２４】図２は多値画像原稿１０１のPicture領域
１０３を切り出して得られた画像データを示す図であ
る。また、図３は多値画像原稿１０１のPicture領域１
０４を切り出して得られた画像データを示す図である。
Picture領域として切り出された画像には高精細な画像
を再現できる圧縮方法が適用される。すなわち、この切
り出し画像には、ディスプレイ表示においても、プリン
ト時においてもきれいに画像再現できるような圧縮が行
われる。例えば、４００ＤＰＩのプリンタに出力する際
に、画像領域（Picture領域）に関しては、Picture Lay
er１１２に保存されているデータを呼び出してプリンタ
へデータ転送すれば、当該画像領域をきれいに再現する
ことができる。FIG. 2 is a diagram showing image data obtained by cutting out the Picture area 103 of the multi-valued image document 101. FIG. 3 shows a picture area 1 of the multi-valued image document 101.
It is a figure which shows the image data obtained by cutting out 04.
A compression method capable of reproducing a high-definition image is applied to an image cut out as a Picture area. That is, the cut-out image is compressed so that the image can be reproduced clearly both on the display and during printing. For example, when outputting to a 400 DPI printer, the picture area (Picture area) is
If the data stored in the er112 is called and transferred to the printer, the image area can be reproduced clearly.

【００２５】さて、このPicture領域の圧縮方法として
は、上記ＪＰＥＧ圧縮による方法を用いても良いが、本
実施形態では、FlashPixのデータフォーマットで保存す
ることにする。以下、FlashPixについて簡単な説明を行
う。As a method of compressing the Picture area, the above-described JPEG compression method may be used, but in the present embodiment, the Picture area is stored in a FlashPix data format. Hereinafter, FlashPix will be briefly described.

【００２６】図４はFlashPixによる画像のタイル分割を
示す図である。FlashPixでは、６４×６４ドットのブロ
ックに分割した各領域で符号化を行う。また、図５に示
すように、解像度を段階的に変えた画像をあらかじめい
っしょに保存する。このため、プリンタには３００ＤＰ
Ｉの画像を用いて、表示には７５ＤＰＩの画像を用い
る、といった場合、高解像度の画像から低解像度の画像
を作成するといった処理を行わず、直接的に低解像度の
画像を得ることができる。なお、段階的な解像度の画像
を持つため、その分データ量は増えるが、低解像度の間
引きされた画像であるため、その増加量は僅かである。FIG. 4 is a diagram showing tile division of an image by FlashPix. In FlashPix, encoding is performed in each area divided into 64 × 64 dot blocks. In addition, as shown in FIG. 5, an image whose resolution is changed stepwise is stored together in advance. For this reason, the printer has 300 DP
In the case where an image of I is used and an image of 75 DPI is used for display, a low-resolution image can be directly obtained without performing a process of creating a low-resolution image from a high-resolution image. Since the image has a stepwise resolution, the data amount increases accordingly. However, since the image is a thinned-out image with a low resolution, the increase amount is slight.

【００２７】以上の説明からわかるように、図２に示さ
れているのはPicture領域１０３の画像データを、Flash
Pixのフォーマットに乗っ取って、段階的な解像度で画
像を保存する例である。同様に、図３に示されているの
は、Picture領域１０４の画像データを、FlashPixのフ
ォーマットに乗っ取って、段階的な解像度で画像を保存
する例である。こうして得られた各パーツの画像データ
はPicture Layer１１２に格納される。As can be seen from the above description, FIG. 2 shows that the image data in the
This is an example of hijacking the Pix format and saving the image at a gradual resolution. Similarly, FIG. 3 shows an example in which the image data in the Picture area 104 is hijacked into the FlashPix format and the image is stored at a stepwise resolution. The image data of each part thus obtained is stored in the Picture Layer 112.

【００２８】次に、図１のLine Art Layer１１３では、
上述のPicture Layer１１２のレイヤと異なり、線画と
判断された領域の画像を保存する。線画領域は、自然画
とは異なり、コンピュータのソフトウエアで作成された
画像で、比較的濃度変化が少なく、また空白の領域が多
い領域である。Next, in the Line Art Layer 113 of FIG.
Unlike the above-mentioned Picture Layer 112, the image of the area determined to be a line drawing is stored. Unlike a natural image, a line drawing area is an image created by computer software and has a relatively small change in density and many blank areas.

【００２９】この線画領域は、線画ベクトル変換を行う
ことができれば、そのような情報で保存することができ
る。本実施形態ではベクトル変換によってえられたデー
タをLine Vectore Layer１１４にて保存する。しかしな
がら、線画ベクトル変換は、現状の技術では完全な変換
ができるわけではなく、画像データとして扱うのが一般
的である。ＣＡＤ図面や簡単な線画は、ベクトル化がし
やすいが、これは、一旦画像データとして保存された状
態から、後処理で変換した方が確実である。そこで、本
実施形態では、線画についても図２や図３で示したよう
にFlashPix形式による階層的な多解像度での画像保存を
行うものとする。This line drawing area can be stored with such information if line drawing vector conversion can be performed. In the present embodiment, the data obtained by the vector conversion is stored in the Line Vectore Layer 114. However, the line drawing vector conversion cannot be completely converted by the current technology, and is generally handled as image data. Although CAD drawings and simple line drawings are easy to vectorize, it is more certain that they are converted by post-processing from a state once stored as image data. Therefore, in the present embodiment, as shown in FIGS. 2 and 3, hierarchical multi-resolution image storage in the FlashPix format is performed for line drawings.

【００３０】Table Layer１１５では、表の抽出、解析
結果を格納する。表のためのレイヤを特別に設けたのは
次の理由による。すなわち、表は、表計算ソフトウエア
という分野が存在するように、文書の中でもその数値が
特別の意味を持つ。そのため、文書中の表のみが必要な
場合、このレイヤ（Table Layer）のみを参照すれば良
いように一つのレイヤとして設けてある。図６は表の構
成例を説明するための図である。図６のごとき、３×３
の簡単な表であっても、外枠Table（ａ）、一つ一つの
内部枠Cell（ｂ）、１行毎の文字列Element（ｃ）ま
た、線で囲まれた範囲Region（ｄ）のような部品に分か
れる。実際にはもっと複雑な表があるため、もっと複雑
な形態になるが、ここではこの程度の説明で留めてお
く。The Table Layer 115 stores the results of table extraction and analysis. The special layer for the table is provided for the following reason. That is, the numerical value of a table has a special meaning in a document, as in the case of a spreadsheet software field. For this reason, when only a table in a document is required, it is provided as one layer so that only this layer (Table Layer) needs to be referred to. FIG. 6 is a diagram for explaining a configuration example of the table. As shown in FIG. 6, 3 × 3
Is a simple table, the outer frame Table (a), each inner frame Cell (b), the character string Element (c) for each line, and the range Region (d) surrounded by a line Divided into such parts. Actually, there are more complicated tables, so it will be more complicated form, but it will be described here in this degree.

【００３１】Text Layer１１６は、文書画像中の文字領
域と解析された領域（領域１０６〜１０８）の画像デー
タに文字認識処理を施して、その結果を格納する領域で
ある。ここでは、単に文字コード情報のみではなく、文
字認識処理で得られる候補文字、飾り情報、大きさ、フ
ォント等の情報、文字の行数、縦書き、横書き情報等も
保存する。なお、文字認識の結果の第一候補のみを抽出
して保持すると、生のプレーンテキストを得ることがで
きるが、誤認識文字を含めたものとなってしまう可能性
が大である。The Text Layer 116 is a region for performing character recognition processing on the image data of the character region and the analyzed region (regions 106 to 108) in the document image and storing the result. Here, not only the character code information but also information such as candidate characters, decoration information, size, font, etc., the number of character lines, vertical writing, horizontal writing information, etc. obtained by the character recognition processing are stored. If only the first candidate of the result of character recognition is extracted and stored, a raw plain text can be obtained, but it is highly likely that the result includes a misrecognized character.

【００３２】Layout Layer１１７は、文書画像をレイア
ウト解析した結果の、レイアウト情報を格納する領域で
ある。Layout Layer１１７には、各属性毎の各領域の矩
形情報（矩形のサイズや位置を含む）やその他付加情報
が保存される。The Layout Layer 117 is an area for storing layout information as a result of layout analysis of a document image. The layout layer 117 stores rectangle information (including the size and position of the rectangle) and other additional information of each area for each attribute.

【００３３】次に、上述したようなMicro Document化さ
れた情報をどのように作成するかを説明する。なお、以
下の説明では、図１の（１）で示した多値画像原稿１０
１を処理する場合を例に挙げて説明する。なお、原稿画
像はカラー画像であっても白黒多値画像であってもよ
い。いずれにしても、図１の（１）で示されるような、
原稿読取り後の全面画像が最も多く情報を持った形態で
あると共に、最も画像データが大きい。Next, how to create the information in the form of a Micro Document as described above will be described. In the following description, the multi-valued image document 10 shown in FIG.
1 will be described as an example. The document image may be a color image or a black-and-white multi-valued image. In any case, as shown in FIG.
The entire image after reading the document has the most information and the largest image data.

【００３４】図７は本実施形態におけるマイクロドキュ
メントの生成手順を説明するフローチャートである。図
７において、ステップＳ５２〜Ｓ５５で示される処理
は、レイアウト解析に適した画像データを作成するため
の手順を示している。FIG. 7 is a flowchart illustrating a procedure for generating a micro document according to the present embodiment. In FIG. 7, the processing shown in steps S52 to S55 shows a procedure for creating image data suitable for layout analysis.

【００３５】まず、ステップＳ５１において多値画像を
入力すると、ステップＳ５２において明度情報を作成す
る。多値画像はＲＧＢカラーの場合、その色がＣＩＥの
色空間に適した画像であるならば、周知の所定の演算式
に従って明度情報を作成できる。また、単にスキャナ特
性で得られた画像ならば、単純にＧＲＥＥＮの画像を明
度情報にしてもさほど問題はない。Ｇｒｅｅｎの画像デ
ータが、色波長の真ん中に位置する為、文字、線矩形等
は、この色に属する場合が多い。厳密には、色ごとに明
度情報を作り出すのだが、ここでは、説明を省く。First, when a multi-value image is input in step S51, brightness information is created in step S52. When the multi-valued image is an RGB color, if the color is an image suitable for the CIE color space, the brightness information can be created according to a well-known predetermined arithmetic expression. In addition, if the image is simply obtained by the scanner characteristics, there is no problem even if the GREEN image is simply the brightness information. Since Green image data is located at the center of the color wavelength, characters, line rectangles, and the like often belong to this color. Strictly speaking, brightness information is generated for each color, but the description is omitted here.

【００３６】なお、上記ステップＳ５２で得られた明度
画像は、白黒グレースケール画像と等価である。従っ
て、白黒グレースケール画像の場合はステップＳ５３か
ら処理を始めればよい。The brightness image obtained in step S52 is equivalent to a monochrome gray scale image. Therefore, in the case of a monochrome grayscale image, the processing may be started from step S53.

【００３７】ステップＳ５３では下地除去処理を行う。
これは、グレースケール画像の場合に下地の濃度がある
程度存在すると、その下地と上地に描かれている線等を
分離しなければ後のレイアウト解析が実行できないため
である。単純な方法では、一定しきい値で、分離する単
純２値化方法がある。しかしながら、画像の領域毎に、
しきい値を変化させる適応的２値化方法が最も適してい
る。この方法は、画像の部分部分でしきい値を変化させ
るため、領域毎の文字抽出に適している。In step S53, a background removal process is performed.
This is because if the density of the background exists to some extent in the case of a grayscale image, subsequent layout analysis cannot be executed unless the background and the lines drawn on the background are separated. As a simple method, there is a simple binarization method in which separation is performed at a fixed threshold. However, for each region of the image,
An adaptive binarization method that changes the threshold is most suitable. This method is suitable for character extraction for each area because the threshold value is changed in a part of the image.

【００３８】ステップＳ５４では下地除去後の画像デー
タを２値化する。これはステップＳ５３の下地除去と同
時に行われる処理であるが、図７では説明のためあえて
分離して記載してある。この結果、最終的に、図１に示
される多値画像１０１は、領域１０２〜１０５の画像部
が比較的黒べったりに塗りつぶされ、領域１０６〜１０
８の文字部がくっきりと現れる２値化画像が作成され
る。ステップＳ５５ではこの２値化画像を保持する。In step S54, the image data from which the background has been removed is binarized. This process is performed simultaneously with the background removal in step S53, but is separately illustrated in FIG. 7 for explanation. As a result, finally, in the multi-valued image 101 shown in FIG. 1, the image portions of the regions 102 to 105 are relatively blackened, and the regions 106 to
A binarized image in which the character portion 8 clearly appears. In step S55, the binarized image is held.

【００３９】ステップＳ５６では、ステップＳ５５で保
持された２値化画像についてレイアウト解析処理を行
い、図８に示すごとき矩形情報を得る。各矩形１０２’
〜１０８’はそれぞれ図１のイメージ１０２〜１０８の
領域に対応している。なお、このレイアウト解析処理に
は、周知のレイアウト解析技術が適用され得る。簡単に
説明すると次のようである。すなわち、画像データを、
左上から検査してゆき、黒画素を検出すると、その黒画
素の周りの画像を調べ、輪郭線追跡という手法で画像を
抽出する。画像全体をこの方法で調査して、得られた輪
郭線情報と画像の黒画素・白画素の状態、黒画素矩形の
位置情報等より、その領域の属性を判断する。必ずしも
正確ではないが、現在までの技術開発でかなりの精度の
高いレイアウト解析結果を得ることができる。そして、
ステップＳ５７において、ステップＳ５６で判定された
属性に応じて、各オブジェクトを対応するレイヤに格納
する。In step S56, a layout analysis process is performed on the binarized image held in step S55 to obtain rectangular information as shown in FIG. Each rectangle 102 '
To 108 'correspond to the regions of the images 102 to 108 in FIG. 1, respectively. Note that a well-known layout analysis technique can be applied to the layout analysis processing. The brief description is as follows. That is, the image data is
The inspection is performed from the upper left, and when a black pixel is detected, an image around the black pixel is examined, and an image is extracted by a method called contour line tracking. The entire image is examined by this method, and the attribute of the region is determined from the obtained contour information, the state of the black and white pixels of the image, the position information of the black pixel rectangle, and the like. Although not always accurate, a very high precision layout analysis result can be obtained by the technical development to date. And
In step S57, each object is stored in the corresponding layer according to the attribute determined in step S56.

【００４０】次に、各属性について説明する。図９にレ
イアウト解析処理より得られた各属性を示す。図９で
は、属性を“TEXT”，“PECTURE”，“TABLE”，“LIN
E”，“FRAME”，“NOISE”の６つの分類に分けた。文
字どおり“TEXT”は文字領域、“PICTURE”は自然画及
び線画領域、“TABLE”は表領域、“LINE”は線領域、
“FRAME”は枠領域を示す。また“NOISE”は未確認領域
で、単にノイズと判断されるか、例えば“文字領域”と
して判断しかねた領域などにつけられる。“NOISE”属
性は、その後、さらに解析をすすめ、他の属性に変更す
る場合とノイズとして、消去してしまう場合の２種類に
分けられる。Next, each attribute will be described. FIG. 9 shows each attribute obtained by the layout analysis processing. In FIG. 9, the attributes are "TEXT", "PECTURE", "TABLE", "LIN".
"TEXT" is a character area, "PICTURE" is a natural image and line drawing area, "TABLE" is a table area, "LINE" is a line area,
“FRAME” indicates a frame area. “NOISE” is an unconfirmed area, which is simply determined as noise or added to, for example, an area that cannot be determined as a “character area”. The “NOISE” attribute is further classified into two types: a case where the analysis is further performed to change the attribute to another attribute, and a case where the attribute is erased as noise.

【００４１】さて、“TEXT”領域は、本文“TX_TEX
T”、タイトル“TX_TITLE”、キャプション“TX_CAPTIO
N”、表内文字“TB_CELL”といった詳細属性が付加され
る。なお、キャプションとは、図、表等の表題や注釈を
指す。また、“PICTURE”領域は、自然画“PI_PICTUR
E”と線画“PI_LINE_ART”といった詳細属性が付加され
る。また、“TABLE”領域は、表枠“TB_FRAME”、表の
罫線で囲まれた領域“TB_REGION”、表内文字“TB_CEL
L”、表内CELL内の一行分の文字ブロック“TB_ELEMEN
T”といった詳細属性が付加される。The "TEXT" area stores the text "TX_TEX".
T ”, title“ TX_TITLE ”, caption“ TX_CAPTIO ”
Detailed attributes such as “N” and in-table characters “TB_CELL” are added.Captions indicate titles and annotations of figures, tables, etc. The “PICTURE” area is a natural picture “PI_PICTUR”.
Detailed attributes such as “E” and the line drawing “PI_LINE_ART” are added.The “TABLE” area has a table frame “TB_FRAME”, an area “TB_REGION” surrounded by table rules, and a character “TB_CEL” in the table.
L ”, one-line character block“ TB_ELEMEN ”in CELL in the table
A detailed attribute such as "T" is added.

【００４２】なお、TABLEについては図６の例を用いて
説明したが、“TB_REGION”についての説明がなされて
いない。“TB_REGION”とは、表内で、罫線が描かれて
いない表に関して、罫線で囲まれた領域情報を示す。表
の中には、意味の有るCELL同士が線で区切られていない
場合も存在する。線で区切られていないからといって、
一つのCELLにすると、表計算ソフトウエアに持っていく
と変な結果となってしまう。そこで、表解析処理によっ
て正しいCELLの状態を解析するわけだが、そのための情
報としてREGIONも必要となる。Note that TABLE has been described with reference to the example of FIG. 6, but "TB_REGION" has not been described. “TB_REGION” indicates area information surrounded by a ruled line in a table in which no ruled line is drawn. In some tables, meaningful CELLs are not separated by lines. Just because they are not separated by lines,
If you use one CELL, you will get strange results if you take it to spreadsheet software. Therefore, the correct CELL state is analyzed by table analysis processing, and REGION is also required as information for that purpose.

【００４３】“LINE”領域は、水平線“HLINE”、垂直
線“VLINE”、斜め線“SLINE”の３種類の詳細属性を持
つ。また、“FRAME”領域は、４隅が線で囲まれた外枠
があるような場合の外枠情報を持つ。The "LINE" area has three types of detailed attributes: a horizontal line "HLINE", a vertical line "VLINE", and an oblique line "SLINE". The “FRAME” area has outer frame information when there is an outer frame whose four corners are surrounded by lines.

【００４４】以上説明した、解析処理で各領域に付与さ
れる属性が図１で説明したレイヤのいずれに当てはまる
かを下記に示す。各レイヤは、同様な文書画像のパーツを集めたものとな
る。The following describes which of the layers described with reference to FIG. 1 the attribute given to each area in the analysis processing described above. Each layer is a collection of similar document image parts.

【００４５】図１０は、Picture Layer１１２における
画像データの格納状態を説明する図である。ここでは、
矩形１０２’、１０３’、１０４’によって図１の
（１）より切り出した画像データそのものが格納された
様子が示されている（もちろん矩形１０５’の画像デー
タも格納されることになるが図では省略されている）。
このように、Picture Layer１１２には図１の（１）に
示した原稿画像より、PI_PICTUREとして切り出された画
像パーツ（一部のみ記載）が入る。この画像は、例え
ば、前述したFlashPixフォーマットで保存されている。
もちろんＪＰＥＧ等の他の圧縮方式でも構わない。FIG. 10 is a diagram for explaining the storage state of image data in the Picture Layer 112. here,
The state in which the image data itself cut out from FIG. 1A is stored by rectangles 102 ′, 103 ′, and 104 ′ (of course, the image data of rectangle 105 ′ is also stored. Omitted).
As described above, the Picture Layer 112 contains image parts (only a part thereof) cut out as PI_PICTURE from the original image shown in (1) of FIG. This image is stored, for example, in the aforementioned FlashPix format.
Of course, other compression methods such as JPEG may be used.

【００４６】図１１は、Text Layer１１６におけるデー
タ保存状態を示す図である。このText Layer１１６に
は、後に説明する文字認識処理結果も格納することにな
る。なお、文字領域矩形１０５’〜１０８’は、レイア
ウト解析処理の中で、順序認識という文書の文の順番を
解析する方法で、番号付けされる。これは、文が各領域
に分かれるため、再構築する際に順番どおり並んでいな
いと、文章がとても読みづらくなってしまう為である。
なお、この順序認識については、周知の順序認識技術が
適用されうる。FIG. 11 is a diagram showing a data storage state in the Text Layer 116. The Text Layer 116 also stores a character recognition processing result described later. The character area rectangles 105 'to 108' are numbered in the layout analysis processing by a method of analyzing the order of a document in order called order recognition. This is because the sentence is divided into the respective areas, and the sentence becomes very difficult to read unless it is arranged in order when reconstructing.
Note that a well-known order recognition technique can be applied to this order recognition.

【００４７】以上のように領域毎に格納された情報はレ
イヤ構造を持ち、各レイヤが他のレイヤと関係づけられ
て１つの解析結果となる。図１２は、本実施形態による
マイクロドキュメント形式のデータ構造を示す図であ
る。ここで、各レイヤ（記述）の関連付けはリンクで行
う。例えば、ある画像領域の画像情報の中に、実際の切
り出し画像がどこに格納されているかを示す情報を付加
しておく。レイアウト記述のレイヤは、各領域の属性と
矩形情報を一括して管理し、詳細レイヤにリンクすると
いう形式となっている。なお、図１２におけるレイアウ
ト記述６０２、TEXT記述６０３、表記述６０４、Pictur
e記述６０５、Line Art記述６０６、LineVector記述６
０７、前面画像記述６０８はそれぞれLayout Layer１１
７、Text Layer１１６、Table Layer１１５、Picture L
ayer１１２、Line Art Layer１１３、Line Vector Laye
r１１４、前面画像Layer１１１に対応する。The information stored for each area as described above has a layer structure, and each layer is associated with another layer to be one analysis result. FIG. 12 is a diagram showing a data structure in a micro document format according to the present embodiment. Here, the association of each layer (description) is performed by a link. For example, information indicating where an actual cut-out image is stored is added to image information of a certain image area. The layout description layer has a format in which attributes and rectangle information of each area are collectively managed and linked to a detailed layer. The layout description 602, TEXT description 603, table description 604, Pictur
e description 605, Line Art description 606, LineVector description 6
07 and the front image description 608 are respectively Layout Layer 11
7, Text Layer 116, Table Layer 115, Picture L
ayer112, Line Art Layer113, Line Vector Laye
r114 corresponds to the front image Layer 111.

【００４８】次に、PICTUREレイヤについて更に説明を
加える。写真のような自然画やドローソフトで描いた絵
は、前記２値化方法では比較的黒画素の密度が高くな
る。一方、線で描いた絵は、黒画素の密度が比較的低く
なる。この性質と若干の情報を利用して、PI_PICTUREと
PI_LINE_ARTに分離する。Next, the PICTURE layer will be further described. Natural images such as photographs and pictures drawn with draw software have a relatively high density of black pixels in the binarization method. On the other hand, a picture drawn with a line has a relatively low density of black pixels. Using this property and some information, PI_PICTURE and
Separate into PI_LINE_ART.

【００４９】特に、PI_PICTURE部は、本実施形態のマイ
クロドキュメントでは、プリントしても高精細な出力を
得られなければならない情報が多々ある。画像矩形情
報、色、解像度情報等は当然付加する。本実施形態で
は、画像内に文字があり、その領域を抽出できれば、そ
の文字情報も付加できる。さらには、文書中で、何番目
の優先順位のある画像であるか等も付加することができ
る。このようにFlashPixのフォーマットで必要とされる
情報は、レイアウト解析、文字認識結果から得られる。In particular, in the PI_PICTURE section, in the micro document of the present embodiment, there is a lot of information for which a high-definition output must be obtained even when printed. Naturally, image rectangle information, color, resolution information, etc. are added. In the present embodiment, if there is a character in the image and the area can be extracted, the character information can be added. Further, it is also possible to add the order of the image having priority in the document. As described above, information required in the FlashPix format can be obtained from the layout analysis and character recognition results.

【００５０】次に、このマイクロドキュメントより、ど
のように文書画像を再構築するかを説明する。Next, how to reconstruct a document image from this micro document will be described.

【００５１】以下、本実施形態によるマイクロドキュメ
ントからの文書画像の再構築手順を図１３のフローチャ
ートを参照して説明する。A procedure for reconstructing a document image from a micro document according to the present embodiment will be described below with reference to the flowchart of FIG.

【００５２】ステップＳ８０１において文書再現の実行
が開始されると、ステップＳ８０２において、各レイヤ
からパーツを収集する。ステップＳ８０３では、Layout
Layer１１７に保持されているレイアウト情報に従って
収集したパーツを配置し、文書を再構築する。When execution of document reproduction is started in step S801, parts are collected from each layer in step S802. In step S803, Layout
The collected parts are arranged according to the layout information stored in Layer 117, and the document is reconstructed.

【００５３】ここで、文字領域に関しては、文字認識処
理結果の第一候補の文字列をその等価な位置にコンピュ
ータが持つフォントで再現する。線、枠領域について
は、開始、終点、太さ等を元に、線を描くコマンドに変
換し、線を描く。表は表の枠情報よりその枠を書き、セ
ル内の文字、数字等を文字情報と同様に書き入れる。Here, as for the character area, the character string of the first candidate as a result of the character recognition processing is reproduced in a font possessed by the computer at an equivalent position. Lines and frame areas are converted into line drawing commands based on the start, end points, thickness, etc., and lines are drawn. The table draws its frame from the table frame information, and writes characters, numbers, and the like in the cell in the same manner as the character information.

【００５４】また、Picture領域に関しては、切り出し
た画像を呼び出し、その等価な位置に画像を貼り付ける
が、この際に使用する画像の解像度をステップＳ８０４
〜Ｓ８０６の処理により決定する。すなわち、ステップ
Ｓ８０４において、ディスプレイ表示を行うのかプリン
ト出力を行うのかを判定し、プリント出力を行う場合は
ステップＳ８０５へ進み、Picture領域に貼り付ける画
像として高解像度の画像を採用する。一方、表示装置へ
の出力であるならば、ステップＳ８０６へ進み、Pictur
e領域へ貼り付ける画像として低解像度の画像を採用す
る。本実施形態ではFlashPixフォーマットによって保管
されているため、例えばディスプレイ表示であるなら、
ステップＳ８０６にて、表示に手頃な７０ＤＰＩ程度の
画像が選択され、表示される。一方、プリント時には、
ステップＳ８０５で高解像度の画像が選択されて、プリ
ントされることになる。なぜなら表示で用いた低解像度
画像では、プリントすると荒い絵になってしまうからで
ある。For the Picture area, the cut-out image is called and the image is pasted at an equivalent position. The resolution of the image used at this time is set in step S804.
To S806. That is, in step S804, it is determined whether to perform display display or print output. If print output is to be performed, the process proceeds to step S805, and a high-resolution image is adopted as an image to be pasted in the Picture area. On the other hand, if it is output to the display device, the process proceeds to step S806, and Pictur
Adopt a low-resolution image as the image to be pasted to the e-region. In this embodiment, since the data is stored in the FlashPix format, for example, if the display is a display,
In step S806, an image of about 70 DPI which is affordable for display is selected and displayed. On the other hand, when printing
In step S805, a high-resolution image is selected and printed. This is because the low-resolution image used for the display becomes a rough picture when printed.

【００５５】以上の様にして、解析結果のみで文書原稿
を再現させ、本処理を終了する（ステップＳ８０５）。As described above, the document manuscript is reproduced only with the analysis result, and this processing is completed (step S805).

【００５６】以上、図７及び図１３によって示された、
本実施形態による原稿画像の保存処理と、原稿画像の再
構築処理の流れを図１４を参照して更に説明する。As described above, FIG. 7 and FIG.
The flow of the document image saving process and the document image reconstruction process according to the present embodiment will be further described with reference to FIG.

【００５７】図１の（１）で示されるような原稿画像デ
ータ９０１を用意する。レイアウト解析９０２では、原
稿画像データ９０１を２値化した２値画像データを生成
し、レイアウト解析を行い、図８で示されるようなレイ
アウト解析結果９０３を得る。そして、文字認識／画像
符号化９０４では、レイアウト解析によって文字部と判
定された部分ついては文字認識を行い、Picture部と判
定された部分については画像符号化を行う。文書パーツ
化９０５においては、当該原稿画像１ページ分のパーツ
化を実行し、図１の（２）示されるマイクロドキュメン
トの形態のデータを構築し、外部記憶装置９０６に記憶
する。An original image data 901 as shown in FIG. 1A is prepared. In the layout analysis 902, binary image data obtained by binarizing the document image data 901 is generated, layout analysis is performed, and a layout analysis result 903 as shown in FIG. 8 is obtained. In the character recognition / image encoding 904, character recognition is performed for a portion determined to be a character portion by layout analysis, and image coding is performed for a portion determined to be a Picture portion. In the document part conversion 905, parts conversion for one page of the original image is executed, data in the form of a microdocument shown in FIG. 1B is constructed, and stored in the external storage device 906.

【００５８】次に、原稿画像の再構築は、Picture Laye
r９０７に保持された画像データ、Text Layer９０８に
保持されたテキストデータが、Layout Layer１１７に保
持されたレイアウト情報に従って配置されることで実現
される。なお、ここで、９０９に示されるように、Pict
ure Layer９０７に保持された画像のみでディスプレイ
１４上への表示や検索ができる。或いは、９１０で示さ
れるように、Text Layer９０８のみで表示、修正や検索
ができる。更に、９１１に示されるように、Picture La
yer９０７やText Layer９０８を用いて文書解析結果
を、プリンタ１８によってプリントアウトすることがで
きる。その際画像部は、解像度の高い方を利用する。Next, the reconstruction of the original image is performed by using Picture Layer
This is realized by arranging the image data held in r907 and the text data held in Text Layer 908 in accordance with the layout information held in Layout Layer 117. Here, as shown in 909, Pict
Display and search on the display 14 can be performed using only the image held in the ure Layer 907. Alternatively, as shown at 910, display, correction, and search can be performed only with the Text Layer 908. Further, as shown in 911, Picture La
The document analysis result can be printed out by the printer 18 using the yer 907 and the Text Layer 908. At that time, the image part uses the one with the higher resolution.

【００５９】次に、解析文書データの修正について説明
する。図１５は、修正確認用の表示画面の一例を示す図
である。図１５において、１６１は、原画を表示し、そ
の上に、レイアウト解析結果を示した画面である。上記
において、各パーツはオブジェクトと表現したが、この
画面では、そのオブジェクトを削除したり、順序認識の
結果を修正したり、枠の大きさを変更したり、枠を分離
したり結合したりできる。主にレイアウト解析結果の修
正を行うことができる。Next, the correction of the analysis document data will be described. FIG. 15 is a diagram illustrating an example of a display screen for correction confirmation. In FIG. 15, reference numeral 161 denotes a screen on which an original image is displayed and a layout analysis result is displayed thereon. In the above, each part is expressed as an object. On this screen, the object can be deleted, the result of order recognition can be corrected, the size of the frame can be changed, and the frame can be separated or combined. . The layout analysis result can be mainly corrected.

【００６０】１６２は、上述した解析文書画面である。
文書上の全パーツを同時に表示している。ここでの修正
は、主に文字認識の結果に対する修正ということにな
る。文字は第一候補を表示しているが、その一文字毎に
裏では候補文字を持っている。言語処理を用いて修正候
補を見つけ、候補の入れ替えを行ったりすることができ
る。このレイアウト再現された文書解析結果は、ＲＴＦ
や、ＳＧＭＬなどの形式に変換され、通常よく用いられ
ているソフトウエアに渡すことが可能である。Reference numeral 162 denotes the analysis document screen described above.
All parts on the document are displayed at the same time. The correction here is mainly a correction to the result of character recognition. The characters indicate the first candidate, but each character has a candidate character on the back. Correction candidates can be found by using language processing, and replacement of the candidates can be performed. The document analysis result of this layout reproduction is
Or, it is converted into a format such as SGML, and can be transferred to commonly used software.

【００６１】図１６及び図１７は画像優先順位を説明す
る図である。文書原稿をレイアウト解析すると、その解
析結果の位置により、各パーツは１つの順序体系で番号
づけされ、管理される。これは、文と図の関係を保つた
めに重要である。文字情報の番号づけについては図１１
において上述したとおりである。FIGS. 16 and 17 are diagrams for explaining the image priority. When the layout analysis of the document manuscript is performed, each part is numbered and managed in one order system according to the position of the analysis result. This is important to maintain the sentence-figure relationship. FIG. 11 shows the numbering of character information.
Is as described above.

【００６２】Picture部に関しては、他に独自の管理番
号をあわせて持つ。これは、画像優先順位と呼ぶ。図１
４の９０９に示したように、画像パーツはそれだけで単
独に検索され、ディスプレイ１４上に表示されることが
できる。これは所望の文書をいち早く検索する一つの手
段となる。この場合、順序認識順に表示されると、検索
が遅くなるという欠点があった。そこでPicture部に関
しては図１６に示すように画像領域の大きい順という法
則に乗っ取り順序を付加したり、図１７に示すように文
書レイアウト＋画像の大きさ順という順番を付加したり
できる。図１６における画像領域の大きい順とは、切り
出し画像の縦、横の大きさが共に大きい順番に番号付け
させる。図１４の画像１６１の左側に縦長の図が存在す
るが、これは画像の端によくできるごみ画像である。こ
のように極端に縦横比のどちらか一方のみが極端に大き
い画像の優先順位は、図１６や図１７に示されるように
低くする。The Picture section also has its own management number. This is called image priority. FIG.
4 909, the image parts can be retrieved by themselves and displayed on the display 14. This is one means of quickly searching for a desired document. In this case, there is a disadvantage that the search becomes slower when displayed in the order recognition order. Thus, for the Picture part, a takeover order can be added according to the rule of larger image area as shown in FIG. 16, or an order of document layout + image size can be added as shown in FIG. In the order of larger image areas in FIG. 16, the cutout images are numbered in the order of larger vertical and horizontal sizes. A vertically long figure exists on the left side of the image 161 in FIG. 14, and this is a garbage image that can be often formed at the end of the image. In this manner, the priority of an image in which only one of the aspect ratios is extremely large is set low as shown in FIGS.

【００６３】図１７に示される文書レイアウト＋画像大
きさ順は、上記順序認識で付加される文章の順番と画像
の大きさの両方を加味した順番付けである。順序認識の
場合、本文の位置と画像の位置により順番をつけていく
が、小さな画像も同等に扱う。この結果、どうでもよい
小さな画像がその位置により優先順位が高くなることに
なる。以上の画像優先順位は、本実施形態のマイクロド
キュメント（図１の（２））の中に格納される重要な情
報である。The document layout + image size order shown in FIG. 17 is an order that takes into account both the order of the sentence added by the order recognition and the image size. In the case of order recognition, the order is determined according to the position of the text and the position of the image, but small images are treated equally. As a result, a small image that does not matter has a higher priority according to its position. The above image priority is important information stored in the micro document ((2) in FIG. 1) of the present embodiment.

【００６４】以上説明してきたように、本実施形態によ
れば、文書画像をレイアウト解析して、各パーツに分離
し、レイヤ管理し、それぞれの特徴を生かした再利用手
段を提供するフォーマットおよび作成方法が提供され
る。As described above, according to the present embodiment, the format and the creation of the layout analysis of the document image, the separation and the layer management of each part, and the provision of the reuse means utilizing the respective characteristics are provided. A method is provided.

【００６５】なお、上記実施形態では、文書画像をレイ
アウト解析して、各パーツに分離したが、文書画像その
ものを画像として扱い、各解像度別に保存する方法でも
よい。この場合図１の（２）における全面画像Layerに
はFlashPixで扱うような階層的な画像が保管される。こ
の保存方法での特徴は、FlashPixの管理情報に文書画像
の中から文字認識した結果の情報や表認識した結果の情
報を挿入できることにある。これは、単に文書を解像度
別に保管するのみならず、その内容も解析して付加する
ことであり、一般的な画像保存とは異なる。このような
保存手段も可能である。In the above embodiment, the document image is subjected to layout analysis and separated into individual parts. However, a method may be used in which the document image itself is treated as an image and stored for each resolution. In this case, a hierarchical image as handled by FlashPix is stored in the entire image layer in (2) of FIG. The feature of this storage method is that information of the result of character recognition and information of the result of table recognition can be inserted into the management information of FlashPix from the document image. This means not only storing a document for each resolution, but also analyzing and adding the contents thereof, which is different from general image storage. Such storage means is also possible.

【００６６】以上説明したように、上記実施形態によれ
ば、文書画像のデータ量の削減、再利用可能なＴＥＸＴ
化、ベクトル化、表計算ソフトへの対応化、などの利点
がある。さらに大きな特徴として、多くの蓄積された文
書画像の検索手段を提供し、利用者がいち早く見つけ出
すことができるようになる。その検索環境も非常に高機
能のコンピュータのみならず、メモリ量の少ない低機能
のコンピュータの環境での利用できる形態を提供するこ
とができるという効果がある。As described above, according to the above embodiment, the data amount of the document image can be reduced and the reusable TEXT can be used.
There are advantages such as digitization, vectorization, and compatibility with spreadsheet software. As a further major feature, a search means for a large number of stored document images is provided, so that a user can quickly find out. The search environment has an effect that it can provide a form that can be used not only in a very high-performance computer but also in a low-performance computer environment with a small amount of memory.

【００６７】なお、本発明は、複数の機器（例えばホス
トコンピュータ，インタフェイス機器，リーダ，プリン
タなど）から構成されるシステムに適用しても、一つの
機器からなる装置（例えば、複写機，ファクシミリ装置
など）に適用してもよい。Even if the present invention is applied to a system composed of a plurality of devices (for example, a host computer, an interface device, a reader, a printer, and the like), an apparatus (for example, a copying machine, a facsimile, etc.) comprising one device Device).

【００６８】また、本発明の目的は、前述した実施形態
の機能を実現するソフトウェアのプログラムコードを記
録した記憶媒体を、システムあるいは装置に供給し、そ
のシステムあるいは装置のコンピュータ（またはＣＰＵ
やＭＰＵ）が記憶媒体に格納されたプログラムコードを
読出し実行することによっても、達成されることは言う
までもない。An object of the present invention is to provide a storage medium storing a program code of software for realizing the functions of the above-described embodiments to a system or an apparatus, and to provide a computer (or CPU) of the system or apparatus.
And MPU) read and execute the program code stored in the storage medium.

【００６９】この場合、記憶媒体から読出されたプログ
ラムコード自体が前述した実施形態の機能を実現するこ
とになり、そのプログラムコードを記憶した記憶媒体は
本発明を構成することになる。In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the storage medium storing the program code constitutes the present invention.

【００７０】プログラムコードを供給するための記憶媒
体としては、例えば、フロッピディスク，ハードディス
ク，光ディスク，光磁気ディスク，ＣＤ−ＲＯＭ，ＣＤ
−Ｒ，磁気テープ，不揮発性のメモリカード，ＲＯＭな
どを用いることができる。As a storage medium for supplying the program code, for example, a floppy disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD
-R, a magnetic tape, a nonvolatile memory card, a ROM, or the like can be used.

【００７１】また、コンピュータが読出したプログラム
コードを実行することにより、前述した実施形態の機能
が実現されるだけでなく、そのプログラムコードの指示
に基づき、コンピュータ上で稼働しているＯＳ（オペレ
ーティングシステム）などが実際の処理の一部または全
部を行い、その処理によって前述した実施形態の機能が
実現される場合も含まれることは言うまでもない。When the computer executes the readout program code, not only the functions of the above-described embodiment are realized, but also the OS (Operating System) running on the computer based on the instruction of the program code. ) May perform some or all of the actual processing, and the processing may realize the functions of the above-described embodiments.

【００７２】さらに、記憶媒体から読出されたプログラ
ムコードが、コンピュータに挿入された機能拡張ボード
やコンピュータに接続された機能拡張ユニットに備わる
メモリに書込まれた後、そのプログラムコードの指示に
基づき、その機能拡張ボードや機能拡張ユニットに備わ
るＣＰＵなどが実際の処理の一部または全部を行い、そ
の処理によって前述した実施形態の機能が実現される場
合も含まれることは言うまでもない。Further, after the program code read from the storage medium is written into a memory provided in a function expansion board inserted into the computer or a function expansion unit connected to the computer, based on the instructions of the program code, It goes without saying that the CPU included in the function expansion board or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.

【００７３】[0073]

【発明の効果】以上説明したように、本発明によれば、
文書画像のデータ量を削減するとともに、データの再利
用が極めて容易な形態で文書データを保持することが可
能となる。As described above, according to the present invention,
The data amount of the document image can be reduced, and the document data can be held in a form that makes it very easy to reuse the data.

[Brief description of the drawings]

【図１】本実施形態によるマイクロドキュメントのデー
タ構成を説明する図である。FIG. 1 is a diagram illustrating a data configuration of a micro document according to an embodiment.

【図２】多値画像原稿１０１のPicture領域１０３を切
り出して得られた画像データを示す図である。FIG. 2 is a diagram showing image data obtained by cutting out a Picture area 103 of a multi-valued image document 101.

【図３】多値画像原稿のPicture領域１０４を切り出し
て得られた画像データを示す図である。FIG. 3 is a diagram showing image data obtained by cutting out a Picture area 104 of a multi-valued image document.

【図４】FlashPixによる画像のタイル分割を示す図であ
る。FIG. 4 is a diagram showing tile division of an image by FlashPix.

【図５】FlashPixによる多解像度による画像データの格
納を説明する図である。FIG. 5 is a diagram illustrating storage of image data at multiple resolutions by FlashPix.

【図６】表データの構成を説明するための図である。FIG. 6 is a diagram for explaining the structure of table data.

【図７】本実施形態におけるマイクロドキュメントの生
成手順を説明するフローチャートである。FIG. 7 is a flowchart illustrating a procedure for generating a micro document according to the embodiment.

【図８】レイアウト解析処理によってえられる矩形情報
を説明する図である。FIG. 8 is a diagram illustrating rectangle information obtained by a layout analysis process.

【図９】レイアウト解析処理より得られた各属性を示す
図である。FIG. 9 is a diagram showing each attribute obtained by a layout analysis process.

【図１０】Picture Layer１１２における画像データの
格納状態を説明する図である。FIG. 10 is a diagram illustrating a storage state of image data in Picture Layer 112.

【図１１】Text Layer１１６におけるデータ格納状態を
示す図である。FIG. 11 is a diagram showing a data storage state in a Text Layer 116.

【図１２】本実施形態によるマイクロドキュメント形式
のデータ構造を示す図である。FIG. 12 is a diagram showing a data structure in a micro document format according to the present embodiment.

【図１３】本実施形態によるマイクロドキュメントから
の文書画像の再構築手順を示すフローチャートである。FIG. 13 is a flowchart illustrating a procedure for reconstructing a document image from a micro document according to the embodiment.

【図１４】本実施形態によるマイクロドキュメントの生
成と、文書画像の再構築を説明する図である。FIG. 14 is a diagram illustrating generation of a microdocument and reconstruction of a document image according to the present embodiment.

【図１５】修正確認用の表示画面の一例を示す図であ
る。FIG. 15 is a diagram showing an example of a display screen for correction confirmation.

【図１６】画像優先順位を説明する図を示す図である。FIG. 16 is a diagram for explaining an image priority order.

【図１７】画像優先順位を説明する図を示す図である。FIG. 17 is a diagram illustrating an image priority order.

【図１８】本実施形態による画像格納装置の構成を示す
ブロック図である。FIG. 18 is a block diagram illustrating a configuration of an image storage device according to the present embodiment.

Claims

[Claims]

A reading means for optically reading a document manuscript to obtain manuscript image data; an analyzing means for extracting a character area and an image area from the manuscript image data and acquiring respective position information; Recognition means for performing character recognition on the image data of the character area obtained by the means; compression means for performing compression processing on the image data of the image area; recognition result data obtained by the recognition means; An information processing apparatus, comprising: storage means for storing the obtained compressed data and the position information obtained by the analysis means.

2. The information processing apparatus according to claim 1, wherein the compression unit uses different compression processing as needed for the image data of each image area extracted from the original image data. .

3. The storage means stores data relating to the document manuscript in a hierarchical data format.
2. The hierarchical structure according to claim 1, wherein the hierarchical structure has a first hierarchy for storing the recognition result data, a second hierarchy for storing the compressed data, and a third hierarchy for storing the position information. Information processing device.

4. The information processing apparatus according to claim 3, wherein the storage unit also stores, in the first hierarchy, data representing an image of a character area corresponding to the recognition result data.

5. The information processing apparatus according to claim 3, wherein the storage unit further includes a fourth layer for storing data obtained by compressing the entire original image data.

6. The information processing apparatus according to claim 3, further comprising display means for performing display based on data stored in the first hierarchy based on a user's instruction.

7. The information processing apparatus according to claim 3, further comprising display means for performing display based on data stored in the second hierarchy based on a user's instruction.

8. The information processing apparatus according to claim 3, further comprising a search unit configured to perform a search using the recognition result data stored in the first hierarchy.

9. The analyzing means assigns a priority order based on the arrangement or size to each of the extracted character areas and image areas, and assigns a priority to each data stored in the first hierarchy and the second hierarchy. The information processing apparatus according to claim 1, wherein a number based on the priority order is assigned.

10. A character image is generated from the recognition result data stored in the first layer, and a decompressed image is generated by expanding compressed data stored in the second layer. 2. The information processing apparatus according to claim 1, further comprising a generation unit that arranges the character image and the restored image based on the stored position information to generate a document image.

11. A reading step of optically reading a document manuscript to obtain manuscript image data; an analyzing step of extracting a character area and an image area from the manuscript image data and acquiring respective position information; A recognition step of performing character recognition on the image data of the character area obtained in the step, a compression step of performing compression processing on the image data of the image area, a recognition result data obtained in the recognition step, An information processing method, comprising: a storage step of storing obtained compressed data and position information obtained in the analysis step in a storage unit.

12. The information processing method according to claim 11, wherein in the compression step, different compression processing is used as necessary for the image data of each image area extracted from the original image data. .

13. The storage step of storing data relating to the document manuscript in a hierarchical data format, wherein the hierarchical structure stores the recognition result data in a first format.
The information processing method according to claim 11, further comprising a hierarchy, a second hierarchy for storing the compressed data, and a third hierarchy for storing the position information.

14. The information processing method according to claim 13, wherein, in the storing step, data representing an image of a character area corresponding to the recognition result data is also stored in the first hierarchy.

15. The method according to claim 1, wherein in the storing step, data obtained by compressing the entire original image data is stored as a fourth layer in the layer structure.
3. The information processing method according to 3.

16. The information processing method according to claim 13, further comprising a display step of performing a display based on data stored in the first hierarchy based on a user's instruction.

17. The information processing method according to claim 13, further comprising a display step of performing display based on data stored in the second hierarchy based on a user's instruction.

18. The information processing method according to claim 13, further comprising a search step of performing a search using the recognition result data stored in the first hierarchy.

19. The analyzing step determines a priority order for each of the extracted character areas and image areas based on the arrangement or size of the extracted character areas and image areas, and assigns a priority to each of the data stored in the first hierarchy and the second hierarchy. 12. The information processing method according to claim 11, wherein a number based on the priority is assigned.

20. A character image is generated from the recognition result data stored in the first layer, and a decompressed image is generated by expanding compressed data stored in the second layer. 12. The information processing method according to claim 11, further comprising a generating step of arranging the character image and the restored image based on the stored position information to generate a document image.

21. A storage medium storing a control program for causing a computer to execute processing for saving data relating to a document document, wherein the control program reads the document document optically and obtains document image data. Extracting a character region and an image region from the original image data, and performing a character recognition on the image data of the character region obtained in the analysis step. A code of a recognition step, a code of a compression step of performing compression processing on the image data of the image area, recognition result data obtained in the recognition step, compressed data obtained in the compression step, and And a code for a storage step of storing the obtained position information in the storage means. Storage medium.