JP2013080348A

JP2013080348A - Image processing apparatus, image processing method, and program

Info

Publication number: JP2013080348A
Application number: JP2011219563A
Authority: JP
Inventors: Tomotoshi Kanatsu; 知俊金津
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-10-03
Filing date: 2011-10-03
Publication date: 2013-05-02
Anticipated expiration: 2031-10-03
Also published as: JP5824309B2

Abstract

PROBLEM TO BE SOLVED: To enable a tabular form object having a cell structure to be included in an electronic document to be outputted, and to efficiently create register transfer level description of the electronic document.SOLUTION: An image processing apparatus includes: means for extracting a plurality of pixel blocks from an input image including a table and analyzing an inclusion relation thereof; means for identifying areas constituted of the plurality of pixel blocks at least as one of a character area, a table area, and the other area; means for creating data of a tree structure showing the inclusion relation in the plurality of pixel blocks in accordance with the analyzed inclusion relation between the pixel blocks and the identified areas of the pixel blocks; means for analyzing a matrix structure of the table for the pixel area identified as the table area; means for creating information of each of cell elements constituting the matrix structure of the table and associating the information with the table area in the data of the tree structure; and means for setting link information to the area corresponding to the content of the cell element among the identified areas for each of the cell elements.

Description

本発明は、紙文書、または文書の画像データから編集可能な電子文書データを生成する画像処理装置、画像処理方法、およびプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program for generating editable electronic document data from a paper document or document image data.

近年、文書を作成する際、フォントに装飾を凝らしたり、図を自由に作成したり、あるいは写真等を取り込んだりといった、高度な機能が用いられるようになっている。しかし、作成物の内容が高度になるにつれて、文書をまったく新規から作成するには大きな労力が必要とされる。よって、過去に作成した文書の一部をそのまま、あるいは加工編集したものを再利用できるようにすることが望まれている。 In recent years, when creating a document, advanced functions have been used, such as embellishment of fonts, free creation of drawings, and taking of photos and the like. However, as the content of the creation becomes sophisticated, a great deal of effort is required to create a completely new document. Therefore, it is desired that a part of a document created in the past can be reused as it is or after being processed and edited.

一方、電子文書が紙に印刷された状態で配布されることも多い。そのように紙文書しか手元に存在しない場合でも、その内容を紙から再利用可能なデータとして得られるようにするための技術が開示されている。例えば、特許文献１では、紙の文書を装置に電子的に読み込ませた際に、その内容と一致する文書をデータベースから検索して取得し、読み込んだ紙面のデータの代わりに利用できることが開示されている。また、同一の文書がデータベースから特定できなかった場合は、読み込んだ文書の画像を再利用が容易な電子データへと変換するため、この場合も文書の内容を再利用することができる。 On the other hand, electronic documents are often distributed on paper. In this way, even when only a paper document exists at hand, a technique for obtaining the contents as reusable data from paper is disclosed. For example, Patent Document 1 discloses that when a paper document is electronically read by a device, a document that matches the content is retrieved from a database and can be used in place of the read paper data. ing. If the same document cannot be identified from the database, the image of the read document is converted into electronic data that can be easily reused. In this case as well, the contents of the document can be reused.

従来、文書画像中の文字情報を再利用が容易な電子データへと変換する技術として、ＯＣＲ技術がある。また、線や面で構成される図画情報を再利用が容易な電子データへと変換する技術として、ベクトル化の技術がある。例えば、特許文献１では、上記技術を用いて文書画像中の文字を文字コードにし、図形の輪郭をベクトルデータにすることで、再利用可能なデータへと変換する技術が開示されている。 Conventionally, there is an OCR technique as a technique for converting character information in a document image into electronic data that can be easily reused. Further, there is a vectorization technique as a technique for converting graphic information composed of lines and surfaces into electronic data that can be easily reused. For example, Patent Document 1 discloses a technique for converting characters in a document image into character codes and converting the outline of a figure into vector data by using the above technique, thereby converting the data into reusable data.

特許文献１では更に、文書画像中の文字、線画、自然画、表などの領域を識別し、各領域の関係をツリー構造で表現するデータを構築する技術を開示している。そして、同構造に従って上記文字コードやベクトルデータ、画像データ等を配置することで、アプリケーションで編集可能な電子文書ページへの変換を行う。この電子データは、元文書と同等のレイアウトを持ち、文書作成アプリケーション等で新規作成した電子文書ページと同様、文字や図形の位置やサイズの変更、さらに幾何学的な変形や色付けなどを容易に行うことができる。 Patent Document 1 further discloses a technique for identifying areas such as characters, line drawings, natural images, and tables in a document image and constructing data that represents the relationship between the areas in a tree structure. Then, by arranging the character code, vector data, image data, etc. according to the same structure, conversion into an electronic document page editable by an application is performed. This electronic data has the same layout as the original document, and it is easy to change the position and size of characters and figures, as well as geometric deformation and coloring, just like an electronic document page newly created by a document creation application. It can be carried out.

また、文書画像中の表形式領域の構造を認識する技術がある。例えば、特許文献２には、表内の矩形枠領域によって構成される行列構造を取得する技術が開示されている。この技術によって得られる枠領域の行構造と、上記技術による枠内文字のＯＣＲ結果とを組み合わせることで、文書画像中の表領域を、表構造を持つ電子データへと変換することが可能である。 There is also a technique for recognizing the structure of a tabular area in a document image. For example, Patent Document 2 discloses a technique for acquiring a matrix structure constituted by rectangular frame regions in a table. By combining the line structure of the frame area obtained by this technique and the OCR result of the in-frame characters by the above technique, it is possible to convert the table area in the document image into electronic data having a table structure. .

図２２（ａ）は文書画像の例である。また、図２２（ｂ）は特許文献１などの技術を用いて文書画像中の領域を識別し、その包含関係をツリー構造（領域ツリー）で表現したものである。図２２（ｂ）の領域２２１０はページ全体、領域２２１１〜２２１９はページ内の各領域に対応する。ここで表領域２２１４以下の領域は、表の中に枠領域（白領域）２２１５〜２２１７があり、更にその枠領域の中に文字領域２２１８、２２１９があるという階層構造を成している。図２２（ｂ）は加えて、特許文献２などの技術を用いた表解析により得られた、表の行列構造情報を含んでいる。それらは、表領域２２１４に付与された２行２列という行列の大きさの情報と、枠領域２２１５〜２２１７に付与されている行列上での論理座標値として与えられている。かように組み合わされた同データは、表構造を持った電子文書ページに変換され、表編集可能なアプリケーションによって編集することが可能となる。 FIG. 22A shows an example of a document image. FIG. 22B shows a region in a document image identified using a technique such as Patent Document 1 and the inclusion relationship is expressed by a tree structure (region tree). An area 2210 in FIG. 22B corresponds to the entire page, and areas 2211 to 2219 correspond to the areas in the page. Here, the area below the table area 2214 has a hierarchical structure in which there are frame areas (white areas) 2215 to 2217 in the table, and there are character areas 2218 and 2219 in the frame areas. FIG. 22B additionally includes matrix structure information of the table obtained by table analysis using a technique such as Patent Document 2. They are given as matrix size information of 2 rows and 2 columns given to the table area 2214 and logical coordinate values on the matrices given to the frame areas 2215 to 2217. The data thus combined is converted into an electronic document page having a table structure and can be edited by an application capable of table editing.

特許第４２５１６２９号明細書Japanese Patent No. 4251629 特開平１−１２９３５８号公報JP-A-1-129358 米国特許出願公開第２００８／０１２３９４５号明細書US Patent Application Publication No. 2008/0123945

従来技術には、カラーの文書画像を再利用が容易な電子データに変換する場合に課題がある。例えば、特許文献１では、二値画像より得た画素塊を利用して領域を識別している。また、特許文献２は二値画像を入力として表枠の情報を抽出している。すなわち、カラー文書画像を入力とする場合には、二値化処理を一旦施す必要があった。しかし、二値化処理によって元のカラー画像の情報が失われてしまう場合がある。例えば、図２０（ａ）に示される表領域は、白地にグレーの文字を含む枠領域２００１と、グレー地に黒の文字を含む枠領域２００２を含んでいる。この文字と下地のグレーの輝度値が両方とも等しい場合、どのような輝度を閾値として二値化しても、図２０（ｂ）、または図２０（ｃ）のように文字の情報が失われてしまう場合がある。 The prior art has a problem in converting a color document image into electronic data that can be easily reused. For example, in patent document 1, the area | region is identified using the pixel block obtained from the binary image. Further, Patent Document 2 extracts table frame information using a binary image as an input. That is, when a color document image is input, it is necessary to perform binarization processing once. However, the original color image information may be lost due to the binarization process. For example, the table region shown in FIG. 20A includes a frame region 2001 containing gray characters on a white background and a frame region 2002 containing black characters on a gray background. When the luminance values of the characters and the gray of the background are both equal, the character information is lost as shown in FIG. 20B or 20C regardless of the luminance used as a threshold value. May end up.

このような問題に対応するため、特許文献３では、文字等の情報が失われないように、カラー画像を二値以上の画素値を持つことができる画像に減色してから、同色となった画素塊を抽出して領域を識別している。この技術によれば、図２０（ａ）の枠領域２００１内の白下地部分とグレー文字、枠領域２００２内のグレー下地と黒文字をすべて識別することが可能である。しかし、二値以上の画素値を持つ画像より抽出した画素塊は、二値画像から抽出した画素塊よりも複雑な構造を取りうる。例えば、図２１（ａ）は、部分的に色の異なる罫線や、枠領域内で背景色の変化がある表領域の画像例である。この画像を適切に減色して抽出した画素塊間の関係は、図２１（ｂ）のツリー構造で表わされる。図２１（ａ）の右下の枠領域内部からは、色の異なる２つの背景色に対応する画素塊２１０６、２１０８と、文字の画素塊２１０７が抽出されている。しかし、これらの画素塊の間には、他の枠領域の背景色画素塊２１０２と文字画素塊２１０９のような包含関係はなく、ツリー上の同階層に配置される。 In order to cope with such a problem, in Patent Document 3, the color image is reduced to an image that can have two or more pixel values so that information such as characters is not lost, and then the same color is obtained. Regions are identified by extracting pixel blocks. According to this technique, it is possible to identify all of the white background portion and gray characters in the frame region 2001 in FIG. 20A and the gray background and black characters in the frame region 2002. However, a pixel block extracted from an image having a binary or higher pixel value can have a more complicated structure than a pixel block extracted from a binary image. For example, FIG. 21A is an example of an image of a table region in which the color of a rule is partially different or the background color changes in the frame region. The relationship between the pixel blocks extracted by appropriately reducing the color of this image is represented by the tree structure in FIG. From the inside of the lower right frame region in FIG. 21A, pixel blocks 2106 and 2108 and character pixel blocks 2107 corresponding to two different background colors are extracted. However, these pixel blocks do not have an inclusive relationship like the background color pixel block 2102 and the character pixel block 2109 in other frame areas, and are arranged in the same hierarchy on the tree.

そして、図２１（ｂ）のように複雑な構造ツリー構造のデータに対し、前述の表構造解析処理により得られた行列構造の情報を付与することは困難となる。具体的には、行列の論理座標を付与すべき枠領域の背景画素塊の位置が階層構造上一定で無い場合や、そもそも枠領域に１対１対応する画素塊が存在しない場合がある。その結果、同データを元に表編集可能な電子文書ページに変換しようとする際に、行列構造との関連付けが得られない場合や、あるいは行列構造へのアクセス方法が複雑になるという課題があった。 Then, it becomes difficult to give the information of the matrix structure obtained by the above-described table structure analysis processing to the data having a complicated structure tree structure as shown in FIG. Specifically, there are cases where the position of the background pixel block in the frame region to which the logical coordinates of the matrix are to be assigned is not constant in the hierarchical structure, or there is no pixel block corresponding to the frame region in the first place. As a result, there is a problem that when trying to convert the data into an electronic document page that can be edited in a table, the association with the matrix structure cannot be obtained, or the access method to the matrix structure becomes complicated. It was.

上記課題を解決するために、本願発明は以下の構成を有する。すなわち、入力画像から編集が可能な電子データを生成する画像処理装置であって、表を含む画像を前記入力画像として入力する入力手段と、前記入力画像を構成する画素において画素値が近似する複数の画素塊を抽出し、当該複数の画素塊間の包含関係を解析する画素塊解析手段と、前記複数の画素塊が構成する領域を、文字領域、表領域、およびそれ以外の領域のうちの少なくともいずれかとして識別する識別手段と、前記画素塊解析手段にて解析した画素塊間の包含関係と、前記識別手段にて識別した画素塊の領域とに従って、領域間の包含関係を示すツリー構造のデータを生成する生成手段と、前記表領域として識別された画素塊に対して、表の行列構造を解析する表構造解析手段と、前記表の行列構造を構成するセル要素それぞれの情報を生成し、前記ツリー構造のデータにおける表領域に関連付ける関連付け手段と、前記セル要素それぞれに対し、前記識別手段にて識別した領域のうち、当該セル要素の内容に相当する領域へのリンク情報を設定する設定手段とを備える。 In order to solve the above problems, the present invention has the following configuration. That is, an image processing apparatus that generates electronic data that can be edited from an input image, and an input unit that inputs an image including a table as the input image, and a plurality of pixel values that approximate pixel values in pixels constituting the input image A pixel block analyzing means for extracting a pixel block of the plurality of pixel blocks and analyzing an inclusion relationship between the plurality of pixel blocks, and an area formed by the plurality of pixel blocks is a character region, a table region, and other regions. A tree structure showing an inclusion relation between regions according to at least one of identification means for identifying, an inclusion relation between pixel chunks analyzed by the pixel chunk analysis means, and an area of the pixel chunks identified by the identification means Generating means for generating data, table structure analyzing means for analyzing the matrix structure of the table with respect to the pixel block identified as the table area, and cell elements constituting the matrix structure of the table, respectively Association means for generating information and associating it with a table area in the data of the tree structure, and for each of the cell elements, among the areas identified by the identification means, link information to the area corresponding to the contents of the cell element Setting means for setting.

本発明によれば、出力される電子文書に、セル構造を持った表形式のオブジェクトを含めることを可能にし、効率良く電子文書のオブジェクト記述を生成することができる。 According to the present invention, it is possible to include a tabular object having a cell structure in an output electronic document, and an object description of the electronic document can be efficiently generated.

システム構成の一例を示す図。The figure which shows an example of a system configuration. 各部位の処理によるデータの変化を示すブロック図。The block diagram which shows the change of the data by the process of each site | part. 画素塊解析部における処理を示すフローチャート。The flowchart which shows the process in a pixel block analysis part. ラベリング処理を示すフローチャート。The flowchart which shows a labeling process. ラベリング処理の例を示す図。The figure which shows the example of a labeling process. 画素塊解析部による処理結果の例を示す図。The figure which shows the example of the process result by a pixel block analysis part. レイアウト解析部における処理を示すフローチャート。The flowchart which shows the process in a layout analysis part. グラフィックスデータ生成部における処理を示すフローチャート。The flowchart which shows the process in a graphics data generation part. レイアウト解析処理部における処理結果の例を示す図。The figure which shows the example of the process result in a layout analysis process part. 文字認識部における処理を示すフローチャート。The flowchart which shows the process in a character recognition part. 文字認識部が出力する文字データの実例を示す図。The figure which shows the actual example of the character data which a character recognition part outputs. グラフィックスデータ生成部における処理の実例を示す図。The figure which shows the example of the process in a graphics data generation part. 電子文書記述生成部における処理を示すフローチャート。The flowchart which shows the process in an electronic document description production | generation part. Ｓ１３０５の処理内容を示すフローチャート。The flowchart which shows the processing content of S1305. レイアウト解析部における表内セル構造の解析処理の一例を示す図。The figure which shows an example of the analysis process of the cell structure in a table | surface in a layout analysis part. 領域種別毎の出力対象を定義するテーブルの例を示す図。The figure which shows the example of the table which defines the output object for every area | region classification. 電子文書の出力例を示す図。The figure which shows the example of an output of an electronic document. 電子文書のＰＣにおける表示例を示す図。The figure which shows the example of a display in PC of an electronic document. 第二実施形態に係るＳ１３０５の処理内容を説明するフローチャート。The flowchart explaining the processing content of S1305 which concerns on 2nd embodiment. 従来のカラーの表画像の二値化処理結果を示す例を示す図。The figure which shows the example which shows the binarization processing result of the conventional color table image. 従来の画素塊へと分割しツリー構造を構築した例を示す図。The figure which shows the example which divided | segmented into the conventional pixel block and constructed | assembled the tree structure. 従来の二値の文書画像から領域のツリー構造を構築した例を示す図。The figure which shows the example which constructed | assembled the area | region tree structure from the conventional binary document image.

＜第一実施形態＞
［システム構成］
以下、本発明を実施するための最良の形態について図面を用いて説明する。図１は本発明に係る画像処理装置を用いたシステム構成の一例を示す図である。画像処理装置１００は、スキャナ１０１、ＣＰＵ１０２、メモリ１０３、ハードディスク１０４、およびネットワークＩ／Ｆ１０５を含む。スキャナ１０１は、読みとった文書の紙面情報を画像データに変換する。ＣＰＵ１０２は、画像データに電子文書生成処理を施すためのプログラムを実行する。メモリ１０３は、該プログラムを実行する際のワークメモリやデータの一時保存などに利用される。ハードディスク１０４は、該プログラムやデータを格納する。ネットワークＩ／Ｆ１０５は、外部装置とデータの入出力を行う。画像処理装置１００は、ネットワークＩ／Ｆ１０５を介してＬＡＮやインターネットなどの有線または無線のネットワーク１１０に接続にされている。このネットワーク１１０には更に汎用のパーソナルコンピュータ（ＰＣ）１２０が接続されており、ＰＣ１２０は、画像処理装置１００から送信されたデータを受信し、同コンピュータ上にて表示・編集などに利用することが可能である。 <First embodiment>
[System configuration]
The best mode for carrying out the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing an example of a system configuration using an image processing apparatus according to the present invention. The image processing apparatus 100 includes a scanner 101, a CPU 102, a memory 103, a hard disk 104, and a network I / F 105. The scanner 101 converts the page information of the read document into image data. The CPU 102 executes a program for performing electronic document generation processing on the image data. The memory 103 is used for temporary storage of work memory and data when executing the program. The hard disk 104 stores the program and data. The network I / F 105 performs data input / output with an external device. The image processing apparatus 100 is connected to a wired or wireless network 110 such as a LAN or the Internet via a network I / F 105. The network 110 is further connected to a general-purpose personal computer (PC) 120. The PC 120 can receive data transmitted from the image processing apparatus 100 and use it for display / editing on the computer. Is possible.

［電子文書生成処理の構成］
図２は、本発明に係る画像処理装置のＣＰＵ１０２で実施される電子文書生成処理の構成を示すブロック図である。更に、電子文書生成処理の中で、生成される各種データが示されている。図２の入力画像２００および出力電子文書２１０はそれぞれ電子文書生成処理の入力データおよび出力データである。この入力画像２００を出力電子文書２１０として出力するまでの処理の流れと、処理を行う各処理部についての概要を説明する。また、各処理部の詳細な処理内容はその後で説明する。 [Configuration of electronic document generation processing]
FIG. 2 is a block diagram showing the configuration of the electronic document generation process executed by the CPU 102 of the image processing apparatus according to the present invention. Further, various data generated in the electronic document generation process are shown. The input image 200 and the output electronic document 210 in FIG. 2 are input data and output data for the electronic document generation process, respectively. A flow of processing until the input image 200 is output as the output electronic document 210 and an outline of each processing unit that performs processing will be described. Detailed processing contents of each processing unit will be described later.

入力画像２００は、図２の電子文書生成処理の対象となる画像データである。例えば図１に示す画像処理装置１００では、スキャナ１０１により読み取られた紙文書の内容が、光電変換により電子的画素情報に変換された文書画像データである。もしくは、ネットワークＩ／Ｆ１０５を通して外部から供給された画像データ、画像処理装置１００内で生成された画像データであってもよい。入力画像２００は、具体的にはメモリ１０３もしくはハードディスク１０４に格納された状態で、以降の処理ブロックへと入力される。 The input image 200 is image data that is a target of the electronic document generation process of FIG. For example, in the image processing apparatus 100 shown in FIG. 1, the contents of a paper document read by the scanner 101 are document image data converted into electronic pixel information by photoelectric conversion. Alternatively, image data supplied from the outside through the network I / F 105 or image data generated in the image processing apparatus 100 may be used. Specifically, the input image 200 is input to a subsequent processing block in a state where it is stored in the memory 103 or the hard disk 104.

出力電子文書２１０は、電子文書生成処理の結果として出力される電子データである。この出力電子文書２１０は、入力画像２００の内容を、利用者がパーソナルコンピュータのアプリケーション上で表示・編集が可能となる形式で表現したものである。またこの出力電子文書２１０内では、入力画像２００に含まれている文字や図形、写真等の内容に応じて、それらの内容を種別毎に適切なデータ形式で表現することを特徴とする。その目的は、表示・保存・検索・編集・再利用等の異なる用途の各々に最適な電子文書を出力可能とするためである。それぞれのデータ形式および電子文書形式の具体例については後述する。 The output electronic document 210 is electronic data output as a result of the electronic document generation process. The output electronic document 210 is a representation of the contents of the input image 200 in a format that allows the user to display and edit on the application of the personal computer. In the output electronic document 210, according to the contents of characters, figures, photographs, and the like included in the input image 200, the contents are expressed in an appropriate data format for each type. The purpose is to make it possible to output an optimal electronic document for each of different uses such as display, storage, retrieval, editing, and reuse. Specific examples of each data format and electronic document format will be described later.

画素塊解析部２０１は、入力画像２００の画素内容（画素情報）を解析し、同色とみなされる連結画素のグループ化を行い、連結画素塊を形成する。そして、画素塊解析部２０１は、形成した連結画素塊の画素形状、および相対的な位置関係を含む画素塊データ２０６を生成する。 The pixel block analysis unit 201 analyzes the pixel content (pixel information) of the input image 200, groups connected pixels regarded as the same color, and forms a connected pixel block. Then, the pixel block analysis unit 201 generates pixel block data 206 including the pixel shape of the formed connected pixel block and the relative positional relationship.

レイアウト解析部２０２は、画素塊解析部２０１にて生成された画素塊データ２０６を入力として、各画素塊を文字と非文字に分類し、かつグループ化を行う。これにより、レイアウト解析部２０２は、入力画像２００内に存在する領域を特定する。ここで特定される領域の種別としては、文字領域、線画領域、自然画領域、表領域などがある。そして、レイアウト解析部２０２は、それら特定された各領域の種類、座標と相対関係、および領域に含まれる画素塊の情報を含む領域データ２０７を生成する。 The layout analysis unit 202 receives the pixel block data 206 generated by the pixel block analysis unit 201 as input, classifies each pixel block into characters and non-characters, and performs grouping. As a result, the layout analysis unit 202 identifies an area existing in the input image 200. The types of areas specified here include character areas, line drawing areas, natural image areas, and table areas. Then, the layout analysis unit 202 generates region data 207 including information on the types of the identified regions, coordinates and relative relationships, and pixel clusters included in the regions.

グラフィックスデータ生成部２０３は、領域データ２０７、画素塊データ２０６、および入力画像２００を入力とし、出力電子文書２１０内で各領域の内容に相当するグラフィックスデータ２０８を生成する。グラフィックスデータ２０８は、後述の電子文書記述生成部２０５において、各領域に対応するグラフィックスオブジェクト記述の生成に用いられる。具体的には、例えば、グラフィックスデータ生成部２０３は、領域データ２０７の中から写真の領域を特定し、入力画像２００中の同領域の画素情報を利用して、写真部分の切り出し画像データを生成する。または、グラフィックスデータ生成部２０３は、線画の領域を特定し、対応する画素塊データの画素形状情報からその輪郭を抽出し、直線・曲線パス近似による線画部分のベクトルデータを生成する。更に、グラフィックスデータ生成部２０３は、入力画像２００において文字・写真・線画といった前景部分の画素を、その周囲色で塗りつぶした画像データである、背景画像データを生成する。 The graphics data generation unit 203 receives the area data 207, the pixel block data 206, and the input image 200 as input, and generates graphics data 208 corresponding to the contents of each area in the output electronic document 210. The graphics data 208 is used to generate a graphics object description corresponding to each area in an electronic document description generation unit 205 described later. Specifically, for example, the graphics data generation unit 203 identifies a photo area from the area data 207 and uses the pixel information of the same area in the input image 200 to extract the cut-out image data of the photo portion. Generate. Alternatively, the graphics data generation unit 203 identifies a line drawing area, extracts the outline from the pixel shape information of the corresponding pixel block data, and generates vector data of the line drawing portion by straight line / curve path approximation. Furthermore, the graphics data generation unit 203 generates background image data that is image data in which pixels in the foreground portion such as characters, photographs, and line drawings in the input image 200 are filled with surrounding colors.

文字認識部２０４は、領域データ２０７から文字の領域を特定し、さらに同領域に対応する画素塊データ２０６から文字の画素形状を二値画像として再構成する。そして、文字認識部２０４は、構成した二値画像に対して文字認識処理を行い、文字領域内の認識文字コード列を得る。さらに、文字認識部２０４は、これらの文字コード列と、電子文書上で利用可能なその他の情報を含む文字データ２０９を生成する。文字認識部２０４は、認識処理を正しく行う為に入力画像２００の天地方向を判別し、その方向が上以外の場合は、二値画像および領域情報を回転して文字認識処理を行うようにしてもよい。文字データ２０９は、文字認識結果の文字コード列のみではなく、各文字の座標情報、推定文字サイズやピッチ、行ピッチなど、文字認識処理に付随して推定された書式情報を含んでいてもよい。また、文字データ２０９は、文字画素塊データが有する色の情報を利用して推定された、各文字の色情報を含んでいてもよい。 The character recognition unit 204 identifies a character region from the region data 207, and further reconstructs the pixel shape of the character from the pixel block data 206 corresponding to the region as a binary image. Then, the character recognition unit 204 performs character recognition processing on the constructed binary image to obtain a recognized character code string in the character area. Furthermore, the character recognition unit 204 generates character data 209 including these character code strings and other information that can be used on the electronic document. The character recognition unit 204 determines the vertical direction of the input image 200 in order to correctly perform the recognition processing. If the direction is other than the top, the character recognition processing is performed by rotating the binary image and the region information. Also good. The character data 209 may include not only the character code string of the character recognition result but also the format information estimated accompanying the character recognition process, such as the coordinate information of each character, the estimated character size, pitch, and line pitch. . Further, the character data 209 may include color information of each character estimated using color information included in the character pixel block data.

電子文書記述生成部２０５は、領域データ２０７、グラフィックスデータ２０８、および文字データ２０９を入力として、それらを用途に合った形式になるよう選択・変形・合成し、出力となる出力電子文書２１０の記述を生成する。電子文書記述生成部２０５は、１つの入力画像２００に対し、１ページの出力電子文書を生成するようにしても、複数の入力画像に対し１のマルチページ電子文書が生成されるようにしてもよい。 The electronic document description generation unit 205 receives the region data 207, the graphics data 208, and the character data 209 as input, selects, transforms, and combines them into a format suitable for the application, and outputs the output electronic document 210 as an output. Generate a description. The electronic document description generation unit 205 may generate one page of output electronic document for one input image 200, or may generate one multi-page electronic document for a plurality of input images. Good.

［各処理部の動作］
続いて、図２の電子文書生成処理を構成する各処理部の詳細な動作例を順に説明していく。各処理部による処理は、画像処理装置１００が備えるＣＰＵ１０２が、記憶部であるメモリ１０３等に記憶されたプログラムを読み出し、実行することにより実現される。 [Operation of each processing unit]
Next, detailed operation examples of the processing units constituting the electronic document generation process of FIG. 2 will be described in order. The processing by each processing unit is realized by the CPU 102 included in the image processing apparatus 100 reading and executing a program stored in the memory 103 or the like that is a storage unit.

（画素塊解析部による処理）
図３に、画素塊解析部２０１の動作例を説明するフローチャートを示す。 (Processing by the pixel block analysis unit)
FIG. 3 shows a flowchart for explaining an operation example of the pixel block analysis unit 201.

Ｓ３０１では、入力画像２００が画素塊解析部２０１へと入力される。ここで入力画像２００は、カラー画像であれば、各画素がＲＧＢ各々３つの８ｂｉｔ値で表現されるページサイズの画素集合として、メモリ１０３に展開した状態で入力されるものとする。なお、これはあくまで一例であって、グレー形式を含むＲＧＢ形式以外の色空間で表現されていてもよい。また、入力画像２００が圧縮画像のストリームとして入力され、画素塊解析部２０１がメモリ１０３にてＲＧＢ画素などとなるように展開してもよい。 In S 301, the input image 200 is input to the pixel block analysis unit 201. Here, if the input image 200 is a color image, it is assumed that the input image 200 is input in a state of being developed in the memory 103 as a set of pixels having a page size in which each pixel is represented by three 8-bit values for each of RGB. This is merely an example, and may be expressed in a color space other than the RGB format including the gray format. Alternatively, the input image 200 may be input as a stream of compressed images, and the pixel block analysis unit 201 may decompress the RGB pixels or the like in the memory 103.

Ｓ３０２では、画素塊解析部２０１は、入力画像２００の各画素に対し、減色処理を施した減色画像を生成する。この減色画像において各画素がとる値の範囲は、入力画像２００の画素値範囲以下の０〜Ｎ（Ｎ≧２）の値である。減色処理の方法自体については、本発明の本質と外れるので詳細な説明は省略する。しかし、減色後の画素値が白黒二値ではなく、元の入力画像２００に含まれる文字や線等の色特徴を保持する画素値を保持するよう処理することで、本発明の効果が発揮されることは留意すべきである。つまり、白黒二値のような２種類の画素値を有する画像でなく、３種類以上の画素値を有することが可能な画像を対象として本発明に係る処理を適用することが本発明の効果を得る前提となる。 In step S 302, the pixel block analysis unit 201 generates a color-reduced image obtained by performing a color-reduction process on each pixel of the input image 200. The range of values that each pixel takes in this subtractive color image is a value of 0 to N (N ≧ 2) that is less than or equal to the pixel value range of the input image 200. Since the color reduction processing method itself is out of the essence of the present invention, a detailed description thereof is omitted. However, the effect of the present invention is exhibited by performing processing so that pixel values after color reduction are not monochrome binary values but pixel values that retain color features such as characters and lines included in the original input image 200. It should be noted that. That is, applying the processing according to the present invention to an image that can have three or more types of pixel values rather than an image having two types of pixel values, such as black and white binary, has the effect of the present invention. It is a premise to get.

このような減色処理の例としては、入力画像２００がＲＧＢ形式の場合、各画素をＲ、Ｇ、Ｂ各要素についてそれぞれ１２８未満か１２８以上であることを０、１で表現した３ｂｉｔで最大８色に減色する方法がある。また、各画素の輝度値Ｙを計算し、このＹを４段階に量子化することで減色する方法もある。また、画像の画素値ヒストグラムからＮ個の代表色を推定したうえで代表色にＩＤ値を付与し、各画素には最も近似する代表色のＩＤ値を割りあてる方法もある。 As an example of such color reduction processing, when the input image 200 is in RGB format, each pixel is less than 128 or more than 128 for each of R, G, and B elements. There is a way to reduce the color. There is also a method of subtracting colors by calculating the luminance value Y of each pixel and quantizing this Y into four stages. There is also a method in which N representative colors are estimated from a pixel value histogram of an image, an ID value is assigned to the representative color, and an ID value of the closest representative color is assigned to each pixel.

Ｓ３０３では、画素塊解析部２０１は、減色画像内で同じ画素値を持つ連結画素の集合に対し、公知のラベリング処理を行い、同一ラベルを持つ画素集合を画素塊として抽出する。これは、入力画像２００中で色が近似する連結画素塊を抽出することを意味する。この連結画素の判定には、上下左右とすべての斜め方向の隣接画素を考慮する８連結判定を用いることにする。８連結判定を用いたラベリング処理については、図４および図５を用いて後述する。 In step S303, the pixel block analysis unit 201 performs a known labeling process on a set of connected pixels having the same pixel value in the reduced color image, and extracts a pixel set having the same label as a pixel block. This means that a connected pixel block whose color is approximate in the input image 200 is extracted. For the determination of the connected pixels, 8-connected determination in consideration of the adjacent pixels in all the diagonal directions and the upper, lower, left, and right sides is used. The labeling process using 8-connection determination will be described later with reference to FIGS.

Ｓ３０４では、画素塊解析部２０１は、Ｓ３０３で生成した画素塊情報中の全画素塊を対象に、画素塊間の接触の有無を示す情報を保持するように、画素塊情報を更新する。具体的には、注目画素塊に対し、１）外接矩形が接するか重なる画素塊、２）さらに両画素塊のランの中に接触するランがある、の両条件に合致する画素塊のＩＤのリストを接触画素塊のリストとして登録する。これを全画素塊の組み合わせに対し実行する。なお、上記条件は一例であり、他の条件を用いても構わない。また、接触する画素塊を取得する処理は、Ｓ３０４のラベル付けのところで、接触ラベル間の関連づけを行っておくことで、より高速な接触関係の特定が可能である。しかし本処理の効率は発明の本質とは関係ないのでその説明は省略する。 In S304, the pixel block analysis unit 201 updates the pixel block information so as to hold information indicating the presence or absence of contact between the pixel blocks for all the pixel blocks in the pixel block information generated in S303. Specifically, with respect to the pixel block of interest, the pixel block ID that satisfies both the following conditions: 1) a pixel block that touches or overlaps the circumscribed rectangle, and 2) there is a run in contact with both pixel block runs. The list is registered as a list of contact pixel blocks. This is executed for all pixel block combinations. Note that the above condition is an example, and other conditions may be used. Further, in the process of acquiring the pixel block to be contacted, it is possible to specify the contact relationship at a higher speed by associating the contact labels at the time of labeling in S304. However, since the efficiency of this process is not related to the essence of the invention, its description is omitted.

Ｓ３０５では、画素塊解析部２０１は、画素塊情報中の全画素塊を対象に、ある画素塊が別の画素塊を含む、あるいは別画素塊に含まれることを示す包含情報を生成し、画素塊情報に追加する。なお、本例では２画素塊が包含関係にあることを次の条件で定義する簡易判定処理を行う。１）２画素塊間が接触しており、かつ、２）片方の画素塊の外接矩形が他方の画素塊の外接矩形を完全に包含する。これは包含判定時の処理量を軽減し処理時間を省くためである。なお、他の条件を用いることによって判定を行い、画素間の正確な包含判定を用いるようにしてもよい。 In S305, the pixel block analysis unit 201 generates inclusion information indicating that a certain pixel block includes another pixel block or is included in another pixel block for all the pixel blocks in the pixel block information. Add to chunk information. In this example, simple determination processing is performed for defining that the two-pixel block is in an inclusive relationship under the following conditions. 1) The two pixel blocks are in contact, and 2) the circumscribed rectangle of one pixel block completely encompasses the circumscribed rectangle of the other pixel block. This is to reduce the amount of processing during inclusion determination and save processing time. Note that the determination may be performed by using other conditions, and an accurate inclusion determination between pixels may be used.

Ｓ３０６では、画素塊解析部２０１は、画素塊情報に追加された包含関係を、画素塊同士の親子関係とみなしたうえで、画像全体を祖先（ルート）とし、各画素塊をノードとする画素塊のツリー構造を生成する。なお、Ｓ３０５にて用いる処理条件では、親となる画素塊を持たない画素塊が存在する場合がある。その場合、当該画素塊が、接触関係にある画素塊の親と同じ親を持つようにツリー構造を生成する。また、親が複数ある画素塊が存在する場合があるが、その際は任意のひとつの親、例えば、階層の最も深い親とのみ親子関係を有すように構造を生成すればよい。Ｓ３０６で生成された画素塊ツリー構造と、各画素塊の情報とを合わせたものが、画素塊解析部２０１が生成する画素塊データ２０６となる。 In S306, the pixel block analysis unit 201 regards the inclusion relation added to the pixel block information as a parent-child relationship between the pixel blocks, and sets the entire image as an ancestor (root) and each pixel block as a node. Generate a tree structure of chunks. Note that the processing conditions used in S305 may include a pixel block that does not have a parent pixel block. In this case, the tree structure is generated so that the pixel block has the same parent as the parent of the pixel block in contact relation. In some cases, a pixel cluster having a plurality of parents may exist. In this case, the structure may be generated so that only one parent, for example, the deepest parent in the hierarchy has a parent-child relationship. The pixel block tree 206 generated by the pixel block analysis unit 201 is a combination of the pixel block tree structure generated in S306 and the information of each pixel block.

（８連結判定によるラベリング処理）
Ｓ３０３にて行われる８連結判定によるラベリング処理の例を図４のフローチャートを用いて説明する。 (Labeling process based on 8-connection judgment)
An example of the labeling process based on the 8-connection determination performed in S303 will be described with reference to the flowchart of FIG.

Ｓ４０１では、画素塊解析部２０１は、ラベル値ｋを１に初期化する。Ｓ４０２では、画素塊解析部２０１は、減色画像の注目ラインにおいて、同一画素値が連続するランを現ランとして抽出する。最初は減色画像の最上ラインに注目し、その左端の画素から同じ画素値を持つ画素が右方向に連続する範囲をランとして抽出するものとする。抽出されたランは、始終点のｘ座標および注目ラインのｙ座標の組からなるラン情報として記憶される。なお、後述するように、同注目ラインでＳ４０２が再度処理される場合、処理済ランの右端の次の画素から始まるランを抽出する。 In S401, the pixel block analysis unit 201 initializes the label value k to 1. In step S 402, the pixel block analysis unit 201 extracts a run in which the same pixel value is continuous in the target line of the reduced color image as the current run. At first, attention is paid to the uppermost line of the subtractive color image, and a range in which pixels having the same pixel value from the leftmost pixel continue in the right direction is extracted as a run. The extracted run is stored as run information including a set of the x coordinate of the start and end points and the y coordinate of the target line. As will be described later, when S402 is processed again on the same line of interest, a run starting from the next pixel at the right end of the processed run is extracted.

Ｓ４０３では、画素塊解析部２０１は、注目ラインのひとつ上のラインにすでに抽出されたランがあり、かつ現ランと画素値が連結しているものがあるかどうかを調べる。ここで８連結しているとは、ｙ座標＝ｋ、ｘ座標による始終点＝（ｓ，ｅ）の現ランに対し、ｙ座標ｋ−１、ｘ座標（ｓ−１，ｅ＋１）の範囲に１画素でも存在するランであり、かつ画素値が同一であることが条件である。なお、注目ラインが最上ラインの場合は、連結ランは常に存在しないこととなる。条件に合致する連結ランが無い場合は（Ｓ４０３にてＮＯ）、Ｓ４０４に進み、画素塊解析部２０１は、現ランに新規ラベルＬｋを付与する。続いてＳ４０５で、画素塊解析部２０１は、ラベル値ｋに対して＋１とする。一方、Ｓ４０３で条件に合致する連結ランが存在した場合（Ｓ４０３にてＹＥＳ）、Ｓ４０６に進む。 In step S 403, the pixel block analysis unit 201 checks whether there is a run already extracted on the line above the target line, and whether there is a connection between the current run and the pixel value. Here, “8 connected” means that y coordinate = k, x coordinate and start / end point = (s, e), current run of y coordinate k−1, x coordinate (s−1, e + 1). The condition is that the run exists even in one pixel and the pixel values are the same. In addition, when the attention line is the top line, there is always no connected run. If there is no connected run that meets the conditions (NO in S403), the process proceeds to S404, and the pixel block analysis unit 201 assigns a new label Lk to the current run. In step S405, the pixel block analysis unit 201 sets +1 to the label value k. On the other hand, if there is a linked run that meets the conditions in S403 (YES in S403), the process proceeds to S406.

Ｓ４０６では、画素塊解析部２０１は、条件に合致する連結ランが複数で、かつ当該連結ランが複数のラベル種を有しているかどうかを調べる。複数のラベル種を有している場合は（Ｓ４０６にてＹＥＳ）、Ｓ４０７に進み、画素塊解析部２０１は、最初に検出された連結ランが有するラベルを現ランに付与する。更に、画素塊解析部２０１は、全連結ランが有するラベルが同一グループとみなされるよう、ラベル値間の関連付けを行う。一方、連結ランが単一の場合、あるいは複数の連結ランが同種のラベルを有している場合は（Ｓ４０６にてＮＯ）、Ｓ４０８に進み、画素塊解析部２０１は、連結ランが有するラベルを現ランに付与する。 In step S 406, the pixel block analysis unit 201 checks whether there are a plurality of linked runs that meet the condition and whether the linked run has a plurality of label types. If there are a plurality of label types (YES in S406), the process proceeds to S407, and the pixel block analysis unit 201 assigns the label of the first detected connected run to the current run. Further, the pixel block analysis unit 201 associates the label values so that the labels of all the connected runs are regarded as the same group. On the other hand, if there is a single connected run, or if multiple connected runs have the same type of label (NO in S406), the process proceeds to S408, and pixel block analysis unit 201 uses the label that the connected run has. Grant to the current run.

Ｓ４０５、Ｓ４０８、Ｓ４０９の後、Ｓ４０９に進み、画素塊解析部２０１は、注目ラインに次のランが有るか否か、すなわち現ランの終点が画像右端でないかどうかを調べる。次のランがある場合（Ｓ４０９にてＹＥＳ）、画素塊解析部２０１は、そのランが抽出されるようにＳ４０２に進んで以降処理を繰り返す。注目ラインに次のランが無い場合（Ｓ４０９にてＮＯ）、Ｓ４１０に進む。 After S405, S408, and S409, the process proceeds to S409, and the pixel block analysis unit 201 checks whether or not the next run is on the line of interest, that is, whether or not the end point of the current run is the right end of the image. If there is a next run (YES in S409), the pixel block analysis unit 201 proceeds to S402 so that the run is extracted, and thereafter repeats the processing. If there is no next run on the target line (NO in S409), the process proceeds to S410.

Ｓ４１０では、画素塊解析部２０１は、注目ラインが最終ラインかどうかを調べる。最終ラインではない場合（Ｓ４１０にてＮＯ）、Ｓ４１１に進み、画素塊解析部２０１は、次のラインに移動する。そして、Ｓ４０２に戻り、画素塊解析部２０１は、そのライン左端の画素から新たなランを抽出して以降の処理を繰り返す。注目ラインが最終ラインの場合は（Ｓ４１０にてＹＥＳ）、Ｓ４１２に進む。 In step S410, the pixel block analysis unit 201 checks whether the target line is the last line. If it is not the last line (NO in S410), the process proceeds to S411, and the pixel block analysis unit 201 moves to the next line. Then, returning to S402, the pixel block analysis unit 201 extracts a new run from the pixel at the left end of the line and repeats the subsequent processing. If the target line is the last line (YES in S410), the process proceeds to S412.

Ｓ４１２では、画素塊解析部２０１は、ラベル値毎に、同ラベルが付与されたランの集合により構成される画素塊情報を作成する。このランの集合を構成する際には、Ｓ４０７で関連づけられた複数種類のラベル値を持つランがひとつの画素塊ラン情報に集められるよう処理される。最終的に生成される画素塊情報として、ひとつの画素塊は、識別の為のＩＤ、外接矩形情報、画素値、および画素塊に集められたラン情報の集合の組から成る。 In step S412, the pixel block analysis unit 201 creates pixel block information including a set of runs to which the label is assigned for each label value. When configuring this set of runs, processing is performed so that runs having a plurality of types of label values associated in S407 are collected into one pixel block run information. As pixel block information to be finally generated, one pixel block includes a set of ID for identification, circumscribed rectangle information, pixel value, and a set of run information collected in the pixel block.

（ラベリング処理の処理例）
図４のラベリング処理を適用した場合の処理例を図５に示す。図５（ａ）は処理対象となる減色画像の例であり、ひとつのマスが１画素を表す幅６×高さ３画素の画像で、各マス内の数値は画素値を示す。図５（ｂ）は図５（ａ）に対するラベリング処理の結果の例である。 (Example of labeling process)
An example of processing when the labeling processing of FIG. 4 is applied is shown in FIG. FIG. 5A shows an example of a subtractive color image to be processed. An image having a width of 6 × 3 pixels in which one square represents one pixel, and a numerical value in each square represents a pixel value. FIG. 5B is an example of the result of the labeling process with respect to FIG.

ラベリング処理では、まず最上のライン（ｙ＝０のライン）に注目して、左端から始まる画素値３のラン５０１が抽出される。これより上にはラインが存在しないので連結ランは無く、ラン５０１には最初のラベルＬ１が与えられる。続く画素値１のラン５０２にも同様に新規ラベルＬ２が与えられる。 In the labeling process, first, paying attention to the uppermost line (line of y = 0), a run 501 having a pixel value of 3 starting from the left end is extracted. Since there is no line above this, there is no connected run and the run 501 is given the first label L1. A new label L2 is similarly given to the subsequent run 502 having a pixel value of 1.

最上ラインにもう画素は無いので次のライン（ｙ＝１のライン）に移動し、画素値１のラン５１１が抽出される。上ライン（ｙ＝０のライン）にはラン５１１に連結する画素値１のランは無いので、ラン５１１には、新規ラベルＬ３が与えられる。次の画素値３のラン５１２は、上ライン（ｙ＝０のライン）に同値のラン５０１が存在する。連結ランはこのラン５０１だけなので、ラン５０１のラベルＬ１がラン５１２にも与えられる。続いて、画素値１のラン５１３が抽出され、同様に上ライン（ｙ＝０のライン）の連結ランであるラン５０２のラベルＬ２が与えられる。続いて画素値３のラン５１４が抽出され、連結ランが無いので、新規ラベルＬ４が与えられる。３ライン目（ｙ＝２のライン）に移動し、画素値１のラン５２１が抽出され、上の連結ランであるラン５１１のラベルＬ３が与えられる。続く画素値３のラン５２２は上ライン（ｙ＝１のライン）の連結ランとして、ラン５１２とラン５１４が存在する。かつ、ラン５１２とラン５１４のラベル値はそれぞれＬ１およびＬ４と異なるので、ラン５２２には連結ランのうち最初に検出されたラン５１２のラベルＬ１が与えられる。加えて、ラベルＬ１とラベルＬ４を同一ラベルとみなすための関連付け情報が生成される。最後に、画素値２のラン５２３が抽出され、新規ラベルＬ５が与えられる。 Since there are no more pixels in the uppermost line, the process moves to the next line (y = 1 line), and a run 511 having a pixel value of 1 is extracted. Since there is no run of pixel value 1 connected to the run 511 in the upper line (line of y = 0), the new label L3 is given to the run 511. The next run 512 with a pixel value of 3 has a run 501 with the same value on the upper line (line with y = 0). Since this run 501 is the only connected run, the label L1 of the run 501 is also given to the run 512. Subsequently, a run 513 with a pixel value of 1 is extracted, and a label L2 of a run 502 that is a connected run of the upper line (y = 0 line) is also given. Subsequently, a run 514 with a pixel value of 3 is extracted, and since there is no connected run, a new label L4 is given. Moving to the third line (y = 2 line), a run 521 with a pixel value of 1 is extracted, and a label L3 of a run 511, which is the upper connected run, is given. A subsequent run 522 with a pixel value of 3 includes a run 512 and a run 514 as a connected run of the upper line (line of y = 1). In addition, since the label values of the run 512 and the run 514 are different from those of L1 and L4, the run 522 is given the label L1 of the run 512 detected first among the connected runs. In addition, association information for regarding the labels L1 and L4 as the same label is generated. Finally, a run 523 with a pixel value of 2 is extracted and given a new label L5.

図５（ｃ）はラベル付けされたラン集合から生成される画素塊情報の例である。ＩＤ１の画素塊は、前述の関連付けに基づき、ラベルＬ１のラン５０１、５１２、５２２およびラベルＬ４のラン５１４の集まりとして構成されている。またその矩形範囲として（０，０）−（５，２）、および画素値３が記憶されている。ＩＤ２の画素塊は、ラベルＬ２のラン５０２およびラン５１３により構成される、矩形範囲（２，０）−（５，１）、および画素値１の画素塊である。以降、ＩＤ３、ＩＤ４も同様に示される。なお、一度画素塊情報が構成されれば、ランの集合を構成する際に使用されたラベルの情報Ｌｋは破棄してもよい。また、各ラン情報が持つ座標を、図５（ｃ）に記されるように各画素塊の外接矩形の左上を原点とするように再設定してもよい。 FIG. 5C shows an example of pixel block information generated from the labeled run set. The pixel block of ID1 is configured as a collection of runs 501, 512, and 522 of label L1 and runs 514 of label L4 based on the association described above. Further, (0,0)-(5,2) and pixel value 3 are stored as the rectangular range. The pixel block of ID2 is a pixel block having a rectangular range (2, 0)-(5, 1) and a pixel value of 1 constituted by the run 502 and the run 513 of the label L2. Hereinafter, ID3 and ID4 are also shown in the same manner. Note that once the pixel block information is configured, the label information Lk used in configuring the run set may be discarded. Further, the coordinates of each run information may be reset so that the upper left corner of the circumscribed rectangle of each pixel block is the origin as shown in FIG.

（画素塊解析部による処理結果の例）
図６に画素塊解析部２０１による処理結果の例を示す。図６（ａ）は画素塊解析部２０１に入力される減色画像の例である。図６（ｂ）は、図６（ａ）に示す減色画像から抽出された画素塊情報の例であり、かつＳ３０４で生成された画素塊の接触関係を矢印で示した図である。図６（ｃ）は、図６（ｂ）に示す画素塊情報の例において、Ｓ３０５で生成された包含関係を矢印で示した図である。この矢印は矢の先が子、元が親を示している。図６（ｄ）は、図６（ｂ）および図６（ｃ）から構成された画素塊ツリー情報の例である。図６（ｄ）の画素塊６０１は包含関係による親画素塊が存在しないため、接触している画素塊６０２が親とする画素塊を親とするようにツリー構造が構築されている。 (Example of processing results by the pixel block analysis unit)
FIG. 6 shows an example of the processing result by the pixel block analysis unit 201. FIG. 6A shows an example of a subtractive color image input to the pixel block analysis unit 201. FIG. 6B is an example of pixel block information extracted from the color-reduced image shown in FIG. 6A, and is a diagram showing the contact relationship of the pixel blocks generated in S304 with arrows. FIG. 6C is a diagram showing the inclusion relationship generated in S305 with arrows in the example of the pixel block information shown in FIG. In this arrow, the tip of the arrow indicates a child, and the original indicates a parent. FIG. 6D is an example of pixel block tree information configured from FIGS. 6B and 6C. Since the pixel block 601 in FIG. 6D has no parent pixel block due to the inclusion relationship, the tree structure is constructed so that the pixel block 602 that is in contact with the pixel block 602 is the parent.

なお、図３のフローチャートでは、画素塊解析処理により入力画像データ全体を一度に処理するように説明した。これに対し、入力画像データを複数部分に分け、各部分画像の入力と画素塊情報の抽出を繰り返しながら行っても構わない。例えば、特許文献３では、３２画素四方のタイルをひとつの処理単位とし、画像の左上から順に画像入力、量子化、タイル内画素塊であるｂｌｏｂの作成、を繰り返すよう処理する例が説明されている。この特許文献３による説明では更に、処理済タイルである上タイルおよび左タイルにあるｂｌｏｂを現タイルのｂｌｏｂと結合することで、結果的に入力画像２００と同じサイズも含む任意の大きさの画素塊生成している。この処理方法を適用することにより、本実施形態に係る画素塊データの生成処理に消費するメモリや処理時間を大幅に節約することも可能である。 In the flowchart of FIG. 3, it has been described that the entire input image data is processed at once by the pixel block analysis processing. On the other hand, the input image data may be divided into a plurality of parts, and input of each partial image and extraction of pixel block information may be repeated. For example, Patent Document 3 describes an example in which 32 pixel square tiles are used as one processing unit, and image input, quantization, and creation of a blob that is a pixel block within the tile are repeated in order from the upper left of the image. Yes. Further, in the description according to Patent Document 3, by combining the blobs in the upper tile and the left tile as processed tiles with the blob of the current tile, a pixel having an arbitrary size including the same size as the input image 200 as a result. A lump has been generated. By applying this processing method, it is possible to greatly save the memory and processing time consumed for the pixel block data generation processing according to the present embodiment.

（レイアウト解析部による処理）
続いて、レイアウト解析部２０２の処理を、図７のフローチャートを用いて説明する。本処理では、メモリ１０３上の画素塊データ２０６を入力とし、文字、線画、自然画、表といった文書領域間の構造に基づいた、領域データ２０７をメモリ１０３上に構成していく。 (Processing by the layout analysis unit)
Next, the processing of the layout analysis unit 202 will be described using the flowchart of FIG. In this process, the pixel block data 206 on the memory 103 is input, and area data 207 based on the structure between document areas such as characters, line drawings, natural images, and tables is configured on the memory 103.

Ｓ７０１では、レイアウト解析部２０２は、入力である画素塊データ２０６中の各画素塊を、文字候補画素塊とそれ以外の画素塊に分類する。画素塊が文字候補であるか否かの分類は、公知の文書画像解析技術で利用されている文字画素塊判定方法を用いればよい。例えば、画素塊の外接矩形サイズを利用し、予め定められた高さおよび幅の範囲に収まるものを文字候補とする方法がある。 In S701, the layout analysis unit 202 classifies each pixel block in the input pixel block data 206 into a character candidate pixel block and other pixel blocks. The classification of whether or not the pixel block is a character candidate may be performed using a character pixel block determination method used in a known document image analysis technique. For example, there is a method of using a circumscribed rectangular size of a pixel block and using a character candidate that falls within a predetermined height and width range.

本例においては、文字とみなす大きさを６ポイントから５０ポイントまでとし、入力画像２００の解像度から換算したＴｍｉｎ〜Ｔｍａｘの画素数に、画素塊の幅あるいは高さが収まるものを文字候補とする。ここでサイズに下限を設けることは、文字内部から抽出される背景相当の小画素塊を文字候補に含めないようにする効果がある。文字候補の判定条件に、更に画素の密度や比率、画素色などを加えてもよい。また大きさの閾値は、入力画像２００から実際抽出された画素塊の集合より幅・高さの頻度情報を得て、動的に定めるようにしてもよい。 In this example, the size considered as a character is from 6 points to 50 points, and a character candidate that has a pixel block width or height that falls within the number of pixels Tmin to Tmax converted from the resolution of the input image 200 is set as a character candidate. . Here, setting a lower limit on the size has an effect of preventing a character candidate from including a small pixel block corresponding to the background extracted from the inside of the character. Pixel density, ratio, pixel color, and the like may be further added to the character candidate determination conditions. The size threshold may be determined dynamically by obtaining frequency information of width and height from a set of pixel blocks actually extracted from the input image 200.

Ｓ７０２では、レイアウト解析部２０２は、Ｓ７０１で分類された文字候補の画素塊に対し、互いに近傍にあるものに対するグループ化を行う。近傍であるか否かの判定は、画素塊の外接矩形座標間のユークリッド距離を計算し、予め定められた閾値以下であることで判定することができる。これはあくまで一例であり、距離計算にシティブロック距離など別の計算方法を用いてもよい。また、複数の文字は行をなして記述され、一般に行内の文字間隔は行間隔より狭いことから、小さい距離閾値で文字行をなす文字候補画素塊をグループ化し、さらに大きな距離閾値で複数の文字行をグループ化するようにしてもよい。なお、本グループ化処理では、画素塊ツリー構造において同じ親を持つ文字候補画素塊同士のみがグループ化の対象になるものとする。これは、近傍計算の対象となる画素塊の組み合わせを減らし、処理を高速化するためである。 In step S 702, the layout analysis unit 202 groups the pixel candidates for character candidates classified in step S 701 with respect to those adjacent to each other. The determination as to whether or not the pixel is in the vicinity can be made by calculating the Euclidean distance between the circumscribed rectangular coordinates of the pixel block and not more than a predetermined threshold value. This is merely an example, and another calculation method such as a city block distance may be used for the distance calculation. In addition, since a plurality of characters are described in a line, and the character spacing within the line is generally narrower than the line spacing, character candidate pixel blocks forming a character line are grouped with a small distance threshold, and a plurality of characters are Lines may be grouped. In this grouping process, only character candidate pixel blocks having the same parent in the pixel block tree structure are to be grouped. This is to reduce the number of pixel block combinations to be subjected to the neighborhood calculation and speed up the processing.

Ｓ７０３では、レイアウト解析部２０２は、Ｓ７０２でグループ化された文字候補画素塊の集合が、実際に文字集合であるか否かの判定をグループ毎に行う。そして、レイアウト解析部２０２は、文字集合であるとされたグループの画素塊の存在範囲をそれぞれ文字領域として特定する。そして特定された領域それぞれに対し、領域の座標と対応画素塊への関連付け情報を含む領域情報が、領域データ２０７の構成要素として記憶される。すなわち、文字領域に対しては、グループ化された文字候補画素集合への関連付け情報と、同画素塊を囲む外接矩形座標情報とが文字領域情報として記憶される。 In step S703, the layout analysis unit 202 determines, for each group, whether the set of character candidate pixel blocks grouped in step S702 is actually a character set. Then, the layout analysis unit 202 specifies the existence ranges of the pixel blocks of the group that is regarded as a character set as character regions. Then, for each identified area, area information including the area coordinates and association information to the corresponding pixel block is stored as a component of the area data 207. That is, for the character region, association information to the grouped character candidate pixel set and circumscribed rectangular coordinate information surrounding the same pixel block are stored as character region information.

文字集合であるか否かの判定は、例えば、グループを含む矩形範囲で文字候補画素塊の射影を縦横に求め、文字列としての整列性を示すか否かで判断する方法がある。具体的には、横書きならば水平の射影、縦書きならば垂直の射影において、行部分には山、行間には谷となる頻度分布が観測されるものは文字領域の可能性が高いと判断できる。さらに、文字同士は斜体などの例外を除き、互いの外接矩形が大きく重複することは少ない。したがって、他の画素塊と大きな重複がないことも、文字領域か否かの判定の有効な手段となる。ただし、漢字などで１つの文字が複数の重複する画素塊に分割されているケースを除くために、重複判定を一定以上の大きさの画素塊間のみに制限することが効果的である。 For example, there is a method for determining whether or not the character set is a character set by determining whether a character candidate pixel block is projected vertically and horizontally in a rectangular range including a group, and indicating whether or not the character string is aligned. Specifically, in horizontal projection for horizontal writing and vertical projection for vertical writing, if a frequency distribution with peaks in the line part and valleys in the line is observed, it is determined that there is a high possibility of a character area. it can. In addition, with the exception of italics and other characters, the circumscribed rectangles do not overlap significantly. Therefore, the fact that there is no large overlap with other pixel blocks is also an effective means for determining whether or not it is a character area. However, in order to exclude the case where one character is divided into a plurality of overlapping pixel blocks such as kanji, it is effective to limit the overlap determination to only between pixel blocks having a certain size or more.

なお、文字領域として特定されたあとに、同領域内にある別の画素塊を文字候補画素塊の集合に追加してもよい。例えば、句読点や文字内の独立点部分の画素塊は、サイズ制限により文字候補画素塊に選出されていない可能性が高い。これら文字候補画素塊に含めるために、既に文字候補である画素塊と同色で近傍にある小サイズの画素塊を追加する処理を行ってもよい。 After the character area is specified, another pixel block in the same area may be added to the set of character candidate pixel blocks. For example, there is a high possibility that a pixel block of an independent point portion within a punctuation mark or character is not selected as a character candidate pixel block due to size restrictions. In order to include in these character candidate pixel blocks, a process of adding a small-sized pixel block having the same color as that of the pixel block that is already a character candidate in the vicinity may be performed.

Ｓ７０４では、レイアウト解析部２０２は、Ｓ７０１で文字候補以外に分類された画素塊から、線画・表枠候補の画素塊を選出する。線画・表枠候補か否かの判定は、文字候補以上の大きさであり、かつ画素塊の存在範囲全体に対する画素密度が低いことにより判断できる。 In step S 704, the layout analysis unit 202 selects a pixel block as a line drawing / table frame candidate from the pixel blocks classified in addition to the character candidates in step S 701. The determination as to whether or not the candidate is a line drawing / table frame can be made based on a size larger than that of the character candidate and a low pixel density with respect to the entire pixel cluster existence range.

Ｓ７０５では、レイアウト解析部２０２は、Ｓ７０４で線画・表枠候補とされた画素塊に対し、表枠であるか否かの判定を行い、表枠と判定された画素塊の存在領域を、表領域として特定する。そして、レイアウト解析部２０２は、対応する表枠の画素塊への関連付け情報と、同画素塊の外接矩形座標を含む領域情報とを、表領域情報として領域データ２０７に記憶する。 In step S705, the layout analysis unit 202 determines whether or not the pixel block determined as the line drawing / table frame candidate in step S704 is a table frame, and displays the existence area of the pixel block determined as the table frame. Specify as an area. The layout analysis unit 202 stores the association information of the corresponding table frame with the pixel block and the region information including the circumscribed rectangular coordinates of the pixel block in the region data 207 as the table region information.

表領域情報表枠か否かの判定は、例えば、画素塊の存在範囲において、画素塊のラン情報から縦横の画素ヒストグラムを計算し、その形状から判定を行えばよい。つまり画素塊が表枠に相当するものであれば、縦横の表外枠および罫線の存在する部分に、複数の鋭いピークがヒストグラム上に発生することを利用する。これらを検出することで、表枠かどうかの判定を行うことができる。あるいは、表枠画素塊の子に相当する画素塊の集合により判断することもできる。表枠の子となる画素塊は、表内の枠領域に相当するため、子領域すべてが矩形形状であり、かつ重なりなく整列していることは、表枠であることの有効な判断手段となる。 Whether or not the table area information table frame is present may be determined, for example, by calculating a vertical and horizontal pixel histogram from the run information of the pixel block in the presence range of the pixel block and determining the shape. That is, if a pixel block corresponds to a front frame, the fact that a plurality of sharp peaks are generated on the histogram in a portion where vertical and horizontal outer frames and ruled lines exist is utilized. By detecting these, it is possible to determine whether the frame is a table frame. Alternatively, the determination can be made based on a set of pixel blocks corresponding to the children of the table frame pixel block. Since the pixel block that is a child of the table frame corresponds to a frame region in the table, all the child regions are rectangular and aligned without overlapping, which is an effective determination means for being a table frame. Become.

Ｓ７０６では、レイアウト解析部２０２は、Ｓ７０４で線画・表枠候補とされ、かつＳ７０５で表枠と判定されなかった画素塊の存在領域を、線画領域として特定する。そして、レイアウト解析部２０２は、対応する線画の画素塊への関連付け情報と、同画素塊の外接矩形座標を含む領域情報とを、線画領域情報として領域データ２０７に記憶する。このとき、線画と判定された画素塊の近傍にある画素塊をグループ化した範囲を線画領域としてもよい。 In step S 706, the layout analysis unit 202 identifies a pixel block existence area that has been determined to be a line drawing / table frame candidate in step S 704 and has not been determined to be a table frame in step S 705 as a line drawing area. Then, the layout analysis unit 202 stores the association information of the corresponding line drawing with the pixel block and the region information including the circumscribed rectangular coordinates of the pixel block in the region data 207 as the line drawing region information. At this time, a range obtained by grouping pixel blocks in the vicinity of the pixel block determined to be a line drawing may be used as the line drawing region.

Ｓ７０７では、レイアウト解析部２０２は、ここまでに記憶されたどの領域にも対応しない画素塊の中から、写真等の自然画領域と判定される画素塊、もしくは画素塊の集合を選出し、その存在領域を自然画領域情報として記憶する。そして、レイアウト解析部２０２は、対応する画素塊集合への関連付け情報と、同画素塊の存在範囲の座標を含む領域情報とを、自然画領域情報として領域データ２０７に記憶する。 In step S707, the layout analysis unit 202 selects a pixel block or a set of pixel blocks determined as a natural image region such as a photo from pixel blocks that do not correspond to any of the regions stored so far. The existing area is stored as natural image area information. Then, the layout analysis unit 202 stores the association information to the corresponding pixel block set and the region information including the coordinates of the existence range of the pixel block as the natural image region information in the region data 207.

自然画領域か否かの判定は、複数色の画素塊が重複、あるいは包含されるように存在しており、かつそれら画素塊の集合が一定の大きさ内の矩形を構成しているものを、矩形状の写真に相当する自然画領域と判定することにする。この判定はあくまで一例であって、任意形状の画素塊集合を対象にしてもよい。 Whether or not it is a natural image area is determined so that pixel clusters of multiple colors exist so that they overlap or are included, and the set of pixel clusters forms a rectangle within a certain size. The natural image area corresponding to the rectangular photograph is determined. This determination is merely an example, and an arbitrarily shaped pixel block set may be targeted.

Ｓ７０８では、レイアウト解析部２０２は、ここまでに記憶されたどの領域にも対応しない画素塊の中から、一定以上の密度および面積を持つ画素塊を、フラット領域として記憶する。無地のページ全体を占める領域、文字や図の背景などで意味的なまとまりを持たせるために着色された色背景領域や、表内セルの背景などがこのフラット領域に相当する。 In step S 708, the layout analysis unit 202 stores a pixel block having a density and area of a certain level or more as a flat region from pixel blocks that do not correspond to any region stored so far. An area that occupies the entire plain page, a colored background area that is colored in order to provide a meaningful grouping of characters and the background of a figure, and the background of a table cell correspond to this flat area.

Ｓ７０９では、レイアウト解析部２０２は、Ｓ７０８までに領域データ２０７に記憶されている各々領域をノードとし、その領域間の相対関係を表現する領域ツリーを構成する。領域ツリーの起点には、入力画像の全体範囲に相当する特殊なルートノードを配置する。そして、領域ツリーのノード間の親子関係は、画素塊ツリーにおいて各領域に対応する画素塊ノードが持つ親子関係と一致するようにする。ツリー構造を構成するための具体的処理としては、各領域データに親領域へのリンク情報、および子領域へのリンク情報のリストを付与することでツリー構造を構成する。なお、リストを実現するためのデータ構造は任意であり、前述のように各領域ノードが親子、兄弟関係にある領域ノードへのポインタを持つようにしてもよいし、子領域を配列構造で保持するようにしてもよい。 In S709, the layout analysis unit 202 configures a region tree that represents each region stored in the region data 207 up to S708 as a node and represents a relative relationship between the regions. A special root node corresponding to the entire range of the input image is arranged at the starting point of the region tree. The parent-child relationship between the nodes of the region tree is made to match the parent-child relationship of the pixel block node corresponding to each region in the pixel block tree. As a specific process for constructing the tree structure, the tree structure is constructed by giving each region data a list of link information to the parent region and link information to the child region. Note that the data structure for realizing the list is arbitrary. As described above, each area node may have a pointer to an area node in a parent-child / sibling relationship, and the child area is held in an array structure. You may make it do.

Ｓ７１０では、レイアウト解析部２０２は、Ｓ７０９で生成された領域ツリーに存在する、表領域の各々に対し、表内のセル構造解析を行う。つまり、表構造解析として、表領域における表がどのような行列構造（セル構造）となっているかを解析する。そして、レイアウト解析部２０２は、解析された構造に基づいて配置されるセルの集合を特定し、セル要素リスト情報として表領域ノードに関連づけて記憶する。ここで「セル」とは、表を構成する項目のひとつに対応し、一般には罫線により４方を囲まれた表内の小領域である。セルは、表内で「行」すなわち横方向の連なり、および「列」すなわち縦方向の連なりからなる行列構造を成す。この行や列にまとまった意味を持つ項目を並べることで、表形式の情報は成立している。 In S710, the layout analysis unit 202 performs in-table cell structure analysis for each of the table areas existing in the area tree generated in S709. That is, as the table structure analysis, the matrix structure (cell structure) of the table in the table region is analyzed. Then, the layout analysis unit 202 identifies a set of cells to be arranged based on the analyzed structure, and stores it as cell element list information in association with the table area node. Here, the “cell” corresponds to one of the items constituting the table, and is generally a small area in the table surrounded on four sides by ruled lines. The cells form a matrix structure consisting of "rows" or horizontal series and "columns" or vertical series in the table. Tabular information is established by arranging items having a grouped meaning in these rows and columns.

したがって、セルの行列構造を解析することは、画像化された表の内容を、再利用可能なデータとして記述するために重要である。この解析情報を後述の電子文書記述に利用するために、Ｓ７１０で生成されるセル要素リスト情報には、各セルの意味に基づく行列上の座標、いわゆる論理座標が付与される。なお、セル要素リストを実現するためのデータ構造は任意である。したがって、データ構造として、セル要素の配列や、セル要素を差すポインタの配列構造をとってもよいし、セル要素毎に前後のセル要素を差すポインタを持たせる構造をとってもよい。 Therefore, analyzing the matrix structure of the cells is important for describing the contents of the imaged table as reusable data. In order to use this analysis information for the electronic document description described later, the cell element list information generated in S710 is given coordinates on a matrix based on the meaning of each cell, so-called logical coordinates. Note that the data structure for realizing the cell element list is arbitrary. Therefore, the data structure may be an array of cell elements, an array of pointers that point to cell elements, or a structure that has a pointer that points to the previous and next cell elements for each cell element.

Ｓ７１０における、表内のセル構造の解析処理の一例を、図１５を用いて以下に説明する。 An example of the analysis process of the cell structure in the table in S710 will be described below with reference to FIG.

図１５（ａ）の表枠１５０１は解析対象となる表枠の画像である。まず表枠の画素塊のラン情報から、表枠画素の垂直・水平射影によるヒストグラムが作成される。ヒストグラム１５０２は、表枠１５０１に対する垂直射影のヒストグラムの例である。また、ヒストグラム１５０３は、表枠１５０１に対する水平射影のヒストグラムの例である。 A table frame 1501 in FIG. 15A is an image of a table frame to be analyzed. First, a histogram based on vertical and horizontal projections of the table frame pixels is created from the run information of the pixel blocks of the table frame. A histogram 1502 is an example of a histogram of vertical projection with respect to the table frame 1501. A histogram 1503 is an example of a histogram of horizontal projection with respect to the table frame 1501.

図１５（ｂ）は、生成されたヒストグラムのピーク位置集合によって表枠範囲を４×３の升目領域に分割した例である。つまり表領域は、垂直方向射影のピーク位置ｘ０，ｘ１，．．．，ｘｎのそれぞれに垂直線を、水平方向射影のピーク位置がｙ０，ｙ１，．．．，ｙｍのそれぞれに水平線を引き、線で囲まれた升目を１単位とするｎ×ｍ個の領域へと分割される。 FIG. 15B is an example in which the table frame range is divided into 4 × 3 grid areas by the peak position set of the generated histogram. That is, the table area includes the peak positions x0, x1,. . . , Xn with vertical lines and horizontal projection peak positions y0, y1,. . . , Ym, and a horizontal line is drawn to divide into n × m areas each having a square surrounded by the line as one unit.

続いて、この升目上での位置（ｉ，ｊ）を論理座標として、各々のセル要素を抽出する。具体的には、各升目位置における上下左右４方向の境界線上に、実際の表画像内の罫線が存在するかどうかを、表枠の画素塊情報を用いてチェックする。そして、実在する罫線で囲まれる範囲をそれぞれひとつのセル要素として抽出する。例えば、論理座標（ｉ，ｊ）のセル要素は、ある位置（ｉ，ｊ）の４方向の境界すべてに罫線が実在する場合に抽出される。あるいは、（ｉ，ｊ）と（ｉ＋１，ｊ）の両升目を囲むように罫線が実在し、両者の境界に罫線が存在しない場合には、論理座標範囲（ｉ，ｊ）−（ｉ＋１，ｊ）のセル要素として抽出される。図１５のケースで各々抽出されたセル要素と、各々に付与された論理座標の例を図１５（ｃ）に示す。 Subsequently, each cell element is extracted using the position (i, j) on the grid as a logical coordinate. Specifically, whether or not there is a ruled line in the actual table image on the boundary lines in the upper, lower, left, and right directions at each grid position is checked using the pixel block information of the table frame. Then, each range surrounded by the existing ruled lines is extracted as one cell element. For example, the cell element of the logical coordinates (i, j) is extracted when ruled lines are actually present at all four-direction boundaries of a certain position (i, j). Alternatively, if a ruled line actually exists so as to surround both cells of (i, j) and (i + 1, j), and no ruled line exists at the boundary between them, the logical coordinate range (i, j) − (i + 1, j ) Cell element. FIG. 15C shows an example of the cell elements extracted in the case of FIG. 15 and the logical coordinates assigned to each.

なお、上述の表内セル構造の解析処理はあくまで一例であり、他の公知の表構造解析処理を用いてもよい。例えば、実際には罫線が引かれていない項目の境界を表内の文字列の分布から推測して、上記升目の分割位置に加えるようにしてもよい。あるいは、表枠の画素塊以外を用いてもよい。例えば、異なる色で塗られた罫線を考慮するために、表枠の画素塊の子の中から、垂直あるいは水平線状の画素塊を特定し、そのｘ座標あるいはｙ座標を上記升目の分割位置に加えてもよい。あるいは、罫線では無く塗り分けによって表の項目が分割されている場合に対応するために、表領域の子要素であるフラット領域の垂直・水平辺の位置で分割してもよい。また、画素塊のラン情報だけではなく、入力画像の画素情報を用いて解析処理を行ってもよい。 Note that the above-described analysis processing of the in-table cell structure is merely an example, and other known table structure analysis processing may be used. For example, the boundaries of items that are not actually drawn with a ruled line may be estimated from the distribution of character strings in the table and added to the division positions of the cells. Alternatively, other than the pixel block of the front frame may be used. For example, in order to consider ruled lines painted in different colors, a vertical or horizontal line-shaped pixel block is specified from among the pixel block children of the table frame, and the x-coordinate or y-coordinate is set as the division position of the grid. May be added. Alternatively, in order to cope with the case where the table items are divided not by ruled lines but by painting, it may be divided at the positions of the vertical and horizontal sides of the flat region which is a child element of the table region. Further, the analysis process may be performed using not only the pixel block run information but also the pixel information of the input image.

Ｓ７１１では、レイアウト解析部２０２は、Ｓ７１０で抽出した各セルと、同セルに含まれる文字や図、写真などの項目内容との関連づけを行う。具体的には、各セル要素の論理座標範囲に対応する画像上の座標範囲を求め、同範囲に包含される領域ノードへのリンクをセル要素が持つリンクのリストに追加する。従って、一つのセル内容に複数の領域ノードが含まれてもよい。 In step S 711, the layout analysis unit 202 associates each cell extracted in step S 710 with item contents such as characters, drawings, and photos included in the cell. Specifically, a coordinate range on the image corresponding to the logical coordinate range of each cell element is obtained, and a link to an area node included in the range is added to the list of links of the cell element. Therefore, a plurality of region nodes may be included in one cell content.

また内容が空のセル要素はサイズ０のリストを保持するものとする。なお、ある表領域のセル要素の内容となる領域ノードは、同表領域ノードの子または子孫のノードからのみ特定されるものとする。また、セル内容に含めるか否かを領域ノードの種類によって変えてもよい。例えば、フラット領域を除く領域をセル内容に含めるようにしてもよい。特に、表枠を構成する罫線の一部が分離して表領域の子領域になっている場合に、セル要素の境界に近い線画領域を、セル内容に含めないようにすることで、表形式出力時のノイズを防ぐことができる。 A cell element having an empty content holds a list of size 0. It is assumed that the area node that is the content of a cell element in a certain table area is specified only from the child or descendant node of the same table area node. Further, whether or not to include the cell contents may be changed depending on the type of the area node. For example, an area other than the flat area may be included in the cell contents. In particular, when part of the ruled lines that make up the table frame is separated and becomes a child area of the table area, a line drawing area close to the cell element boundary is not included in the cell contents, so that the table format Noise during output can be prevented.

（レイアウト解析部による処理結果の例）
図７のフローチャートにより説明した、レイアウト解析部２０２における処理の結果の例を、図９を用いて説明する。図９（ａ）は、画素塊解析部２０１により減色され、画素塊に分解される文書画像の例である。図９（ｂ）は、図９（ａ）に示す文書画像において抽出された画素塊を、画素塊ツリー構造で表現した例図である。図９（ａ）、（ｂ）において、画素塊９０１、９０２、９０３はそれぞれ元画像において文字に相当する画素塊の集合である。図面上の都合で表現はされないが、画素塊９０１は黒色、画素塊９０２は青色、画素塊９０３は赤色のそれぞれ文字であるとする。画素塊９０４〜９０８は、表を構成する画素塊である。表枠である画素塊９０４は黒色の画素塊である。表内部の枠領域背景にそれぞれ相当する、画素塊９０５、９０７、９０８は明るい灰色の背景色の画素塊、画素塊９０６は画素塊９０５等による背景色よりやや暗い灰色の背景色の画素塊である。画素塊９０９は濃い灰色の星型の線画に相当する画素塊である。画素塊９１０、９１１は減色により二色に分離した、写真部分に相当する画素塊である。画素塊９００は画像全体の背景となる白色の画素塊である。なお図９では、文字内部の小さな画素塊に関しては記載を省略している。 (Example of processing result by layout analysis unit)
An example of the result of processing in the layout analysis unit 202 described with reference to the flowchart of FIG. 7 will be described with reference to FIG. FIG. 9A is an example of a document image that is reduced in color by the pixel block analysis unit 201 and decomposed into pixel blocks. FIG. 9B is an example diagram in which pixel blocks extracted from the document image shown in FIG. 9A are expressed by a pixel block tree structure. 9A and 9B, pixel blocks 901, 902, and 903 are sets of pixel blocks corresponding to characters in the original image. Although not expressed for convenience in the drawing, the pixel block 901 is black, the pixel block 902 is blue, and the pixel block 903 is red. Pixel blocks 904 to 908 are pixel blocks constituting a table. A pixel block 904 that is a front frame is a black pixel block. The pixel blocks 905, 907, and 908 corresponding to the frame area background in the table are pixel blocks having a light gray background color, and the pixel block 906 is a pixel block having a slightly darker gray background color than the background color of the pixel block 905 or the like. is there. A pixel block 909 is a pixel block corresponding to a dark gray star-shaped line drawing. Pixel blocks 910 and 911 are pixel blocks corresponding to a photographic part, which are separated into two colors by color reduction. The pixel block 900 is a white pixel block serving as the background of the entire image. In FIG. 9, the description of the small pixel block inside the character is omitted.

図９（ｂ）において、矢印は包含関係に基づく親子関係を示している。例えば、表を構成する画素塊９０４〜９０８において、画素塊９０５は表の左上枠内の単色背景領域である。画素塊９０５は内部に２つの文字の領域である画素塊９０２を包含するので、両者は親子関係となる。一方、表の右上枠内の背景領域は、上下に異なる色の画素塊９０６、９０７に分かれている。その結果、文字の画素塊９０３はどちらの画素塊にも包含されないので、それらの親となる表枠の画素塊９０４の直接の子となる。こういったケースはオリジナルの入力画像が元々そのように色分けされている場合の他、ノイズや減色処理を要因に、単色領域が意図せず過分割されて生ずることもある。いずれにせよ、カラーの画素塊構造抽出処理においては、普遍的に発生するケースであることを考慮すべきである。 In FIG. 9B, arrows indicate parent-child relationships based on inclusion relationships. For example, in the pixel blocks 904 to 908 constituting the table, the pixel block 905 is a monochrome background area in the upper left frame of the table. Since the pixel block 905 includes a pixel block 902 that is an area of two characters inside, the two have a parent-child relationship. On the other hand, the background area in the upper right frame of the table is divided into pixel clusters 906 and 907 of different colors in the upper and lower sides. As a result, since the character pixel block 903 is not included in either pixel block, it becomes a direct child of the table block pixel block 904 serving as the parent. Such a case may occur when the original input image is originally color-coded as described above, or the monochromatic region is unintentionally excessively divided due to noise or color reduction processing. In any case, it should be considered that the color pixel block structure extraction process is a universal case.

図９（ｃ）は、図９（ｂ）の画素塊ツリーから生成される領域ツリーの例である。その生成過程を図７のフローチャートのＳ７０１〜Ｓ７０９に従って以下説明する。まずＳ７０１〜Ｓ７０３にて、レイアウト解析部２０２は、画素塊９０１、９０２、９０３の３つの画素塊グループが文字候補画素塊として選出する。そして、レイアウト解析部２０２は、文字領域の判定条件を満たす、各存在範囲が文字領域として子領域ノード９２１、９２２、９２３を記憶する。 FIG. 9C is an example of a region tree generated from the pixel block tree of FIG. The generation process will be described below according to S701 to S709 in the flowchart of FIG. First, in S701 to S703, the layout analysis unit 202 selects three pixel block groups of pixel blocks 901, 902, and 903 as character candidate pixel blocks. The layout analysis unit 202 stores child area nodes 921, 922, and 923 in which each existence range that satisfies the character area determination condition is a character area.

Ｓ７０４で、レイアウト解析部２０２は、画素塊９０４、９０８を線画・表枠候補として選出する。Ｓ７０５では、レイアウト解析部２０２は、画素塊９０４を表枠と判定し、表領域として子領域ノード９２４を記憶する。Ｓ７０６では、レイアウト解析部２０２は、画素塊９０９を線画と判定し、線画領域として子領域ノード９２８を記憶する。 In step S704, the layout analysis unit 202 selects the pixel blocks 904 and 908 as line drawing / table frame candidates. In step S705, the layout analysis unit 202 determines that the pixel block 904 is a table frame, and stores a child region node 924 as a table region. In step S706, the layout analysis unit 202 determines that the pixel block 909 is a line drawing, and stores a child area node 928 as a line drawing area.

Ｓ７０７で、レイアウト解析部２０２は、画素塊９１０、９１１が自然画領域を構成すると判定し、自然画領域として子領域ノード９３０を記憶する。Ｓ７０８では、レイアウト解析部２０２は、残る画素塊９００、９０５、９０６、９０７、９０８をいずれもフラット領域として記憶する。そしてＳ７０９にて、レイアウト解析部２０２は、各領域をノードとし、各々対応する画素塊ツリーの親子構造を反映した領域ツリー構造を生成する。図９（ｃ）において、線で結ばれる領域は親子関係を持つ。 In step S707, the layout analysis unit 202 determines that the pixel blocks 910 and 911 form a natural image area, and stores the child area node 930 as the natural image area. In S708, the layout analysis unit 202 stores the remaining pixel blocks 900, 905, 906, 907, and 908 as flat areas. In step S709, the layout analysis unit 202 uses each region as a node and generates a region tree structure that reflects the parent-child structure of the corresponding pixel block tree. In FIG. 9C, areas connected by lines have a parent-child relationship.

図９（ｄ）は、Ｓ７１０の処理により表領域である子領域ノード９２４に付加されたセル要素のリストの構成例である。表領域である画素塊９０４から解析された、２行２列の行列構造に基づき、セル要素９４１は論理座標（１，１）を持ち、セル内容として文字領域である子領域ノード９２２へのリンクを有する。同様にセル要素９４２は論理座標（２，１）を持ち、セル内容として文字領域である子領域ノード９２３へのリンクを有する。セル要素９４３は存在範囲として論理座標（２，１）−（２，２）を持ち、セル内容を持たないのでリンクは有してない。なお、図９（ｃ）と図９（ｄ）の参照番号が一致するもの同士が対応している。 FIG. 9D is a configuration example of a list of cell elements added to the child area node 924 that is a table area by the processing of S710. Based on the 2-by-2 matrix structure analyzed from the pixel block 904 that is the table area, the cell element 941 has logical coordinates (1, 1) and is linked to the child area node 922 that is the character area as the cell contents. Have Similarly, the cell element 942 has logical coordinates (2, 1), and has a link to a child area node 923 that is a character area as cell contents. The cell element 943 has logical coordinates (2, 1)-(2, 2) as an existence range, and has no cell contents, and therefore has no link. In addition, the thing with the same reference number of FIG.9 (c) and FIG.9 (d) respond | corresponds.

（グラフィックスデータ生成部による処理）
続いて、グラフィックスデータ生成部２０３の処理を説明する。グラフィックスデータ生成部２０３は、領域データ２０７に含まれる各領域をグラフィックスオブジェクトとして表現する為のグラフィックスデータ２０８を生成する。ここで生成されるデータは、後述の電子文書記述生成部２０５において、各領域の内容をオブジェクトとして記述する際に利用される。以下、グラフィックスデータ生成部２０３の処理を図８のフローチャートを用いて説明する。 (Processing by the graphics data generator)
Next, the processing of the graphics data generation unit 203 will be described. The graphics data generation unit 203 generates graphics data 208 for expressing each area included in the area data 207 as a graphics object. The data generated here is used when the contents of each area are described as an object in the electronic document description generation unit 205 described later. Hereinafter, the processing of the graphics data generation unit 203 will be described with reference to the flowchart of FIG.

Ｓ８０１では、グラフィックスデータ生成部２０３は、出力電子文書２１０において、線図形部分のオブジェクトをグラフィックスで表現するためのベクトルデータを生成する。本例におけるベクトルデータの生成対象領域は、入力となる領域データ２０７中に存在する文字領域、線画領域、および表領域とする。生成されたベクトルデータは、領域データ２０７中の対応する領域ノードに関連付けられたうえで、記憶部であるメモリ１０３あるいはハードディスク１０４に保存される。 In step S 801, the graphics data generation unit 203 generates vector data for expressing the object of the line figure part in the output electronic document 210 with graphics. The generation target areas of vector data in this example are a character area, a line drawing area, and a table area existing in the input area data 207. The generated vector data is stored in the memory 103 or the hard disk 104 as a storage unit after being associated with the corresponding area node in the area data 207.

Ｓ８０１におけるベクトルデータ生成処理は、画素塊解析部２０１にて生成された画素塊データ２０６を用いて、対象領域に関連付けられた画素塊のラン情報から二値画像を構成し、当該二値画像に公知のベクトル化手法を用いて行う。例えば、輪郭線追跡により二値画像の外輪郭、内輪郭ループを抽出し、両者に対する直線、曲線近似を行ってベクトルパスデータを求めればよい。パスの塗り色には、画素塊に関連付けられた色情報を設定するものとする。 The vector data generation process in S801 uses the pixel block data 206 generated by the pixel block analysis unit 201 to construct a binary image from the pixel block run information associated with the target region, and A known vectorization method is used. For example, the outer contour and inner contour loop of the binary image may be extracted by contour tracking, and the vector path data may be obtained by performing straight line and curve approximation on both. It is assumed that color information associated with a pixel block is set as a pass paint color.

図１２（ａ）にベクトルデータ生成結果の例を示す。画素塊１２０１は文字Ａの画素塊形状を示しており、これにベクトル化処理を施すことでベクトルデータ１２１１が生成される。同様に星型の線画を成す画素塊１２０２から、ベクトルデータ１２１２が生成される。いずれの生成されたベクトルデータも、画素塊が持つ画素凹凸が直線近似により平坦化されて、先鋭な線図形として再現されている。またベクトルデータなので拡大、縮小しても画質劣化が少なく、かつ変形や着色等も容易であるため、編集用途に適したデータである。 FIG. 12A shows an example of the vector data generation result. A pixel block 1201 indicates the pixel block shape of the character A, and vector data 1211 is generated by performing vectorization processing on the pixel block 1201. Similarly, vector data 1212 is generated from a pixel block 1202 forming a star-shaped line drawing. In any of the generated vector data, the pixel unevenness of the pixel block is flattened by linear approximation and reproduced as a sharp line figure. Further, since the data is vector data, the image quality is hardly deteriorated even when enlarged or reduced, and the data is suitable for editing because it can be easily deformed and colored.

Ｓ８０２では、グラフィックスデータ生成部２０３は、出力電子文書２１０において、フラット領域部分のオブジェクトをグラフィックスで表現するためのベクトルデータを生成する。すなわち、グラフィックスデータ生成部２０３は、領域データ２０７中に存在するフラット領域の画素塊に対し、Ｓ８０１と同様のベクトル化処理を施し、領域ノードに関連付けてメモリ１０３あるいはハードディスク１０４に保存する。なお、本例では、ページ全体の下地に相当するような、ルート領域直下のフラット領域に対するベクトルデータは生成しないものとする。 In step S 802, the graphics data generation unit 203 generates vector data for expressing the flat area object in graphics in the output electronic document 210. That is, the graphics data generation unit 203 performs vectorization processing similar to S801 on the pixel block in the flat area existing in the area data 207, and stores it in the memory 103 or the hard disk 104 in association with the area node. In this example, it is assumed that vector data for a flat area immediately below the root area that corresponds to the background of the entire page is not generated.

また、Ｓ８０２のフラット領域のベクトル化処理では、ベクトル化対象の画素塊の内輪郭に関しては無視してベクトル化を行ってもよい。これは後述の電子文書記述生成時において、内輪郭にあたる部分は別オブジェクトが重畳され隠れるようになるからである。例えば、図１２（ｂ）の二値画像１２０３は、図９の画素塊９０５のラン情報から構築された二値画像であるが、ベクトル化処理の過程で白く抜けた文字部に相当する内輪郭は無視され、矩形状のベクトルデータ１２１３が生成される。一方、図９の画素塊９０６、９０７に対しては、ラン情報に相当する二値画像１２０４、１２０５から、文字の抜けによる凹凸を含む外輪郭形状そのままのベクトルデータ１２１４、１２１５が生成される。 Further, in the flat area vectorization process in S802, the inner contour of the pixel block to be vectorized may be ignored and vectorized. This is because when an electronic document description described later is generated, another object is superimposed and hidden in the portion corresponding to the inner contour. For example, the binary image 1203 in FIG. 12B is a binary image constructed from the run information of the pixel block 905 in FIG. 9, but the inner contour corresponding to the character portion that has been whitened out during the vectorization process. Are ignored and rectangular vector data 1213 is generated. On the other hand, for the pixel blocks 906 and 907 in FIG. 9, vector data 1214 and 1215 are generated as they are from the binary images 1204 and 1205 corresponding to the run information as they are without the outer contour shape including irregularities due to missing characters.

Ｓ８０３では、グラフィックスデータ生成部２０３は、出力電子文書２１０において、ベクトル化対象外の領域を表現するための、切り出し画像データを生成する。本例における切り出し画像データ生成領域は、領域データ２０７中に存在する自然画像の領域（自然画領域）とする。切り出された画像データは、領域データ２０７中の対応する領域ノードに関連付けられたうえで、メモリ１０３あるいはハードディスク１０４に保存される。ここで切り出し処理とは、入力画像２００を参照し、対象範囲の画素のみからなる同サイズの画像データを生成する処理である。切り出された画像データはＪＰＥＧ等の公知の圧縮技術で圧縮してもよい。 In step S 803, the graphics data generation unit 203 generates cut-out image data for expressing an area that is not vectorized in the output electronic document 210. The cutout image data generation area in this example is a natural image area (natural image area) existing in the area data 207. The extracted image data is stored in the memory 103 or the hard disk 104 after being associated with the corresponding area node in the area data 207. Here, the clipping process is a process of referring to the input image 200 and generating image data of the same size including only pixels in the target range. The clipped image data may be compressed by a known compression technique such as JPEG.

Ｓ８０４では、グラフィックスデータ生成部２０３は、出力電子文書２１０において、背景に用いられる背景画像データを生成する。生成された背景画像データは領域データ２０７のルートノードに関連づけられて、メモリ１０３もしくはハードディスク１０４に保存される。 In step S804, the graphics data generation unit 203 generates background image data used for the background in the output electronic document 210. The generated background image data is associated with the root node of the area data 207 and stored in the memory 103 or the hard disk 104.

背景画像とは、Ｓ８０１〜Ｓ８０３で生成されるベクトルデータや切り出し画像データを前景データとして、当該背景画像に重ねて描画することで、出力電子文書２１０が入力画像２００と同等の見た目を有するように用意されるものである。背景画像データに対しては、前景データが存在する領域の画素情報を入力画像２００から消去する処理を行う。 The background image is such that the output electronic document 210 has the same appearance as the input image 200 by rendering the vector data and cutout image data generated in S801 to S803 as foreground data and overlaying the background image. It will be prepared. For the background image data, a process of erasing the pixel information of the area where the foreground data exists from the input image 200 is performed.

画素情報の消去には、合成した出力電子文書２１０において、データが二重に見えるのを防ぐ効果がある。あるいは重畳により隠れてしまう領域に存在する無駄な画素情報を無くすことで圧縮効率を上げ、出力電子文書２１０をコンパクトにする効果がある。画素情報を消去する方法は、例えば、対象領域の矩形範囲をその周囲色で一様に塗り潰す方法がある。なお、対象領域が線図形領域の場合、線部分に相当する画素のみを、その近傍の画素色で塗り潰すようにすれば、線部分以外にあたる部分の色情報を背景情報に残すこともできる。 Erasing pixel information has the effect of preventing the data from appearing double in the synthesized output electronic document 210. Alternatively, there is an effect that the compression efficiency is improved by eliminating useless pixel information existing in an area hidden by superposition, and the output electronic document 210 is made compact. As a method of erasing pixel information, for example, there is a method of uniformly filling a rectangular range of a target area with surrounding colors. When the target area is a line figure area, if only the pixels corresponding to the line portion are filled with the neighboring pixel color, the color information of the portion other than the line portion can be left in the background information.

圧縮効率を向上させる場合、画素情報を消去するのは、Ｓ８０１でベクトル化された線図形の領域と、Ｓ８０３で切り出された自然画領域のみでもよい。両者に比し、フラット領域の画素情報量は少ないので、フラット領域が背景画像に残っていても圧縮効率を下げない為である。 When improving the compression efficiency, the pixel information may be erased only in the region of the line figure vectorized in S801 and the natural image region cut out in S803. This is because the amount of pixel information in the flat area is small compared to both, so that the compression efficiency is not lowered even if the flat area remains in the background image.

この場合に図９（ａ）の入力画像の例に対し生成される背景画像データの例を図１２（ｃ）の背景画像１２３０に示す。図９（ａ）中の線図形部分、すなわち文字領域である画素塊９０１〜９０３、線画領域である画素塊９０８、および表枠領域である画素塊９０４の線図形部分画素は周辺の画素色で塗りつぶされている。また、自然画領域である画素塊９０９に関しては、その矩形範囲全体が周辺の画素色で塗りつぶされている。 In this case, an example of background image data generated for the example of the input image in FIG. 9A is shown as a background image 1230 in FIG. In FIG. 9A, the line graphic portions of the line graphic portion of the pixel block 901 to 903 that is the character area, the pixel block 908 that is the line drawing area, and the pixel block 904 that is the table frame area are the peripheral pixel colors. It is filled. In addition, regarding the pixel block 909 which is a natural image region, the entire rectangular range is filled with surrounding pixel colors.

なお、グラフィックスデータ生成部２０３の処理で、どの種類の領域に対しベクトル化処理または画像切り出し処理を行うかは上述した例に限るものではない。例えば、線画、表領域に対して、画像切り出し処理を行うようにしてもよい。これら処理対象領域の選択は、電子文書生成処理の制御項目として、処理対象領域種類を外部指示により設定できるようにしてもよい。また、出力される電子文書の形式が複数あり、それぞれ別の用途がある場合、各用途に適したデータ形式を領域種別毎に変えられるようにしてもよい。 Note that the type of region in which the vectorization process or the image cutout process is performed in the processing of the graphics data generation unit 203 is not limited to the above-described example. For example, image cutout processing may be performed on a line drawing and a table area. For selection of these processing target areas, the processing target area type may be set by an external instruction as a control item of the electronic document generation process. In addition, when there are a plurality of formats of electronic documents to be output and each has a different use, a data format suitable for each use may be changed for each region type.

同様に、背景画像データ生成時にどの種類の領域に対して画素情報の消去処理を行うかを電子文書生成処理の制御項目として設定してもよいし、生成する電子文書の形式に合わせて変えられるようにしてもよい。 Similarly, the type of region for which pixel information is erased when generating background image data may be set as a control item for the electronic document generation process, or may be changed according to the format of the electronic document to be generated. You may do it.

（文字認識部による処理）
文字認識部２０４の処理を、図１０のフローチャートを用いて説明する。Ｓ１００１では、文字認識部２０４は、文字認識処理に入力する文字画像を生成する。本説明では、文字認識処理において、文字を含む二値画像を入力とすることを前提とし、各文字領域の二値画像を生成する。文字領域の二値画像とは、領域内の文字画素を１、それ以外を０とする、入力画像と同じ画素数の二値画像である。実際の処理では、レイアウト解析部２０２が生成した領域データ２０７中の各文字領域に対し、同領域内に存在する画素塊情報を画素塊解析部２０１が生成した画素塊データ２０６から読み出す。そして、各画素塊が持つラン部分が１、それ以外が０になるように入力画像２００と等サイズの画像を生成する。 (Processing by the character recognition unit)
The processing of the character recognition unit 204 will be described with reference to the flowchart of FIG. In step S1001, the character recognition unit 204 generates a character image to be input to the character recognition process. In this description, in the character recognition process, it is assumed that a binary image including characters is input, and a binary image of each character region is generated. A binary image of a character area is a binary image having the same number of pixels as the input image, where 1 is the character pixel in the area and 0 is the other. In actual processing, for each character area in the area data 207 generated by the layout analysis unit 202, the pixel block information existing in the area is read from the pixel block data 206 generated by the pixel block analysis unit 201. Then, an image having the same size as the input image 200 is generated so that the run portion of each pixel block is 1 and the others are 0.

Ｓ１００２では、文字認識部２０４は、各文字領域内に公知の文字認識処理を実行し、文字コード列を含む文字認識結果を得る。本実施形態において、文字認識結果は、文字領域情報、行情報、および認識文字情報で構成される。文字領域情報は文字が存在する範囲の座標と、認識された文字行数の情報を含む。行情報は、各行の行内文字数の情報を含む。認識文字情報は、各文字に対して認識された文字コードの情報を含む。認識文字情報には、文字認識処理により付加的に得られた各文字の情報を追加してもよい。例えば、各文字の外接矩形座標、行内の文字平均高さやピッチから推定される文字サイズ、太字、斜体、下線といった文字修飾情報、フォント種類などを付加してもよい。得られた文字認識結果は、領域データ２０７中の対応する文字領域に関連づけられて、メモリ１０３もしくはハードディスク１０４に記憶される。 In step S1002, the character recognition unit 204 executes a known character recognition process in each character area, and obtains a character recognition result including a character code string. In the present embodiment, the character recognition result includes character area information, line information, and recognized character information. The character area information includes information on the coordinates of the range in which characters exist and the number of recognized character lines. The line information includes information on the number of characters in each line. The recognized character information includes information on the character code recognized for each character. Information of each character additionally obtained by the character recognition process may be added to the recognized character information. For example, circumscribed rectangular coordinates of each character, character size estimated from the average height and pitch of characters in a line, character modification information such as bold, italic, and underline, font type, and the like may be added. The obtained character recognition result is stored in the memory 103 or the hard disk 104 in association with the corresponding character area in the area data 207.

領域データ２０７に含まれる複数の文字領域がある場合は、文字認識部２０４は、それぞれの領域に対しＳ１００１とＳ１００２の処理を行って各領域の文字認識結果を記憶する。なお、Ｓ１００１ですべての文字領域を含む入力と等サイズの二値画像を作成してから、各領域にＳ１００２の文字認識処理を施すように処理してもよい。 When there are a plurality of character areas included in the area data 207, the character recognition unit 204 performs the processing of S1001 and S1002 on each area and stores the character recognition result of each area. Note that a binary image having the same size as the input including all the character regions may be created in S1001, and then the processing may be performed so that the character recognition processing in S1002 is performed on each region.

（文字認識部による処理結果の例）
図１１は、ひとつの文字領域に対して得られる文字認識結果データ構成の例である。図１１（ａ）は処理対象となる文字領域の例であり、“ａｂｃ”、“１２３”、“ｉｆ”の３行からなる８つの文字を含んでいる。なお、１行目の“ａｂ”と“ｃ”との間には１文字分スペースが空いているものとする。 (Example of processing result by character recognition unit)
FIG. 11 is an example of a character recognition result data structure obtained for one character area. FIG. 11A shows an example of a character area to be processed, which includes eight characters consisting of three lines “ab c”, “123”, and “if”. It is assumed that there is a space for one character between “ab” and “c” on the first line.

図１１（ａ）の文字領域に対する文字認識結果の例を図１１（ｂ）に示す。この文字認識結果の領域情報には、領域全体の外接矩形座標と、認識された文字行数３が保持されている。行情報は各行に含まれる文字数として、それぞれ４、３、２が保持されている。認識文字情報は、各文字の文字コード、文字の外接矩形座標、推定文字サイズ、太字・斜体などの文字修飾種類で構成されており、領域内の１行目先頭文字から順に全行続けて配置されている。例えば、図１１（ａ）の１行目に対しては、ＡＳＣＩＩコードとしてそれぞれ０ｘ６１、０ｘ６２、０ｘ２０、０ｘ６３の４文字分の文字コード情報が保持されている。ここで３文字目は、スペース文字として認識された文字情報である。また、それぞれの文字の外接矩形座標、および本行内の共通情報として、推定文字サイズ１４（ポイント）と、文字修飾なしの情報が保持されている。 An example of a character recognition result for the character area of FIG. 11A is shown in FIG. In the area information of the character recognition result, circumscribed rectangular coordinates of the entire area and the number 3 of recognized character lines are held. The line information holds 4, 3, and 2 as the number of characters included in each line. Recognized character information consists of the character code of each character, the circumscribing rectangle coordinates of the character, the estimated character size, and the character modification types such as bold and italic. Has been. For example, for the first line in FIG. 11A, character code information of 4 characters of 0x61, 0x62, 0x20, and 0x63 is held as an ASCII code. Here, the third character is character information recognized as a space character. In addition, as the circumscribed rectangular coordinates of each character and the common information in this line, an estimated character size 14 (points) and information without character modification are held.

（電子文書記述生成部による処理）
電子文書記述生成部２０５の処理を図１３のフローチャートを用いて説明する。Ｓ１３０１では、電子文書記述生成部２０５は、表領域に関するデータの出力形式を指定するモードを設定する。ここで設定可能なモードは、通常出力と表形式出力の２択であるとする。両モードの説明は、実際にモード設定の影響をうけるＳ１３０５の処理内容説明の箇所で行う。また、いずれのモードが設定されるかは、本電子文書生成処理の制御項目として予め指定されているものとする。また、ここで設定したモードは、生成される出力電子文書２１０のフォーマットの違いに関連するとして、以下のステップの記述データのフォーマットや記述文法が、同モードによって変更されるようにしてもよい。 (Processing by electronic document description generator)
The processing of the electronic document description generation unit 205 will be described with reference to the flowchart of FIG. In step S1301, the electronic document description generation unit 205 sets a mode for designating an output format of data related to the table area. The mode that can be set here is assumed to be two options of normal output and tabular output. Both modes will be described in the description of the processing content of S1305 that is actually affected by the mode setting. Further, it is assumed that which mode is set is specified in advance as a control item of the electronic document generation process. Further, the mode set here is related to the format difference of the output electronic document 210 to be generated, and the format and description grammar of the description data in the following steps may be changed by the mode.

Ｓ１３０２では、電子文書記述生成部２０５は、出力電子文書２１０の開始部分を記述するデータを出力する。本説明では、出力先はメモリ１０３あるいはハードディスク１０４に確保される出力バッファ（不図示）である。以降、本処理でデータが出力される毎に、その内容は出力バッファ内に出力済のデータの末尾へ追記されていくものとする。 In step S 1302, the electronic document description generation unit 205 outputs data describing the start portion of the output electronic document 210. In this description, the output destination is an output buffer (not shown) secured in the memory 103 or the hard disk 104. Thereafter, each time data is output in this process, the contents are added to the end of the output data in the output buffer.

Ｓ１３０３では、電子文書記述生成部２０５は、出力電子文書２１０において、ページの開始部分を記述するデータを出力する。なお本例では、１の入力画像２００の内容を出力電子文書２１０の１ページに対応させるものとする。Ｓ１３０４では、電子文書記述生成部２０５は、領域データ２０７内におけるルートノードを最初の処理対象となる注目ノードに設定する。Ｓ１３０５は、後述の図１４のフローチャートで説明される処理関数を呼び出すステップである。このＳ１３０５における関数処理では、電子文書記述生成部２０５は、注目ノードに関する領域データ記述の出力処理を行う、さらにその子領域ノード以下に対しても順にデータ出力処理が行われるよう同処理関数の再帰呼び出しをする。 In step S 1303, the electronic document description generation unit 205 outputs data describing the start portion of the page in the output electronic document 210. In this example, it is assumed that the content of one input image 200 corresponds to one page of the output electronic document 210. In step S 1304, the electronic document description generation unit 205 sets the root node in the area data 207 as the attention node that is the first processing target. S1305 is a step of calling a processing function described in the flowchart of FIG. In the function processing in S1305, the electronic document description generation unit 205 performs output processing of the area data description related to the node of interest, and recursively calls the same processing function so that the data output processing is sequentially performed for the child area nodes and below. do.

この処理関数としてのＳ１３０５による処理内容を、図１４のフローチャートを用いて説明する。Ｓ１４０１では、電子文書記述生成部２０５は、現在の出力モードが表形式であるか否かを調べる。出力モードはＳ１３０１で設定されたものである。表形式の場合は（Ｓ１４０１にてＹＥＳ）、Ｓ１４０６に進み、それ以外の場合は（Ｓ１４０１にてＮＯ）、Ｓ１４０２に進む。 The processing contents in S1305 as this processing function will be described with reference to the flowchart of FIG. In step S1401, the electronic document description generation unit 205 checks whether the current output mode is a table format. The output mode is set in S1301. In the case of a table format (YES in S1401), the process proceeds to S1406. In other cases (NO in S1401), the process proceeds to S1402.

Ｓ１４０２では、電子文書記述生成部２０５は、本関数の呼び出し時に、処理対象として指定された注目ノード（以下、指定注目ノードと記す）に対応する領域に、出力対象となるデータがあるか否かを調べる。データが有る場合は（Ｓ１４０２にてＹＥＳ）、Ｓ１４０３に進み、データが無い場合は（Ｓ１４０２にてＮＯ）、Ｓ１４０３の処理をスキップしてＳ１４０４に進む。 In step S1402, the electronic document description generation unit 205 determines whether there is data to be output in an area corresponding to a target node specified as a processing target (hereinafter referred to as a specified target node) when this function is called. Check out. If there is data (YES in S1402), the process proceeds to S1403. If there is no data (NO in S1402), the process of S1403 is skipped and the process proceeds to S1404.

ここで出力対象となるデータは、領域の種別毎に、図１６のような定義テーブルによって予め指定されているものとする。例えば、図１６に示す定義１６０１のテーブルが指定されている場合、指定注目ノードが“ルートノード”の場合は、関連付けられた背景画像データが出力対象となる。指定注目ノードが“文字領域ノード”の場合は、関連付けられた文字データが出力対象となる。以降同様に、“線画領域ノード”に対してはベクトルデータが出力対象となり、“写真領域ノード”に対しては切り出し画像データが出力対象となる。“表領域”と“フラット領域”のノードに対しては、出力対象データ無しと判定される。なお、定義テーブルに定められた出力対象が、指定注目ノードに関連づけられ存在していない場合も、出力データ無しと判定される。 Here, it is assumed that the data to be output is specified in advance by the definition table as shown in FIG. 16 for each type of area. For example, when the table of the definition 1601 shown in FIG. 16 is designated and the designated attention node is “root node”, the associated background image data is the output target. When the designated attention node is a “character area node”, the associated character data is an output target. In the same manner, vector data is an output target for “line drawing area node”, and cut-out image data is an output target for “photo area node”. It is determined that there is no output target data for the nodes of “table area” and “flat area”. Note that it is determined that there is no output data even when the output target defined in the definition table does not exist in association with the designated attention node.

Ｓ１４０３では、電子文書記述生成部２０５は、Ｓ１４０２で出力対象とされたデータを、必要に応じて出力電子文書２１０の記述フォーマットへと変換したうえで、出力バッファへと出力する。 In step S1403, the electronic document description generation unit 205 converts the data to be output in step S1402 into the description format of the output electronic document 210 as necessary, and outputs the converted data to the output buffer.

Ｓ１４０４では、電子文書記述生成部２０５は、指定注目ノードが子領域ノードを持つか否かを調べる。子領域ノードを持つ場合は（Ｓ１４０４にてＹＥＳ）Ｓ１４０５に進み、持たない場合は（Ｓ１４０４にてＮＯ）、現在の指定注目ノードに対する本処理を終了する。 In step S 1404, the electronic document description generation unit 205 checks whether the designated node of interest has a child area node. If there is a child area node (YES in S1404), the process proceeds to S1405. If not (NO in S1404), the process for the current designated node of interest ends.

Ｓ１４０５では、電子文書記述生成部２０５は、注目ノードが持つ子領域の各々を新たな注目ノードとして、それぞれに対する関数Ｓ１３０５の呼び出しを行う。すなわち、電子文書記述生成部２０５は、各子領域に対し、それらの子孫領域への再帰処理がなされるようなループ処理を行う。このループ処理が終了すると、指定注目ノード、およびその子領域ノードに対する再帰処理は終了したものとして、本処理を終了する。 In step S1405, the electronic document description generation unit 205 calls the function S1305 for each child area of the node of interest as a new node of attention. That is, the electronic document description generation unit 205 performs a loop process for each child area so that recursion to the descendant area is performed. When this loop process ends, the recursive process for the designated node of interest and its child area node is completed, and this process ends.

一方、Ｓ１４０１で出力モードが表形式であると判定された場合（Ｓ１４０１にてＹＥＳ）、Ｓ１４０６に進む。そして、電子文書記述生成部２０５は、さらに指定注目ノードが表領域であるか否かを判定する。指定注目ノードが表領域である場合は（Ｓ１４０６にてＹＥＳ）Ｓ１４０８に進み、それ以外である場合は（Ｓ１４０６にてＮＯ）、Ｓ１４０７に進む。 On the other hand, if it is determined in S1401 that the output mode is tabular (YES in S1401), the process proceeds to S1406. Then, the electronic document description generation unit 205 further determines whether or not the designated attention node is a table area. If the designated node of interest is a table area (YES in S1406), the process proceeds to S1408; otherwise (NO in S1406), the process proceeds to S1407.

Ｓ１４０８に進んだ場合、すなわち指定注目ノードが表領域であった場合、電子文書記述生成部２０５は、表形式を開始する記述データを出力バッファに出力する。Ｓ１４０９にて、電子文書記述生成部２０５は、前述のＳ７１０により表領域ノードへ関連づけされたセル要素リストを対象に、その各セル要素を指定注目ノードとして、それぞれ関数Ｓ１３０５の呼び出しを行うことでループ処理を行う。電子文書記述生成部２０５は、すべてのセル要素に対して関数Ｓ１３０５の処理を行うと、ループ処理を終了し、Ｓ１４１０に進む。Ｓ１４１０では、表形式を終了する記述データを出力バッファに出力し、指定の表領域ノードを対象にした本処理を終了する。 If the process proceeds to S1408, that is, if the designated target node is a table area, the electronic document description generation unit 205 outputs description data for starting the table format to the output buffer. In step S1409, the electronic document description generation unit 205 calls the function S1305 with respect to the cell element list associated with the table area node in the above-described step S710, using each cell element as the designated attention node, thereby looping. Process. When the electronic document description generation unit 205 performs the process of function S1305 for all cell elements, the electronic document description generation unit 205 ends the loop process and proceeds to S1410. In S1410, description data for ending the table format is output to the output buffer, and this processing for the specified table area node is ended.

一方、Ｓ１４０６からＳ１４０７に進んだ場合、Ｓ１４０７にて更に、電子文書記述生成部２０５は、指定注目ノードがセル要素であるか否かを調べる。セル要素では無い場合（Ｓ１４０７にてＮＯ）、すなわち指定注目ノードが、表領域以外の領域ツリーのノードであった場合は、Ｓ１４０２に進む。そして、電子文書記述生成部２０５は、上述したＳ１４０２以降と同処理を行う。一方、指定注目ノードがセル要素の場合、すなわち、現処理が前述のＳ１４０９のループ処理内部での関数Ｓ１３０５の呼び出しの処理である場合（Ｓ１４０７にてＹＥＳ）、Ｓ１４１１に進む。 On the other hand, when the process proceeds from S1406 to S1407, in S1407, the electronic document description generation unit 205 further checks whether or not the designated attention node is a cell element. If it is not a cell element (NO in S1407), that is, if the designated target node is a node of an area tree other than the table area, the process proceeds to S1402. Then, the electronic document description generation unit 205 performs the same processing as that after S1402 described above. On the other hand, if the designated node of interest is a cell element, that is, if the current process is a process of calling the function S1305 in the loop process of S1409 described above (YES in S1407), the process proceeds to S1411.

Ｓ１４１１では、電子文書記述生成部２０５は、セル記述を開始するデータを出力バッファに出力する。Ｓ１４１２では、電子文書記述生成部２０５は、注目ノードに相当するセル要素が、セル内容となる領域ノードへのリンクを持っているか否かを調べる。リンクを持っている場合は（Ｓ１４１２にてＹＥＳ）、Ｓ１４１３に進む。リンクを持っていない、すなわちリンクのリストサイズが０の場合は（Ｓ１４１２にてＮＯ）、Ｓ１４１４に進む。 In step S1411, the electronic document description generation unit 205 outputs data for starting cell description to the output buffer. In step S1412, the electronic document description generation unit 205 checks whether the cell element corresponding to the node of interest has a link to the area node that is the cell content. If there is a link (YES in S1412), the process proceeds to S1413. If there is no link, that is, the list size of the link is 0 (NO in S1412), the process proceeds to S1414.

Ｓ１４１３では、電子文書記述生成部２０５は、セル内容のリンクリストが保持しているリンク先の領域ノードそれぞれを指定注目ノードとして、Ｓ１３０５の関数処理呼び出しを行うことでループ処理を行う。電子文書記述生成部２０５は、すべてのリンク先の領域ノードに対し、Ｓ１３０５の処理を行った後、ループ処理を終了してＳ１４１４に進む。 In step S 1413, the electronic document description generation unit 205 performs loop processing by calling the function processing in step S 1305 with each link destination area node held in the cell content link list as the designated attention node. The electronic document description generation unit 205 performs the processing of S1305 for all link destination area nodes, and then ends the loop processing and proceeds to S1414.

Ｓ１４１４にて、電子文書記述生成部２０５は、セル記述を終了するデータを出力バッファに出力し、現在の指定注目ノード、すなわち表領域に関連づけられたセル要素に対する、本処理を終了する。 In step S1414, the electronic document description generation unit 205 outputs the data for ending the cell description to the output buffer, and ends this processing for the cell element associated with the current designated node of interest, that is, the table area.

上述した複数箇所で説明されている、関数Ｓ１３０５の処理終了後の動作について補足する。図１３のＳ１３０４の直後に、ルートノードを指定注目ノードとして、関数Ｓ１３０５が呼び出されている場合、電子文書記述生成部２０５は、Ｓ１３０６を実行する。一方、前述のループ処理Ｓ１４０５において、ある子領域ノードを指定注目ノードとして、関数Ｓ１３０５が呼び出されている場合は、電子文書記述生成部２０５は、次の子領域を指定注目ノードとして関数Ｓ１３０５を呼び出すようループ処理を続行する。セル要素に対するループ処理Ｓ１４０９において、セル要素を注目ノードとして、関数Ｓ１３０５が呼び出されている場合には、電子文書記述生成部２０５は、次のセル要素を注目ノードとするようループ処理を続行する。セル内容としてリンクされる領域ノードに対するループ処理Ｓ１４１３にて、関数Ｓ１３０５が呼び出されている場合は、電子文書記述生成部２０５は、リンクリストの次の領域ノードを注目ノードとして関数Ｓ１３０５を呼び出すようループ処理を続行する。 It supplements about operation | movement after the completion | finish of a process of function S1305 demonstrated by several places mentioned above. If the function S1305 is called immediately after S1304 in FIG. 13 with the root node as the designated attention node, the electronic document description generation unit 205 executes S1306. On the other hand, in the above-described loop processing S1405, when the function S1305 is called with a certain child area node as the designated attention node, the electronic document description generation unit 205 calls the function S1305 with the next child area as the designated attention node. Continue loop processing. In the loop processing S1409 for the cell element, when the function S1305 is called with the cell element as the target node, the electronic document description generation unit 205 continues the loop processing so that the next cell element is the target node. When the function S1305 is called in the loop processing S1413 for the area node linked as the cell contents, the electronic document description generation unit 205 loops to call the function S1305 with the next area node in the link list as the attention node. continue processing.

図１３に戻り、以降の処理について説明する。Ｓ１３０６では、電子文書記述生成部２０５は、ページの終端データを出力する。なお、Ｓ１３０５の再帰処理が終了した時点で、領域データ２０７中に含まれるノードにあり、１ページの内容として出力すべきデータの出力は完了している。 Returning to FIG. 13, the subsequent processing will be described. In step S1306, the electronic document description generation unit 205 outputs page end data. At the time when the recursion process of S1305 is completed, the output of the data to be output as the contents of one page in the node included in the area data 207 has been completed.

Ｓ１３０７では、電子文書記述生成部２０５は、追加のページがあるか否かを調べる。追加のページは、図２の電子文書記述生成処理が複数ページの出力電子文書２１０を出力するよう動作している場合に、追加の画像が入力された際に発生する。追加のページがある場合は（Ｓ１３０７にてＹＥＳ）、電子文書記述生成部２０５は、Ｓ１３０３に戻って以下の処理を繰り返す。追加のページが無い、すなわちこれ以上画像が入力されない場合は（Ｓ１３０７にてＮＯ）、Ｓ１３０８へ進む。 In step S1307, the electronic document description generation unit 205 checks whether there is an additional page. The additional page is generated when an additional image is input when the electronic document description generation process of FIG. 2 is operated to output a multi-page output electronic document 210. If there is an additional page (YES in S1307), the electronic document description generation unit 205 returns to S1303 and repeats the following processing. If there is no additional page, that is, no more images are input (NO in S1307), the process proceeds to S1308.

Ｓ１３０８では、電子文書記述生成部２０５は、電子文書の終端データを出力する。本データの追加により完結した電子文書データが出力バッファ上に完成する。最後にＳ１３０９では、電子文書記述生成部２０５は、出力バッファ上の電子文書データを出力電子文書２１０として、利用者により予め指定されたＰＣ等の宛先に送信し、電子文書生成処理を終了する。 In step S1308, the electronic document description generation unit 205 outputs end data of the electronic document. Completed electronic document data is completed on the output buffer by adding this data. Finally, in step S1309, the electronic document description generation unit 205 transmits the electronic document data on the output buffer as an output electronic document 210 to a destination such as a PC specified in advance by the user, and ends the electronic document generation processing.

なお、上記では、十分なサイズの出力バッファに出力電子文書２１０の全体を書き出すよう説明したが、もっと小さい出力バッファサイズで処理できるようにしてもよい。例えば、各ページの終端データを出力した時点で出力バッファの内容を指定送信先へと送信し、次ページの内容はクリアされた出力バッファの先頭から書き出すようにしてもよい。あるいはもっと小さい単位で出力バッファへの書き出しと送信、クリアを繰り返すようにしてもよい。 In the above description, the entire output electronic document 210 is written in a sufficiently sized output buffer. However, the output electronic document 210 may be processed with a smaller output buffer size. For example, the contents of the output buffer may be transmitted to the designated transmission destination when the end data of each page is output, and the contents of the next page may be written from the head of the cleared output buffer. Alternatively, writing to the output buffer, transmission, and clearing may be repeated in smaller units.

（電子文書記述生成部による処理結果の例（その１））
図１３および図１４の具体的な処理内容の実例を示すために、図９（ｃ）および図９（ｄ）の領域データを入力とした場合の動作例、および出力例を以下に説明する。以下では、電子文書生成処理の制御パラメータとして、図１３のＳ１３０１にて指定される出力モードと、図１４のＳ１４０２で参照される出力対象の定義テーブルの組み合わせのケースを２通り設定し、それぞれのケースでの動作と出力例を説明する。 (Example of processing result by electronic document description generation unit (part 1))
In order to show actual examples of the processing contents of FIGS. 13 and 14, an operation example and an output example when the area data of FIGS. 9C and 9D are input will be described below. Below, as the control parameters for the electronic document generation process, two combinations of the output mode specified in S1301 of FIG. 13 and the output target definition table referred to in S1402 of FIG. 14 are set. The operation and output example in the case will be described.

まず、電子文書生成処理の制御パラメータとして、出力モードに“表形式”が指定され、出力対象の定義テーブルに、図１６の定義１６０１が指定されているケースの出力例を、図１７（ａ）の出力１７００に示す。出力１７００は、本説明の為の仮想の電子文書フォーマットに基づいた電子文書の例である。この出力１７００は、図１３、図１４中のデータ出力を行う各処理ステップから出力されたデータの記述１７０１〜１７１８によって構成されている。以下、出力１７００の生成過程を図１３および図１４の処理内容に沿って説明する。 First, as a control parameter of the electronic document generation process, an output example in the case where “table format” is specified as the output mode and the definition 1601 of FIG. 16 is specified in the definition table to be output is shown in FIG. The output 1700 of FIG. An output 1700 is an example of an electronic document based on a virtual electronic document format for the present description. This output 1700 is composed of data descriptions 1701 to 1718 output from each processing step for outputting data in FIGS. Hereinafter, the generation process of the output 1700 will be described along the processing contents of FIGS. 13 and 14.

図１３フローチャートのＳ１３０１の処理にて、本ケースの前提にとなる“表形式”の出力モードが設定される。続くＳ１３０２、Ｓ１３０３にて、電子文書記述生成部２０５は、電子文書の開始データ、およびページの開始データとして、それぞれ記述１７０１、１７０２を出力する。ここで、ページの開始データを示す記述１７０２は、ページサイズの情報である幅２４８０、高さ３５２０を含んでいる。 In the process of S1301 in the flowchart of FIG. 13, the “table format” output mode, which is the premise of this case, is set. In subsequent S1302 and S1303, the electronic document description generation unit 205 outputs descriptions 1701 and 1702 as the start data of the electronic document and the start data of the page, respectively. Here, the description 1702 indicating the start data of the page includes a width 2480 and a height 3520 which are page size information.

Ｓ１３０４で、電子文書記述生成部２０５は、図９（ｃ）の領域データにおけるルートノードを注目ノードに設定し、関数Ｓ１３０５を呼び出す。関数Ｓ１３０５の処理内容として、図１４に移り、電子文書記述生成部２０５は、Ｓ１４０１にて判定を行う。ここでは、出力モードは表形式なのでＳ１４０６に進む。そして、注目ノードは、ルートノードなので、Ｓ１４０７を経てＳ１４０２に進む。 In step S1304, the electronic document description generation unit 205 sets the root node in the area data in FIG. 9C as the node of interest, and calls the function S1305. As processing contents of the function S1305, the processing shifts to FIG. 14, and the electronic document description generation unit 205 determines in S1401. Here, since the output mode is a table format, the process proceeds to S1406. Since the node of interest is the root node, the process proceeds to S1402 via S1407.

Ｓ１４０２では、電子文書記述生成部２０５は、図１６の定義１６０１を参照する。そして、ルートノードには出力対象となる背景画像データがあるので、Ｓ１４０３に進む。Ｓ１４０３では、電子文書記述生成部２０５は、背景画像データを出力電子文書のフォーマットに変換した記述１７０３を出力する。この記述１７０３は、インライン参照される圧縮画像データを、画像オブジェクトとして電子文書ページの指定範囲にイメージ描画する記述である。 In step S1402, the electronic document description generation unit 205 refers to the definition 1601 in FIG. Since there is background image data to be output in the root node, the process proceeds to S1403. In step S1403, the electronic document description generation unit 205 outputs a description 1703 obtained by converting the background image data into the output electronic document format. This description 1703 is a description for drawing an image of compressed image data referred to in-line as an image object in a designated range of an electronic document page.

Ｓ１４０４に進み、ルートノードには子領域ノード９２０があるのでＳ１４０５に進む。ループ処理Ｓ１４０５内では、電子文書記述生成部２０５は、フラット領域である子領域ノード９２０に対して関数Ｓ１３０５を呼び出し、同領域を注目ノードとしたＳ１４０１以降の処理を行う。 The process advances to step S1404, and the root node includes a child area node 920, and thus the process advances to step S1405. In the loop process S1405, the electronic document description generation unit 205 calls the function S1305 for the child area node 920 that is a flat area, and performs the processes after S1401 with the same area as the node of interest.

上記同様に、Ｓ１４０１、Ｓ１４０６、Ｓ１４０７、Ｓ１４０２と進む。Ｓ１４０２では図１６の定義１６０１により、フラット領域の出力対象は無いので、Ｓ１４０４に進む。そして、電子文書記述生成部２０５は、子領域ノード９２１，９３０，９２９，９２４のそれぞれに対し、ループ処理Ｓ１４０５内で関数Ｓ１３０５を呼び出す。 Similarly to the above, the process proceeds to S1401, S1406, S1407, and S1402. In S1402, there is no flat area output target according to the definition 1601 in FIG. 16, and the process advances to S1404. Then, the electronic document description generation unit 205 calls the function S1305 in the loop processing S1405 for each of the child area nodes 921, 930, 929, and 924.

最初の３つの子領域ノードに対する処理フローは上述の子領域ノード９２０の場合と同等なので、詳細な説明は省略するが、子領域ノード９２１に対しては文字データが出力対象となり記述１７０４が出力される。記述１７０４は電子文書ページの指定範囲、サイズ、色で文字列を文字オブジェクトとして配置する記述である。子領域ノード９３０に対しては切り出し画像データが出力対象となり記述１７０５が出力される。記述１７０５は記述１７０３同様の画像オブジェクトを出力するイメージ描画の記述である。子領域ノード９２９に対してはベクトルデータを出力する記述１７０６が出力される。記述１７０６は、パスオブジェクトとして連続する直線・曲線パスを描画し塗色するベクトル描画記述である。なお、これらのノードは子領域を持たないので、関数Ｓ１３０５はＳ１４０４以降の処理は行われずに終了する。 Since the processing flow for the first three child area nodes is the same as that of the child area node 920 described above, detailed description is omitted, but character data is output to the child area node 921 and a description 1704 is output. The A description 1704 is a description in which a character string is arranged as a character object with a designated range, size, and color of the electronic document page. For the child area node 930, the cut-out image data is output and a description 1705 is output. A description 1705 is an image drawing description for outputting an image object similar to the description 1703. A description 1706 for outputting vector data is output to the child area node 929. A description 1706 is a vector drawing description for drawing and painting a continuous straight line / curve path as a path object. Since these nodes do not have child areas, the function S1305 ends without performing the processes after S1404.

一方、最後の子領域である子領域ノード９２４に対しては、表領域であるのでＳ１４０６からＳ１４０８に進み、電子文書記述生成部２０５は、表形式オブジェクトの開始データである記述１７０７を出力する。記述１７０７には、表形式オブジェクトのパラメータとして、表領域の範囲座標、表の構造解析によって得られた行数と列数、そして行列構造の境界位置の座標リストが含まれる。 On the other hand, for the child area node 924 that is the last child area, since it is a table area, the process advances from S1406 to S1408, and the electronic document description generation unit 205 outputs a description 1707 that is start data of the tabular object. The description 1707 includes, as parameters of the tabular object, range coordinates of the table area, the number of rows and columns obtained by the table structure analysis, and a coordinate list of the boundary positions of the matrix structure.

続くＳ１４０９は、表領域に関連づけられたセル要素、すなわち図９（ｄ）のセル要素９４１、９４２、９４３に対しそれぞれ関数Ｓ１３０５を呼び出すループ処理である。セル要素９４１に対する処理では、電子文書記述生成部２０５は、Ｓ１４０１、Ｓ１４０６、Ｓ１４０７からＳ１４１１に進み、セルの開始データとなる記述１７０８を出力する。このセル開始記述は、同セルの構造情報である論理座標“１，１”を含む。 The subsequent S1409 is a loop process that calls the function S1305 for each cell element associated with the table area, that is, the cell elements 941, 942, and 943 in FIG. In the process for the cell element 941, the electronic document description generation unit 205 proceeds from S1401, S1406, and S1407 to S1411, and outputs a description 1708 as cell start data. This cell start description includes logical coordinates “1, 1” which is the structure information of the cell.

Ｓ１４１２にて、セル要素９４１は、セル内容としての子領域ノード９２２へのリンクを有しているので、ループ処理Ｓ１４１３に進む。そして、電子文書記述生成部２０５は、子領域ノード９２２への関数Ｓ１３０５呼び出しを行う。同呼び出し処理のフローは前述の文字領域ノードを注目ノードとした場合のフローと同様なので省略するが、結果として文字領域である子領域ノード９２２に関連づけられた文字データによる、文字オブジェクトの記述１７０９が出力される。 In S1412, since the cell element 941 has a link to the child area node 922 as the cell content, the process proceeds to the loop process S1413. Then, the electronic document description generation unit 205 calls the function S1305 to the child area node 922. Since the flow of the calling process is the same as that in the case where the above-described character area node is the target node, the description is omitted, but as a result, the description 1709 of the character object by the character data associated with the child area node 922 that is the character area is Is output.

ループが終了するとＳ１４１４に進み、電子文書記述生成部２０５は、セルの終了データである記述１７１０を出力する。 When the loop ends, the process advances to step S1414, and the electronic document description generation unit 205 outputs a description 1710 that is cell end data.

その後、セル要素９４１に対する関数Ｓ１３０５の処理が終了したので、電子文書記述生成部２０５は、ループ処理Ｓ１４０９により次のセル要素９４２に対する関数Ｓ１３０５の呼び出し処理を行う。同様に、電子文書記述生成部２０５は、結果として論理座標“２，１”のセル開始記述１７１１、セル内容としての文字領域である子領域ノード９２３に関連づけられた文字データの記述１７１２、セル終了記述１７１３を出力する。 After that, since the process of the function S1305 for the cell element 941 is completed, the electronic document description generation unit 205 performs a call process of the function S1305 for the next cell element 942 by the loop process S1409. Similarly, the electronic document description generation unit 205 results in a cell start description 1711 of logical coordinates “2, 1”, a character data description 1712 associated with a child area node 923 that is a character area as the cell contents, and a cell end. A description 1713 is output.

最後のセル要素９４３に対して、電子文書記述生成部２０５は、論理座標の開始位置“１，２”、終了位置“２，２”のセル開始記述１７１４を出力した後、このセルに対しては、セル内容が無いため、続けてセル終了記述１７１５を出力する。 For the last cell element 943, the electronic document description generation unit 205 outputs the cell start description 1714 with the start position “1, 2” and the end position “2, 2” of the logical coordinates, Since there is no cell content, the cell end description 1715 is continuously output.

電子文書記述生成部２０５は、全セル要素を処理した後、Ｓ１４０９のループを終了し、Ｓ１４１０にて電子文書記述生成部２０５は、表形式の終了データである記述１７１６を出力する。この関数Ｓ１３０５の終了により、子領域ノード９２０に対する全ての子領域ノードへの処理が終了する為、電子文書記述生成部２０５は、図１３に戻りＳ１３０６に進む。 After processing all cell elements, the electronic document description generation unit 205 ends the loop of S1409, and in S1410, the electronic document description generation unit 205 outputs a description 1716, which is tabular end data. When the function S1305 ends, the processing for all child area nodes for the child area node 920 ends, so the electronic document description generation unit 205 returns to FIG. 13 and proceeds to S1306.

Ｓ１３０６では、電子文書記述生成部２０５は、ページの終端データとして記述１７１７を出力する。ここでは追加のページは無いためＳ１３０８に進み、電子文書記述生成部２０５は、電子文書の終端データである記述１７１８を出力する。 In step S1306, the electronic document description generation unit 205 outputs a description 1717 as page end data. Since there is no additional page here, the process advances to step S1308, and the electronic document description generation unit 205 outputs a description 1718, which is end data of the electronic document.

以上により完成した電子文書の出力１７００がＳ１３０９で指定送信先へと送られることになる。 The output 1700 of the completed electronic document is sent to the designated transmission destination in step S1309.

（電子文書記述生成部による処理結果の例（その２））
次に、電子文書生成処理の制御パラメータとして、出力モードに“通常出力モード”が指定され、出力対象の定義テーブルとして、図１６の定義１６０２が指定されているケースの出力例を、図１７（ｂ）の出力１７５０に示す。出力１７５０も、本説明の為の仮想の電子文書フォーマットに基づいた電子文書の例である。出力１７５０は、図１３、図１４中のデータ出力を行う各処理ステップから出力されたデータの記述１７５１〜１７６５によって構成されている。以下、出力１７５０の生成過程を図１３および図１４の処理内容に沿って説明する。 (Example of processing result by electronic document description generation unit (part 2))
Next, an output example of the case where “normal output mode” is specified as the output mode as the control parameter of the electronic document generation process and the definition 1602 of FIG. 16 is specified as the definition table of the output target is shown in FIG. This is shown in the output 1750 of b). An output 1750 is also an example of an electronic document based on a virtual electronic document format for the present description. The output 1750 includes data descriptions 1751 to 1765 output from each processing step for outputting data in FIGS. Hereinafter, the generation process of the output 1750 will be described along the processing contents of FIGS. 13 and 14.

図１３に示すフローチャートのＳ１３０１の処理にて、本ケースの前提により“通常出力モード”が設定される。続くＳ１３０２で、電子文書記述生成部２０５は、電子文書の開始データの記述１７５１を出力する。この記述１７５１は出力モードが表形式ではないことに対応して、出力１７００とは別形式の電子文書フォーマットを宣言するものである。ただし、本例では説明を簡略にするため、ページ内容を構成するオブジェクト記述のフォーマットは出力１７００と同様であるとする。 In the process of S1301 in the flowchart shown in FIG. 13, the “normal output mode” is set based on the premise of this case. In step S1302, the electronic document description generation unit 205 outputs a description 1751 of the start data of the electronic document. This description 1751 declares an electronic document format that is different from the output 1700 in response to the output mode not being a table format. However, in this example, in order to simplify the description, it is assumed that the format of the object description constituting the page content is the same as that of the output 1700.

Ｓ１３０３にて、電子文書記述生成部２０５は、ページサイズの情報を含むページの開始データとして記述１７５２を出力する。Ｓ１３０４で、電子文書記述生成部２０５は、図９（ｃ）領域データのルートノードを注目ノードに設定し、関数Ｓ１３０５を呼び出す。 In step S1303, the electronic document description generation unit 205 outputs a description 1752 as page start data including page size information. In step S1304, the electronic document description generation unit 205 sets the root node of the area data in FIG. 9C as the target node, and calls the function S1305.

図１４に移り、Ｓ１４０１で、出力モードは表形式では無いのでＳ１４０２に進む。Ｓ１４０２では電子文書記述生成部２０５は、図１６の定義１６０１を参照する。ここでは、ルートノードには出力対象となる背景画像データがあるのでＳ１４０３に進む。 Turning to FIG. 14, in S1401, the output mode is not a table format, and thus the process proceeds to S1402. In step S1402, the electronic document description generation unit 205 refers to the definition 1601 in FIG. Here, since there is background image data to be output in the root node, the process proceeds to S1403.

Ｓ１４０３では、電子文書記述生成部２０５は、背景画像データを画像オブジェクトとして配置する記述１７５３を記述する。Ｓ１４０４に進み、ルートノードには子領域ノード９２０があるのでＳ１４０５に進む。 In step S1403, the electronic document description generation unit 205 describes a description 1753 that arranges background image data as an image object. The process advances to step S1404, and the root node includes a child area node 920, and thus the process advances to step S1405.

電子文書記述生成部２０５は、ループ処理のＳ１４０５内では、フラット領域である子領域ノード９２０に対して関数Ｓ１３０５を呼び出し、同領域を注目ノードとしたＳ１４０１以降の処理を行う。ここで、出力モードが“通常モード”の場合、図１４のＳ１４０６側に処理が進むことはない。よって、電子文書記述生成部２０５は、領域データのツリー順に、各領域ノードに対する関数Ｓ１３０５の処理を共通のフローで行う。その結果、図１６の定義テーブルに従って対象領域のオブジェクトを記述するデータが順に出力されることになる。 In S1405 of the loop process, the electronic document description generation unit 205 calls the function S1305 for the child area node 920 that is a flat area, and performs the processes after S1401 with the same area as the target node. Here, when the output mode is the “normal mode”, the processing does not proceed to S1406 in FIG. Therefore, the electronic document description generation unit 205 performs the processing of the function S1305 for each region node in the common flow in the tree order of the region data. As a result, data describing the object in the target area is sequentially output according to the definition table of FIG.

以下、ツリー順に処理される順番で出力内容を説明すると、フラット領域である子領域ノード９２０に対しては、関連づけられたデータが無いので出力は行われない。文字領域である子領域ノード９２１に対しては、文字領域である画素塊９０１の文字線をベクトル化したパスオブジェクトのデータとして記述１７５４が出力される。自然画領域である子領域ノード９３０に対しては、切り出し画像のオブジェクトを配置するデータとして記述１７５５が出力される。線画領域である子領域ノード９２９に対しては、線画のベクトルデータをパスオブジェクトとして描画する記述１７５６が出力される。 Hereinafter, the output contents will be described in the order of processing in the tree order. The child area node 920 that is a flat area is not output because there is no associated data. For the child area node 921 that is a character area, a description 1754 is output as path object data obtained by vectorizing the character line of the pixel block 901 that is the character area. A description 1755 is output as data for arranging the object of the cut-out image to the child area node 930 which is a natural image area. A description 1756 for drawing line drawing vector data as a path object is output to the child area node 929 which is a line drawing area.

表領域である子領域ノード９２４に対しては、表枠のベクトルデータをパスオブジェクトとして記述する記述１７５７が出力される。フラット領域である子領域ノード９２５に対しては、フラット領域のベクトルデータをパスオブジェクトとして描画する記述１７５８が出力される。文字領域である子領域ノード９２２に対しては、文字領域である画素塊９０２のベクトル化データをパスオブジェクトとして描画する記述１７５９が出力される。以下同様に、ノード９２６、９２３、９２７、９２８に対し記述１７６０、１７６１、１７６２、１７６３が出力される。 A description 1757 describing the vector data of the table frame as a path object is output to the child region node 924 which is a table region. A description 1758 for rendering vector data of the flat area as a path object is output to the child area node 925 which is a flat area. A description 1759 for rendering vectorized data of the pixel block 902 that is a character area as a path object is output to the child area node 922 that is a character area. Similarly, descriptions 1760, 1761, 1762, and 1763 are output to the nodes 926, 923, 927, and 928.

電子文書記述生成部２０５は、全領域ノードの出力処理が終了すると、図１３に戻ってＳ１３０６ではページの終端データとして記述１７６４を出力する。ここでは追加のページは無いためＳ１３０８に進み、電子文書記述生成部２０５は、電子文書の終端データである記述１７６５を出力する。以上により完成した電子文書の出力１７５０がＳ１３０９で指定送信先へと送られることになる。 When the output processing of all the area nodes is completed, the electronic document description generation unit 205 returns to FIG. 13 and outputs a description 1764 as page end data in S1306. Since there is no additional page here, the process advances to step S1308, and the electronic document description generation unit 205 outputs a description 1765 that is end data of the electronic document. The output 1750 of the completed electronic document is sent to the designated transmission destination in step S1309.

（ＰＣにおける画面の例）
次に、本発明の画像処理装置１００が実施する電子文書生成処理によって生成された出力電子文書２１０を受けとったＰＣの動作例を図１８に示す。 (Example of screen on PC)
Next, FIG. 18 shows an operation example of the PC that has received the output electronic document 210 generated by the electronic document generation processing performed by the image processing apparatus 100 of the present invention.

図１８（ａ）は、図１７の出力１７００に示される、表形式の記述を含む出力電子文書２１０を利用可能なアプリケーション１８０１の動作画面の例である。アプリケーション１８０１はＰＣ１２０上で動作するＧＵＩベースのウィンドウプログラムであり、ユーザのキーボード入力やマウス操作によって文字入力や図形入力を受けつける。そして、紙面レイアウトと同等のページ表示を行いながら、その編集が可能な電子文書作成環境を提供する、いわゆるワードプロセッサである。アプリケーション１８０１は編集後または途中の電子文書の状態を、専用もしくは汎用の電子文書ファイルとして保存することが可能である。図１７の出力１７００は、この電子文書のフォーマットで記述されているものとする。 FIG. 18A shows an example of an operation screen of the application 1801 that can use the output electronic document 210 including the description in the table format shown in the output 1700 of FIG. An application 1801 is a GUI-based window program that runs on the PC 120, and accepts character input and graphic input by user keyboard input and mouse operation. This is a so-called word processor that provides an electronic document creation environment in which editing is possible while performing page display equivalent to the paper layout. The application 1801 can save the state of the electronic document after editing or in the middle as a dedicated or general-purpose electronic document file. Assume that the output 1700 in FIG. 17 is described in this electronic document format.

図１８（ａ）のページ画面１８０２は、アプリケーション１８０１で出力１７００の電子文書を読み込み、同文書のページ内容として表示したページ画面の例である。ページ画面１８０２は、記述１７０２のサイズ指定に従って作成され、以降の記述によりページ内容のオブジェクトを順に描画した状態にある。 A page screen 1802 in FIG. 18A is an example of a page screen in which an electronic document output 1700 is read by the application 1801 and displayed as the page contents of the document. The page screen 1802 is created in accordance with the size designation of the description 1702, and is in a state where objects of page contents are sequentially drawn according to the subsequent description.

まず、記述１７０３によりページ全体を覆うように背景画像データが描画される。以降の記述によるグラフィックスデータ、画像データは、それぞれこの背景に対する前景オブジェクトとして重畳描画される。 First, background image data is drawn so as to cover the entire page by the description 1703. Graphics data and image data according to the following description are superimposed and drawn as foreground objects for the background.

文字部分１８０３は、図１７の記述１７０４により文字オブジェクトとして配置された文字列である。記述１７０４は、指定の座標範囲に、入力画像から抽出した文字画素塊色である黒色で、文字列“ＡＢＣ（改行）ＤＥＦ”を描画する記述である。なお、記述１７０４にフォントの指定は無いため、文字部分１８０３はアプリケーション１８０１のデフォルトフォントで文字列が描画されている。オブジェクト１８０４は記述１７０５に従って配置された切り出し画像のオブジェクトであり、オブジェクト１８０５は記述１７０６に従ってベクトル描画された線画オブジェクトである。 A character portion 1803 is a character string arranged as a character object according to the description 1704 in FIG. The description 1704 is a description for drawing the character string “ABC (line feed) DEF” in black, which is the character pixel block color extracted from the input image, in the designated coordinate range. Since no font is specified in the description 1704, a character string is drawn in the default font of the application 1801 in the character portion 1803. An object 1804 is a clipped image object arranged in accordance with the description 1705, and an object 1805 is a line drawing object that is vector-drawn in accordance with the description 1706.

オブジェクト１８０６は、記述１７０７〜１７１５に基づいて描画された表形式のオブジェクトであり、記述１７０７に指定されるように２行２列の行列構造で３つのセル領域を有するように描画されている。そして、上段２つのセル内部には、それぞれ記述１７０９および１７１２に基づき、それぞれ青色の文字列“ｗｘ”、赤色の文字列“ｙｚ”が配置されている。 An object 1806 is a tabular object drawn based on the descriptions 1707 to 1715, and is drawn so as to have three cell regions in a 2-by-2 matrix structure as specified in the description 1707. In the upper two cells, a blue character string “wx” and a red character string “yz” are arranged based on descriptions 1709 and 1712, respectively.

なお、表１８０６内部の表枠・文字以外の部分は何も描画しないように設定されており、背景画像の画素がそのまま透過して見えるようになっている。ここで背景画像は図１２（ｃ）の背景画像１２３０のように表セルの背景にあたるフラット領域の画素を残しているので、表のオブジェクト１８０６は入力画像と同様の背景色を持つように見える。 It should be noted that portions other than the table frame and characters inside the table 1806 are set so as not to draw anything, so that the pixels of the background image can be seen as they are. Here, since the background image leaves the pixels of the flat area corresponding to the background of the table cell like the background image 1230 of FIG. 12C, the table object 1806 seems to have the same background color as the input image.

前述のとおり、アプリケーション１８０１では、表示された出力電子文書内の各オブジェクトに対する移動、変形といった編集操作が可能である。特に本例では、文字部分１８０３や文字部分１８０７、１８０８は文字コード列になっているので、文字の追加、削除や、フォント、書体、サイズ、色の変更といった、文字に対する編集操作を自由に行うことが可能である。また、表部分であるオブジェクト１８０６は表形式で表現されているため、セルの追加、削除や変形、罫線種類の変更といった表の編集操作を自由に行うことが可能である。よって、入力画像に対し、文字や表の編集を含む再利用を行いたい場合には、アプリケーション１８０１で利用可能な電子文書に変換することが好適である。 As described above, the application 1801 can perform editing operations such as movement and deformation for each object in the displayed output electronic document. Particularly in this example, since the character portion 1803 and the character portions 1807 and 1808 are character code strings, character editing operations such as addition and deletion of characters and change of font, typeface, size, and color are freely performed. It is possible. Further, since the object 1806 which is a table portion is expressed in a table format, it is possible to freely perform table editing operations such as adding, deleting and transforming cells, and changing the ruled line type. Therefore, when it is desired to reuse the input image including editing of characters and tables, it is preferable to convert the input image into an electronic document that can be used by the application 1801.

なお、図１７（ａ）に示す形式はあくまで一例であって、他の電子文書形式やオブジェクト記述方式を用いてもよい。特に、表形式のオブジェクト記述である記述１７０７〜１７１６の部分には、図１７（ｃ）のように、行数分のセル行要素が列数分のセル要素を含むような記述を用いてもよい。また、表のセル内に文字以外の領域の記述が含まれてもよい。 Note that the format shown in FIG. 17A is merely an example, and other electronic document formats and object description methods may be used. In particular, the description 1707 to 1716 that is the object description in the table format may use a description in which cell row elements for the number of rows include cell elements for the number of columns as shown in FIG. Good. In addition, a description of an area other than characters may be included in the table cell.

一方、図１８（ｂ）は、図１７の出力１７５０に示される電子文書を利用可能なアプリケーション１８２１の動作画面の例である。アプリケーション１８２１は１８０１同様に、ＰＣ１２０上で動作するＧＵＩベースのウィンドウプログラムであり、ユーザが主にマウス操作による線分描画等で作図を行う、いわゆるドローイングツール等と呼ばれるものである。アプリケーション１８２１は、アプリケーション１８０１同様、編集後または途中の電子文書の状態を、専用もしくは汎用の電子文書ファイルとして保存することが可能である。図１７の出力１７５０は、この電子文書のフォーマットで記述されているものとする。 On the other hand, FIG. 18B is an example of an operation screen of the application 1821 that can use the electronic document shown in the output 1750 of FIG. Similarly to 1801, the application 1821 is a GUI-based window program that runs on the PC 120. The application 1821 is called a so-called drawing tool or the like in which the user draws a drawing mainly by drawing a line segment by a mouse operation. Similar to the application 1801, the application 1821 can save the state of the electronic document after editing or in the middle as a dedicated or general-purpose electronic document file. Assume that the output 1750 in FIG. 17 is described in this electronic document format.

図１８（ｂ）のページ画面１８２２は、アプリケーション１８２１で出力１７５０の電子文書を読み込み、同文書のページ内容として表示したページ画面の例である。ページ画面１８２２は、記述１７５２のサイズ指定に従って作成され、以降、記述１７５３〜１７６３に基づいて、ベクトルデータもしくは画像データが順に描画されていく。 A page screen 1822 in FIG. 18B is an example of a page screen in which an electronic document output 1750 is read by the application 1821 and displayed as the page content of the document. The page screen 1822 is created in accordance with the size designation of the description 1752, and thereafter, vector data or image data is sequentially drawn based on the descriptions 1753 to 1763.

これらの記述は、図１３および図１４の処理フローによって、領域データ２０７の領域ツリー順に出力されていることに特徴がある。図９（ｃ）の領域ツリーの例の場合、子領域ノード９２０、９２１、９３０、９２９、９２４、９２５、９２２、９２６、９２３、９２７、９２８の順番である。そして、それぞれ出力データが記述１７５３、１７５４、１７５５、１７５６、１７５７、１７５８、１７５９、１７６０、１７６１、１７６２、１７６３の順番で出力されている。 These descriptions are characterized in that they are output in the area tree order of the area data 207 by the processing flow of FIGS. 13 and 14. In the example of the area tree in FIG. 9C, the order is child area nodes 920, 921, 930, 929, 924, 925, 922, 926, 923, 927, and 928. The output data is output in the order of descriptions 1753, 1754, 1755, 1756, 1757, 1758, 1759, 1760, 1761, 1762, and 1763, respectively.

ここで、２つの記述の描画範囲が重なる場合、後の記述による描画オブジェクトによって、前の記述による描画オブジェクトは上書きされることになる。例として、記述１７５８および記述１７５９は、図９の表領域である画素塊９０４内部にある背景としての画素塊９０５とセル内の文字列である画素塊９０２の描画データである。これらは、図９（ｂ）の画素塊の包含関係から導かれる領域ノード間の親子関係により、セル背景のフラット領域である子領域ノード９２５のデータ、セル内の文字領域である子領域ノード９２２のデータの順に記述されている。ここで、フラット領域のベクトルデータは、文字のベクトルデータに重畳されることを前提に、図１２（ｂ）に示すベクトルデータ１２１３のように内輪郭を再現しないベクトルデータにすることができる。このことはベクトルデータの記述量を減らし、生成される電子文書をコンパクトにする効果がある。 Here, when the drawing ranges of the two descriptions overlap, the drawing object of the previous description is overwritten by the drawing object of the later description. For example, the description 1758 and the description 1759 are drawing data of a pixel block 905 as a background and a pixel block 902 that is a character string in a cell inside the pixel block 904 that is the table region of FIG. These are based on the parent-child relationship between the region nodes derived from the pixel block inclusion relationship in FIG. 9B, and the data of the child region node 925, which is a flat region of the cell background, and the child region node 922, which is the character region in the cell. It is described in the order of the data. Here, on the premise that the vector data of the flat area is superimposed on the character vector data, it can be vector data that does not reproduce the inner contour like the vector data 1213 shown in FIG. This has the effect of reducing the amount of description of vector data and making the generated electronic document compact.

アプリケーション１８２１では、表示された出力電子文書内の各オブジェクトに対する移動、変形といった編集操作が可能である。アプリケーション１８０１とは異なり、ベクトルデータで表現された文字に対する文字編集は不可能である。一方、文字部分の表現において、アプリケーションで利用可能なフォント種類に依存しないため、アプリケーション１８０１に比べて入力画像により忠実な表示が得られる。その為、見た目の一致度を重視して入力画像を再利用する場合には、本アプリケーションで利用可能な電子文書を生成することが好適となる。 The application 1821 can perform editing operations such as movement and deformation for each object in the displayed output electronic document. Unlike the application 1801, character editing is not possible for characters represented by vector data. On the other hand, since the representation of the character portion does not depend on the font type that can be used by the application, a more faithful display can be obtained by the input image than the application 1801. Therefore, when reusing an input image with an emphasis on the degree of visual matching, it is preferable to generate an electronic document that can be used by this application.

なお、図１７における両記述の形式はあくまで一例であって、他の一般に公開された記述形式を用いてもよい。例えば、ＰＤＦ（ＰａｇｅＤｅｓｃｒｉｐｔｉｏｎＦｏｒｍａｔ）、ＸＰＳ（ＸＭＬＰａｐｅｒＳｐｅｃｉｆｉｃａｔｉｏｎ）、ＳＶＧ（ＳｃａｌａｂｌｅＶｅｃｔｏｒＧｒａｐｈｉｃｓ）などのグラフィックスページ記述を用いてもよい。また、ＯｆｆｉｃｅＯｐｅｎＸＭＬ、ＯｐｅｎＤｏｃｕｍｅｎｔＦｏｒｍａｔなどのページ編集データ記述などを用いてもよい。また、図１７の出力１７００と出力１７５０のように異なる電子文書フォーマットではなく、同一のフォーマットでオブジェクトの表現方式を変更するように電子文書を生成してもよい。 Note that the formats of both descriptions in FIG. 17 are merely examples, and other publicly described description formats may be used. For example, graphics page descriptions such as PDF (Page Description Format), XPS (XML Paper Specification), and SVG (Scalable Vector Graphics) may be used. Also, page edit data description such as Office Open XML, Open Document Format, or the like may be used. Also, the electronic document may be generated so that the object representation method is changed in the same format, instead of different electronic document formats as in the output 1700 and the output 1750 in FIG.

以上説明したように、本発明に係る画像処理装置が実施する電子文書生成処理では、減色処理された入力画像に対し画素塊解析処理を行い、同色画素塊の抽出と、その相互関係を示す画素塊データを構築する。この画素塊データに対しレイアウト解析処理を行い、入力画像に存在する各種文書領域を特定し、同領域をノードとするツリー構造で表現した領域データを作成する。更に、このレイアウト解析処理において表解析を行い、各表領域に表内のセル構造を特定するセル要素リストを関連づける。 As described above, in the electronic document generation process performed by the image processing apparatus according to the present invention, the pixel block analysis process is performed on the input image that has been subjected to the color-reduction process, and the pixels indicating the same color pixel block and the correlation between them are extracted. Build chunk data. Layout analysis processing is performed on the pixel block data, various document areas existing in the input image are specified, and area data expressed in a tree structure having the same area as a node is created. Further, a table analysis is performed in the layout analysis process, and a cell element list for specifying a cell structure in the table is associated with each table area.

またセル要素には、領域ツリー中でセル内容に相当する領域ノードへのリンクを保持させる。そして、出力電子文書におけるオブジェクト内容のデータとして、各領域に対応するグラフィックスデータを生成し、関連づける。また文字領域に対しては、文字認識を行って文字データを関連づける。そして最後に、上記生成された領域データ、グラフィックスデータ、および文字データから、出力電子文書の記述を生成する。 The cell element holds a link to an area node corresponding to the cell contents in the area tree. Then, graphics data corresponding to each area is generated and associated as data of object contents in the output electronic document. Further, character data is associated with the character area by performing character recognition. Finally, a description of the output electronic document is generated from the generated area data, graphics data, and character data.

このとき、表形式の出力モードでは、領域データの表領域を表オブジェクトとして、それ以外は各領域に関連付けられたグラフィックス又は文字オブジェクトとして、領域ツリー順に出力する記述を生成する。この場合、表領域以下のノードを処理する際、子領域ノードではなく、表領域に付加されたセル要素リストにアクセスする。そして、各セル要素が保持するリンク先の領域に関連づけられたオブジェクトデータをセル内容として記述する。これにより、セル構造に基づいて、表内の各領域のオブジェクトが配置された、表形式のオブジェクトを出力することが可能になる。 At this time, in the tabular output mode, a description to be output in the order of the area tree is generated as a table object of the area data as a table object, and as other graphics or character objects associated with each area. In this case, when processing a node below the table area, the cell element list attached to the table area is accessed instead of the child area node. Then, the object data associated with the linked area held by each cell element is described as the cell contents. This makes it possible to output a tabular object in which objects in each area in the table are arranged based on the cell structure.

一方、表形式ではない出力モードでは、表を含むすべての領域に対して領域ツリー順に処理し、各領域に関連付けられたグラフィックスオブジェクトを出力する記述を生成する。 On the other hand, in an output mode that is not in a table format, all areas including a table are processed in the area tree order, and a description for outputting a graphics object associated with each area is generated.

すなわち本実施形態によれば、画素塊構造の解析結果より生成されたツリー構造の領域データに、表解析結果によるセル構造に基づくアクセス手段を付与する。これは、出力される電子文書に、セル構造を持った表形式のオブジェクトを含めることを可能にする効果がある。また、領域データは包含関係に基づく領域のツリー構造も保持しているので、ツリー順に各領域のグラフィックスデータを記述するだけで、効率良く電子文書のオブジェクト記述を生成することができる。 That is, according to the present embodiment, access means based on the cell structure based on the table analysis result is added to the area data of the tree structure generated from the analysis result of the pixel block structure. This has the effect of allowing the output electronic document to include a tabular object having a cell structure. In addition, since the area data also has a tree structure of areas based on the inclusion relation, the object description of the electronic document can be efficiently generated simply by describing the graphics data of each area in the tree order.

更に、表オブジェクトの出力内容を表形式にするかグラフィックス形式にするかを、領域データの表領域以下に対するデータアクセス方法により変更できる。すなわち、ひとつの領域データを用いて、生成する電子文書に表形式を含めるか否かを、電子文書の記述段階で容易に切り替えられる効果がある。 Furthermore, whether the output contents of the table object are in the table format or the graphics format can be changed by the data access method for the area data below the table area. That is, there is an effect that it is possible to easily switch whether or not to include a table format in the generated electronic document at the stage of describing the electronic document using one area data.

＜第二実施形態＞
本発明の第一実施形態では、画像処理装置が実施する図２の電子文書生成処理を、画像の入力から電子文書の出力までを１単位の処理として説明した。これを、画素塊解析部２０１〜文字認識部２０４の解析・データ生成処理部と、それらデータから電子文書記述を生成する電子文書記述生成部２０５の２つに分割してもよい。つまり、前半の解析・データ生成処理は、領域データ２０７、グラフィックスデータ２０８、文字データ２０９をメモリ１０３もしくはハードディスク１０４に記憶して終了する。 <Second embodiment>
In the first embodiment of the present invention, the electronic document generation process of FIG. 2 performed by the image processing apparatus has been described as a unit of processing from image input to electronic document output. This may be divided into two: a pixel block analysis unit 201 to an analysis / data generation processing unit of the character recognition unit 204, and an electronic document description generation unit 205 that generates an electronic document description from these data. That is, the analysis / data generation process in the first half is completed after the area data 207, the graphics data 208, and the character data 209 are stored in the memory 103 or the hard disk 104.

続く電子文書記述生成部２０５は、ここまでに記憶されたデータ（領域データ２０７〜文字データ２０９）に対して電子文書の記述を生成するよう動作する。そして、ひとつの入力から生成されたデータに複数回の電子文書記述生成処理を施すことで、複数種類の電子文書を生成するように構成してもよい。このとき、電子文書記述生成部２０５の制御パラメータである、出力モードおよび領域種別毎の出力対象の定義テーブルは、処理毎に変更することができる。 The subsequent electronic document description generation unit 205 operates to generate a description of the electronic document for the data stored so far (region data 207 to character data 209). A plurality of types of electronic documents may be generated by performing electronic document description generation processing a plurality of times on data generated from one input. At this time, the output target definition table for each output mode and region type, which is a control parameter of the electronic document description generation unit 205, can be changed for each process.

すなわち、一回の解析・データ生成処理で得たデータ（領域データ２０７〜文字データ２０９）から、表形式オブジェクトを含む電子文書と含まない電子文書の両方を出力することが可能である。あるいは、過去に記憶された複数のデータ（領域データ２０７〜文字データ２０９）の中から、ユーザが変換対象を選ぶ。そして、出力電子文書形式を指示することで、指示に合ったデータ形式を持つ電子文書を生成することも可能である。 That is, it is possible to output both an electronic document including a tabular object and an electronic document not including it from data (region data 207 to character data 209) obtained by one analysis / data generation process. Alternatively, the user selects a conversion target from a plurality of data (region data 207 to character data 209) stored in the past. By instructing the output electronic document format, an electronic document having a data format that matches the instruction can be generated.

すなわち本実施形態によれば、第一実施形態の効果に加え、更に、解析処理により得られたデータを記憶しておくことで、同じデータに対し出力形式を指定して複数の電子文書を生成することができる。これにより、入力画像から一度解析データを得る処理を実施しておけば、表形式オブジェクトの有無を含む複数形式の電子文書を生成することが可能となり、処理時間や処理リソースを削減する効果がある。 In other words, according to the present embodiment, in addition to the effects of the first embodiment, by further storing data obtained by analysis processing, a plurality of electronic documents can be generated by designating an output format for the same data. can do. As a result, once processing for obtaining analysis data from an input image is performed, it is possible to generate an electronic document in a plurality of formats including presence / absence of a tabular object, thereby reducing processing time and processing resources. .

＜第三実施形態＞
本発明に係る第一、第二実施形態では、表形式のオブジェクトを生成するか否かを、電子文書記述生成処理における出力モードとして設定し、入力画像から抽出されたすべての表領域に対して一律に作用するようにしていた。これを、電子文書記述生成処理内で、表領域ごとに表形式のオブジェクトを生成するか否かを選択できるようにしてもよい。 <Third embodiment>
In the first and second embodiments according to the present invention, whether or not to generate a tabular object is set as an output mode in the electronic document description generation process, and for all table regions extracted from the input image It was supposed to work uniformly. It may be possible to select whether or not to generate a tabular object for each table area in the electronic document description generation process.

図１９は、図２の電子文書記述生成部２０５の処理を説明する図１３のフローチャートにおける、関数Ｓ１３０５の別の実施形態を説明するフローチャートである。なお、図１９にて図１４と同じ記号が付与されたステップは、図１４と同一の動作をするものとする。よって、重複する部分については、説明を省略する。 FIG. 19 is a flowchart for explaining another embodiment of the function S1305 in the flowchart of FIG. 13 for explaining the processing of the electronic document description generating unit 205 of FIG. In FIG. 19, steps to which the same symbols as those in FIG. 14 are given perform the same operations as in FIG. 14. Therefore, the description of the overlapping parts is omitted.

図１４との唯一の差分となるＳ１９０１では、電子文書記述生成部２０５は、注目の表領域ノードを表形式で出力するか否かを判定する。この判定は、表領域に付与されている情報である、セルの行列構造の複雑度から決めればよい。例えば、行数と列数が共に“１”の場合には、出力電子文書２１０でセル構造を再現する必要性は低く、逆に飾り枠の領域であるなど、表形式よりもグラフィックスのオブジェクトで表現するほうが適切である可能性もある。なお、これはあくまで一例であり、あるいは表枠画素塊の色や形状、表内フラット領域の分割数、セル内の文字認識結果などを考慮して決めてもよい。あるいは、ユーザに領域データを提示し、各表領域の出力形式を対話的に指定できるようにしてもよい。 In S1901, which is the only difference from FIG. 14, the electronic document description generation unit 205 determines whether or not to output the table area node of interest in a table format. This determination may be determined from the complexity of the matrix structure of the cell, which is information given to the table area. For example, when both the number of rows and the number of columns are “1”, it is less necessary to reproduce the cell structure in the output electronic document 210, and conversely, a graphics object rather than a table format such as a decorative frame area. It may be more appropriate to express with. This is merely an example, or may be determined in consideration of the color and shape of the table frame pixel block, the number of divisions of the in-table flat area, the character recognition result in the cell, and the like. Alternatively, the area data may be presented to the user so that the output format of each table area can be specified interactively.

表形式で出力すると判定した場合（Ｓ１９０１にてＹＥＳ）は、Ｓ１４０８に進み、電子文書記述生成部２０５は、以降、図１４の説明と同様に、セル構造に基づく表オブジェクトの記述を出力する。それ以外の場合は（Ｓ１９０１にてＮＯ）、Ｓ１４０２に進み、図１４の説明と同様に、表枠およびそれらの子領域のグラフィックスデータによるオブジェクト記述が出力されるよう処理される。 If it is determined to output in the table format (YES in S1901), the process advances to S1408, and the electronic document description generation unit 205 subsequently outputs a description of the table object based on the cell structure, as in the description of FIG. In other cases (NO in S1901), the process proceeds to S1402, and similarly to the description of FIG. 14, processing is performed so that the object description based on the graphics data of the table frame and their child areas is output.

すなわち本発明に係る第三実施形態によれば、第一実施形態の効果に加え、更に、表オブジェクトの出力方法を、モード指定制御および対象表領域の特性に基づいて、表領域以下のデータアクセス方法変更により切り替える。よって、出力電子文書に表形式を含めるか否かを、電子文書の記述段階で容易に切り替えつつ、表形式の出力が適切な表領域のみが、表形式のオブジェクトで出力されるという効果がある。 That is, according to the third embodiment of the present invention, in addition to the effects of the first embodiment, the table object output method is further configured to access the data below the table area based on the mode designation control and the characteristics of the target table area. Switch by changing method. Therefore, it is possible to easily switch whether or not to include the table format in the output electronic document at the description stage of the electronic document, and only the table area whose table format output is appropriate is output as the table format object. .

＜その他の実施形態＞
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 <Other embodiments>
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

また、本発明は、図２の各処理部の一部または全部を、電子回路等のハードウェアを用いて実現してもよい。 In the present invention, a part or all of each processing unit in FIG. 2 may be realized using hardware such as an electronic circuit.

Claims

An image processing apparatus that generates electronic data that can be edited from an input image,
Input means for inputting an image including a table as the input image;
Pixel block analysis means for extracting a plurality of pixel blocks whose pixel values approximate in the pixels constituting the input image and analyzing the inclusion relationship between the plurality of pixel blocks;
An identification means for identifying an area formed by the plurality of pixel blocks as at least one of a character area, a table area, and other areas;
Generating means for generating tree-structured data indicating an inclusion relationship between regions according to the inclusion relationship between the pixel blocks analyzed by the pixel block analysis unit and the region of the pixel block identified by the identification unit;
Table structure analysis means for analyzing the matrix structure of the table for the pixel block identified as the table region,
An associating means for generating information of each cell element constituting the matrix structure of the table and associating it with a table area in the data of the tree structure;
An image processing apparatus comprising: a setting unit configured to set link information to a region corresponding to the content of the cell element among the regions identified by the identifying unit for each of the cell elements.

The image processing apparatus according to claim 1, wherein a pixel value constituting an input image from which the pixel block analysis unit extracts a pixel block is set from any of three or more types.

Generating means for generating a description defining the electronic data from the tree-structured data;
A designation unit that accepts designation of whether to output the table area in a table format;
The generating means generates a description of the table area in the electronic data according to the cell element information and the link information associated with the table area when outputting the table area in a table format. 3. The image processing apparatus according to claim 1, wherein when the area is not output in a table format, a description of the table area in the electronic data is generated according to the tree structure.

Recognizing a character from a pixel block of the character region, further comprising character recognition means for generating character data;
The image processing apparatus according to claim 3, wherein the generation unit generates a description of the character area in the electronic data using the character data.

As the data of each area constituting the tree structure, further comprising graphics data generating means for generating graphics data represented by vector data or image data cut out from the input image,
The image processing apparatus according to claim 3, wherein the generation unit generates a description in the electronic data using the graphics data.

The generation means generates a description defining a plurality of types of electronic data from the tree structure data and region data corresponding to one input image,
The plurality of types of electronic data are electronic data including a description of a table area defined in a table format and electronic data including a description of a table area not defined in a table format. The image processing apparatus according to 3.

The image processing apparatus according to claim 3, wherein the designation unit receives designation as to whether or not to output in a table format for each table region included in the input image.

An image processing method for generating electronic data that can be edited from an input image,
An input step in which an input means inputs an image including a table as the input image;
A pixel block analysis unit that extracts a plurality of pixel blocks whose pixel values approximate in the pixels constituting the input image and analyzes an inclusion relationship between the plurality of pixel blocks;
An identifying step for identifying an area formed by the plurality of pixel blocks as at least one of a character area, a table area, and other areas;
Generating means for generating data having a tree structure indicating an inclusion relationship between regions according to the inclusion relationship between the pixel blocks analyzed in the pixel block analysis step and the region of the pixel block identified in the identification step Process,
A table structure analyzing means for analyzing a matrix structure of the table for the pixel block identified as the table region;
An associating means for generating information on each cell element constituting the matrix structure of the table and associating it with a table region in the data of the tree structure;
An image processing method comprising: a setting unit that sets, for each of the cell elements, link information to a region corresponding to the content of the cell element among the regions identified in the identification step; Method.

Computer
Input means for inputting an image including a table as an input image;
Pixel block analysis means for extracting a plurality of pixel blocks whose pixel values approximate in the pixels constituting the input image and analyzing the inclusion relation between the plurality of pixel blocks;
Identifying means for identifying an area formed by the plurality of pixel blocks as at least one of a character area, a table area, and other areas;
Generating means for generating data of a tree structure indicating an inclusion relationship between regions according to the inclusion relationship between the pixel blocks analyzed by the pixel block analysis unit and the region of the pixel block identified by the identification unit;
Table structure analyzing means for analyzing the matrix structure of the table for the pixel block identified as the table region,
Association means for generating information on each cell element constituting the matrix structure of the table and associating it with a table region in the data of the tree structure;
A program for causing each cell element to function as setting means for setting link information to an area corresponding to the content of the cell element among areas identified by the identifying means.