JP2013080349A

JP2013080349A - Image processor, image processing method, and program

Info

Publication number: JP2013080349A
Application number: JP2011219564A
Authority: JP
Inventors: Makoto Enomoto; 誠榎本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-10-03
Filing date: 2011-10-03
Publication date: 2013-05-02
Anticipated expiration: 2031-10-03
Also published as: JP5767549B2

Abstract

PROBLEM TO BE SOLVED: To solve problems of increasing the size of a file when characters include many variations of colors, for the necessity of describing new modification information every time the modification information of the characters in a text is changed when adding the modification information of the character such as a color to the character.SOLUTION: An image processor comprises: extracting means for extracting a plurality of pixel blocks whose pixel values are approximate in pixels constituting an input image including a character string; specifying means for specifying an area constituted by the plurality of pixel blocks as at least one of a character area and an area other than the character area; acquiring means for analyzing a character from the pixel blocks specified as the character area, and acquiring character information including at least a character code and position information of the character; designating means for designating a character string including a space character from a disposition of the character indicated by the character information; and adding means for acquiring color information from the pixel blocks of the character area at a position indicated by the character information, and adding the color information to the character information. The image processor adds the color information of characters before/after the space character in the character string with respect to the character information of the space character included in the character string.

Description

本発明は、紙文書、または文書の画像データから編集可能な電子文書データを生成する画像処理装置、画像処理方法、およびプログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and a program for generating editable electronic document data from a paper document or document image data.

近年、文書を作成する際、単に文字を打ち込むのみならず、フォントに装飾を凝らしたり、図を自由に作成したり、あるいは写真等を取り込んだりといった、高度な機能が用いられるようになっている。 In recent years, when creating a document, advanced functions have been used, such as adding decorations to fonts, creating drawings freely, and capturing photos, etc., as well as simply typing characters. .

しかし、作成物の内容が高度になるほど、文書をまったく新規から作成するには大きな労力が必要とされる。よって、過去に作成した文書の一部をそのまま、あるいは加工編集したものを再利用できるようにすることが望まれている。 However, the higher the content of the creation, the more labor is required to create a completely new document. Therefore, it is desired that a part of a document created in the past can be reused as it is or after being processed and edited.

一方、インターネットに代表されるようなネットワークの広がりにより、文書が電子的に配布される機会も増えたが、電子文書が紙に印刷された状態で配布されることも多い。そのように紙文書しか手元に存在しない場合でも、その内容を紙から再利用可能なデータとして得られるようにするための技術が開示されている。例えば、特許文献１では、紙の文書を装置に電子的に読み込ませた際に、その内容と一致する文書をデータベースから検索して取得し、読み込んだ紙面のデータの代わりに利用できることが記載されている。また、同一の文書がデータベースから特定できなかった場合は、読み込んだ文書の画像を再利用が容易な電子データへと変換するため、この場合も文書の内容を再利用することができる。 On the other hand, the spread of networks such as the Internet has increased the opportunities for electronic distribution of documents, but electronic documents are often distributed in a printed state on paper. In this way, even when only a paper document exists at hand, a technique for obtaining the contents as reusable data from paper is disclosed. For example, Patent Document 1 describes that when a paper document is electronically read by a device, a document that matches the content is retrieved from a database and can be used in place of the read paper data. ing. If the same document cannot be identified from the database, the image of the read document is converted into electronic data that can be easily reused. In this case as well, the contents of the document can be reused.

従来、文書画像中の文字情報を再利用が容易な電子データへと変換する技術として、ＯＣＲ技術があった。また、線や面で構成される図画情報を再利用が容易な電子データへと変換する技術として、ベクトル化の技術があった。例えば、特許文献１では、上記技術を用いて文書画像中の文字を文字コードにしたり、図形の輪郭をベクトルデータにすることで、再利用可能なデータへと変換する技術が開示されている。 Conventionally, there has been an OCR technique as a technique for converting character information in a document image into electronic data that can be easily reused. Further, as a technique for converting graphic information composed of lines and planes into electronic data that can be easily reused, there has been a vectorization technique. For example, Patent Document 1 discloses a technique for converting characters in a document image into character codes using the above technique, or converting the outline of a figure into reusable data by using vector data.

特許文献１では更に、文書画像中の文字、線画、自然画、表などの領域を識別し、各領域の関係をツリー構造で表現するデータを構築する技術を開示している。そして、同構造に従って上記文字コードやベクトルデータ、画像データ等を配置することで、アプリケーションで編集可能な電子文書ページへの変換を行う。この電子データは、元文書と同等のレイアウトを持ち、文書作成アプリケーション等で新規作成した電子文書ページと同様、文字や図形の位置やサイズの変更、さらに幾何学的な変形や色付けなどを容易に行うことができる。 Patent Document 1 further discloses a technique for identifying areas such as characters, line drawings, natural images, and tables in a document image and constructing data that represents the relationship between the areas in a tree structure. Then, by arranging the character code, vector data, image data, etc. according to the same structure, conversion into an electronic document page editable by an application is performed. This electronic data has the same layout as the original document, and it is easy to change the position and size of characters and figures, as well as geometric deformation and coloring, just like an electronic document page newly created by a document creation application. It can be carried out.

また、カラープリンタ等の普及により、カラフルに印刷された紙文書を受けとる機会も増えている。このようなカラー文書の内容を再利用しようとする場合、その色情報を再現したうえで、再利用可能なデータに変換することが求められる。このような要求に対し、特許文献２では、文字等の色情報が失われないように、カラー画像を２値以上の画素値を持つことができる画像に減色してから、同色となった画素塊を抽出して領域を識別している。この技術を利用して色情報を持つ画素塊取得し、上記ベクトル化等の処理を施すことで、色情報を再現したうえで再利用可能なデータを得ることができる。 In addition, with the spread of color printers and the like, opportunities to receive colorfully printed paper documents are increasing. When reusing the contents of such a color document, it is required to reproduce the color information and convert it into reusable data. In response to such a request, in Patent Document 2, the color image is reduced to an image that can have two or more pixel values so that color information such as characters is not lost, and then the pixels that have the same color. A block is extracted to identify the region. By obtaining a pixel block having color information using this technique and performing the above-described processing such as vectorization, it is possible to obtain reusable data after reproducing the color information.

また、特許文献３では、文字認識結果の各文字コードに対し、入力画像の画素を元に得た色情報を付加している。これにより、色情報を再現した再利用可能な文字データを得ることができる。 In Patent Document 3, color information obtained based on the pixels of the input image is added to each character code of the character recognition result. Thereby, reusable character data reproducing color information can be obtained.

特許第４２５１６２９号明細書Japanese Patent No. 4251629 米国特許出願公開第２００８／０１２３９４５号明細書US Patent Application Publication No. 2008/0123945 特開２００９−２０５２３２号明細書JP 2009-205232 A Specification

入力画像を電子文書に変換する際に、一文字ごとに色などの修飾情報を付加すると、その分ファイルサイズが増大してしまうという問題がある。 When converting an input image into an electronic document, if modification information such as color is added for each character, there is a problem that the file size increases accordingly.

また、文字認識結果の中にスペース文字（空白文字）がある場合、当該スペース文字には対応する画素情報が存在しないので、当該スペース文字は色情報無しとして判定される。文字認識結果の文字列の中に、色情報が付与される文字と、色情報が付与されないスペース文字とが混在する場合、そのままの状態で電子文書への変換を行おうとすると、文字ごとに色情報の有無を記述する必要が出てしまう。その結果、電子文書のファイルサイズが増大することにより、保存や再利用の際に利便性が損なわれるという課題があった。 If there is a space character (blank character) in the character recognition result, there is no pixel information corresponding to the space character, so the space character is determined as having no color information. If a character string with color information and a space character without color information are mixed in the character string of the character recognition result, if conversion to an electronic document is performed as it is, the color for each character It becomes necessary to describe the presence or absence of information. As a result, the file size of the electronic document increases, and there is a problem that convenience is lost during storage and reuse.

上記課題を解決するために、本願発明は以下の構成を有する。すなわち、入力画像から編集が可能な電子データを生成する画像処理装置であって、文字列を含む画像を前記入力画像として入力する入力手段と、前記入力画像を構成する画素において画素値が近似する複数の画素塊を抽出する抽出手段と、前記複数の前記画素塊が構成する領域を、文字領域、およびそれ以外の領域のうちの少なくともいずれかとして識別する識別手段と、前記文字領域として識別された画素塊から文字を解析し、当該文字の少なくとも文字コードと位置情報とを含む文字情報を取得する解析手段と、前記文字情報により示される文字の配置から空白文字を含む文字列を特定する特定手段と、前記文字情報にて示される位置の文字領域の画素塊から色情報を取得し、当該文字情報に付加する色情報付加手段と、前記特定された文字列と当該文字列に含まれる文字の文字情報とから前記電子データを定義する記述を生成する生成手段とを備え、前記色情報付加手段は、前記文字列に含まれる空白文字の文字情報に対し、当該文字列における当該空白文字の前後にある文字の色情報を付加する。 In order to solve the above problems, the present invention has the following configuration. That is, an image processing apparatus that generates electronic data that can be edited from an input image, and the pixel values approximate in input means that inputs an image including a character string as the input image and pixels that constitute the input image An extraction unit that extracts a plurality of pixel blocks, an identification unit that identifies a region formed by the plurality of pixel blocks as at least one of a character region and other regions, and is identified as the character region Analysis means for analyzing a character from the pixel block and acquiring character information including at least a character code and position information of the character, and specifying a character string including a blank character from an arrangement of characters indicated by the character information Color information adding means for acquiring color information from a pixel block of a character area at a position indicated by the character information and adding the color information to the character information; Generating means for generating a description defining the electronic data from a character string and character information of a character included in the character string, and the color information adding means includes character information of blank characters included in the character string. On the other hand, color information of characters before and after the blank character in the character string is added.

不可視の文字に対して前後の修飾情報を付加することで、生成する電子文書の見た目の情報を変えることなく、文字列中の修飾情報をまとめ、ファイルサイズを削減する。 By adding front and rear modification information to invisible characters, the modification information in the character string is collected and the file size is reduced without changing the appearance information of the generated electronic document.

システム構成の一例を示す図。The figure which shows an example of a system configuration. 各部位の処理によるデータの変化を示すブロック図。The block diagram which shows the change of the data by the process of each site | part. 画素塊解析部における処理を示すフローチャート。The flowchart which shows the process in a pixel block analysis part. ラベリング処理を示すフローチャート。The flowchart which shows a labeling process. ラベリング処理の例を示す図。The figure which shows the example of a labeling process. 画素塊解析部による処理結果の例を示す図。The figure which shows the example of the process result by a pixel block analysis part. レイアウト解析部における処理を示すフローチャート。The flowchart which shows the process in a layout analysis part. グラフィックスデータ生成部における処理を示すフローチャート。The flowchart which shows the process in a graphics data generation part. レイアウト解析処理部における処理結果の例を示す図。The figure which shows the example of the process result in a layout analysis process part. 文字認識部における処理を示すフローチャート。The flowchart which shows the process in a character recognition part. 文字領域の処理結果の例を示す図。The figure which shows the example of the processing result of a character area. 色情報付加処理を示すフローチャート。5 is a flowchart showing color information addition processing. 色情報付加処理の処理結果の例を示す図。The figure which shows the example of the process result of a color information addition process. 画素塊色情報を説明するための図。The figure for demonstrating pixel block color information. 電子文書生成部における処理を示すフローチャート。The flowchart which shows the process in an electronic document production | generation part. 領域種別毎の出力対象を定義するテーブルの例を示す図。The figure which shows the example of the table which defines the output object for every area | region classification. 電子文書データの出力例を示す図。The figure which shows the example of an output of electronic document data. 第二実施形態に係る文字領域の処理結果の例を示す図。The figure which shows the example of the processing result of the character area which concerns on 2nd embodiment. 第二実施形態に係る電子文書の出力例を示す図。The figure which shows the output example of the electronic document which concerns on 2nd embodiment.

＜第一実施形態＞
［システム構成］
以下、本発明を実施するための最良の形態について図面を用いて説明する。図１は本発明に係る画像処理装置を用いたシステム構成の一例を示す図である。画像処理装置１００は、スキャナ１０１、ＣＰＵ１０２、メモリ１０３、ハードディスク１０４、およびネットワークＩ／Ｆ１０５を含む。スキャナ１０１は、読みとった文書の紙面情報を画像データに変換する。ＣＰＵ１０２は、画像データに電子文書生成処理を施すためのプログラムを実行する。メモリ１０３は、該プログラムを実行する際のワークメモリやデータの一時保存などに利用される。ハードディスク１０４は、該プログラムやデータを格納する。ネットワークＩ／Ｆ１０５は、外部装置とデータの入出力を行う。画像処理装置１００は、ネットワークＩ／Ｆ１０５を介してＬＡＮやインターネットなどの有線または無線のネットワーク１１０に接続にされている。このネットワーク１１０には更に汎用のパーソナルコンピュータ（ＰＣ）１２０が接続されており、ＰＣ１２０は、画像処理装置１００から送信されたデータを受信し、同コンピュータ上にて表示・編集などに利用することが可能である。 <First embodiment>
[System configuration]
The best mode for carrying out the present invention will be described below with reference to the drawings. FIG. 1 is a diagram showing an example of a system configuration using an image processing apparatus according to the present invention. The image processing apparatus 100 includes a scanner 101, a CPU 102, a memory 103, a hard disk 104, and a network I / F 105. The scanner 101 converts the page information of the read document into image data. The CPU 102 executes a program for performing electronic document generation processing on the image data. The memory 103 is used for temporary storage of work memory and data when executing the program. The hard disk 104 stores the program and data. The network I / F 105 performs data input / output with an external device. The image processing apparatus 100 is connected to a wired or wireless network 110 such as a LAN or the Internet via a network I / F 105. The network 110 is further connected to a general-purpose personal computer (PC) 120. The PC 120 can receive data transmitted from the image processing apparatus 100 and use it for display / editing on the computer. Is possible.

［電子文書生成処理の構成］
図２は、本発明に係る画像処理装置のＣＰＵ１０２で実施される電子文書生成処理の構成を示すブロック図である。更に、電子文書生成処理の中で、生成される各種データが示されている。図２の入力画像２００および出力電子文書２１０はそれぞれ電子文書生成処理の入力データおよび出力データである。この入力画像２００を出力電子文書２１０として出力するまでの処理の流れと、処理を行う各処理部についての概要を説明する。また、各処理部の詳細な処理内容はその後で説明する。 [Configuration of electronic document generation processing]
FIG. 2 is a block diagram showing the configuration of the electronic document generation process executed by the CPU 102 of the image processing apparatus according to the present invention. Further, various data generated in the electronic document generation process are shown. The input image 200 and the output electronic document 210 in FIG. 2 are input data and output data for the electronic document generation process, respectively. A flow of processing until the input image 200 is output as the output electronic document 210 and an outline of each processing unit that performs processing will be described. Detailed processing contents of each processing unit will be described later.

入力画像２００は、図２の電子文書生成処理の対象となる画像データである。例えば図１に示す画像処理装置１００では、スキャナ１０１により読み取られた紙文書の内容が、光電変換により電子的画素情報に変換された文書画像データである。もしくは、ネットワークＩ／Ｆ１０５を通して外部から供給された画像データ、画像処理装置１００内で生成された画像データであってもよい。入力画像２００は、具体的にはメモリ１０３もしくはハードディスク１０４に格納された状態で、以降の処理ブロックへと入力される。 The input image 200 is image data that is a target of the electronic document generation process of FIG. For example, in the image processing apparatus 100 shown in FIG. 1, the contents of a paper document read by the scanner 101 are document image data converted into electronic pixel information by photoelectric conversion. Alternatively, image data supplied from the outside through the network I / F 105 or image data generated in the image processing apparatus 100 may be used. Specifically, the input image 200 is input to a subsequent processing block in a state where it is stored in the memory 103 or the hard disk 104.

出力電子文書２１０は、電子文書生成処理の結果として出力される電子データである。この出力電子文書２１０は、入力画像２００の内容を、利用者がパーソナルコンピュータのアプリケーション上で表示・編集が可能となる形式で表現したものである。またこの出力電子文書２１０内では、入力画像２００に含まれている文字や図形、写真等の内容に応じて、それらの内容を種別毎に適切なデータ形式で表現することを特徴とする。その目的は、表示・保存・検索・編集・再利用等の異なる用途の各々に最適な電子文書を出力可能とするためである。それぞれのデータ形式および電子文書形式の具体例については後述する。 The output electronic document 210 is electronic data output as a result of the electronic document generation process. The output electronic document 210 is a representation of the contents of the input image 200 in a format that allows the user to display and edit on the application of the personal computer. In the output electronic document 210, according to the contents of characters, figures, photographs, and the like included in the input image 200, the contents are expressed in an appropriate data format for each type. The purpose is to make it possible to output an optimal electronic document for each of different uses such as display, storage, retrieval, editing, and reuse. Specific examples of each data format and electronic document format will be described later.

画素塊解析部２０１は、入力画像２００の画素内容（画素情報）を解析し、同色とみなされる連結画素のグループ化を行い、連結画素塊を形成する。そして、画素塊解析部２０１は、形成した連結画素塊の画素形状、および相対的な位置関係を含む画素塊データ２０６を生成する。 The pixel block analysis unit 201 analyzes the pixel content (pixel information) of the input image 200, groups connected pixels regarded as the same color, and forms a connected pixel block. Then, the pixel block analysis unit 201 generates pixel block data 206 including the pixel shape of the formed connected pixel block and the relative positional relationship.

レイアウト解析部２０２は、画素塊解析部２０１にて生成された画素塊データ２０６を入力として、各画素塊を文字と非文字に分類し、かつグループ化を行う。これにより、レイアウト解析部２０２は、入力画像２００内に存在する領域を特定する。ここで特定される領域の種別としては、文字領域、線画領域、自然画領域、表領域などがある。そして、レイアウト解析部２０２は、それら特定された各領域の種類、座標と相対関係、および領域に含まれる画素塊の情報を含む領域データ２０７を生成する。 The layout analysis unit 202 receives the pixel block data 206 generated by the pixel block analysis unit 201 as input, classifies each pixel block into characters and non-characters, and performs grouping. As a result, the layout analysis unit 202 identifies an area existing in the input image 200. The types of areas specified here include character areas, line drawing areas, natural image areas, and table areas. Then, the layout analysis unit 202 generates region data 207 including information on the types of the identified regions, coordinates and relative relationships, and pixel clusters included in the regions.

グラフィックスデータ生成部２０３は、領域データ２０７、画素塊データ２０６、および入力画像２００を入力とし、出力電子文書２１０内で各領域の内容に相当するグラフィックスデータ２０８を生成する。グラフィックスデータ２０８は、後述の電子文書記述生成部２０５において、各領域に対応するグラフィックスオブジェクト記述の生成に用いられる。具体的には、例えば、グラフィックスデータ生成部２０３は、領域データ２０７の中から写真の領域を特定し、入力画像２００中の同領域の画素情報を利用して、写真部分の切り出し画像データを生成する。または、グラフィックスデータ生成部２０３は、線画の領域を特定し、対応する画素塊データの画素形状情報からその輪郭を抽出し、直線・曲線パス近似による線画部分のベクトルデータを生成する。更に、グラフィックスデータ生成部２０３は、入力画像２００において文字・写真・線画といった前景部分の画素を、その周囲色で塗りつぶした画像データである、背景画像データを生成する。 The graphics data generation unit 203 receives the area data 207, the pixel block data 206, and the input image 200 as input, and generates graphics data 208 corresponding to the contents of each area in the output electronic document 210. The graphics data 208 is used to generate a graphics object description corresponding to each area in an electronic document description generation unit 205 described later. Specifically, for example, the graphics data generation unit 203 identifies a photo area from the area data 207 and uses the pixel information of the same area in the input image 200 to extract the cut-out image data of the photo portion. Generate. Alternatively, the graphics data generation unit 203 identifies a line drawing area, extracts the outline from the pixel shape information of the corresponding pixel block data, and generates vector data of the line drawing portion by straight line / curve path approximation. Furthermore, the graphics data generation unit 203 generates background image data that is image data in which pixels in the foreground portion such as characters, photographs, and line drawings in the input image 200 are filled with surrounding colors.

文字認識部２０４は、領域データ２０７から文字の領域を特定し、さらに同領域に対応する画素塊データ２０６から文字の画素形状を二値画像として再構成する。そして、文字認識部２０４は、構成した二値画像に対して文字認識処理を行い、文字領域内の認識文字コード列を得る。さらに、文字認識部２０４は、これらの文字コード列と、電子文書上で利用可能なその他の情報を含む文字データ２０９を生成する。文字認識部２０４は、認識処理を正しく行う為に入力画像２００の天地方向を判別し、その方向が上以外の場合は、二値画像および領域情報を回転して文字認識処理を行うようにしてもよい。文字データ２０９は、文字認識結果の文字コード列のみではなく、各文字の座標情報、推定文字サイズやピッチ、行ピッチなど、文字認識処理に付随して推定された書式情報を含んでいてもよい。また、文字データ２０９は、文字画素塊データが有する色の情報を利用して推定された、各文字の色情報を含んでいてもよい。 The character recognition unit 204 identifies a character region from the region data 207, and further reconstructs the pixel shape of the character from the pixel block data 206 corresponding to the region as a binary image. Then, the character recognition unit 204 performs character recognition processing on the constructed binary image to obtain a recognized character code string in the character area. Furthermore, the character recognition unit 204 generates character data 209 including these character code strings and other information that can be used on the electronic document. The character recognition unit 204 determines the vertical direction of the input image 200 in order to correctly perform the recognition processing. If the direction is other than the top, the character recognition processing is performed by rotating the binary image and the region information. Also good. The character data 209 may include not only the character code string of the character recognition result but also the format information estimated accompanying the character recognition process, such as the coordinate information of each character, the estimated character size, pitch, and line pitch. . Further, the character data 209 may include color information of each character estimated using color information included in the character pixel block data.

電子文書記述生成部２０５は、領域データ２０７、グラフィックスデータ２０８、および文字データ２０９を入力として、それらを用途に合った形式になるよう選択・変形・合成し、出力となる出力電子文書２１０の記述を生成する。電子文書記述生成部２０５は、１つの入力画像２００に対し、１ページの出力電子文書を生成するようにしても、複数の入力画像に対し１のマルチページ電子文書が生成されるようにしてもよい。 The electronic document description generation unit 205 receives the region data 207, the graphics data 208, and the character data 209 as input, selects, transforms, and combines them into a format suitable for the application, and outputs the output electronic document 210 as an output. Generate a description. The electronic document description generation unit 205 may generate one page of output electronic document for one input image 200, or may generate one multi-page electronic document for a plurality of input images. Good.

［各処理部の動作］
続いて、図２の電子文書生成処理を構成する各処理部の詳細な動作例を順に説明していく。各処理部による処理は、画像処理装置１００が備えるＣＰＵ１０２が、記憶部であるメモリ１０３等に記憶されたプログラムを読み出し、実行することにより実現される。 [Operation of each processing unit]
Next, detailed operation examples of the processing units constituting the electronic document generation process of FIG. 2 will be described in order. The processing by each processing unit is realized by the CPU 102 included in the image processing apparatus 100 reading and executing a program stored in the memory 103 or the like that is a storage unit.

（画素塊解析部による処理）
図３に、画素塊解析部２０１の動作例を説明するフローチャートを示す。 (Processing by the pixel block analysis unit)
FIG. 3 shows a flowchart for explaining an operation example of the pixel block analysis unit 201.

Ｓ３０１では、入力画像２００が画素塊解析部２０１へと入力される。ここで入力画像２００は、カラー画像であれば、各画素がＲＧＢ各々３つの８ｂｉｔ値で表現されるページサイズの画素集合として、メモリ１０３に展開した状態で入力されるものとする。なお、これはあくまで一例であって、グレー形式を含むＲＧＢ形式以外の色空間で表現されていてもよい。また、入力画像２００が圧縮画像のストリームとして入力され、画素塊解析部２０１がメモリ１０３にてＲＧＢ画素などとなるように展開してもよい。 In S 301, the input image 200 is input to the pixel block analysis unit 201. Here, if the input image 200 is a color image, it is assumed that the input image 200 is input in a state of being developed in the memory 103 as a set of pixels having a page size in which each pixel is represented by three 8-bit values for each of RGB. This is merely an example, and may be expressed in a color space other than the RGB format including the gray format. Alternatively, the input image 200 may be input as a stream of compressed images, and the pixel block analysis unit 201 may decompress the RGB pixels or the like in the memory 103.

Ｓ３０２では、画素塊解析部２０１は、入力画像２００の各画素に対し、減色処理を施した減色画像を生成する。この減色画像において各画素がとる値の範囲は、入力画像２００の画素値範囲以下の０〜Ｎ（Ｎ≧２）の値である。減色処理の方法自体については、本発明の本質と外れるので詳細な説明は省略する。しかし、減色後の画素値が白黒二値ではなく、元の入力画像２００に含まれる文字や線等の色特徴を保持する画素値を保持するよう処理することで、本発明の効果が発揮されることは留意すべきである。つまり、白黒二値のような２種類の画素値を有する画像でなく、３種類以上の画素値を有することが可能な画像を対象として本発明に係る処理を適用することが本発明の効果を得る前提となる。 In step S 302, the pixel block analysis unit 201 generates a color-reduced image obtained by performing a color-reduction process on each pixel of the input image 200. The range of values that each pixel takes in this subtractive color image is a value of 0 to N (N ≧ 2) that is less than or equal to the pixel value range of the input image 200. Since the color reduction processing method itself is out of the essence of the present invention, a detailed description thereof is omitted. However, the effect of the present invention is exhibited by performing processing so that pixel values after color reduction are not monochrome binary values but pixel values that retain color features such as characters and lines included in the original input image 200. It should be noted that. That is, applying the processing according to the present invention to an image that can have three or more types of pixel values rather than an image having two types of pixel values, such as black and white binary, has the effect of the present invention. It is a premise to get.

このような減色処理の例としては、入力画像２００がＲＧＢ形式の場合、各画素をＲ、Ｇ、Ｂ各要素についてそれぞれ１２８未満か１２８以上であることを０、１で表現した３ｂｉｔで最大８色に減色する方法がある。また、各画素の輝度値Ｙを計算し、このＹを４段階に量子化することで減色する方法もある。また、画像の画素値ヒストグラムからＮ個の代表色を推定したうえで代表色にＩＤ値を付与し、各画素には最も近似する代表色のＩＤ値を割りあてる方法もある。 As an example of such color reduction processing, when the input image 200 is in RGB format, each pixel is less than 128 or more than 128 for each of R, G, and B elements. There is a way to reduce the color. There is also a method of subtracting colors by calculating the luminance value Y of each pixel and quantizing this Y into four stages. There is also a method in which N representative colors are estimated from a pixel value histogram of an image, an ID value is assigned to the representative color, and an ID value of the closest representative color is assigned to each pixel.

Ｓ３０３では、画素塊解析部２０１は、減色画像内で同じ画素値を持つ連結画素の集合に対し、公知のラベリング処理を行い、同一ラベルを持つ画素集合を画素塊として抽出する。これは、入力画像２００中で色が近似する連結画素塊を抽出することを意味する。この連結画素の判定には、上下左右とすべての斜め方向の隣接画素を考慮する８連結判定を用いることにする。８連結判定を用いたラベリング処理については、図４および図５を用いて後述する。 In step S303, the pixel block analysis unit 201 performs a known labeling process on a set of connected pixels having the same pixel value in the reduced color image, and extracts a pixel set having the same label as a pixel block. This means that a connected pixel block whose color is approximate in the input image 200 is extracted. For the determination of the connected pixels, 8-connected determination in consideration of the adjacent pixels in all the diagonal directions and the upper, lower, left, and right sides is used. The labeling process using 8-connection determination will be described later with reference to FIGS.

Ｓ３０４では、画素塊解析部２０１は、Ｓ３０３で生成した画素塊情報中の全画素塊を対象に、画素塊間の接触の有無を示す情報を保持するように、画素塊情報を更新する。具体的には、注目画素塊に対し、１）外接矩形が接するか重なる画素塊、２）さらに両画素塊のランの中に接触するランがある、の両条件に合致する画素塊のＩＤのリストを接触画素塊のリストとして登録する。これを全画素塊の組み合わせに対し実行する。なお、上記条件は一例であり、他の条件を用いても構わない。また、接触する画素塊を取得する処理は、Ｓ３０４のラベル付けのところで、接触ラベル間の関連づけを行っておくことで、より高速な接触関係の特定が可能である。しかし本処理の効率は発明の本質とは関係ないのでその説明は省略する。 In S304, the pixel block analysis unit 201 updates the pixel block information so as to hold information indicating the presence or absence of contact between the pixel blocks for all the pixel blocks in the pixel block information generated in S303. Specifically, with respect to the pixel block of interest, the pixel block ID that satisfies both the following conditions: 1) a pixel block that touches or overlaps the circumscribed rectangle, and 2) there is a run in contact with both pixel block runs. The list is registered as a list of contact pixel blocks. This is executed for all pixel block combinations. Note that the above condition is an example, and other conditions may be used. Further, in the process of acquiring the pixel block to be contacted, it is possible to specify the contact relationship at a higher speed by associating the contact labels at the time of labeling in S304. However, since the efficiency of this process is not related to the essence of the invention, its description is omitted.

Ｓ３０５では、画素塊解析部２０１は、画素塊情報中の全画素塊を対象に、ある画素塊が別の画素塊を含む、あるいは別画素塊に含まれることを示す包含情報を生成し、画素塊情報に追加する。なお、本例では２画素塊が包含関係にあることを次の条件で定義する簡易判定処理を行う。１）２画素塊間が接触しており、かつ、２）片方の画素塊の外接矩形が他方の画素塊の外接矩形を完全に包含する。これは包含判定時の処理量を軽減し処理時間を省くためである。なお、他の条件を用いることによって判定を行い、画素間の正確な包含判定を用いるようにしてもよい。 In S305, the pixel block analysis unit 201 generates inclusion information indicating that a certain pixel block includes another pixel block or is included in another pixel block for all the pixel blocks in the pixel block information. Add to chunk information. In this example, simple determination processing is performed for defining that the two-pixel block is in an inclusive relationship under the following conditions. 1) The two pixel blocks are in contact, and 2) the circumscribed rectangle of one pixel block completely encompasses the circumscribed rectangle of the other pixel block. This is to reduce the amount of processing during inclusion determination and save processing time. Note that the determination may be performed by using other conditions, and an accurate inclusion determination between pixels may be used.

Ｓ３０６では、画素塊解析部２０１は、画素塊情報に追加された包含関係を、画素塊同士の親子関係とみなしたうえで、画像全体を祖先（ルート）とし、各画素塊をノードとする画素塊のツリー構造を生成する。なお、Ｓ３０５にて用いる処理条件では、親となる画素塊を持たない画素塊が存在する場合がある。その場合、当該画素塊が、接触関係にある画素塊の親と同じ親を持つようにツリー構造を生成する。また、親が複数ある画素塊が存在する場合があるが、その際は任意のひとつの親、例えば、階層の最も深い親とのみ親子関係を有すように構造を生成すればよい。Ｓ３０６で生成された画素塊ツリー構造と、各画素塊の情報とを合わせたものが、画素塊解析部２０１が生成する画素塊データ２０６となる。 In S306, the pixel block analysis unit 201 regards the inclusion relation added to the pixel block information as a parent-child relationship between the pixel blocks, and sets the entire image as an ancestor (root) and each pixel block as a node. Generate a tree structure of chunks. Note that the processing conditions used in S305 may include a pixel block that does not have a parent pixel block. In this case, the tree structure is generated so that the pixel block has the same parent as the parent of the pixel block in contact relation. In some cases, a pixel cluster having a plurality of parents may exist. In this case, the structure may be generated so that only one parent, for example, the deepest parent in the hierarchy has a parent-child relationship. The pixel block tree 206 generated by the pixel block analysis unit 201 is a combination of the pixel block tree structure generated in S306 and the information of each pixel block.

（８連結判定によるラベリング処理）
Ｓ３０３にて行われる８連結判定によるラベリング処理の例を図４のフローチャートを用いて説明する。 (Labeling process based on 8-connection judgment)
An example of the labeling process based on the 8-connection determination performed in S303 will be described with reference to the flowchart of FIG.

Ｓ４０１では、画素塊解析部２０１は、ラベル値ｋを１に初期化する。Ｓ４０２では、画素塊解析部２０１は、減色画像の注目ラインにおいて、同一画素値が連続するランを現ランとして抽出する。最初は減色画像の最上ラインに注目し、その左端の画素から同じ画素値を持つ画素が右方向に連続する範囲をランとして抽出するものとする。抽出されたランは、始終点のｘ座標および注目ラインのｙ座標の組からなるラン情報として記憶される。なお、後述するように、同注目ラインでＳ４０２が再度処理される場合、処理済ランの右端の次の画素から始まるランを抽出する。 In S401, the pixel block analysis unit 201 initializes the label value k to 1. In step S 402, the pixel block analysis unit 201 extracts a run in which the same pixel value is continuous in the target line of the reduced color image as the current run. At first, attention is paid to the uppermost line of the subtractive color image, and a range in which pixels having the same pixel value from the leftmost pixel continue in the right direction is extracted as a run. The extracted run is stored as run information including a set of the x coordinate of the start and end points and the y coordinate of the target line. As will be described later, when S402 is processed again on the same line of interest, a run starting from the next pixel at the right end of the processed run is extracted.

Ｓ４０３では、画素塊解析部２０１は、注目ラインのひとつ上のラインにすでに抽出されたランがあり、かつ現ランと画素値が連結しているものがあるかどうかを調べる。ここで８連結しているとは、ｙ座標＝ｋ、ｘ座標による始終点＝（ｓ，ｅ）の現ランに対し、ｙ座標ｋ−１、ｘ座標（ｓ−１，ｅ＋１）の範囲に１画素でも存在するランであり、かつ画素値が同一であることが条件である。なお、注目ラインが最上ラインの場合は、連結ランは常に存在しないこととなる。条件に合致する連結ランが無い場合は（Ｓ４０３にてＮＯ）、Ｓ４０４に進み、画素塊解析部２０１は、現ランに新規ラベルＬｋを付与する。続いてＳ４０５で、画素塊解析部２０１は、ラベル値ｋに対して＋１とする。一方、Ｓ４０３で条件に合致する連結ランが存在した場合（Ｓ４０３にてＹＥＳ）、Ｓ４０６に進む。 In step S 403, the pixel block analysis unit 201 checks whether there is a run already extracted on the line above the target line, and whether there is a connection between the current run and the pixel value. Here, “8 connected” means that y coordinate = k, x coordinate and start / end point = (s, e), current run of y coordinate k−1, x coordinate (s−1, e + 1). The condition is that the run exists even in one pixel and the pixel values are the same. In addition, when the attention line is the top line, there is always no connected run. If there is no connected run that meets the conditions (NO in S403), the process proceeds to S404, and the pixel block analysis unit 201 assigns a new label Lk to the current run. In step S405, the pixel block analysis unit 201 sets +1 to the label value k. On the other hand, if there is a linked run that meets the conditions in S403 (YES in S403), the process proceeds to S406.

Ｓ４０６では、画素塊解析部２０１は、条件に合致する連結ランが複数で、かつ当該連結ランが複数のラベル種を有しているかどうかを調べる。複数のラベル種を有している場合は（Ｓ４０６にてＹＥＳ）、Ｓ４０７に進み、画素塊解析部２０１は、最初に検出された連結ランが有するラベルを現ランに付与する。更に、画素塊解析部２０１は、全連結ランが有するラベルが同一グループとみなされるよう、ラベル値間の関連付けを行う。一方、連結ランが単一の場合、あるいは複数の連結ランが同種のラベルを有している場合は（Ｓ４０６にてＮＯ）、Ｓ４０８に進み、画素塊解析部２０１は、連結ランが有するラベルを現ランに付与する。 In step S 406, the pixel block analysis unit 201 checks whether there are a plurality of linked runs that meet the condition and whether the linked run has a plurality of label types. If there are a plurality of label types (YES in S406), the process proceeds to S407, and the pixel block analysis unit 201 assigns the label of the first detected connected run to the current run. Further, the pixel block analysis unit 201 associates the label values so that the labels of all the connected runs are regarded as the same group. On the other hand, if there is a single connected run, or if multiple connected runs have the same type of label (NO in S406), the process proceeds to S408, and pixel block analysis unit 201 uses the label that the connected run has. Grant to the current run.

Ｓ４０５、Ｓ４０８、Ｓ４０９の後、Ｓ４０９に進み、画素塊解析部２０１は、注目ラインに次のランが有るか否か、すなわち現ランの終点が画像右端でないかどうかを調べる。次のランがある場合（Ｓ４０９にてＹＥＳ）、画素塊解析部２０１は、そのランが抽出されるようにＳ４０２に進んで以降処理を繰り返す。注目ラインに次のランが無い場合（Ｓ４０９にてＮＯ）、Ｓ４１０に進む。 After S405, S408, and S409, the process proceeds to S409, and the pixel block analysis unit 201 checks whether or not the next run is on the line of interest, that is, whether or not the end point of the current run is the right end of the image. If there is a next run (YES in S409), the pixel block analysis unit 201 proceeds to S402 so that the run is extracted, and thereafter repeats the processing. If there is no next run on the target line (NO in S409), the process proceeds to S410.

Ｓ４１０では、画素塊解析部２０１は、注目ラインが最終ラインかどうかを調べる。最終ラインではない場合（Ｓ４１０にてＮＯ）、Ｓ４１１に進み、画素塊解析部２０１は、次のラインに移動する。そして、Ｓ４０２に戻り、画素塊解析部２０１は、そのライン左端の画素から新たなランを抽出して以降の処理を繰り返す。注目ラインが最終ラインの場合は（Ｓ４１０にてＹＥＳ）、Ｓ４１２に進む。 In step S410, the pixel block analysis unit 201 checks whether the target line is the last line. If it is not the last line (NO in S410), the process proceeds to S411, and the pixel block analysis unit 201 moves to the next line. Then, returning to S402, the pixel block analysis unit 201 extracts a new run from the pixel at the left end of the line and repeats the subsequent processing. If the target line is the last line (YES in S410), the process proceeds to S412.

Ｓ４１２では、画素塊解析部２０１は、ラベル値毎に、同ラベルが付与されたランの集合により構成される画素塊情報を作成する。このランの集合を構成する際には、Ｓ４０７で関連づけられた複数種類のラベル値を持つランがひとつの画素塊ラン情報に集められるよう処理される。最終的に生成される画素塊情報として、ひとつの画素塊は、識別の為のＩＤ、外接矩形情報、画素値、および画素塊に集められたラン情報の集合の組から成る。 In step S412, the pixel block analysis unit 201 creates pixel block information including a set of runs to which the label is assigned for each label value. When configuring this set of runs, processing is performed so that runs having a plurality of types of label values associated in S407 are collected into one pixel block run information. As pixel block information to be finally generated, one pixel block includes a set of ID for identification, circumscribed rectangle information, pixel value, and a set of run information collected in the pixel block.

（ラベリング処理の処理例）
図４のラベリング処理を適用した場合の処理例を図５に示す。図５（ａ）は処理対象となる減色画像の例であり、ひとつのマスが１画素を表す幅６×高さ３画素の画像で、各マス内の数値は画素値を示す。図５（ｂ）は図５（ａ）に対するラベリング処理の結果の例である。 (Example of labeling process)
An example of processing when the labeling processing of FIG. 4 is applied is shown in FIG. FIG. 5A shows an example of a subtractive color image to be processed. An image having a width of 6 × 3 pixels in which one square represents one pixel, and a numerical value in each square represents a pixel value. FIG. 5B is an example of the result of the labeling process with respect to FIG.

ラベリング処理では、まず最上のライン（ｙ＝０のライン）に注目して、左端から始まる画素値３のラン５０１が抽出される。これより上にはラインが存在しないので連結ランは無く、ラン５０１には最初のラベルＬ１が与えられる。続く画素値１のラン５０２にも同様に新規ラベルＬ２が与えられる。 In the labeling process, first, paying attention to the uppermost line (line of y = 0), a run 501 having a pixel value of 3 starting from the left end is extracted. Since there is no line above this, there is no connected run and the run 501 is given the first label L1. A new label L2 is similarly given to the subsequent run 502 having a pixel value of 1.

最上ラインにもう画素は無いので次のライン（ｙ＝１のライン）に移動し、画素値１のラン５１１が抽出される。上ライン（ｙ＝０のライン）にはラン５１１に連結する画素値１のランは無いので、ラン５１１には、新規ラベルＬ３が与えられる。次の画素値３のラン５１２は、上ライン（ｙ＝０のライン）に同値のラン５０１が存在する。連結ランはこのラン５０１だけなので、ラン５０１のラベルＬ１がラン５１２にも与えられる。続いて、画素値１のラン５１３が抽出され、同様に上ライン（ｙ＝０のライン）の連結ランであるラン５０２のラベルＬ２が与えられる。続いて画素値３のラン５１４が抽出され、連結ランが無いので、新規ラベルＬ４が与えられる。３ライン目（ｙ＝２のライン）に移動し、画素値１のラン５２１が抽出され、上の連結ランであるラン５１１のラベルＬ３が与えられる。続く画素値３のラン５２２は上ライン（ｙ＝１のライン）の連結ランとして、ラン５１２とラン５１４が存在する。かつ、ラン５１２とラン５１４のラベル値はそれぞれＬ１およびＬ４と異なるので、ラン５２２には連結ランのうち最初に検出されたラン５１２のラベルＬ１が与えられる。加えて、ラベルＬ１とラベルＬ４を同一ラベルとみなすための関連付け情報が生成される。最後に、画素値２のラン５２３が抽出され、新規ラベルＬ５が与えられる。 Since there are no more pixels in the uppermost line, the process moves to the next line (y = 1 line), and a run 511 having a pixel value of 1 is extracted. Since there is no run of pixel value 1 connected to the run 511 in the upper line (line of y = 0), the new label L3 is given to the run 511. The next run 512 with a pixel value of 3 has a run 501 with the same value on the upper line (line with y = 0). Since this run 501 is the only connected run, the label L1 of the run 501 is also given to the run 512. Subsequently, a run 513 with a pixel value of 1 is extracted, and a label L2 of a run 502 that is a connected run of the upper line (y = 0 line) is also given. Subsequently, a run 514 with a pixel value of 3 is extracted, and since there is no connected run, a new label L4 is given. Moving to the third line (y = 2 line), a run 521 with a pixel value of 1 is extracted, and a label L3 of a run 511, which is the upper connected run, is given. A subsequent run 522 with a pixel value of 3 includes a run 512 and a run 514 as a connected run of the upper line (line of y = 1). In addition, since the label values of the run 512 and the run 514 are different from those of L1 and L4, the run 522 is given the label L1 of the run 512 detected first among the connected runs. In addition, association information for regarding the labels L1 and L4 as the same label is generated. Finally, a run 523 with a pixel value of 2 is extracted and given a new label L5.

図５（ｃ）はラベル付けされたラン集合から生成される画素塊情報の例である。ＩＤ１の画素塊は、前述の関連付けに基づき、ラベルＬ１のラン５０１、５１２、５２２およびラベルＬ４のラン５１４の集まりとして構成されている。またその矩形範囲として（０，０）−（５，２）、および画素値３が記憶されている。ＩＤ２の画素塊は、ラベルＬ２のラン５０２およびラン５１３により構成される、矩形範囲（２，０）−（５，１）、および画素値１の画素塊である。以降、ＩＤ３、ＩＤ４も同様に示される。なお、一度画素塊情報が構成されれば、ランの集合を構成する際に使用されたラベルの情報Ｌｋは破棄してもよい。また、各ラン情報が持つ座標を、図５（ｃ）に記されるように各画素塊の外接矩形の左上を原点とするように再設定してもよい。 FIG. 5C shows an example of pixel block information generated from the labeled run set. The pixel block of ID1 is configured as a collection of runs 501, 512, and 522 of label L1 and runs 514 of label L4 based on the association described above. Further, (0,0)-(5,2) and pixel value 3 are stored as the rectangular range. The pixel block of ID2 is a pixel block having a rectangular range (2, 0)-(5, 1) and a pixel value of 1 constituted by the run 502 and the run 513 of the label L2. Hereinafter, ID3 and ID4 are also shown in the same manner. Note that once the pixel block information is configured, the label information Lk used in configuring the run set may be discarded. Further, the coordinates of each run information may be reset so that the upper left corner of the circumscribed rectangle of each pixel block is the origin as shown in FIG.

（画素塊解析部による処理結果の例）
図６に画素塊解析部２０１による処理結果の例を示す。図６（ａ）は画素塊解析部２０１に入力される減色画像の例である。図６（ｂ）は、図６（ａ）に示す減色画像から抽出された画素塊情報の例であり、かつＳ３０４で生成された画素塊の接触関係を矢印で示した図である。図６（ｃ）は、図６（ｂ）に示す画素塊情報の例において、Ｓ３０５で生成された包含関係を矢印で示した図である。この矢印は矢の先が子、元が親を示している。図６（ｄ）は、図６（ｂ）および図６（ｃ）から構成された画素塊ツリー情報の例である。図６（ｄ）の画素塊６０１は包含関係による親画素塊が存在しないため、接触している画素塊６０２が親とする画素塊を親とするようにツリー構造が構築されている。 (Example of processing results by the pixel block analysis unit)
FIG. 6 shows an example of the processing result by the pixel block analysis unit 201. FIG. 6A shows an example of a subtractive color image input to the pixel block analysis unit 201. FIG. 6B is an example of pixel block information extracted from the color-reduced image shown in FIG. 6A, and is a diagram showing the contact relationship of the pixel blocks generated in S304 with arrows. FIG. 6C is a diagram showing the inclusion relationship generated in S305 with arrows in the example of the pixel block information shown in FIG. In this arrow, the tip of the arrow indicates a child, and the original indicates a parent. FIG. 6D is an example of pixel block tree information configured from FIGS. 6B and 6C. Since the pixel block 601 in FIG. 6D has no parent pixel block due to the inclusion relationship, the tree structure is constructed so that the pixel block 602 that is in contact with the pixel block 602 is the parent.

なお、図３のフローチャートでは、画素塊解析処理により入力画像データ全体を一度に処理するように説明した。これに対し、入力画像データを複数部分に分け、各部分画像の入力と画素塊情報の抽出を繰り返しながら行っても構わない。例えば、特許文献３では、３２画素四方のタイルをひとつの処理単位とし、画像の左上から順に画像入力、量子化、タイル内画素塊であるｂｌｏｂの作成、を繰り返すよう処理する例が説明されている。この特許文献３による説明では更に、処理済タイルである上タイルおよび左タイルにあるｂｌｏｂを現タイルのｂｌｏｂと結合することで、結果的に入力画像２００と同じサイズも含む任意の大きさの画素塊生成している。この処理方法を適用することにより、本実施形態に係る画素塊データの生成処理に消費するメモリや処理時間を大幅に節約することも可能である。 In the flowchart of FIG. 3, it has been described that the entire input image data is processed at once by the pixel block analysis processing. On the other hand, the input image data may be divided into a plurality of parts, and input of each partial image and extraction of pixel block information may be repeated. For example, Patent Document 3 describes an example in which 32 pixel square tiles are used as one processing unit, and image input, quantization, and creation of a blob that is a pixel block within the tile are repeated in order from the upper left of the image. Yes. Further, in the description according to Patent Document 3, by combining the blobs in the upper tile and the left tile as processed tiles with the blob of the current tile, a pixel having an arbitrary size including the same size as the input image 200 as a result. A lump has been generated. By applying this processing method, it is possible to greatly save the memory and processing time consumed for the pixel block data generation processing according to the present embodiment.

（レイアウト解析部による処理）
続いて、レイアウト解析部２０２の処理を、図７のフローチャートを用いて説明する。本処理では、メモリ１０３上の画素塊データ２０６を入力とし、文字、線画、自然画、表といった文書領域間の構造に基づいた、領域データ２０７をメモリ１０３上に構成していく。 (Processing by the layout analysis unit)
Next, the processing of the layout analysis unit 202 will be described using the flowchart of FIG. In this process, the pixel block data 206 on the memory 103 is input, and area data 207 based on the structure between document areas such as characters, line drawings, natural images, and tables is configured on the memory 103.

Ｓ７０１では、レイアウト解析部２０２は、入力である画素塊データ２０６中の各画素塊を、文字候補画素塊とそれ以外の画素塊に分類する。画素塊が文字候補であるか否かの分類は、公知の文書画像解析技術で利用されている文字画素塊判定方法を用いればよい。例えば、画素塊の外接矩形サイズを利用し、予め定められた高さおよび幅の範囲に収まるものを文字候補とする方法がある。 In S701, the layout analysis unit 202 classifies each pixel block in the input pixel block data 206 into a character candidate pixel block and other pixel blocks. The classification of whether or not the pixel block is a character candidate may be performed using a character pixel block determination method used in a known document image analysis technique. For example, there is a method of using a circumscribed rectangular size of a pixel block and using a character candidate that falls within a predetermined height and width range.

本例においては、文字とみなす大きさを６ポイントから５０ポイントまでとし、入力画像２００の解像度から換算したＴｍｉｎ〜Ｔｍａｘの画素数に、画素塊の幅あるいは高さが収まるものを文字候補とする。ここでサイズに下限を設けることは、文字内部から抽出される背景相当の小画素塊を文字候補に含めないようにする効果がある。文字候補の判定条件に、更に画素の密度や比率、画素色などを加えてもよい。また大きさの閾値は、入力画像２００から実際抽出された画素塊の集合より幅・高さの頻度情報を得て、動的に定めるようにしてもよい。 In this example, the size considered as a character is from 6 points to 50 points, and a character candidate that has a pixel block width or height that falls within the number of pixels Tmin to Tmax converted from the resolution of the input image 200 is set as a character candidate. . Here, setting a lower limit on the size has an effect of preventing a character candidate from including a small pixel block corresponding to the background extracted from the inside of the character. Pixel density, ratio, pixel color, and the like may be further added to the character candidate determination conditions. The size threshold may be determined dynamically by obtaining frequency information of width and height from a set of pixel blocks actually extracted from the input image 200.

Ｓ７０２では、レイアウト解析部２０２は、Ｓ７０１で分類された文字候補の画素塊に対し、互いに近傍にあるものに対するグループ化を行う。近傍であるか否かの判定は、画素塊の外接矩形座標間のユークリッド距離を計算し、予め定められた閾値以下であることで判定することができる。これはあくまで一例であり、距離計算にシティブロック距離など別の計算方法を用いてもよい。また、複数の文字は行をなして記述され、一般に行内の文字間隔は行間隔より狭いことから、小さい距離閾値で文字行をなす文字候補画素塊をグループ化し、さらに大きな距離閾値で複数の文字行をグループ化するようにしてもよい。なお、本グループ化処理では、画素塊ツリー構造において同じ親を持つ文字候補画素塊同士のみがグループ化の対象になるものとする。これは、近傍計算の対象となる画素塊の組み合わせを減らし、処理を高速化するためである。 In step S 702, the layout analysis unit 202 groups the pixel candidates for character candidates classified in step S 701 with respect to those adjacent to each other. The determination as to whether or not the pixel is in the vicinity can be made by calculating the Euclidean distance between the circumscribed rectangular coordinates of the pixel block and not more than a predetermined threshold value. This is merely an example, and another calculation method such as a city block distance may be used for the distance calculation. In addition, since a plurality of characters are described in a line, and the character spacing within the line is generally narrower than the line spacing, character candidate pixel blocks forming a character line are grouped with a small distance threshold, and a plurality of characters are Lines may be grouped. In this grouping process, only character candidate pixel blocks having the same parent in the pixel block tree structure are to be grouped. This is to reduce the number of pixel block combinations to be subjected to the neighborhood calculation and speed up the processing.

Ｓ７０３では、レイアウト解析部２０２は、Ｓ７０２でグループ化された文字候補画素塊の集合が、実際に文字集合であるか否かの判定をグループ毎に行う。そして、レイアウト解析部２０２は、文字集合であるとされたグループの画素塊の存在範囲をそれぞれ文字領域として特定する。そして特定された領域それぞれに対し、領域の座標と対応画素塊への関連付け情報を含む領域情報が、領域データ２０７の構成要素として記憶される。すなわち、文字領域に対しては、グループ化された文字候補画素集合への関連付け情報と、同画素塊を囲む外接矩形座標情報とが文字領域情報として記憶される。 In step S703, the layout analysis unit 202 determines, for each group, whether the set of character candidate pixel blocks grouped in step S702 is actually a character set. Then, the layout analysis unit 202 specifies the existence ranges of the pixel blocks of the group that is regarded as a character set as character regions. Then, for each identified area, area information including the area coordinates and association information to the corresponding pixel block is stored as a component of the area data 207. That is, for the character region, association information to the grouped character candidate pixel set and circumscribed rectangular coordinate information surrounding the same pixel block are stored as character region information.

文字集合であるか否かの判定は、例えば、グループを含む矩形範囲で文字候補画素塊の射影を縦横に求め、文字列としての整列性を示すか否かで判断する方法がある。具体的には、横書きならば水平の射影、縦書きならば垂直の射影において、行部分には山、行間には谷となる頻度分布が観測されるものは文字領域の可能性が高いと判断できる。さらに、文字同士は斜体などの例外を除き、互いの外接矩形が大きく重複することは少ない。したがって、他の画素塊と大きな重複がないことも、文字領域か否かの判定の有効な手段となる。ただし、漢字などで１つの文字が複数の重複する画素塊に分割されているケースを除くために、重複判定を一定以上の大きさの画素塊間のみに制限することが効果的である。 For example, there is a method for determining whether or not the character set is a character set by determining whether a character candidate pixel block is projected vertically and horizontally in a rectangular range including a group, and indicating whether or not the character string is aligned. Specifically, in horizontal projection for horizontal writing and vertical projection for vertical writing, if a frequency distribution with peaks in the line part and valleys in the line is observed, it is determined that there is a high possibility of a character area. it can. In addition, with the exception of italics and other characters, the circumscribed rectangles do not overlap significantly. Therefore, the fact that there is no large overlap with other pixel blocks is also an effective means for determining whether or not it is a character area. However, in order to exclude the case where one character is divided into a plurality of overlapping pixel blocks such as kanji, it is effective to limit the overlap determination to only between pixel blocks having a certain size or more.

なお、文字領域として特定されたあとに、同領域内にある別の画素塊を文字候補画素塊の集合に追加してもよい。例えば、句読点や文字内の独立点部分の画素塊は、サイズ制限により文字候補画素塊に選出されていない可能性が高い。これら文字候補画素塊に含めるために、既に文字候補である画素塊と同色で近傍にある小サイズの画素塊を追加する処理を行ってもよい。 After the character area is specified, another pixel block in the same area may be added to the set of character candidate pixel blocks. For example, there is a high possibility that a pixel block of an independent point portion within a punctuation mark or character is not selected as a character candidate pixel block due to size restrictions. In order to include in these character candidate pixel blocks, a process of adding a small-sized pixel block having the same color as that of the pixel block that is already a character candidate in the vicinity may be performed.

Ｓ７０４では、レイアウト解析部２０２は、Ｓ７０１で文字候補以外に分類された画素塊から、線画・表枠候補の画素塊を選出する。線画・表枠候補か否かの判定は、文字候補以上の大きさであり、かつ画素塊の存在範囲全体に対する画素密度が低いことにより判断できる。 In step S 704, the layout analysis unit 202 selects a pixel block as a line drawing / table frame candidate from the pixel blocks classified in addition to the character candidates in step S 701. The determination as to whether or not the candidate is a line drawing / table frame can be made based on a size larger than that of the character candidate and a low pixel density with respect to the entire pixel cluster existence range.

Ｓ７０５では、レイアウト解析部２０２は、Ｓ７０４で線画・表枠候補とされた画素塊に対し、表枠であるか否かの判定を行い、表枠と判定された画素塊の存在領域を、表領域として特定する。そして、レイアウト解析部２０２は、対応する表枠の画素塊への関連付け情報と、同画素塊の外接矩形座標を含む領域情報とを、表領域情報として領域データ２０７に記憶する。 In step S705, the layout analysis unit 202 determines whether or not the pixel block determined as the line drawing / table frame candidate in step S704 is a table frame, and displays the existence area of the pixel block determined as the table frame. Specify as an area. The layout analysis unit 202 stores the association information of the corresponding table frame with the pixel block and the region information including the circumscribed rectangular coordinates of the pixel block in the region data 207 as the table region information.

表領域情報表枠か否かの判定は、例えば、画素塊の存在範囲において、画素塊のラン情報から縦横の画素ヒストグラムを計算し、その形状から判定を行えばよい。つまり画素塊が表枠に相当するものであれば、縦横の表外枠および罫線の存在する部分に、複数の鋭いピークがヒストグラム上に発生することを利用する。これらを検出することで、表枠かどうかの判定を行うことができる。あるいは、表枠画素塊の子に相当する画素塊の集合により判断することもできる。表枠の子となる画素塊は、表内の枠領域に相当するため、子領域すべてが矩形形状であり、かつ重なりなく整列していることは、表枠であることの有効な判断手段となる。 Whether or not the table area information table frame is present may be determined, for example, by calculating a vertical and horizontal pixel histogram from the run information of the pixel block in the presence range of the pixel block and determining the shape. That is, if a pixel block corresponds to a front frame, the fact that a plurality of sharp peaks are generated on the histogram in a portion where vertical and horizontal outer frames and ruled lines exist is utilized. By detecting these, it is possible to determine whether the frame is a table frame. Alternatively, the determination can be made based on a set of pixel blocks corresponding to the children of the table frame pixel block. Since the pixel block that is a child of the table frame corresponds to a frame region in the table, all the child regions are rectangular and aligned without overlapping, which is an effective determination means for being a table frame. Become.

Ｓ７０６では、レイアウト解析部２０２は、Ｓ７０４で線画・表枠候補とされ、かつＳ７０５で表枠と判定されなかった画素塊の存在領域を、線画領域として特定する。そして、レイアウト解析部２０２は、対応する線画の画素塊への関連付け情報と、同画素塊の外接矩形座標を含む領域情報とを、線画領域情報として領域データ２０７に記憶する。このとき、線画と判定された画素塊の近傍にある画素塊をグループ化した範囲を線画領域としてもよい。 In step S 706, the layout analysis unit 202 identifies a pixel block existence area that has been determined to be a line drawing / table frame candidate in step S 704 and has not been determined to be a table frame in step S 705 as a line drawing area. Then, the layout analysis unit 202 stores the association information of the corresponding line drawing with the pixel block and the region information including the circumscribed rectangular coordinates of the pixel block in the region data 207 as the line drawing region information. At this time, a range obtained by grouping pixel blocks in the vicinity of the pixel block determined to be a line drawing may be used as the line drawing region.

Ｓ７０７では、レイアウト解析部２０２は、ここまでに記憶されたどの領域にも対応しない画素塊の中から、写真等の自然画領域と判定される画素塊、もしくは画素塊の集合を選出し、その存在領域を自然画領域情報として記憶する。そして、レイアウト解析部２０２は、対応する画素塊集合への関連付け情報と、同画素塊の存在範囲の座標を含む領域情報とを、自然画領域情報として領域データ２０７に記憶する。 In step S707, the layout analysis unit 202 selects a pixel block or a set of pixel blocks determined as a natural image region such as a photo from pixel blocks that do not correspond to any of the regions stored so far. The existing area is stored as natural image area information. Then, the layout analysis unit 202 stores the association information to the corresponding pixel block set and the region information including the coordinates of the existence range of the pixel block as the natural image region information in the region data 207.

自然画領域か否かの判定は、複数色の画素塊が重複、あるいは包含されるように存在しており、かつそれら画素塊の集合が一定の大きさ内の矩形を構成しているものを、矩形状の写真に相当する自然画領域と判定することにする。この判定はあくまで一例であって、任意形状の画素塊集合を対象にしてもよい。 Whether or not it is a natural image area is determined so that pixel clusters of multiple colors exist so that they overlap or are included, and the set of pixel clusters forms a rectangle within a certain size. The natural image area corresponding to the rectangular photograph is determined. This determination is merely an example, and an arbitrarily shaped pixel block set may be targeted.

Ｓ７０８では、レイアウト解析部２０２は、ここまでに記憶されたどの領域にも対応しない画素塊の中から、一定以上の密度および面積を持つ画素塊を、フラット領域として記憶する。無地のページ全体を占める領域、文字や図の背景などで意味的なまとまりを持たせるために着色された色背景領域や、表内セルの背景などがこのフラット領域に相当する。 In step S 708, the layout analysis unit 202 stores a pixel block having a density and area of a certain level or more as a flat region from pixel blocks that do not correspond to any region stored so far. An area that occupies the entire plain page, a colored background area that is colored in order to provide a meaningful grouping of characters and the background of a figure, and the background of a table cell correspond to this flat area.

Ｓ７０９では、レイアウト解析部２０２は、Ｓ７０８までに領域データ２０７に記憶されている各々領域をノードとし、その相対関係を表現する領域ツリーを構成する。領域ツリーの起点には、入力画像全体範囲に相当する特殊なルートノードを配置する。そして、領域ツリーのノード間の親子関係は、画素塊ツリーにおいて各領域に対応する画素塊ノードが持つ親子関係と一致するようにする。ツリー構造を構成するための具体的処理としては、各領域データに親領域へのリンク情報、および子領域へのリンク情報のリストを付与することでツリー構造を構成する。 In step S709, the layout analysis unit 202 configures an area tree that expresses the relative relationship with each area stored in the area data 207 by S708 as a node. A special root node corresponding to the entire input image range is arranged at the starting point of the region tree. The parent-child relationship between the nodes of the region tree is made to match the parent-child relationship of the pixel block node corresponding to each region in the pixel block tree. As a specific process for constructing the tree structure, the tree structure is constructed by giving each region data a list of link information to the parent region and link information to the child region.

（レイアウト解析部による処理結果の例）
図７のフローチャートにより説明した、レイアウト解析部２０２における処理の結果の例を、図９を用いて説明する。図９（ａ）は、画素塊解析部２０１により減色され、画素塊に分解される文書画像の例である。図９（ｂ）は、図９（ａ）に示す文書画像において抽出された画素塊を、画素塊ツリー構造で表現した例図である。図９（ａ）、（ｂ）において、画素塊９０１、９０２、９０３はそれぞれ元画像において文字に相当する画素塊の集合である。画素塊９０４〜９０７は、表を構成する画素塊である。画素塊９０８は星型の線画に相当する画素塊である。画素塊９０９、９１０は写真に相当する画素塊である。なお図９では、文字内部の小さな画素塊に関しては記載を省略している。 (Example of processing result by layout analysis unit)
An example of the result of processing in the layout analysis unit 202 described with reference to the flowchart of FIG. 7 will be described with reference to FIG. FIG. 9A is an example of a document image that is reduced in color by the pixel block analysis unit 201 and decomposed into pixel blocks. FIG. 9B is an example diagram in which pixel blocks extracted from the document image shown in FIG. 9A are expressed by a pixel block tree structure. 9A and 9B, pixel blocks 901, 902, and 903 are sets of pixel blocks corresponding to characters in the original image. Pixel blocks 904 to 907 are pixel blocks constituting a table. A pixel block 908 is a pixel block corresponding to a star-shaped line drawing. Pixel blocks 909 and 910 are pixel blocks corresponding to a photograph. In FIG. 9, the description of the small pixel block inside the character is omitted.

図９（ｂ）において、矢印は包含関係に基づく親子関係を示している。例えば、表を構成する画素塊９０４〜９０７において、画素塊９０５は表の左側枠内の単色背景領域である。画素塊９０５は内部に２つの文字の領域である画素塊９０２を包含するので、両者は親子関係となる。一方、表の右側枠内の背景領域は、上下に異なる色の画素塊９０６、９０７に分かれている。その結果、文字の画素塊９０３はどちらの画素塊にも包含されないので、それらの親となる表枠の画素塊９０４の直接の子となる。こういったケースはオリジナルの入力画像が元々そのように色分けされている場合の他、ノイズや減色処理を要因に、単色領域が意図せず過分割されて生ずることもある。いずれにせよカラーの画素塊構造抽出処理において普遍的に発生するケースであることを考慮すべきである。 In FIG. 9B, arrows indicate parent-child relationships based on inclusion relationships. For example, in the pixel blocks 904 to 907 constituting the table, the pixel block 905 is a monochrome background area in the left frame of the table. Since the pixel block 905 includes a pixel block 902 that is an area of two characters inside, the two have a parent-child relationship. On the other hand, the background area in the right frame of the table is divided into pixel clusters 906 and 907 of different colors in the vertical direction. As a result, since the character pixel block 903 is not included in either pixel block, it becomes a direct child of the table block pixel block 904 serving as the parent. Such a case may occur when the original input image is originally color-coded as described above, or the monochromatic region is unintentionally excessively divided due to noise or color reduction processing. In any case, it should be considered that this is a case that occurs universally in color pixel block structure extraction processing.

図９（ｃ）は、図９（ｂ）の画素塊ツリーから生成される領域ツリーの例であり、その生成過程を図７のフローチャートのＳ７０１〜Ｓ７０９に従って以下説明する。まずＳ７０１〜Ｓ７０３にて、レイアウト解析部２０２は、画素塊９０１、９０２、９０３の３つの画素塊グループが文字候補画素塊として選出する。そして、レイアウト解析部２０２は、文字領域の判定条件を満たす、各存在範囲が文字領域として子領域ノード９２１、９２２、９２３を記憶する。 FIG. 9C is an example of a region tree generated from the pixel block tree of FIG. 9B, and the generation process will be described below according to S701 to S709 of the flowchart of FIG. First, in S701 to S703, the layout analysis unit 202 selects three pixel block groups of pixel blocks 901, 902, and 903 as character candidate pixel blocks. The layout analysis unit 202 stores child area nodes 921, 922, and 923 in which each existence range that satisfies the character area determination condition is a character area.

Ｓ７０４で、レイアウト解析部２０２は、画素塊９０４、９０８を線画・表枠候補として選出する。Ｓ７０５では、レイアウト解析部２０２は、画素塊９０４を表枠と判定し、表領域として子領域ノード９２４を記憶する。Ｓ７０６では、レイアウト解析部２０２は、画素塊９０８を線画と判定し、線画領域として子領域ノード９２８を記憶する。 In step S704, the layout analysis unit 202 selects the pixel blocks 904 and 908 as line drawing / table frame candidates. In step S705, the layout analysis unit 202 determines that the pixel block 904 is a table frame, and stores a child region node 924 as a table region. In step S706, the layout analysis unit 202 determines that the pixel block 908 is a line drawing, and stores a child area node 928 as a line drawing area.

Ｓ７０７で、レイアウト解析部２０２は、画素塊９０９、９１０が自然画領域を構成すると判定し、自然画領域として子領域ノード９２９を記憶する。Ｓ７０８では、レイアウト解析部２０２は、残る画素塊９００、９０５、９０６、９０７をいずれもフラット領域として記憶する。そしてＳ７０９にて、レイアウト解析部２０２は、各領域をノードとし、各々対応する画素塊ツリーの親子構造を反映した領域ツリー構造を生成する。図９（ｃ）において、線で結ばれる領域は親子関係を持つ。 In step S707, the layout analysis unit 202 determines that the pixel blocks 909 and 910 constitute a natural image area, and stores a child area node 929 as the natural image area. In S708, the layout analysis unit 202 stores the remaining pixel blocks 900, 905, 906, and 907 as flat regions. In step S709, the layout analysis unit 202 uses each region as a node and generates a region tree structure that reflects the parent-child structure of the corresponding pixel block tree. In FIG. 9C, areas connected by lines have a parent-child relationship.

（グラフィックスデータ生成部による処理）
続いて、グラフィックスデータ生成部２０３の処理を説明する。グラフィックスデータ生成部２０３は、領域データ２０７に含まれる各領域をグラフィックスオブジェクトとして表現する為のグラフィックスデータ２０８を生成する。ここで生成されるデータは、後述の電子文書記述生成部２０５において、各領域の内容をオブジェクトとして記述する際に利用される。以下、グラフィックスデータ生成部２０３の処理を図８のフローチャートを用いて説明する。 (Processing by the graphics data generator)
Next, the processing of the graphics data generation unit 203 will be described. The graphics data generation unit 203 generates graphics data 208 for expressing each area included in the area data 207 as a graphics object. The data generated here is used when the contents of each area are described as an object in the electronic document description generation unit 205 described later. Hereinafter, the processing of the graphics data generation unit 203 will be described with reference to the flowchart of FIG.

Ｓ８０１では、グラフィックスデータ生成部２０３は、出力電子文書２１０において、線図形部分のオブジェクトをグラフィックスで表現するためのベクトルデータを生成する。本例におけるベクトルデータの生成対象領域は、入力となる領域データ２０７中に存在する線画領域、および表領域とする。生成されたベクトルデータは、領域データ２０７中の対応する領域ノードに関連付けられたうえで、メモリ１０３あるいはハードディスク１０４に保存される。ベクトルデータ生成には、対象領域に関連付けられた画素塊の輪郭情報から、公知のベクトル化手法、すなわち直線、曲線パス近似手法を用いるものとする。各画素塊から生成されるパスの塗り色には、画素塊に関連付けられた色情報を設定するものとする。 In step S 801, the graphics data generation unit 203 generates vector data for expressing the object of the line figure part in the output electronic document 210 with graphics. The vector data generation target area in this example is a line drawing area and a table area existing in the input area data 207. The generated vector data is stored in the memory 103 or the hard disk 104 after being associated with the corresponding area node in the area data 207. For vector data generation, a known vectorization method, that is, a straight line or curved path approximation method is used from the outline information of the pixel block associated with the target region. It is assumed that color information associated with a pixel block is set as a paint color of a path generated from each pixel block.

Ｓ８０２では、グラフィックスデータ生成部２０３は、出力電子文書２１０において、ベクトル化対象外の領域を表現するための、切り出し画像データを生成する。本例における切り出し画像データ生成領域は、領域データ２０７中に存在する自然画像の領域（自然画領域）とする。切り出された画像データは、領域データ２０７中の対応する領域ノードに関連付けられたうえで、メモリ１０３あるいはハードディスク１０４に保存される。ここで切り出し処理とは、入力画像２００を参照し、対象範囲の画素のみからなる同サイズの画像データを生成する処理である。切り出された画像データはＪＰＥＧ等の公知の圧縮技術で圧縮してもよい。 In step S 802, the graphics data generation unit 203 generates cutout image data for expressing an area that is not vectorized in the output electronic document 210. The cutout image data generation area in this example is a natural image area (natural image area) existing in the area data 207. The extracted image data is stored in the memory 103 or the hard disk 104 after being associated with the corresponding area node in the area data 207. Here, the clipping process is a process of referring to the input image 200 and generating image data of the same size including only pixels in the target range. The clipped image data may be compressed by a known compression technique such as JPEG.

Ｓ８０３では、グラフィックスデータ生成部２０３は、出力電子文書２１０において、背景に用いられる背景画像データを生成する。生成された背景画像データは領域データ２０７のルートノードに関連づけられて、メモリ１０３もしくはハードディスク１０４に保存される。 In step S 803, the graphics data generation unit 203 generates background image data used for the background in the output electronic document 210. The generated background image data is associated with the root node of the area data 207 and stored in the memory 103 or the hard disk 104.

背景画像とは、Ｓ８０１、Ｓ８０２で生成されるベクトルデータや切り出し画像データを前景データとして、当該背景画像に重ねて描画することで、出力電子文書２１０が入力画像２００と同等の見た目を有するように用意されるものである。背景画像データに対しては、前景データが存在する領域の画素情報を入力画像２００から消去する処理を行う。 A background image is such that the output electronic document 210 has the same appearance as the input image 200 by rendering the vector data and cutout image data generated in S801 and S802 as foreground data and overlaying the background image. It will be prepared. For the background image data, a process of erasing the pixel information of the area where the foreground data exists from the input image 200 is performed.

画素情報の消去には、合成した出力電子文書２１０において、データが二重に見えるのを防ぐ効果がある。あるいは重畳により隠れてしまう領域に存在する無駄な画素情報を無くすことで圧縮効率を上げ、出力電子文書をコンパクトにする効果がある。画素情報の消去は、例えば、対象領域の矩形範囲をその周囲色で一様に塗り潰す方法がある。なお、対象領域が線図形領域の場合、線部分に相当する画素のみを、その近傍の画素色で塗り潰すようにすれば、線部分以外にあたる部分の色情報を背景情報に残すこともできる。 Erasing pixel information has the effect of preventing the data from appearing double in the synthesized output electronic document 210. Alternatively, there is an effect that compression efficiency is improved by eliminating useless pixel information existing in an area hidden by superposition, and the output electronic document is made compact. For example, the pixel information can be erased by uniformly painting a rectangular area of the target area with surrounding colors. When the target area is a line figure area, if only the pixels corresponding to the line portion are filled with the neighboring pixel color, the color information of the portion other than the line portion can be left in the background information.

なお、Ｓ８０１およびＳ８０２において、どの種類の領域に対しベクトル化処理または画像切り出し処理を行うかは上述した例に限るものではない。例えば、線画、表領域に対し画像切り処理を行うようにしてもよい。また、ベクトル化対象領域に文字領域を加えてもよい。これらの対象選択は、電子文書生成処理の制御項目として、処理対象領域種類を外部指示により設定できるようにしてもよい。また、生成される電子文書の形式が複数あり、それぞれ別の用途がある場合、各用途に適したデータ形式を領域種別毎に変えられるようにしてもよい。 In S801 and S802, for which type of region the vectorization process or the image cutout process is performed is not limited to the above-described example. For example, image cutting processing may be performed on a line drawing and a table area. In addition, a character area may be added to the vectorization target area. In these target selections, the processing target area type may be set by an external instruction as a control item of the electronic document generation process. In addition, when there are a plurality of formats of electronic documents to be generated and each has a different use, a data format suitable for each use may be changed for each region type.

また、Ｓ８０３の背景画像データ生成時に、どの種類の領域に対して画素情報の消去処理を行うかを、電子文書生成処理の制御項目として設定するようにしてもよいし、生成電子文書の形式に合わせて変えられるようにしてもよい。また、文字領域がベクトル化対象ではない場合にも、後述の文字認識部２０４の処理において、文字データ２０９が出力される場合には、文字画素が除去されるように背景データが生成されるようにしてもよい。 In addition, it may be set as a control item of the electronic document generation process for which type of region the pixel information erasure process is performed at the time of generating the background image data in S803. You may make it change together. Even when a character area is not a vectorization target, when character data 209 is output in the processing of the character recognition unit 204 described later, background data is generated so that character pixels are removed. It may be.

図９に、図９（ａ）の入力画像の例に対し生成される背景データの例を示す。図９（ａ）中のすなわち文字領域である画素塊９０１〜９０３、線画領域である画素塊９０８、および表枠領域である画素塊９０４の線図形部分画素が周辺の画素色で塗りつぶされている。また、自然画領域である画素塊９０９の矩形範囲が周辺の画素色で塗りつぶされている。 FIG. 9 shows an example of background data generated for the example of the input image shown in FIG. In FIG. 9A, the pixel block 901 to 903 which is a character area, the pixel block 908 which is a line drawing area, and the line figure partial pixel of the pixel block 904 which is a table frame area are filled with surrounding pixel colors. . In addition, a rectangular range of the pixel block 909 that is a natural image region is filled with surrounding pixel colors.

（文字認識部による処理）
文字認識部２０４の処理を、図１０のフローチャートを用いて説明する。Ｓ１００１では、文字認識部２０４は、文字認識処理に入力する文字画像を生成する。本説明では、文字認識処理において、文字を含む二値画像を入力とすることを前提とし、各文字領域の２値画像を生成する。文字領域の二値画像とは、領域内の文字画素を１、それ以外を０とする、入力画像と同じ画素数の二値画像である。実際の処理では、レイアウト解析部２０２が生成した領域データ２０７中の各文字領域に対し、同領域内に存在する画素塊情報を画素塊解析部２０１が生成した画素塊データ２０６から読み出す。そして、各画素塊が持つラン部分が１、それ以外が０になるように、入力画像２００と等サイズの画像を生成する。 (Processing by the character recognition unit)
The processing of the character recognition unit 204 will be described with reference to the flowchart of FIG. In step S1001, the character recognition unit 204 generates a character image to be input to the character recognition process. In this description, in the character recognition process, it is assumed that a binary image including characters is input, and a binary image of each character region is generated. A binary image of a character area is a binary image having the same number of pixels as the input image, where 1 is the character pixel in the area and 0 is the other. In actual processing, for each character area in the area data 207 generated by the layout analysis unit 202, the pixel block information existing in the area is read from the pixel block data 206 generated by the pixel block analysis unit 201. Then, an image of the same size as the input image 200 is generated so that the run portion of each pixel block is 1 and the others are 0.

Ｓ１００２では、文字認識部２０４は、入力された文字入りの文書画像が正置されていない、すなわち入力画像２００内に書かれた文字の上方向が９０°、１８０°、２７０°である可能性を想定する。そして、文字認識部２０４は、それらを正しい向きに補正するために必要な回転角を判別する方向判別処理を行う。 In S 1002, the character recognizing unit 204 may not input the document image including the character, that is, the upward direction of the character written in the input image 200 may be 90 °, 180 °, or 270 °. Is assumed. Then, the character recognition unit 204 performs a direction determination process for determining a rotation angle necessary to correct them in the correct direction.

ここでの方向判別処理は、Ｓ１００１で生成した二値画像を利用して公知の手法で行う。方向判別処理の手法は本発明の本質とは異なるため詳細は省略する。なお、方向判別処理の一例として、画像中のいくつかの文字を０°のほか、９０°、１８０°、２７０°に回転した状態で計４方向に認識し、その際の認識スコアが最も高い方向を正しい方向と判断する方法が挙げられる。 The direction discrimination processing here is performed by a known method using the binary image generated in S1001. Since the method of direction discrimination processing is different from the essence of the present invention, the details are omitted. As an example of the direction discrimination processing, some characters in the image are recognized in four directions in a state rotated to 90 °, 180 °, and 270 ° in addition to 0 °, and the recognition score at that time is the highest. One way is to determine the direction as the correct direction.

Ｓ１００３では、文字認識部２０４は、Ｓ１００２で得られた、必要な回転角が０°かどうかを調べる。必要な回転角が０°、すなわち回転の必要が無い場合には（Ｓ１００３にてＹＥＳ）、Ｓ１００６に進む。９０°、１８０°、２７０°いずれかの回転が必要な場合には（Ｓ１００３にてＮＯ）、Ｓ１００４に進む。 In S1003, the character recognition unit 204 checks whether or not the necessary rotation angle obtained in S1002 is 0 °. If the required rotation angle is 0 °, that is, if there is no need for rotation (YES in S1003), the process proceeds to S1006. If any rotation of 90 °, 180 °, or 270 ° is necessary (NO in S1003), the process proceeds to S1004.

Ｓ１００４では、文字認識部２０４は、Ｓ１００１で生成した二値画像を、Ｓ１００２で得られた必要回転角度分だけ回転する。Ｓ１００５では、文字認識部２０４は、Ｓ１００２で得られた回転角を領域データ２０７に回転情報として付加する。 In S1004, the character recognition unit 204 rotates the binary image generated in S1001 by the necessary rotation angle obtained in S1002. In S1005, the character recognition unit 204 adds the rotation angle obtained in S1002 to the area data 207 as rotation information.

Ｓ１００６では、文字認識部２０４は、回転された二値画像および文字領域情報を利用して、各文字領域内に公知の文字認識処理を実行し、文字コード列を含む文字認識結果を得る。なお、回転角が０°以外の場合は、文字認識処理に指定する領域情報も、回転された二値画像上での領域に一致するように回転されたものとする。文字認識結果は、文字領域情報、行情報、および認識文字情報で構成される。文字領域情報は、位置情報として文字が存在する範囲の座標と、認識された文字行数の情報を含む。行情報は、各行の行内文字数の情報を含む。認識文字情報は、各文字に対して認識された文字コードと、文字の外接矩形座標の情報とを含む。認識文字情報には、文字認識処理により付加的に得られた各文字の情報を追加してもよい。例えば、行内の文字平均高さやピッチから推定される文字サイズや、太字、斜体、下線といった文字修飾情報やフォント種類などを付加してもよい。 In step S1006, the character recognition unit 204 executes a known character recognition process in each character region using the rotated binary image and character region information, and obtains a character recognition result including a character code string. When the rotation angle is other than 0 °, it is assumed that the area information specified in the character recognition process is also rotated so as to match the area on the rotated binary image. The character recognition result includes character area information, line information, and recognized character information. The character area information includes information on the coordinates of a range where characters exist as position information and the number of recognized character lines. The line information includes information on the number of characters in each line. The recognized character information includes a character code recognized for each character and information of circumscribed rectangular coordinates of the character. Information of each character additionally obtained by the character recognition process may be added to the recognized character information. For example, character size estimated from the average character height and pitch within a line, character modification information such as bold, italic, and underline, font type, and the like may be added.

Ｓ１００７では、文字認識部２０４は、Ｓ１００６で出力された文字認識結果の各文字に色情報を付加する。本処理では、画素塊データ２０６に保持されている、各画素塊の外接矩形座標と、画素値すなわち画素塊の色情報とを利用する。ただし、文字認識処理は二値画像を文字領域単位で指定して行われており、文字認識処理の結果である文字単位は、画素塊データ２０６の画素塊と関連づけられていない。 In step S1007, the character recognition unit 204 adds color information to each character of the character recognition result output in step S1006. In this process, the circumscribed rectangular coordinates of each pixel block and the pixel value, that is, the color information of the pixel block, stored in the pixel block data 206 are used. However, the character recognition process is performed by designating a binary image in units of character areas, and the character unit as a result of the character recognition process is not associated with the pixel block of the pixel block data 206.

（色情報付加処理）
文字認識部２０４によるＳ１００７の処理を図１２のフローチャートを用いて説明する。Ｓ１２０１では、文字認識部２０４は、領域データ２０７の文字領域に属する画素塊データ２０６から未処理画素塊一つを処理対象Ｃとして選択する。このとき、文字認識部２０４は、文字データ２０９の処理情報をクリアする。画素塊データに対する処理順序は、画素塊データの処理方法や原稿の入力方向などにより異なるため、不定、つまり文字の読み順であることを前提としない。なお、領域データ２０７に０°以外の回転角が付与されている場合は、処理対象の画素塊Ｃに対する以下のステップの処理において、回転角により回転された座標を用いる。これにより、入力画像の方向と文字認識処理時の正置方向が異なる場合に、両者から得られたデータの不一致を解消することができる。 (Color information addition processing)
The process of S1007 by the character recognition unit 204 will be described with reference to the flowchart of FIG. In step S 1201, the character recognition unit 204 selects one unprocessed pixel block as the processing target C from the pixel block data 206 belonging to the character area of the area data 207. At this time, the character recognition unit 204 clears the processing information of the character data 209. Since the processing order for the pixel block data varies depending on the processing method of the pixel block data, the input direction of the original, and the like, it is not assumed to be indefinite, that is, the reading order of characters. When a rotation angle other than 0 ° is given to the area data 207, coordinates rotated by the rotation angle are used in the processing of the following steps for the pixel block C to be processed. Thereby, when the direction of the input image is different from the normal orientation at the time of character recognition processing, it is possible to eliminate the mismatch of the data obtained from both.

Ｓ１２０２では、文字認識部２０４は、文字データ２０９から未処理文字の一つを処理対象Ｏとして選択する。Ｓ１２０３では、文字認識部２０４は、画素塊Ｃと文字データＯの領域の外接矩形の重なりを判定する。ＣとＯの領域の外接矩形が重なれば（Ｓ１２０３にてＹＥＳ）、Ｓ１２０４へ進む。ＣとＯの領域の外接矩形が重ならない場合は（Ｓ１２０３にてＮＯ）、Ｓ１２０２へ進み、文字認識部２０４は、次の未処理文字データを処理対象Ｏとする。 In step S 1202, the character recognition unit 204 selects one of the unprocessed characters from the character data 209 as the processing target O. In step S 1203, the character recognition unit 204 determines whether the circumscribed rectangle overlaps the pixel block C and the character data O area. If the circumscribed rectangles of the C and O regions overlap (YES in S1203), the process proceeds to S1204. If the circumscribed rectangles of the areas C and O do not overlap (NO in S1203), the process proceeds to S1202, and the character recognition unit 204 sets the next unprocessed character data as the processing target O.

Ｓ１２０４では、文字認識部２０４は、既に文字Ｏに画素塊Ｃが関連付けられているかを判断する。関連付けられていなければ（Ｓ１２０４にてＮＯ）、Ｓ１２０６へ進む。関連付けられていれば（Ｓ１２０４にてＹＥＳ）、文字認識部２０４は、画素塊Ｃを文字Ｏの関連画素塊Ｏｃとして、Ｓ１２０５へ進む。ここで、文字と画素塊とが関連付けられているかの判定は、当該文字に色情報が付加されているかで判断することができる。 In step S1204, the character recognition unit 204 determines whether the pixel block C is already associated with the character O. If not associated (NO in S1204), the process proceeds to S1206. If it is associated (YES in S1204), the character recognition unit 204 sets the pixel block C as the related pixel block Oc of the character O, and proceeds to S1205. Here, it can be determined whether the character and the pixel block are associated with each other based on whether color information is added to the character.

Ｓ１２０５では、文字認識部２０４は、既に文字Ｏに関連づけられた画素塊Ｏｃと文字Ｏの重なりの面積が近似するかを判定する。ここでは、文字認識部２０４は、既に文字Ｏに関連づけられた画素Ｏｃと文字Ｏの重なりの面積と、画素塊Ｃと文字Ｏの重なりの面積の大きさを判定する。あるいは単に画素塊Ｃと文字Ｏの重なりの大きさと文字Ｏの面積の近さ、つまり画素塊Ｃが文字Ｏをカバーする面積の広さにより判定しても良い。 In step S 1205, the character recognition unit 204 determines whether the overlapping area of the pixel block Oc and the character O already associated with the character O is approximate. Here, the character recognizing unit 204 determines the size of the overlap area of the pixel Oc and the character O already associated with the character O and the size of the overlap area of the pixel block C and the character O. Alternatively, it may be determined simply by the size of the overlap between the pixel block C and the character O and the proximity of the area of the character O, that is, the size of the area where the pixel block C covers the character O.

本工程は、小領域を無視することで文字矩形内に存在するノイズの色の付加を防ぐ目的と、より正確な色情報を持つ画素塊と関連づける効果がある。先に説明した様に、文字画素塊と文字は一対一で対応するとは限らない。例えば偏と旁からなる漢字では偏と旁でそれぞれ一つの画素塊が形成される。偏と旁が同色であれば、どちらの色情報を用いても構わないが、インクの滲みやスキャナの光学解像度の問題から文字輪郭部の色味は一般的に異なる。そのため、小さい画素塊では輪郭部の占める面積が大きくなり不正確な色情報が抽出される可能性が高い。よって、面積を比較することで、より正確な色情報を持つと推定される画素塊を選択する。面積が近似していれば（Ｓ１２０５にてＹＥＳ）Ｓ１２０６へ進む。面積が近似していない場合は（Ｓ１２０５にてＮＯ）、Ｓ１２０２へ進み、文字認識部２０４は、次の未処理文字データを処理対象Ｏとする。 This step has the effect of preventing the addition of the noise color existing in the character rectangle by ignoring the small area, and the effect of associating with the pixel block having more accurate color information. As described above, a character pixel block and a character do not necessarily correspond one-on-one. For example, in a Chinese character consisting of bias and 旁, one pixel block is formed for both bias and 旁. Either color information may be used as long as the bias and the color are the same, but the color of the character outline portion is generally different due to the problem of ink bleeding and the optical resolution of the scanner. Therefore, in a small pixel block, the area occupied by the contour portion becomes large, and there is a high possibility that inaccurate color information is extracted. Therefore, a pixel block estimated to have more accurate color information is selected by comparing the areas. If the area is approximate (YES in S1205), the process proceeds to S1206. If the area is not approximate (NO in S1205), the process proceeds to S1202, and the character recognition unit 204 sets the next unprocessed character data as the processing target O.

Ｓ１２０６では、文字認識部２０４は、文字Ｏに対して、関連付けられた画素塊Ｃの色情報を付加する。本実施形態では、文字Ｏの関連画素塊Ｏｃとして画素塊Ｃを関連付けることで、文字Ｏに対する関連画素塊Ｏｃの色情報を参照可能とする。その後、Ｓ１２０７へ進む。 In step S 1206, the character recognition unit 204 adds color information of the associated pixel block C to the character O. In the present embodiment, the color information of the related pixel block Oc with respect to the character O can be referred to by associating the pixel block C as the related pixel block Oc of the character O. Then, it progresses to S1207.

Ｓ１２０７では、文字認識部２０４は、文字データ２０９の全文字に対して処理が終了したかを判定する。処理が終了していれば（Ｓ１２０７にてＹＥＳ）Ｓ１２０８へ進む。未処理の文字データがあれば（Ｓ１２０７にてＮＯ）、Ｓ１２０２へ進み、文字認識部２０４は、次の未処理文字データを処理対象Ｏとする。 In step S 1207, the character recognition unit 204 determines whether processing has been completed for all characters of the character data 209. If processing has ended (YES in S1207), the process advances to S1208. If there is unprocessed character data (NO in S1207), the process proceeds to S1202, and the character recognition unit 204 sets the next unprocessed character data as the processing target O.

Ｓ１２０８では、文字認識部２０４は、画素塊データ２０６の全画素塊データに対して処理が終了したかを判定する。未処理画素塊データがあれば（Ｓ１２０８にてＮＯ）、Ｓ１２０１へ進み、文字認識部２０４は、次の未処理画素塊データを処理対象Ｃとする。全画素塊に対して処理が終了していれば（Ｓ１２０８にてＹＥＳ）、Ｓ１２０９へ進む。 In step S 1208, the character recognition unit 204 determines whether processing has been completed for all pixel block data of the pixel block data 206. If there is unprocessed pixel block data (NO in S1208), the process proceeds to S1201, and the character recognition unit 204 sets the next unprocessed pixel block data as a processing target C. If processing has been completed for all pixel blocks (YES in step S1208), the process advances to step S1209.

Ｓ１２０９では、文字認識部２０４は、色情報のない文字データに対する色付け処理を行う。文字認識部２０４の出力する文字データは画素塊を元にしているため、出力される文字データには一致する画素塊が存在するが、空白などの不可視文字を認識する場合は例外的に一致する画素塊が存在しない。前の文字が存在すれば、文字認識部２０４は、前の文字と同一の文字色を色情報に付加する。前の文字が存在しない場合は、文字認識部２０４は、後ろの文字の色情報を参照し、存在しない場合はさらに後方の文字を辿り、文字色が存在した時点での文字色を付加する。この処理を色情報のない文字データ全てに行うことで、全ての空白文字に、前、あるいは後方の文字列の色情報が付けられる。なお、本実施形態では、前の文字の色を優先的に付加したが、後ろの文字の色を優先的に付加してもよい。 In step S1209, the character recognition unit 204 performs a coloring process on character data without color information. Since the character data output from the character recognition unit 204 is based on a pixel block, there is a matching pixel block in the output character data, but it matches exceptionally when an invisible character such as a blank is recognized. There is no pixel block. If the previous character exists, the character recognition unit 204 adds the same character color as that of the previous character to the color information. If the previous character does not exist, the character recognition unit 204 refers to the color information of the subsequent character. If the character does not exist, the character recognizing unit 204 traces the character behind and adds the character color when the character color exists. By performing this process on all character data having no color information, the color information of the preceding or following character string is attached to all blank characters. In the present embodiment, the color of the preceding character is preferentially added, but the color of the subsequent character may be preferentially added.

（色情報付加処理の処理結果の例）
以下、入力される文字領域の例として図１１を用い、図１２のフローチャートの処理を説明する。図１１（ａ）は文字領域の例であり、“ａｂｃ”、“１２３”、“ｉｆ”の３行からなる８つの文字とスペース（空白文字）とを含んでいる。ここで、“ａｂｃ”は黒色、“１２３”が赤色、“ｉｆ”が青色であるとする。なお、１行目の“ａｂ”と“ｃ”の間にはスペースが含まれているものとする。 (Example of processing result of color information addition processing)
Hereinafter, the processing of the flowchart of FIG. 12 will be described using FIG. 11 as an example of the character region to be input. FIG. 11A shows an example of a character area, which includes eight characters consisting of three lines “ab c”, “123”, and “if” and a space (blank character). Here, “ab c” is black, “123” is red, and “if” is blue. It is assumed that a space is included between “ab” and “c” in the first line.

図１１（ｂ）は、図１１（ａ）を文字認識した結果の例である。本例では、文字Ｏ０１から文字Ｏ０９まで９文字の文字コード情報と矩形座標情報、推定文字サイズ、そして行情報が認識されている。なお、文字認識は二値の情報を元に行われるため、色情報は付加されていない。 FIG. 11B is an example of the result of character recognition in FIG. In this example, character code information, rectangular coordinate information, estimated character size, and line information of nine characters from character O01 to character O09 are recognized. Since character recognition is performed based on binary information, color information is not added.

図１１（ｃ）は、文字領域に対して画素塊解析部２０１より得られた画素塊データ２０６の例である。Ｃ０１からＣ０９で表した９個の画素塊と、それぞれの外接矩形座標、そして色情報が格納されている。本例では色情報の表記を赤成分４ｂｉｔ、緑成分４ｂｉｔ、青成分４ｂｉｔの３桁の１６進数ＲＧＢ表記を用いる。このとき、色情報の値は、黒色なら“＃０００”、赤色なら“＃Ｆ００”、青色なら“＃００Ｆ”となる。画素塊データは、先に説明した通り画素の連結からなるため、文字認識結果と領域が異なる。例えば、文字認識結果でスペースと認識された部分に対して画素塊は存在しない。 FIG. 11C is an example of the pixel block data 206 obtained from the pixel block analysis unit 201 for the character area. Nine pixel blocks represented by C01 to C09, their circumscribed rectangular coordinates, and color information are stored. In this example, the color information is expressed in three-digit hexadecimal RGB notation of red component 4 bits, green component 4 bits, and blue component 4 bits. At this time, the value of the color information is “# 000” for black, “# F00” for red, and “# 00F” for blue. Since the pixel block data is composed of connected pixels as described above, the character recognition result and the area are different. For example, there is no pixel block for a portion recognized as a space in the character recognition result.

また、図１３（ａ）で示す通り、文字Ｏ０８の文字矩形が実線１３００で囲まれた一つの領域であるのに対し、画素塊は破線１３０１と破線１３０２で囲まれたＣ０７、Ｃ０８の二つの領域に分割される。また、Ｃ０７の画素塊の色情報が実際は青色を表す“＃００Ｆ”であるにも関わらず、文字の滲み、あるいはスキャン解像度によって、“＃３３Ｆ”として得られている。文字の滲み、スキャン解像度による色の誤差については図１４を用いて説明をする。 Further, as shown in FIG. 13A, the character rectangle of the character O08 is one area surrounded by a solid line 1300, whereas the pixel block is two areas C07 and C08 surrounded by a broken line 1301 and a broken line 1302. Divided into regions. In addition, although the color information of the pixel block of C07 is actually “# 00F” representing blue, it is obtained as “# 33F” due to the blur of characters or the scan resolution. Character bleeding and color error due to scan resolution will be described with reference to FIG.

図１４（ａ）は、図１１（ａ）の文字“ｉ”のスキャン画像を拡大した図である。３×３の小領域からなる領域１４０１は、色滲みとスキャン解像度の影響により、本来の色よりも淡色で得られている。領域１４０２は領域１４０１と比較してより大きい領域であるため、色の滲みやスキャン解像度の影響を受けている領域が少ない。 FIG. 14A is an enlarged view of the scanned image of the letter “i” in FIG. A region 1401 composed of 3 × 3 small regions is obtained in a lighter color than the original color due to the influence of color blur and scan resolution. Since the area 1402 is larger than the area 1401, there are few areas that are affected by color blur or scan resolution.

図１４（ｂ）は、図１４（ａ）に対して減色処理（Ｓ３０２）を行った結果を図示したものである。減色処理により領域１４０１の画素塊は本来の色より淡色にまとめられている。 FIG. 14B illustrates the result of the color reduction process (S302) performed on FIG. 14A. The pixel block in the area 1401 is gathered lighter than the original color by the color reduction processing.

続いて、図１２のフローチャートに沿って、図１１に示す情報に対する色情報付加処理を具体的に説明する。 Next, the color information addition processing for the information shown in FIG. 11 will be specifically described along the flowchart of FIG.

Ｓ１２０１で、図１１（ｃ）に示す画素塊Ｃ０１が選択されたとする。Ｓ１２０２では図１１（ｂ）に示す未処理の文字Ｏ０１が選択され、Ｓ１２０３で外接矩形の重なりの判定が行われる。このとき、画素塊Ｃ０１の外接矩形は、文字Ｏ０１の外接矩形に内包されており、重なっていると判定され、Ｓ１２０４に進む。ここで文字Ｏ０１には色情報が未付加であるためＳ１２０６へ進み、文字Ｃ０１と関連付けられ、画素塊Ｃ０１の色情報“＃０００”が文字Ｏ０１の色情報として参照可能となる。同様の処理により、画素塊Ｃ０２と文字Ｏ０２、画素塊Ｃ０３と文字Ｏ０４、画素塊Ｃ０４と文字Ｏ０５、画素塊Ｃ０５と文字Ｏ０６、画素塊Ｃ０６と文字Ｏ０７、画素塊Ｃ０７と文字Ｏ０８がそれぞれ関連付けられる。 Assume that the pixel block C01 shown in FIG. 11C is selected in S1201. In S1202, the unprocessed character O01 shown in FIG. 11B is selected, and in S1203, the determination of the circumscribed rectangle overlap is performed. At this time, the circumscribed rectangle of the pixel block C01 is included in the circumscribed rectangle of the character O01, and it is determined that the pixel rectangle C01 overlaps, and the process proceeds to S1204. Here, since color information is not added to the character O01, the process proceeds to S1206, and is associated with the character C01, and the color information “# 000” of the pixel block C01 can be referred to as the color information of the character O01. By similar processing, pixel block C02 and character O02, pixel block C03 and character O04, pixel block C04 and character O05, pixel block C05 and character O06, pixel block C06 and character O07, and pixel block C07 and character O08 are associated with each other. .

次にＳ１２０１で画素塊Ｃ０８が選択され、Ｓ１２０２で文字Ｏ０８が選択され、Ｓ１２０３へ進んだとする。この時、画素塊Ｃ０８の外接矩形と文字Ｏ０８の外接矩形が重なるため、Ｓ１２０４へ進む。Ｓ１２０４では、文字Ｏ０８に既に画素塊Ｃ０７が関連付けられているため、Ｓ１２０５へ進む。Ｓ１２０５では、既に関連づけられている画素塊Ｃ０７と文字Ｏ０８の重なっている面積の差を比較する。 Next, it is assumed that the pixel block C08 is selected in S1201, the character O08 is selected in S1202, and the process proceeds to S1203. At this time, since the circumscribed rectangle of the pixel block C08 and the circumscribed rectangle of the character O08 overlap, the process proceeds to S1204. In S1204, since the pixel block C07 is already associated with the character O08, the process proceeds to S1205. In S1205, the difference in the area where the pixel block C07 already associated and the character O08 overlap is compared.

画素塊Ｃ０７は、全領域が文字Ｏ０８に内包されており、重なりの面積は、図１３（ａ）の領域１３１１で示す画素塊Ｃ０７の面積、つまり３×３＝９となる。一方、画素塊Ｃ０８と文字Ｏ０８においては、画素塊Ｃ０８は全領域が文字Ｏ０８に内包されており、図１３（ａ）の領域１３１２で示す矩形面積７×１３＝９１となる。この場合、画素塊Ｃ０８の方がより文字Ｏ０８に近いと判定され、Ｓ１２０６にて、画素塊Ｃ０８と文字Ｏ０８とが関連づけられる。これにより、文字Ｏ０８の色情報が、画素塊Ｃ０７の“＃３３Ｆ”から、Ｃ０８の“＃００Ｆ”に変更される。 The entire area of the pixel block C07 is included in the character O08, and the overlapping area is the area of the pixel block C07 indicated by the region 1311 in FIG. 13A, that is, 3 × 3 = 9. On the other hand, in the pixel block C08 and the character O08, the entire region of the pixel block C08 is included in the character O08, and the rectangular area 7 × 13 = 91 indicated by the region 1312 in FIG. In this case, it is determined that the pixel block C08 is closer to the character O08, and the pixel block C08 and the character O08 are associated in S1206. As a result, the color information of the character O08 is changed from “# 33F” of the pixel block C07 to “# 00F” of C08.

続けてＳ１２０１へ進み、画素塊Ｃ０９と文字Ｏ０９が関連づけられる。全ての画素塊に対して処理が終わり、Ｓ１２０９へ進む。図１３（ｂ）は、この時点での文字データに対する色情報の値である。文字Ｏ０３は空白文字（スペース）であるため、対応する画素塊がなく、色が付加されていない。Ｓ１２０９では、文字認識部２０４は、色情報が未付加の文字Ｏ０３に対して前後の色情報を付加する。前の文字Ｏ０２に色情報＃０００が付加されているため、文字認識部２０４は、Ｏ０３に対しても“＃０００”を付加し、処理を終了する。文字認識部２０４の全処理を終了したあとの文字データの例を図１３（ｃ）に示す。 Subsequently, the process proceeds to S1201, and the pixel block C09 and the character O09 are associated with each other. Processing is completed for all pixel blocks, and the process advances to step S1209. FIG. 13B shows color information values for character data at this point. Since the character O03 is a blank character (space), there is no corresponding pixel block and no color is added. In step S1209, the character recognition unit 204 adds the preceding and following color information to the character O03 to which color information is not added. Since the color information # 000 is added to the previous character O02, the character recognition unit 204 adds “# 000” to O03 and ends the process. FIG. 13C shows an example of character data after the entire processing of the character recognition unit 204 is completed.

Ｓ１２０５の処理を行わない場合は、Ｃ０７とＣ０８のどちらの色がＯ０８に付加されるかは不定であるため、本来の色ではないＣ０７の“＃３３Ｆ”が付加されることがある。 When the processing of S1205 is not performed, it is uncertain which color of C07 or C08 is added to O08, and therefore C07 “# 33F” that is not the original color may be added.

（電子文書記述生成部による処理）
電子文書記述生成部２０５の処理を図１５のフローチャートを用いて説明する。Ｓ１５０１では、電子文書記述生成部２０５は、出力電子文書２１０の開始部分を記述するデータを出力する。なお、本説明では、出力先はメモリ１０３あるいはハードディスク１０４に確保された出力バッファである。以降、処理でデータが出力される毎に、その内容は出力バッファ内に出力済のデータの末尾へと追加されるよう記憶されていくものとする。 (Processing by electronic document description generator)
The processing of the electronic document description generation unit 205 will be described with reference to the flowchart of FIG. In step S 1501, the electronic document description generation unit 205 outputs data describing the start portion of the output electronic document 210. In this description, the output destination is an output buffer secured in the memory 103 or the hard disk 104. Thereafter, each time data is output in the process, the contents are stored in the output buffer so as to be added to the end of the output data.

Ｓ１５０２では、電子文書記述生成部２０５は、出力電子文書２１０において、ページの開始部分を記述するデータを出力する。なお、本実施形態に係る電子文書生成処理では、ひとつの入力画像の内容を出力電子文書２１０の１ページに対応させるものとする。Ｓ１５０３では、電子文書記述生成部２０５は、領域データ２０７内におけるルートノードを最初の処理対象となる注目ノードに設定する。 In step S 1502, the electronic document description generation unit 205 outputs data describing the start part of the page in the output electronic document 210. In the electronic document generation process according to the present embodiment, the content of one input image is assumed to correspond to one page of the output electronic document 210. In step S 1503, the electronic document description generation unit 205 sets the root node in the area data 207 as the attention node that is the first processing target.

Ｓ１５０４では、電子文書記述生成部２０５は、注目ノードが出力対象領域であるかどうかを調べる。出力対象領域である場合は（Ｓ１５０４にてＹＥＳ）、Ｓ１５０５に進む。出力対象領域では無い場合は（Ｓ１５０４にてＮＯ）、Ｓ１５０６に進む。ここで注目ノードが出力対象領域か否かは、電子文書記述生成部２０５に設定された定義テーブルに基づいて判断される。この定義テーブルには、前背景、文字、線画、自然画、表の領域種別毎に、出力の有無および方式（データ形式）が定義される。 In step S1504, the electronic document description generation unit 205 checks whether the target node is an output target area. If it is an output target area (YES in S1504), the process proceeds to S1505. If it is not an output target area (NO in S1504), the process proceeds to S1506. Here, whether or not the node of interest is an output target area is determined based on a definition table set in the electronic document description generation unit 205. In this definition table, the presence / absence of output and the method (data format) are defined for each of the foreground / background, character, line drawing, natural image, and table area types.

図１６に出力対象領域の定義テーブルの一例を示す。図１６では、定義１６０１と定義１６０２の２種類の定義テーブルが定められている。どちらのテーブルを用いるかは、本電子文書生成処理に予め指示されていてもよいし、入力内容によって電子文書生成処理内で自動選択するようになっていてもよい。 FIG. 16 shows an example of an output target area definition table. In FIG. 16, two types of definition tables, a definition 1601 and a definition 1602, are defined. Which table is used may be instructed in advance in the electronic document generation process, or may be automatically selected in the electronic document generation process depending on the input content.

Ｓ１５０５では、電子文書記述生成部２０５は、注目ノードに対し、領域に対応づけられているグラフィックスデータ２０８もしくは文字データ２０９を出力する。なお、領域データ２０７に対し、図１０のＳ１００５にて、０°以外の回転角情報が付与されている場合がある。これは、入力画像２００の向きが、文字が読める正置方向と異なる場合に相当する。図１０で説明したように、このとき領域に対応づけられている文字データは、正置方向に回転された二値画像から得られたものであり、正置方向の座標を有する。一方、グラフィックスデータ２０８は、入力画像２００から得た、正置とは異なる方向の座標を有する。これらの不一致を解消する為に、グラフィックスデータ２０８に対しては、座標を正置方向へと回転したデータを出力するものとする。 In step S 1505, the electronic document description generation unit 205 outputs graphics data 208 or character data 209 associated with the area for the node of interest. Note that rotation angle information other than 0 ° may be given to the region data 207 in S1005 of FIG. This corresponds to a case where the orientation of the input image 200 is different from the normal orientation in which characters can be read. As described with reference to FIG. 10, the character data associated with the region at this time is obtained from the binary image rotated in the normal direction, and has coordinates in the normal direction. On the other hand, the graphics data 208 has coordinates in a direction different from the normal position obtained from the input image 200. In order to eliminate these inconsistencies, data obtained by rotating the coordinates in the normal direction is output to the graphics data 208.

Ｓ１５０６では、電子文書記述生成部２０５は、注目ノードの次に出力処理が行われるべきノードである、次ノードを取得する。Ｓ１５０７では、電子文書記述生成部２０５は、Ｓ１５０６で次ノードが取得できたか否かを判定する。次ノードが取得できた場合は（Ｓ１５０７にてＹＥＳ）、Ｓ１５０８に進み、取得できなかった場合は（Ｓ１５０７にてＮＯ）、Ｓ１５０９に進む。Ｓ１５０８では、電子文書記述生成部２０５は、Ｓ１５０６で取得した次ノードを新たな注目ノードとしてＳ１５０４に戻り、以降の処理を繰り返す。Ｓ１５０９では、電子文書記述生成部２０５は、領域データにさらに出力すべき領域のノードが無い、すなわち出力１ページぶんのデータ記述が終了したとして、ページの終端データを出力する。 In step S 1506, the electronic document description generation unit 205 acquires a next node that is a node on which output processing should be performed next to the node of interest. In step S1507, the electronic document description generation unit 205 determines whether the next node has been acquired in step S1506. If the next node can be acquired (YES in S1507), the process proceeds to S1508, and if it cannot be acquired (NO in S1507), the process proceeds to S1509. In step S1508, the electronic document description generation unit 205 returns to step S1504 with the next node acquired in step S1506 as a new node of interest, and repeats the subsequent processing. In step S 1509, the electronic document description generation unit 205 outputs page end data on the assumption that there is no region node to be output in the region data, that is, the data description for one output page has been completed.

Ｓ１５１０では、電子文書記述生成部２０５は、追加のページがあるか否かを調べる。追加のページは、電子文書生成処理が複数ページの電子文書を出力するよう動作している場合、追加の画像が入力された際に発生する。追加のページがある場合は（Ｓ１５１０にてＹＥＳ）、電子文書記述生成部２０５は、Ｓ１５０２に戻って、以降の処理を繰り返す。追加のページが無い、すなわちこれ以上画像が入力されない場合は（Ｓ１５１０にてＮＯ）、Ｓ１５１１へ進む。 In step S1510, the electronic document description generation unit 205 checks whether there is an additional page. An additional page is generated when an additional image is input when the electronic document generation process operates to output a multi-page electronic document. If there is an additional page (YES in S1510), the electronic document description generation unit 205 returns to S1502 and repeats the subsequent processing. If there is no additional page, that is, if no more images are input (NO in S1510), the process proceeds to S1511.

Ｓ１５１１では、電子文書記述生成部２０５は、電子文書データにおける終端データを出力する。終端データの出力により完結した電子文書データが出力バッファ上に構成される。Ｓ１５１２では、電子文書記述生成部２０５は、出力バッファ上の電子文書データを出力電子文書２１０として利用者が予め指定したＰＣ等に送信し、電子文書生成処理を終了する。 In step S1511, the electronic document description generation unit 205 outputs the end data in the electronic document data. Completed electronic document data is constructed on the output buffer by outputting the end data. In step S1512, the electronic document description generation unit 205 transmits the electronic document data on the output buffer to the PC or the like designated in advance by the user as the output electronic document 210, and ends the electronic document generation process.

なお本説明では、電子文書データの全体を出力バッファに書き出すように処理したが、より小さい出力バッファサイズで処理できるようにしてもよい。例えば、各ページの終端データを出力した時点で同ページの内容を指定送信先に送信し、次ページの内容は再び出力バッファの先頭からデータを書き出すようにしてもよい。あるいはより小さい単位で出力バッファへの書き出しと送信、クリアを繰り返すようにしてもよい。 In this description, the entire electronic document data is processed to be written to the output buffer. However, the electronic document data may be processed with a smaller output buffer size. For example, when the end data of each page is output, the content of the same page may be transmitted to the designated transmission destination, and the content of the next page may be written again from the top of the output buffer. Alternatively, writing to the output buffer, transmission, and clearing may be repeated in smaller units.

（電子ファイルの変換例）
図１７は、図１３の文字認識結果を電子文書記述生成部２０５により電子文書データに変換した例の一部分である。本例ではＸＭＬ（ｅＸｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）形式の仮想のフォーマットで表現している。なお、記述形式はこれに限定するものではなく、他の形式を用いても構わない。ｔｅｘｔＬｉｎｅ要素が１行の文字列、ｒｕｎ要素がそのうちの同一の修飾情報を持つ文字列をまとめている。ｔｅｘｔ要素に囲まれた文字列が実際に出力する文字列であり、色情報を示すｃｏｌｏｒ属性、文字サイズを示すｓｉｚｅ属性からなる。図１７（ａ）は、図１２のＳ１２０５とＳ１２０９の処理を行わない状態を電子ファイル化した例の一部である。 (Example of electronic file conversion)
FIG. 17 is a part of an example in which the character recognition result of FIG. 13 is converted into electronic document data by the electronic document description generation unit 205. In this example, the virtual format is expressed in an XML (extensible Markup Language) format. The description format is not limited to this, and other formats may be used. The textLine element is a single-line character string, and the run element is a group of character strings having the same modification information. A character string surrounded by text elements is a character string that is actually output, and includes a color attribute indicating color information and a size attribute indicating character size. FIG. 17A is a part of an example in which the state where the processing of S1205 and S1209 in FIG. 12 is not performed is converted into an electronic file.

図１７（ｂ）は、Ｓ１２０９の処理を施した後、つまり図１３（ｃ）を電子ファイル化した例の一部である。図１３（ｂ）の例では、１行目の“ａｂ”と“ｃ”の間のスペースに色情報がないため、３つのｒｕｎ要素が作成される。ここで、“＆ｎｂｓｐ；”は、スペースを表す記号である。また３行目の“ｉ”の色情報が“＃３３Ｆ”になったため、３行目に２つのｒｕｎ要素が作成され、３行合計６つのｒｕｎ要素が作成される。 FIG. 17B is a part of an example in which the processing of S1209 is performed, that is, FIG. 13C is converted into an electronic file. In the example of FIG. 13B, since there is no color information in the space between “ab” and “c” in the first row, three run elements are created. Here, “ ” is a symbol representing a space. In addition, since the color information of “i” in the third row becomes “# 33F”, two run elements are created in the third row, and a total of six run elements in three rows are created.

一方、図１３（ｃ）では、Ｓ１２０９の処理によってスペースに前の文字の色情報を付加したため、１行目の“ａｂｃ”が１つのｒｕｎ要素で表される。３行目“ｉ”のＳ１２０５の処理によって色情報が“＃００Ｆ”となっているため１つのｒｕｎ要素で表現され、合計３つのｒｕｎ要素で記述される。スペースは不可視の文字コードであるため、電子文書の表示に影響しない。 On the other hand, in FIG. 13C, the color information of the previous character is added to the space by the processing of S1209, so that “ab c” on the first line is represented by one run element. Since the color information is “# 00F” by the processing of S1205 of the third line “i”, it is expressed by one run element, and is described by a total of three run elements. Since the space is an invisible character code, it does not affect the display of the electronic document.

このように、本発明を適用することでテキストに色情報を付加した上で、スペースなどの空白を示す文字の記述を省略でき記述量の削減をすることが可能となる。 In this manner, by applying the present invention, it is possible to omit the description of characters indicating white space such as spaces after adding color information to the text, and to reduce the amount of description.

＜第二実施形態＞
第一実施形態では、図１２のＳ１２０９において色情報を付加したが、色情報に限らず文字の修飾情報を未定義文字に付加してもよい。以下、入力文字領域の例として図１８を用いた処理を説明する。ここでの色情報以外の修飾情報の例として、文字サイズ、フォント形状、文字を装飾する情報が挙げられる。 <Second embodiment>
In the first embodiment, color information is added in step S1209 of FIG. 12, but not only color information but also character modification information may be added to undefined characters. Hereinafter, processing using FIG. 18 as an example of the input character area will be described. Examples of the modification information other than the color information here include character size, font shape, and information for decorating characters.

図１８（ａ）は、文字領域の例であり、１行の文字列“ａｂｃ”からなる。このとき、文字ａ、ｂ、ｃの文字列は緑色で、フォント形状がゴシック体、斜体、太字といった装飾情報で描画された文字であるとする。 FIG. 18A shows an example of a character area, which is composed of one line of character string “ab c”. At this time, the character strings of the characters a, b, and c are green, and the font shape is a character drawn with decoration information such as Gothic, italic, and bold.

図１８（ｂ）は、文字認識結果に対してＳ１２０６の処理によって色情報を付加した例である。文字コード０ｘ６１、０ｘ６２、０ｘ２０、０ｘ６３の４文字が認識される。更に、文字コード０ｘ６１、０ｘ６２、０ｘ６３について、文字サイズ“１４”のサイズ情報、ゴシックのフォント形状情報、斜体、太字の装飾情報、緑を表す“＃０Ｆ０”の色情報を有するとする。スペースを表す“０ｘ２０”に対してはフォント形状情報、装飾情報、色情報が未定義である。 FIG. 18B is an example in which color information is added to the character recognition result by the processing of S1206. Four characters of character codes 0x61, 0x62, 0x20, and 0x63 are recognized. Further, it is assumed that the character codes 0x61, 0x62, and 0x63 have size information of the character size “14”, Gothic font shape information, italics, bold decoration information, and “# 0F0” color information representing green. For “0x20” representing a space, font shape information, decoration information, and color information are undefined.

図１８（ｃ）は、図１８（ｂ）に対してＳ１２０９の処理で前後のフォント形状情報、装飾情報、色情報を付加した例である。ここで、スペースに対して、情報が付加されていることが分かる。 FIG. 18C is an example in which font shape information, decoration information, and color information before and after are added in the processing of S1209 to FIG. 18B. Here, it can be seen that information is added to the space.

図１９は、図１８の認識結果の例をそれぞれ電子文書データに変換した例である。図１９（ａ）は図１８（ｂ）の変換結果であり、図１９（ｂ）は図１８（ｃ）の変換結果である。図１７で説明したＸＭＬ形式のｔｅｘｔ要素にさらに、フォント形状を表すｆｏｎｔ属性、太字を表すｂ属性、斜体を表すｉ属性が加わっている。図１９（ａ）が“ａｂ”とスペースと“ｃ”の３つのｒｕｎ要素から構成されているのに対し、図１９（ｂ）では“ａｂｃ”と一つのｒｕｎ要素になっている。スペースは不可視の文字コードであるため、電子文書の表示には影響しない。 FIG. 19 is an example in which the example of the recognition result of FIG. 18 is converted into electronic document data. FIG. 19A shows the conversion result of FIG. 18B, and FIG. 19B shows the conversion result of FIG. 18C. In addition to the XML format text element described in FIG. 17, a font attribute representing a font shape, a b attribute representing bold, and an i attribute representing italic are added. FIG. 19A is composed of three run elements “ab”, a space, and “c”, whereas in FIG. 19B, “ab c” is one run element. Since the space is an invisible character code, it does not affect the display of the electronic document.

このように、本実施形態を適用することで、フォント形状や装飾情報を認識した場合にもスペースなどの空白を示す文字の記述を省略し、記述量の削減を行うことが可能となる。 As described above, by applying this embodiment, it is possible to omit the description of characters indicating white space such as a space even when the font shape and decoration information are recognized, and to reduce the amount of description.

＜その他の実施形態＞
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 <Other embodiments>
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

また、本発明は、図２の各処理部の一部または全部を、電子回路等のハードウェアを用いて実現してもよい。 In the present invention, a part or all of each processing unit in FIG. 2 may be realized using hardware such as an electronic circuit.

Claims

An image processing apparatus that generates electronic data that can be edited from an input image,
Input means for inputting an image including a character string as the input image;
Extraction means for extracting a plurality of pixel clusters whose pixel values approximate in the pixels constituting the input image;
Identification means for identifying an area formed by the plurality of pixel blocks as at least one of a character area and other areas;
Analyzing a character from the pixel block identified as the character region, and obtaining character information including at least a character code and position information of the character;
A specifying means for specifying a character string including a blank character from an arrangement of characters indicated by the character information;
Color information adding means for acquiring color information from a pixel block in a character area at a position indicated by the character information and adding the color information to the character information;
Generating means for generating a description defining the electronic data from the specified character string and character information of characters included in the character string;
The image processing apparatus, wherein the color information adding unit adds color information of characters before and after the blank character in the character string to character information of the blank character included in the character string.

The analysis means further obtains at least one information of character size, font shape, bold, italic as modification information for the character of the pixel block identified as the character region,
The color information adding unit further adds modification information of characters before and after the blank character in the character string to character information of the blank character included in the character string. Image processing apparatus.

When there are two or more pixel blocks at a position indicated by one character information, the color information adding means has a maximum area overlapping the character range indicated by the character information in the pixel block. The image processing apparatus according to claim 1, wherein the color information is acquired from a pixel block.

The image processing apparatus according to claim 1, wherein the position information included in the character information is indicated by coordinates of a rectangular area including the character.

An image processing method for generating electronic data that can be edited from an input image,
An input step in which an input means inputs an image including a character string as the input image;
An extracting step in which the extracting means extracts a plurality of pixel blocks whose pixel values approximate in the pixels constituting the input image;
An identification step for identifying an area formed by the plurality of pixel blocks as at least one of a character area and other areas;
An analyzing step of analyzing a character from the pixel block identified as the character region and obtaining character information including at least a character code and position information of the character;
A specifying step of specifying a character string including a blank character from an arrangement of characters indicated by the character information;
A color information adding step for acquiring color information from a pixel block in a character area at a position indicated by the character information and adding the color information to the character information;
A generation step of generating a description defining the electronic data from the identified character string and character information of characters included in the character string;
In the color information adding step, color information on characters before and after the blank character in the character string is added to the character information of the blank character included in the character string.

Computer
Input means for inputting an image including a character string as an input image;
Extraction means for extracting a plurality of pixel clusters whose pixel values approximate in the pixels constituting the input image;
An identification means for identifying an area formed by the plurality of pixel blocks as at least one of a character area and other areas;
Analyzing means for analyzing a character from the pixel block identified as the character region and obtaining character information including at least a character code and position information of the character;
A specifying means for specifying a character string including a blank character from an arrangement of characters indicated by the character information;
Color information adding means for acquiring color information from a pixel block in a character area at a position indicated by the character information and adding the color information to the character information;
Function as generation means for generating a description defining electronic data from the specified character string and character information of characters included in the character string;
The color information adding means adds color information of characters before and after the blank character in the character string to character information of the blank character included in the character string.