JP6607490B2

JP6607490B2 - CONVERSION PROCESSING DEVICE, INFORMATION PROCESSING DEVICE EQUIPPED WITH THE SAME, PROGRAM, AND RECORDING MEDIUM

Info

Publication number: JP6607490B2
Application number: JP2015210168A
Authority: JP
Inventors: 輝彦松岡; 真彦高島; 和之濱田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2015-10-26
Filing date: 2015-10-26
Publication date: 2019-11-20
Anticipated expiration: 2035-10-26
Also published as: JP2017084012A

Description

本発明は、変換処理装置、それを備えた情報処理装置、プログラム、及び記録媒体に関する。 The present invention relates to a conversion processing apparatus, an information processing apparatus including the conversion processing apparatus, a program, and a recording medium.

従来、電子写真方式又はインクジェット方式等により画像形成を行う複写機又は複合機等の画像形成装置が広く普及している。また、近年の複合機は高機能化が進められており、スキャナにより読み取った文書を文書画像データ（以下、文書画像情報、または文書画像ともいう）として保存し、保存した文書画像を管理する機能等が求められている。一般に、スキャナにより読み取った文書は画像データとして保存されるため読み取った文書をそのまま再編集することはできない。これに対し、読み取った文書画像に含まれる文字や図表の情報を取得して再編集したいといった要望があり、更には文書画像に含まれる文字や図、表などの構造を認識し、文書作成ソフトウェアや、表計算ソフトウェア、またはプレゼンテーションソフトウェアなどで作成、使用されるオフィス文書ファイルとして再構成する機能への需要が高まっている。 2. Description of the Related Art Conventionally, image forming apparatuses such as copiers or multi-function machines that perform image formation by an electrophotographic method or an ink jet method have been widely used. In recent years, advanced functions have been advanced in multifunction peripherals, and a function of storing a document read by a scanner as document image data (hereinafter also referred to as document image information or document image) and managing the stored document image. Etc. are required. In general, since a document read by a scanner is stored as image data, the read document cannot be re-edited as it is. On the other hand, there is a demand to acquire and re-edit information of characters and diagrams included in the read document image, and further, the document creation software recognizes the structure of characters, diagrams and tables included in the document image. In addition, there is an increasing demand for a function for reconstructing an office document file created and used by spreadsheet software or presentation software.

こうしたオフィス文書ファイルへの変換処理機能を支える技術として、文書画像に含まれる文字や図、写真、表などといった要素を抽出、解析する技術が必要となる。既に、光学文字認識（Optical Character Recognition; OCR）技術により、文書画像に含まれる文字画像を解析して文字コードなどの情報に変換する機能が多くのスキャナなどにも搭載されている。これにより、文書画像中の文字画像の集合を編集可能なテキストデータとして取得することは一般に行われている。また、文書画像中に含まれる図や写真、また表の領域を分離、抽出するといった機能も種々提案されている。また、紙媒体でしか保存されていない帳票を作り直したり、膨大な実験データを表計算ソフトに入力したりといった利用例を考えると、特に表の再構成の頻度は高いと考えられる。しかし、文書画像から手動で同様のレイアウトの文書を作成する場合、表を再構成する作業が、特に手間のかかる作業であり、表の再構成の自動化がますます重要となっている。 As a technology that supports such an office document file conversion processing function, a technology for extracting and analyzing elements such as characters, diagrams, photographs, and tables included in a document image is required. Many scanners and the like are already equipped with a function of analyzing a character image included in a document image and converting it into information such as a character code by using an optical character recognition (OCR) technique. As a result, a collection of character images in a document image is generally acquired as editable text data. Various functions have also been proposed for separating and extracting diagrams, photographs, and table areas included in document images. Also, considering the use cases such as recreating forms that are stored only on paper media and inputting a large amount of experimental data into spreadsheet software, it is considered that the frequency of table reconstruction is particularly high. However, when a document having the same layout is manually created from a document image, the work of reconfiguring the table is particularly troublesome work, and automation of the reorganization of the table is becoming increasingly important.

こうした表画像から精度よく表を再構成するための技術が開発されている。たとえば、特許文献１に記載の技術では、枠に接触する文字等を含む画像から枠を正確に抽出し、文字等を高品位に復元可能とする画像抽出装置を提案している。この画像抽出装置では、画像を構成するパターンから画素と画素が繋がっている部分パターンを抽出し、抽出した各部分パターンに基づいて枠を抽出し、抽出した部分パターン及び枠に基づいて文字と枠との交点を算出する。枠の線幅に応じて枠を挟む文字線分間の距離及び傾きの連続性の判定基準を適応的に変化させ、距離及び傾きの連続性に基づいて算出した交点の対応付けを行い、対応付けた交点に基づいて枠内の文字線分を抽出する。これにより、１文字枠のサイズや位置がわからない矩形で分離した文字枠が複数個あり、その文字枠に接触、またはその文字枠からはみ出して手書き文字が書かれた場合であっても、良好に文字及び文字枠のパターンから文字部分だけを一文字、一文字切り出すことを可能としている。 A technique for accurately reconstructing a table from such a table image has been developed. For example, the technique described in Patent Document 1 proposes an image extraction device that can accurately extract a frame from an image including characters that touch the frame and restore the characters and the like to high quality. In this image extraction device, a partial pattern in which pixels are connected to each other is extracted from a pattern constituting an image, a frame is extracted based on each extracted partial pattern, and characters and frames are extracted based on the extracted partial pattern and frame. The intersection point with is calculated. Adaptively change the distance and inclination continuity criteria between character lines that sandwich the frame according to the line width of the frame, and associate the intersections calculated based on the distance and inclination continuity. The character line segment in the frame is extracted based on the intersection. As a result, even if there are multiple character frames separated by rectangles whose size and position are not known, and handwritten characters are written in contact with or protruding from the character frame It is possible to cut out only one character or one character from a character and character frame pattern.

特開平１１−３５３４１５号公報Japanese Patent Laid-Open No. 11-353415

しかしながら、特許文献１の手法の場合、電子化された文書の画像データから表領域を抽出し、再利用可能な表データに変換する際、表の中に写真やイラストなどの画像オブジェクトが存在する場合に、それを文字ではなく画像オブジェクトだと認識する手段がない。そのため、画像オブジェクトを文字として誤認識し、画像オブジェクトを間違った文字として再構成してしまう可能性があるという問題がある。 However, in the case of the technique disclosed in Patent Document 1, when a table area is extracted from image data of an electronic document and converted into reusable table data, image objects such as photographs and illustrations exist in the table. In some cases, there is no means of recognizing it as an image object rather than a character. For this reason, there is a problem that the image object may be erroneously recognized as a character and the image object may be reconfigured as an incorrect character.

そこで、本発明では、表のセルの中に画像など文字以外のオブジェクトが存在する場合でも、そのオブジェクトを文字として誤って抽出することなく、正しくオブジェクトを抽出して変換し、表の中にそのオブジェクトを正しく配置することを可能とする変換処理装置、それを備えた情報処理装置、プログラム、並びに記憶媒体を提供することを目的とする。 Therefore, in the present invention, even when an object other than a character such as an image exists in a table cell, the object is correctly extracted and converted without erroneously extracting the object as a character, and the object is converted into the table. It is an object of the present invention to provide a conversion processing device capable of correctly arranging objects, an information processing device including the same, a program, and a storage medium.

上記問題を解決するために、本発明の一態様は、文書画像情報に存在する文字領域を抽出する文字抽出処理部と、前記文書画像情報に存在する線分を抽出するライン抽出処理部と、前記ライン抽出処理部より抽出される前記線分の情報を用い、表領域を抽出する表領域抽出処理部と、前記文書画像情報に対して、予め定められる局所領域を設定し、前記局所領域の輝度ヒストグラムを作成して前記局所領域の輝度変化情報を求め、前記輝度変化情報と、前記文字抽出処理部より抽出される前記文字領域の情報と、前記ライン抽出処理部より抽出される線分の情報と、前記表領域抽出処理部より抽出される表領域の情報と、を用いて前記表領域の外、または前記表領域の中に存在する図、または写真を含む画像オブジェクト領域の抽出を行う図領域抽出処理部と、前記表領域における前記文字領域の情報、前記線分の情報、および前記画像オブジェクト領域の情報をもとに表構造を解析し、表を再構成する表構造情報を取得する表構造化処理部と、を備えることを特徴とする変換処理装置である。 In order to solve the above problem, one aspect of the present invention provides a character extraction processing unit that extracts a character region existing in document image information, a line extraction processing unit that extracts a line segment existing in the document image information, Using the line segment information extracted from the line extraction processing unit, a table region extraction processing unit for extracting a table region and a predetermined local region for the document image information are set, and the local region A luminance histogram is created to determine luminance change information of the local region, the luminance change information, information on the character region extracted by the character extraction processing unit, and a line segment extracted by the line extraction processing unit Using the information and the table area information extracted by the table area extraction processing unit, an image object area including a figure or a photograph outside the table area or in the table area is extracted. Figure The table structure is analyzed based on the area extraction processing unit, the information on the character area in the table area, the information on the line segment, and the information on the image object area, and the table structure information for reconfiguring the table is acquired. A conversion processing apparatus comprising: a table structuring processing unit.

また、本発明の一態様は、上記に記載の発明において、前記文字抽出処理部が、前記文書画像情報から抽出する前記文字領域には、文字列を含む文字列領域も含まれており、前記図領域抽出処理部は、前記文書画像情報に対して、予め定められる局所領域を設定し、前記局所領域の輝度ヒストグラムを作成して前記局所領域の輝度変化情報を求め、前記輝度変化情報と、前記文字抽出処理部より抽出される前記文字列領域の情報と、前記ライン抽出処理部より抽出される線分の情報と、前記表領域抽出処理部より抽出される表領域の情報と、を用いて前記表領域の外、または前記表領域の中に存在する図、または写真を含む画像オブジェクト領域の抽出を行い、前記表構造化処理部は、前記表領域における前記文字列領域の情報、前記線分の情報、および前記画像オブジェクト領域の情報をもとに表構造を解析し、表を再構成する表構造情報を取得することを特徴としてもよい。 Further, according to an aspect of the present invention, in the invention described above, the character region extracted by the character extraction processing unit from the document image information includes a character string region including a character string, The figure region extraction processing unit sets a predetermined local region for the document image information, creates a luminance histogram of the local region to obtain luminance change information of the local region, and the luminance change information; Using the character string region information extracted from the character extraction processing unit, the line segment information extracted from the line extraction processing unit, and the table region information extracted from the table region extraction processing unit The image object area including a figure or a photograph existing outside the table area or in the table area is extracted, and the table structuring processing unit includes information on the character string area in the table area, Line segment Distribution, and the analyzes based on the table structure information of the image object area may also be characterized by acquiring the table structure information to reconstruct the table.

また、本発明の一態様は、上記に記載の発明において、前記文字列領域のオブジェクト、前記表領域のオブジェクト、前記線分領域のオブジェクト、前記画像オブジェクトの順番、または、前記文字列領域のオブジェクト、前記表領域のオブジェクト、前記画像オブジェクト、前記線分領域のオブジェクトの順番でオブジェクトを配置するように指定されたファイル形式で記述するファイル記述部をさらに備えることを特徴としてもよい。 In addition, according to one aspect of the present invention, in the above-described invention, the character string area object, the table area object, the line segment object, the image object order, or the character string area object And a file description section described in a file format designated to arrange the objects in the order of the table area object, the image object, and the line segment object.

また、本発明の一態様は、上記に記載の発明において、前記図領域抽出処理部は、前記文書画像情報に対してエッジ検出を行い、前記エッジ検出の結果から、前記文字抽出処理部より抽出される文書構成要素の１つである文字列領域を除外することにより、前記画像オブジェクト領域の候補となる非文字列マップを生成する非文字列マップ生成処理部と、前記輝度変化情報として、前記文書画像情報の局所領域におけるヒストグラムのエントロピーを算出し、算出したエントロピーの値が高い値である領域を前記画像オブジェクト領域の候補として前記非文字列マップに追加する非文字列エリア追加処理部と、前記画像オブジェクト領域の候補が追加された前記非文字列マップに対し、前記表領域抽出処理部より抽出される前記表領域の線分を削除し、前記表領域の線分を削除した非文字列マップに対してラベリング処理を行うことにより前記画像オブジェクト領域のラベル付けを行い、ラベル付けされた画像オブジェクト領域の矩形領域を求めることによりオブジェクトマップを生成するオブジェクトマップ生成処理部と、を備えることを特徴としてもよい。 According to another aspect of the present invention, in the above-described invention, the figure region extraction processing unit performs edge detection on the document image information, and extracts from the character extraction processing unit based on the result of the edge detection. A non-character string map generation processing unit that generates a non-character string map that is a candidate for the image object region by excluding a character string region that is one of the document components to be processed, A non-character string area addition processing unit that calculates the entropy of a histogram in a local region of the document image information, and adds a region having a high value of the calculated entropy as a candidate for the image object region to the non-character string map; A line segment of the table area extracted by the table area extraction processing unit for the non-character string map to which the candidate for the image object area is added. The object is obtained by labeling the image object area by performing a labeling process on the non-character string map from which the line segment of the table area is deleted, and obtaining a rectangular area of the labeled image object area. And an object map generation processing unit that generates a map.

また、本発明の一態様は、上記に記載の発明において、前記図領域抽出処理部は、前記画像オブジェクト領域の矩形領域それぞれについて、前記矩形領域の統合処理、または、前記矩形領域の分割処理を行う有効オブジェクトエリア判定処理部を備えることを特徴としてもよい。 According to another aspect of the present invention, in the above-described invention, the figure region extraction processing unit performs the integration processing of the rectangular regions or the division processing of the rectangular regions for each of the rectangular regions of the image object region. An effective object area determination processing unit may be provided.

また、本発明の一態様は、上記に記載の発明において、前記有効オブジェクトエリア判定処理部は、複数の前記画像オブジェクト領域の矩形領域が重複している場合には、複数の画像オブジェクトの矩形領域座標の最大値と最小値を算出し、重複している画像オブジェクト領域を１つの矩形領域に統合することを特徴としてもよい。 Further, according to one aspect of the present invention, in the above-described invention, the effective object area determination processing unit may be configured such that when the rectangular areas of the plurality of image object areas overlap, the rectangular areas of the plurality of image objects. A maximum value and a minimum value of coordinates may be calculated, and overlapping image object regions may be integrated into one rectangular region.

また、本発明の一態様は、上記に記載の発明において、前記有効オブジェクトエリア判定処理部は、前記画像オブジェクト領域の矩形領域に対し、前記文字列領域が重複している場合には、前記画像オブジェクトの矩形領域と前記文字列領域よりなる領域の、座標の最大値と最小値を算出し、前記画像オブジェクト領域と重複している文字列領域を１つの矩形領域に統合することを特徴としてもよい。 According to another aspect of the present invention, in the above-described invention, the valid object area determination processing unit is configured to display the image when the character string area overlaps the rectangular area of the image object area. It is also possible to calculate the maximum value and the minimum value of the coordinates of the object rectangular area and the character string area, and to combine the character string area overlapping the image object area into one rectangular area. Good.

また、本発明の一態様は、上記に記載の発明において、前記有効オブジェクトエリア判定処理部は、前記画像オブジェクト領域の矩形領域に対し、前記表領域が重複している場合には、前記画像オブジェクト領域の矩形領域から前記表領域が重複している領域を排除すると共に、前記重複している表領域の水平方向の枠線、または、垂直方向の枠線の延長線上に沿って前記画像オブジェクト領域の矩形領域を分割することを特徴としてもよい。 Further, according to one aspect of the present invention, in the invention described in the above, the valid object area determination processing unit is configured to display the image object when the table area overlaps a rectangular area of the image object area. The area where the table area overlaps from the rectangular area of the area is excluded, and the image object area along a horizontal frame line or an extension of the vertical frame line of the overlap table area The rectangular area may be divided.

また、本発明の一態様は、上記に記載の変換処理装置を備えることを特徴とする情報処理装置である。 Another embodiment of the present invention is an information processing device including the conversion processing device described above.

また、本発明の一態様は、コンピュータを、文書画像情報に存在する文字領域を抽出する文字抽出処理手段、前記文書画像情報に存在する線分を抽出するライン抽出処理手段、前記ライン抽出処理手段より抽出される前記線分の情報を用い、表領域を抽出する表領域抽出処理手段、前記文書画像情報に対して、予め定められる局所領域を設定し、前記局所領域の輝度ヒストグラムを作成して前記局所領域の輝度変化情報を求め、前記輝度変化情報と、前記文字抽出処理手段より抽出される前記文字領域の情報と、前記ライン抽出処理手段より抽出される線分の情報と、前記表領域抽出処理手段より抽出される表領域の情報と、を用いて前記表領域の外、または前記表領域の中に存在する図、または写真を含む画像オブジェクト領域の抽出を行う図領域抽出処理手段、前記表領域における前記文字領域の情報、前記線分の情報、および前記画像オブジェクト領域の情報をもとに表構造を解析し、表を再構成する表構造情報を取得する表構造化処理手段、として機能させるためのプログラムである。 According to another aspect of the present invention, there is provided a computer, a character extraction processing unit that extracts a character region existing in document image information, a line extraction processing unit that extracts a line segment existing in the document image information, and the line extraction processing unit. Table area extraction processing means for extracting a table area using the extracted line segment information, setting a predetermined local area for the document image information, and creating a luminance histogram of the local area The luminance change information of the local region is obtained, the luminance change information, the information of the character region extracted by the character extraction processing unit, the line segment information extracted by the line extraction processing unit, and the table region The table object information extracted by the extraction processing means is used to extract an image object area including a figure or a photograph outside or within the table area. The table structure is analyzed based on the figure region extraction processing means, the information on the character region in the table region, the information on the line segment, and the information on the image object region, and the table structure information for reconstructing the table is obtained It is a program for functioning as a table structuring processing means.

また、本発明の一態様は、上記に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体である。 One embodiment of the present invention is a computer-readable recording medium on which the program described above is recorded.

この発明によれば、表のセルの中に画像など文字以外のオブジェクトが存在する場合でも、そのオブジェクトを文字として誤って抽出することなく、正しくオブジェクトを抽出して変換し、表の中にそのオブジェクトを正しく配置することが可能となる。 According to the present invention, even when an object other than a character such as an image exists in a table cell, the object is correctly extracted and converted without erroneously extracting the object as a character, and the object is converted into the table. It becomes possible to arrange objects correctly.

本発明の実施の形態１による画像形成装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of an image forming apparatus according to Embodiment 1 of the present invention. 同実施の形態による変換処理部の構成を示すブロック図である。It is a block diagram which shows the structure of the conversion process part by the embodiment. 同実施の形態による表構造情報の一例を示す図（その１）である。It is a figure (the 1) which shows an example of the table structure information by the embodiment. 同実施の形態による表構造情報の一例を示す図（その２）である。It is a figure (the 2) which shows an example of the table structure information by the embodiment. Ｗｏｒｄファイルのファイル構造の一例を示す図である。It is a figure which shows an example of the file structure of a Word file. マークアップ言語を用いて記述された表の一例を示す図である。It is a figure which shows an example of the table described using the markup language. 同実施の形態による図領域抽出処理部の構成を示すブロック図である。It is a block diagram which shows the structure of the figure area | region extraction process part by the embodiment. 同実施の形態による表画像の一例を示す図である。It is a figure which shows an example of the table | surface image by the embodiment. 同実施の形態による非文字列マップ生成処理部による処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process by the non-character string map production | generation process part by the embodiment. 同実施の形態に適用されるラプラシアンフィルタと適用結果の一例を示す図である。It is a figure which shows an example of the Laplacian filter applied to the embodiment, and an application result. 同実施の形態による非文字列マップ生成処理部による処理が適用された表画像の一例を示す図である。It is a figure which shows an example of the table image to which the process by the non-character string map production | generation process part by the embodiment was applied. 同実施の形態による非文字列エリア追加処理部による処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process by the non-character string area addition process part by the embodiment. 同実施の形態による非文字列エリア追加処理部による処理が適用された表画像の一例を示す図（その１）である。It is FIG. (1) which shows an example of the table image to which the process by the non-character string area addition process part by the same embodiment was applied. 同実施の形態による非文字列エリア追加処理部による処理が適用された表画像の一例を示す図（その２）である。It is FIG. (2) which shows an example of the table image to which the process by the non-character string area addition process part by the same embodiment was applied. 同実施の形態によるオブジェクトマップ生成処理部による処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process by the object map production | generation process part by the embodiment. 同実施の形態によるオブジェクトマップ生成処理部による処理が適用された表画像の一例を示す図（その１）である。FIG. 6B is a diagram illustrating an example of a table image to which the process by the object map generation processing unit according to the embodiment is applied (part 1); 同実施の形態によるオブジェクトマップ生成処理部による処理が適用された表画像の一例を示す図（その２）である。It is FIG. (2) which shows an example of the table image to which the process by the object map production | generation process part by the embodiment was applied. 同実施の形態によるオブジェクトマップ生成処理部による処理が適用された表画像の一例を示す図（その３）である。It is FIG. (3) which shows an example of the table image to which the process by the object map production | generation process part by the embodiment was applied. 同実施の形態による有効オブジェクトエリア判定処理部による処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process by the effective object area determination process part by the embodiment. 同実施の形態による有効オブジェクトエリア判定処理部による処理が適用された画像オブジェクトの一例を示す図（その１）である。It is FIG. (1) which shows an example of the image object to which the process by the effective object area determination process part by the same embodiment was applied. 同実施の形態による有効オブジェクトエリア判定処理部による処理が適用された画像オブジェクトの一例を示す図（その２）である。It is FIG. (2) which shows an example of the image object to which the process by the effective object area determination process part by the same embodiment was applied. 同実施の形態による有効オブジェクトエリア判定処理部による処理が適用された画像オブジェクトの一例を示す図（その３）である。It is FIG. (3) which shows an example of the image object to which the process by the effective object area determination process part by the same embodiment was applied. 本発明の実施の形態２による画像読取装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image reading apparatus by Embodiment 2 of this invention. 輝度変化情報の変形例を説明するための図である。It is a figure for demonstrating the modification of brightness | luminance change information.

以下、本発明の実施の形態について、図面を参照して詳細に説明する。本発明による実施形態では、電子化された文書の画像データから表領域を抽出し、再利用可能な表データに変換する際、表のセルの中に写真・図・グラフ・イラストといった文字以外の画像オブジェクトが存在する場合でも、その画像オブジェクトを文字として誤って抽出することなく、正しく画像オブジェクトを抽出して変換し、表の中にその画像オブジェクトを正しく配置する。また、画像オブジェクト領域同士や画像オブジェクト領域と文字列領域、画像オブジェクト領域と表領域が重なった場合でも、所定のファイルフォーマットに変換する際に見栄え良く変換する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In an embodiment according to the present invention, when a table area is extracted from image data of an electronic document and converted into reusable table data, a table cell contains characters other than characters such as photographs, diagrams, graphs, and illustrations. Even when an image object exists, the image object is correctly extracted and converted without erroneously extracting the image object as characters, and the image object is correctly arranged in the table. Further, even when the image object areas, the image object area and the character string area, or the image object area and the table area overlap each other, the image object area and the table area are converted with a good appearance when converted into a predetermined file format.

〔実施の形態１〕
以下の説明では、本発明に係る変換処理装置が変換処理部３０として画像処理装置１の一部を成し、また、その画像処理装置１が画像形成装置１００の一部を成す形態を例示する。図１は、実施の形態１に係る画像形成装置１００（情報処理装置）の機能的構成を示すブロック図である。画像形成装置１００は、例えば、コピー機能及びスキャナ機能等を有するデジタル複合機である。画像形成装置１００は、画像処理装置１、画像入力装置２、画像出力装置３、送信装置４、操作パネル５、及び記憶部６を備えている。 [Embodiment 1]
In the following description, the conversion processing apparatus according to the present invention forms a part of the image processing apparatus 1 as the conversion processing unit 30, and the image processing apparatus 1 forms a part of the image forming apparatus 100. . FIG. 1 is a block diagram illustrating a functional configuration of an image forming apparatus 100 (information processing apparatus) according to the first embodiment. The image forming apparatus 100 is, for example, a digital multifunction machine having a copy function, a scanner function, and the like. The image forming apparatus 100 includes an image processing apparatus 1, an image input apparatus 2, an image output apparatus 3, a transmission apparatus 4, an operation panel 5, and a storage unit 6.

操作パネル５は、画像入力装置２、画像処理装置１、画像出力装置３及び送信装置４に接続されている。操作パネル５は、ユーザが画像形成装置１００の動作モードを設定するための設定ボタン及びテンキー等の操作部（不図示）と、液晶ディスプレイ等で構成される表示部（不図示）とを備える。 The operation panel 5 is connected to the image input device 2, the image processing device 1, the image output device 3, and the transmission device 4. The operation panel 5 includes an operation unit (not shown) such as a setting button and a numeric keypad for the user to set the operation mode of the image forming apparatus 100, and a display unit (not shown) configured by a liquid crystal display or the like.

画像形成装置１００で実行される各種処理は、不図示の制御部（ＣＰＵ（Central Processing Unit）あるいはＤＳＰ（Digital Signal Processor）等のプロセッサを含むコンピュータ）が制御する。画像形成装置１００の制御部は、不図示のネットワークカード及びＬＡＮケーブルを介して、ネットワークに接続されたコンピュータ及び他のデジタル複合機等とデータ通信を行う。 Various processes executed by the image forming apparatus 100 are controlled by a control unit (a computer including a processor such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor)) (not shown). The control unit of the image forming apparatus 100 performs data communication with a computer and other digital multifunction peripherals connected to the network via a network card (not shown) and a LAN cable.

以下、画像形成装置１００の各部について詳述する。画像入力装置２は、原稿から画像を光学的に読み取る。画像入力装置２は、たとえば、ＣＣＤ（Charge Coupled Device）を有するカラースキャナよりなり、原稿からの反射光像を、ＣＣＤを用いてＲＧＢ（Ｒ：赤，Ｇ：緑，Ｂ：青）のアナログ信号として読み取り、画像処理装置１へ出力する。画像入力装置２は、スキャナでなくてもよく、たとえば、デジタルカメラ等であってもよい。 Hereinafter, each part of the image forming apparatus 100 will be described in detail. The image input device 2 optically reads an image from a document. The image input device 2 is composed of, for example, a color scanner having a CCD (Charge Coupled Device), and an RGB (R: red, G: green, B: blue) analog signal is obtained from the reflected light image from the original using the CCD. And output to the image processing apparatus 1. The image input apparatus 2 may not be a scanner, and may be a digital camera, for example.

画像処理装置１は、画像入力装置２が読み取った画像データに処理を施し、処理を施した画像データを保存、あるいは、送信するために圧縮ファイルを生成する。画像処理装置１は、画像入力装置２から入力されたＲＧＢのアナログ信号に対して、Ａ／Ｄ変換部１０、シェーディング補正部１１、原稿種別判別部１２、入力階調補正部１３、及び領域分離処理部１４により、各後述する画像処理を実行することによって、ＲＧＢのデジタル信号（以下、ＲＧＢ信号という）からなる画像データを生成する。 The image processing device 1 performs processing on the image data read by the image input device 2, and generates a compressed file for storing or transmitting the processed image data. The image processing apparatus 1 performs an A / D conversion unit 10, a shading correction unit 11, a document type determination unit 12, an input tone correction unit 13, and a region separation for RGB analog signals input from the image input device 2. Image data composed of RGB digital signals (hereinafter referred to as RGB signals) is generated by executing image processing to be described later by the processing unit 14.

また、画像処理装置１は、領域分離処理部１４が出力したＲＧＢ信号に対して色補正部１５、黒生成下地除去部１６、空間フィルタ処理部１７、出力階調補正部１８、及び階調再現処理部１９により、各後述する画像処理を実行することによって、ＣＭＹＫ（Ｃ：シアン，Ｍ：マゼンタ，Ｙ：イエロー，Ｋ：ブラック）のデジタル信号からなる画像データを生成して、ストリームとして画像出力装置３へ出力する。なお、画像出力装置３へ出力される前に、画像データが記憶部６に一旦記憶されてもよい。記憶部６は、たとえば、不揮発性の記憶装置（たとえばハードディスク）である。 The image processing apparatus 1 also performs color correction unit 15, black generation background removal unit 16, spatial filter processing unit 17, output tone correction unit 18, and tone reproduction for the RGB signal output from region separation processing unit 14. The processing unit 19 executes image processing to be described later to generate image data composed of digital signals of CMYK (C: cyan, M: magenta, Y: yellow, K: black), and output the image as a stream. Output to device 3. Note that the image data may be temporarily stored in the storage unit 6 before being output to the image output device 3. The storage unit 6 is, for example, a nonvolatile storage device (for example, a hard disk).

画像出力装置３は、画像処理装置１が生成した画像データに基づいて画像を出力する。画像出力装置３は、画像処理装置１から入力された画像データに基づいて、熱転写、電子写真、又はインクジェット等の方式により、記録シート（たとえば記録用紙等）上にカラー画像を形成（印刷）して出力する。なお、画像出力装置３はカラー画像を出力する構成に限られるものではなく、たとえば、記録シート上にモノクローム画像（白黒画像）を形成して出力する構成であってもよい。この場合、画像処理装置１により、カラー画像の画像データがモノクローム画像の画像データに変換されてから画像出力装置３へ出力される。 The image output device 3 outputs an image based on the image data generated by the image processing device 1. The image output device 3 forms (prints) a color image on a recording sheet (for example, recording paper) by a method such as thermal transfer, electrophotography, or ink jet based on the image data input from the image processing device 1. Output. Note that the image output device 3 is not limited to a configuration that outputs a color image, and may be a configuration that forms and outputs a monochrome image (monochrome image) on a recording sheet, for example. In this case, the image processing apparatus 1 converts the color image image data into monochrome image data, and then outputs the image data to the image output apparatus 3.

更にまた、画像処理装置１において、領域分離処理部１４が出力したＲＧＢ信号に対して圧縮処理部２０が、画像圧縮処理を実行することによって、圧縮されたカラー画像の画像データを有する圧縮ファイルを生成し、送信装置４へ出力する。なお、送信装置４へ出力される前に、圧縮ファイルが記憶部６に一旦記憶されてもよい。 Furthermore, in the image processing apparatus 1, the compression processing unit 20 executes image compression processing on the RGB signals output from the region separation processing unit 14, thereby generating a compressed file having image data of a compressed color image. Generate and output to the transmitter 4. Note that the compressed file may be temporarily stored in the storage unit 6 before being output to the transmission device 4.

画像処理装置１の変換処理部３０は、操作パネル５においてフォーマット変換モードが選択されている場合、領域分離処理部１４が出力したＲＧＢ信号に対してフォーマット変換処理を実行する。このフォーマット変換処理により、変換処理部３０は、後述のように、カラー画像が有する文書レイアウトを解析して文書構造ツリーを生成し、この文書構造ツリーを操作パネル５においてユーザが選択したフォーマットに変換して、送信装置４へ出力する。なお、変換処理部３０は、本発明に係る変換処理装置として機能する機能部でもある。また、送信装置４へ出力される前に、変換されたファイルが記憶部６に一旦記憶されてもよい。 When the format conversion mode is selected on the operation panel 5, the conversion processing unit 30 of the image processing apparatus 1 executes format conversion processing on the RGB signals output from the region separation processing unit 14. By this format conversion processing, the conversion processing unit 30 analyzes the document layout of the color image to generate a document structure tree as will be described later, and converts this document structure tree into the format selected by the user on the operation panel 5. And output to the transmission device 4. The conversion processing unit 30 is also a functional unit that functions as a conversion processing device according to the present invention. In addition, the converted file may be temporarily stored in the storage unit 6 before being output to the transmission device 4.

送信装置４は、画像処理装置１が生成した圧縮ファイルを外部へ送信する。送信装置４は、図示しない公衆回線網、ＬＡＮ（Local Area Network）又はインターネット等の通信ネットワークに接続可能であり、ファクシミリ又は電子メール等の通信方法により、通信ネットワークを介して外部へ圧縮ファイルを送信する。たとえば、操作パネル５において「scan to e-mail」モードが選択されている場合、ネットワークカード、モデム等を用いてなる送信装置４は、圧縮ファイルを電子メールに添付し、設定された送信先へ送信する。 The transmission device 4 transmits the compressed file generated by the image processing device 1 to the outside. The transmission device 4 can be connected to a communication network such as a public network (not shown), a LAN (Local Area Network), or the Internet, and transmits a compressed file to the outside via the communication network by a communication method such as facsimile or e-mail. To do. For example, when the “scan to e-mail” mode is selected on the operation panel 5, the transmission device 4 using a network card, a modem, etc. attaches the compressed file to the e-mail and sends it to the set destination. Send.

なお、ファクシミリ送信を行う場合は、画像形成装置１００の制御部が、モデム等を用いてなる送信装置４により、相手先との通信手続きを行い、送信可能な状態が確保されたときに、圧縮ファイルに対して圧縮形式の変更等の必要な処理を施してから、相手先に通信回線を介して順次送信する。
また、ファクシミリを受信する場合、画像形成装置１００の制御部は、送信装置４により、通信手続きを行いながら、相手先から送信されてくる圧縮ファイルを受信して、画像処理装置に入力する。 When facsimile transmission is performed, the control unit of the image forming apparatus 100 performs a communication procedure with the other party by the transmission apparatus 4 using a modem or the like, and compression is performed when a transmission possible state is ensured. After performing necessary processing such as changing the compression format on the file, the file is sequentially transmitted to the other party via a communication line.
When receiving a facsimile, the control unit of the image forming apparatus 100 receives a compressed file transmitted from the other party and inputs it to the image processing apparatus while performing a communication procedure by the transmission apparatus 4.

画像処理装置１では、受信した圧縮ファイルに対し、不図示の圧縮／伸張処理部で伸張処理が施される。圧縮ファイルを伸張することによって得られた画像データには、必要に応じて、不図示の処理部で回転処理及び／又は解像度変換処理等が施され、また、出力階調補正部１８で出力階調補正が施され、階調再現処理部１９で階調再現処理が施される。各種画像処理が施された画像データは、画像出力装置３へ出力され、画像出力装置３により、記録シート上に画像が形成される。 In the image processing apparatus 1, the received compressed file is decompressed by a compression / decompression processing unit (not shown). The image data obtained by decompressing the compressed file is subjected to rotation processing and / or resolution conversion processing by a processing unit (not shown) as necessary, and the output gradation correction unit 18 outputs the output level. Tone correction is performed, and the gradation reproduction processing unit 19 performs gradation reproduction processing. The image data that has been subjected to various types of image processing is output to the image output device 3, and an image is formed on the recording sheet by the image output device 3.

以下では、画像処理装置１の構成について、画像処理装置における画像処理及びフォーマット変換処理を詳述しながら説明する。Ａ／Ｄ変換部１０は、画像入力装置２から画像処理装置１へ入力されたＲＧＢのアナログ信号を受け付け、ＲＧＢのアナログ信号をＲＧＢのデジタル信号（即ちＲＧＢ信号）へ変換し、変換したＲＧＢ信号をシェーディング補正部１１へ出力する。 Hereinafter, the configuration of the image processing apparatus 1 will be described in detail with respect to image processing and format conversion processing in the image processing apparatus. The A / D conversion unit 10 receives RGB analog signals input from the image input device 2 to the image processing device 1, converts the RGB analog signals into RGB digital signals (that is, RGB signals), and converts the converted RGB signals. Is output to the shading correction unit 11.

シェーディング補正部１１は、Ａ／Ｄ変換部１０から入力されたＲＧＢ信号に対して、画像入力装置２の照明系、結像系及び撮像系で生じる各種の歪みを取り除く処理を行う。次いで、シェーディング補正部１１は、歪みを取り除いたＲＧＢ信号を原稿種別判別部１２へ出力する。原稿種別判別部１２では、シェーディング補正部１１から入力されたＲＧＢの反射率信号をＲＧＢ各色の濃度を示す濃度信号に変換し、文字、印刷写真、又は写真（連続階調写真）等の原稿のモードを判別する原稿種別判別処理が実行される。原稿種別判別処理を、ユーザが操作パネル５を用いてマニュアル設定する場合、原稿種別判別部１２は、シェーディング補正部１１から入力されたＲＧＢ信号をそのまま後段の入力階調補正部１３に出力する。原稿種別判別処理の処理結果は、後段の画像処理に反映される。 The shading correction unit 11 performs processing for removing various distortions generated in the illumination system, the imaging system, and the imaging system of the image input device 2 on the RGB signal input from the A / D conversion unit 10. Next, the shading correction unit 11 outputs the RGB signal from which distortion has been removed to the document type determination unit 12. The document type discrimination unit 12 converts the RGB reflectance signal input from the shading correction unit 11 into a density signal indicating the density of each of the RGB colors, and converts a document such as a character, a printed photograph, or a photograph (continuous tone photograph). Document type determination processing for determining the mode is executed. When the user manually sets the document type determination process using the operation panel 5, the document type determination unit 12 outputs the RGB signal input from the shading correction unit 11 to the input tone correction unit 13 at the subsequent stage as it is. The processing result of the document type determination processing is reflected in the subsequent image processing.

入力階調補正部１３は、ＲＧＢ信号に対して、カラーバランスの調整、下地濃度の除去、及びコントラストの調整等の画質調整処理を行う。入力階調補正部１３は、次に、処理を行ったＲＧＢ信号を領域分離処理部１４へ出力する。領域分離処理部１４は、入力階調補正部１３から入力されたＲＧＢ信号が表す画像中の各画素を、文字領域、網点領域、又は写真領域のいずれかに分離する。また、領域分離処理部１４は、分離結果に基づき、各画素がいずれの領域に属しているかを示す領域識別信号を、黒生成下地除去部１６、空間フィルタ処理部１７、階調再現処理部１９、及び圧縮処理部２０へ出力する。更に、領域分離処理部１４は、入力階調補正部１３から入力されたＲＧＢ信号を、そのまま後段の色補正部１５及び圧縮処理部２０へ出力する。 The input tone correction unit 13 performs image quality adjustment processing such as color balance adjustment, background density removal, and contrast adjustment on the RGB signals. Next, the input tone correction unit 13 outputs the processed RGB signal to the region separation processing unit 14. The region separation processing unit 14 separates each pixel in the image represented by the RGB signal input from the input tone correction unit 13 into one of a character region, a halftone region, and a photo region. Further, the region separation processing unit 14 generates a region identification signal indicating which region each pixel belongs to based on the separation result, and generates a black generation background removal unit 16, a spatial filter processing unit 17, and a gradation reproduction processing unit 19. And to the compression processing unit 20. Further, the region separation processing unit 14 outputs the RGB signal input from the input tone correction unit 13 to the subsequent color correction unit 15 and compression processing unit 20 as they are.

色補正部１５は、領域分離処理部１４から入力されたＲＧＢ信号をＣＭＹのデジタル信号（以下、ＣＭＹ信号という）へ変換し、色再現の忠実化実現のために、不要吸収成分を含むＣＭＹ色材の分光特性に基づいた色濁りをＣＭＹ信号から取り除く処理を行う。次いで、色補正部１５は、色補正後のＣＭＹ信号を黒生成下地除去部１６へ出力する。黒生成下地除去部１６は、色補正部１５から入力されたＣＭＹ信号に基づき、ＣＭＹ信号から黒色（Ｋ）信号を生成する黒生成処理と、ＣＭＹ信号から黒生成処理で得たＫ信号を差し引いて新たなＣＭＹ信号を生成する処理とを行う。この結果、ＣＭＹ３色のデジタル信号は、ＣＭＹＫ４色のデジタル信号（以下、ＣＭＹＫ信号という）に変換される。次いで、黒生成下地除去部１６は、ＣＭＹ信号を変換したＣＭＹＫ信号を空間フィルタ処理部１７へ出力する。 The color correction unit 15 converts the RGB signal input from the region separation processing unit 14 into a CMY digital signal (hereinafter referred to as “CMY signal”), and CMY colors including unnecessary absorption components for realizing faithful color reproduction. Processing for removing color turbidity based on the spectral characteristics of the material from the CMY signal is performed. Next, the color correction unit 15 outputs the color-corrected CMY signal to the black generation background removal unit 16. Based on the CMY signal input from the color correction unit 15, the black generation background removal unit 16 subtracts the black generation process for generating a black (K) signal from the CMY signal and the K signal obtained by the black generation process from the CMY signal. To generate a new CMY signal. As a result, the CMY3 color digital signals are converted into CMYK 4 color digital signals (hereinafter referred to as CMYK signals). Next, the black generation background removal unit 16 outputs the CMYK signal obtained by converting the CMY signal to the spatial filter processing unit 17.

黒生成処理の一例としては、一般に、スケルトン・ブラックによる黒生成を行う方法が用いられる。この方法では、スケルトン・カーブの入出力特性をｙ＝ｆ（ｘ）、入力されるデータをＣ，Ｍ，Ｙ、出力されるデータをＣ'，Ｍ'，Ｙ'，Ｋ'、ＵＣＲ（Under Color Removal）率をα（０＜α＜１）とすると、黒生成下地除去処理は、下記の式（１）〜式（４）で表わされる。 As an example of the black generation process, a method of generating black by skeleton black is generally used. In this method, the input / output characteristic of the skeleton curve is y = f (x), the input data is C, M, Y, the output data is C ′, M ′, Y ′, K ′, UCR (Under When the color removal rate is α (0 <α <1), the black generation background removal processing is expressed by the following equations (1) to (4).

ここで、ＵＣＲ率α（０＜α＜１）とは、ＣＭＹが重なっている部分をＫに置き換えてＣＭＹをどの程度削減するかを示すものである。式（１）は、ＣＭＹの各信号強度の内の最も小さい信号強度に応じてＫ信号が生成されることを示している。 Here, the UCR rate α (0 <α <1) indicates how much CMY is reduced by replacing the portion where CMY overlaps with K. Equation (1) indicates that the K signal is generated in accordance with the smallest signal strength among the signal strengths of CMY.

空間フィルタ処理部１７は、黒生成下地除去部１６から入力されたＣＭＹＫ信号の画像データに対して、領域分離処理部１４から入力された領域識別信号に基づいてデジタルフィルタによる空間フィルタ処理を行い、空間周波数特性を補正することによって、画像のぼやけ又は粒状性劣化を改善する。たとえば、領域分離処理部１４により文字に分離された領域に対しては、空間フィルタ処理部１７は、文字の再現性を高めるために、高周波成分の強調量が大きいフィルタを用いて空間フィルタ処理を行う。また、領域分離処理部１４により網点に分離された領域に対しては、空間フィルタ処理部は、入力網点成分を除去するためのローパス・フィルタ処理を行う。 The spatial filter processing unit 17 performs spatial filter processing using a digital filter on the image data of the CMYK signal input from the black generation background removal unit 16 based on the region identification signal input from the region separation processing unit 14, By correcting the spatial frequency characteristics, blurring or graininess degradation of the image is improved. For example, for a region separated into characters by the region separation processing unit 14, the spatial filter processing unit 17 performs spatial filter processing using a filter having a high enhancement amount of high-frequency components in order to improve character reproducibility. Do. In addition, for the region separated into halftone dots by the region separation processing unit 14, the spatial filter processing unit performs low-pass filter processing for removing the input halftone component.

次いで、空間フィルタ処理部１７は、処理後のＣＭＹＫ信号を出力階調補正部１８へ出力する。出力階調補正部１８は、空間フィルタ処理部１７から入力されたＣＭＹＫ信号に対して、画像出力装置３の特性に基づく出力階調補正処理を行い、出力階調補正処理後のＣＭＹＫ信号を階調再現処理部１９へ出力する。階調再現処理部１９は、出力階調補正部１８から入力されたＣＭＹＫ信号に対して、領域分離処理部１４から入力された領域識別信号に基づいて、中間調処理を行う。たとえば、領域分離処理部１４により文字に分離された領域に対しては、階調再現処理部１９は、高域周波成分の再現に適した高解像度のスクリーンによる二値化又は多値化の処理を行う。また、領域分離処理部１４にて網点に分離された領域に対しては、階調再現処理部１９は、階調再現性を重視したスクリーンでの二値化又は多値化の処理を行う。次いで、階調再現処理部１９は、処理後の画像データを画像出力装置３へ出力する。 Next, the spatial filter processing unit 17 outputs the processed CMYK signal to the output tone correction unit 18. The output tone correction unit 18 performs output tone correction processing based on the characteristics of the image output device 3 on the CMYK signal input from the spatial filter processing unit 17, and outputs the CMYK signal after the output tone correction processing. Output to the tone reproduction processing unit 19. The gradation reproduction processing unit 19 performs halftone processing on the CMYK signal input from the output gradation correction unit 18 based on the region identification signal input from the region separation processing unit 14. For example, for a region separated into characters by the region separation processing unit 14, the gradation reproduction processing unit 19 performs binarization or multi-value processing using a high-resolution screen suitable for reproducing high-frequency components. I do. In addition, for a region separated into halftone dots by the region separation processing unit 14, the gradation reproduction processing unit 19 performs binarization or multi-value processing on the screen with an emphasis on gradation reproducibility. . Next, the gradation reproduction processing unit 19 outputs the processed image data to the image output device 3.

圧縮処理部２０は、領域分離処理部１４から入力された領域識別信号と、ＲＧＢ信号からなる画像データとに基づき、圧縮ファイルを生成する。圧縮処理部２０に入力される画像データは、マトリクス状に配置されている複数の画素で構成されている。この画像データは、前景レイヤと背景レイヤとに分離され、前景レイヤが更に二値画像に変換され、各二値画像がたとえばＭＭＲ（Modified Modified READ）で可逆圧縮され、背景レイヤがたとえばＪＰＥＧ(Joint Photographic Experts Group)で非可逆圧縮される。最後に、可逆圧縮された二値画像及び非可逆圧縮された背景レイヤと、これらを伸張してカラー画像の画像データとなすための伸張情報とが一つのファイルにまとめられる。このファイルが圧縮ファイルとなる。また、この伸張情報としては、圧縮形式を示す情報、及びインデックス・カラー・テーブル等が用いられる。画素毎に生成された領域識別信号の圧縮は、たとえば、可逆圧縮方法であるＭＭＲ方式、ＭＲ（Modified READ）方式に基づいて行われる。圧縮された画像データ（圧縮画像）は、一旦記憶部６に格納され、たとえば、操作パネル５において、「scan to e-mail」モードが選択されている場合、送信装置４から、電子メールに添付されて、設定された送信先に送信される。 The compression processing unit 20 generates a compressed file based on the region identification signal input from the region separation processing unit 14 and the image data composed of RGB signals. The image data input to the compression processing unit 20 is composed of a plurality of pixels arranged in a matrix. This image data is separated into a foreground layer and a background layer, the foreground layer is further converted into a binary image, each binary image is reversibly compressed by, for example, MMR (Modified Modified READ), and the background layer is converted to, for example, JPEG (Joint It is lossy compressed by Photographic Experts Group). Finally, the reversible compressed binary image and the irreversibly compressed background layer, and decompression information for decompressing them into color image data are collected into one file. This file becomes a compressed file. As the decompression information, information indicating a compression format, an index color table, and the like are used. The compression of the region identification signal generated for each pixel is performed based on, for example, the MMR method and the MR (Modified READ) method, which are lossless compression methods. The compressed image data (compressed image) is temporarily stored in the storage unit 6. For example, when the “scan to e-mail” mode is selected on the operation panel 5, the image data is attached to the e-mail from the transmission device 4. And sent to the set destination.

（変換処理部の構成）
変換処理部３０は、入力された文書画像の情報（以下、入力画像の情報ともいう）に対してフォーマット変換処理を実行する。変換処理部３０の詳細について、以下に説明する。図２は変換処理部３０の構成を示すブロック図である。変換処理部３０は、文字抽出処理部３１、ライン抽出処理部３２、表領域抽出処理部３３、図領域抽出処理部３４、表構造化処理部３５及びファイル記述部３６を備える。 (Configuration of conversion processing unit)
The conversion processing unit 30 performs format conversion processing on the input document image information (hereinafter also referred to as input image information). Details of the conversion processing unit 30 will be described below. FIG. 2 is a block diagram illustrating a configuration of the conversion processing unit 30. The conversion processing unit 30 includes a character extraction processing unit 31, a line extraction processing unit 32, a table region extraction processing unit 33, a diagram region extraction processing unit 34, a table structuring processing unit 35, and a file description unit 36.

文字抽出処理部３１は、入力画像に含まれる文字画像を抽出し、その文字画像が示す文字の情報をＯＣＲ等により解析する。また、文字抽出処理部３１は、各文字の並び方から、１つ以上の文字で構成される文字列を定義する（ここでは、１文字のものも文字列に含むこととする）。更に、文字抽出処理部３１は、抽出された文字のサイズや文字の色などの属性を取得する。 The character extraction processing unit 31 extracts a character image included in the input image, and analyzes character information indicated by the character image by OCR or the like. Further, the character extraction processing unit 31 defines a character string composed of one or more characters from the arrangement of each character (here, one character is included in the character string). Furthermore, the character extraction processing unit 31 acquires attributes such as the extracted character size and character color.

ライン抽出処理部３２は、入力画像に含まれるライン（線分）を抽出し、各ラインの情報を取得する。ラインの情報は、少なくともラインの抽出された位置（座標）や方向、長さ、幅及び色を含む。表領域抽出処理部３３は、ライン抽出処理部３２で抽出されたラインの情報から、互いに交差する水平方向のラインと垂直方向のラインからなる集合を表領域として抽出する。図領域抽出処理部３４は、入力画像に含まれる図や写真などの領域を抽出して、各領域の情報を取得する。図領域の情報として、少なくとも各図領域を囲む矩形の左上位置及びサイズ（幅、高さ）を含む。表構造化処理部３５は、抽出された表領域において、表領域に含まれるラインや、表領域と重複する位置で抽出された文字や図などの要素を用いて表構造を解析し、表を構造化するための情報（表構造情報）を取得する。ファイル記述部３６は、表構造の情報や文字、図などの要素、また表に含まれないライン等の情報から、入力画像と同様のレイアウトの文書構造となるように、指定されたファイル形式の記述方法に従ってファイルを記述する。 The line extraction processing unit 32 extracts lines (line segments) included in the input image and acquires information on each line. The line information includes at least the extracted position (coordinates), direction, length, width, and color of the line. The table area extraction processing unit 33 extracts, as a table area, a set of horizontal lines and vertical lines that intersect each other from the line information extracted by the line extraction processing unit 32. The figure region extraction processing unit 34 extracts regions such as diagrams and photographs included in the input image, and acquires information on each region. The figure area information includes at least the upper left position and size (width, height) of a rectangle surrounding each figure area. In the extracted table area, the table structuring processing unit 35 analyzes the table structure using the elements included in the table area, such as lines included in the table area and characters and diagrams extracted at positions overlapping with the table area. Information for structuring (table structure information) is acquired. The file description unit 36 uses a specified file format so that the document structure has a layout similar to that of the input image from information such as table structure information, elements such as characters and diagrams, and information such as lines not included in the table. Describe the file according to the description method.

文字抽出処理部３１、ライン抽出処理部３２、表領域抽出処理部３３、及び表構造化処理部３５は、公知の技術を利用してもよい。文字抽出処理部３１は、公知のＯＣＲ技術により文字及び文字列の抽出を行うことができる。文字列の抽出手法としては、たとえば、表から文字列を抽出する手法（特願２０１４−１７４３４８に記載の技術）を利用することができる。当該手法では、まず、文字抽出処理部３１が、ＯＣＲにより抽出した文字の文字矩形（外接矩形）を求める。文字抽出処理部３１が、求めた各矩形に対し、近傍の矩形同士の距離を算出し、近い矩形同士を同じ文字列のグループとして統合する。このとき、近傍の矩形サイズと比較して、あまりにもサイズが違い過ぎており、かつ、近傍に似たような矩形が存在しない矩形については、非文字矩形として除去する。文字抽出処理部３１は、各文字列グループの水平方向のサイズ、垂直方向のサイズを検出し、長い方を文字列の方向として設定する。水平方向のサイズと垂直方向のサイズが同じくらいの場合は、方向が判定できないので、文字列方向不定として設定する。文字抽出処理部３１は、同じ方向の近傍文字列でサイズが近いもの、または文字列の開始位置や終了位置が近いもの同士でグループ分けを行って文字列領域とする。 The character extraction processing unit 31, the line extraction processing unit 32, the table area extraction processing unit 33, and the table structuring processing unit 35 may use known techniques. The character extraction processing unit 31 can extract characters and character strings by a known OCR technique. As a method for extracting a character string, for example, a method for extracting a character string from a table (the technique described in Japanese Patent Application No. 2014-174348) can be used. In this method, first, the character extraction processing unit 31 obtains a character rectangle (circumscribed rectangle) of a character extracted by OCR. The character extraction processing unit 31 calculates the distance between adjacent rectangles for each obtained rectangle, and integrates the close rectangles into the same character string group. At this time, a rectangle whose size is too different from the neighboring rectangle size and a rectangle similar to the neighborhood does not exist is removed as a non-character rectangle. The character extraction processing unit 31 detects the horizontal size and the vertical size of each character string group, and sets the longer one as the character string direction. If the size in the horizontal direction is the same as the size in the vertical direction, the direction cannot be determined, and the character string direction is set to be indefinite. The character extraction processing unit 31 performs grouping on adjacent character strings in the same direction that are close in size, or those that are close in the start position and end position of the character string to form character string areas.

なお、文字抽出処理部３１による、文字の抽出及び文字列の抽出は、ライン抽出処理部３２の処理の前に行っても、後に行ってもよい。ただし、後述する図領域抽出処理部３４による処理は、文字、ライン、表の情報を用いることから、文字抽出処理部３１、ライン抽出処理部３２、表領域抽出処理部３３による処理は、図領域抽出処理部３４による処理よりも前に行われている必要がある。
また、文字抽出処理部３１が外接矩形を求める際、文字または文字列を含む表の各セルの大きさは既知であり、外接矩形は、セルの大きさを超えないことから、セルの大きさを超えるような外接矩形が得られた場合、セルの範囲以内となるように外接矩形の大きさを補正するようにしてもよい。 The character extraction and character string extraction by the character extraction processing unit 31 may be performed before or after the processing of the line extraction processing unit 32. However, since the processing by the figure region extraction processing unit 34 described later uses information on characters, lines, and tables, the processing by the character extraction processing unit 31, the line extraction processing unit 32, and the table region extraction processing unit 33 is performed in the figure region. It needs to be performed before the processing by the extraction processing unit 34.
Further, when the character extraction processing unit 31 obtains the circumscribed rectangle, the size of each cell in the table including the character or the character string is known, and the circumscribed rectangle does not exceed the size of the cell. When a circumscribed rectangle exceeding the range is obtained, the size of the circumscribed rectangle may be corrected so as to be within the range of the cell.

ライン抽出処理部３２及び表領域抽出処理部３３は、たとえば、参考文献１（特許第５１５３８５７号公報）で挙げられる方法によりライン及び表領域を抽出することができる。参考文献１に記載の方法では、文書画像データからラインとなる可能性のある候補画素を抽出し、前記候補画素が水平方向もしくは垂直方向に所定画素数以上連続する場合に、前記連続する候補画素の集合をラインとして抽出する。前記抽出された水平方向及び垂直方向のラインの位置関係から、各ラインが表を構成する罫線であるか単一のラインであるかを判定し、同一の表を構成するラインの集合について、それら全てを囲む最小外接矩形を表領域として抽出する。参考文献２に記載の方法では、ラインを構成する画素（ライン画素）を文書画像中から抽出しているため、ライン画素の画素値の平均値をラインの色として算出することも可能である。また、上記ライン画素の方向と直交する方向（水平方向のラインであれば垂直方向）に並ぶ画素数からラインの太さを算出することができ、たとえば垂直方向に連続するライン画素の平均値をラインの太さとして算出することも可能である。 The line extraction processing unit 32 and the table region extraction processing unit 33 can extract lines and table regions by the method described in Reference Document 1 (Japanese Patent No. 5153857), for example. In the method described in Reference 1, candidate pixels that are likely to be lines are extracted from document image data, and when the candidate pixels are continuous in a horizontal direction or a vertical direction by a predetermined number of pixels, the continuous candidate pixels are extracted. Is extracted as a line. From the positional relationship between the extracted horizontal and vertical lines, it is determined whether each line is a ruled line constituting a table or a single line, and for a set of lines constituting the same table, The minimum bounding rectangle that surrounds all is extracted as a table area. In the method described in Reference 2, since pixels (line pixels) constituting a line are extracted from a document image, it is possible to calculate an average value of pixel values of the line pixels as a line color. In addition, the line thickness can be calculated from the number of pixels arranged in a direction orthogonal to the direction of the line pixels (or a vertical direction in the case of a horizontal line). For example, an average value of line pixels continuous in the vertical direction can be calculated. It is also possible to calculate the thickness of the line.

また、よく知られた方法により、点線や破線を抽出することが可能であり、当該方法と、参考文献１の方法と組み合わせることで、実線以外の線種のラインを抽出することも可能である。たとえば、参考文献２（特開平７−２３０５２５号公報）の方法では、二値化された文書画像から、注目する罫線方向（水平または垂直方向）に連結する黒画素を抽出し、それぞれの連結画素を全て内包する矩形を抽出したとき、その大きさが所定の閾値以下である場合に点線要素とし、更に点線要素同士の間隔が所定閾値内である場合にそれらの点線要素同士を統合した矩形を、点線の罫線として抽出することができる。 Moreover, it is possible to extract a dotted line or a broken line by a well-known method, and it is also possible to extract a line type line other than a solid line by combining the method with the method of Reference Document 1. . For example, in the method of Reference 2 (Japanese Patent Application Laid-Open No. 7-230525), black pixels connected in the noted ruled line direction (horizontal or vertical direction) are extracted from a binarized document image, and each connected pixel is extracted. Is extracted as a dotted line element when its size is equal to or smaller than a predetermined threshold, and when the distance between the dotted line elements is within the predetermined threshold, a rectangle that integrates the dotted line elements is extracted. , And can be extracted as a dotted ruled line.

また、参考文献３（特開平４−６８４７７号公報）では、二値化された文書画像から黒画素の連結を追跡することで輪郭ベクトルを抽出する。そのうち実線の直線もしくは曲線を構成するものを除いた輪郭ベクトルで囲まれた領域の長軸および短軸の長さの関係から破線候補としての領域を抽出し、得られた破線候補の相互関係を調べることで、同一の破線を構成する破線領域を抽出している。 In Reference 3 (Japanese Patent Laid-Open No. 4-68477), a contour vector is extracted by tracking the connection of black pixels from a binarized document image. The area as the candidate for the broken line is extracted from the relationship between the length of the major axis and the minor axis of the area surrounded by the contour vector excluding the solid line or curved line, and the correlation between the obtained broken line candidates is calculated. By examining, a broken line area constituting the same broken line is extracted.

図領域抽出処理部３４については後述する。表構造化処理部３５は、たとえば、入力された文書画像と、入力画像から抽出されたラインから水平方向の基準線及び垂直方向の基準線を決定し、２本の隣り合う水平方向の基準線と、２本の隣り合う垂直方向の基準線からなる四角形をそれぞれ仮セルとして定義する。定義された仮セルを、入力画像から抽出された要素の情報をもとに、少なくとも１つ以上の仮セルからなる四角形の集合である結合セル候補として第１の分類を行う。第１の分類が行われた結合セル候補ごとに、入力画像から抽出された要素の情報をもとに、判定基準に従って、結合セル候補に含まれる仮セルを、最も適切とされるセルブロック（少なくとも１つ以上の仮セルからなる四角形の集合）として第２の分類を行う。第２の分類が行われたセルブロックの属性、及び予め抽出されている要素の情報から、表構造を解析し、表を再構成するために参照すべき情報を表構造情報として記述する。 The figure region extraction processing unit 34 will be described later. The table structuring processing unit 35 determines, for example, a horizontal reference line and a vertical reference line from the input document image and lines extracted from the input image, and two adjacent horizontal reference lines. A quadrangle composed of two adjacent vertical reference lines is defined as a temporary cell. Based on the element information extracted from the input image, the defined temporary cell is first classified as a combined cell candidate that is a set of quadrangles including at least one temporary cell. For each combined cell candidate subjected to the first classification, the temporary cell included in the combined cell candidate is determined to be the most appropriate cell block (in accordance with the criterion) based on the element information extracted from the input image ( The second classification is performed as a set of at least one temporary cell). The table structure is analyzed from the attribute of the cell block subjected to the second classification and the element information extracted in advance, and information to be referred to in order to reconstruct the table is described as the table structure information.

これらの表構造情報の具体的な記述方法について、特に決まりは無く、たとえば、図３のようにツリー形状に参照できるように記述しておくと、後述するＸＭＬ(Extensible Markup Language)形式のファイルフォーマットなどへの変換が容易になる。また、属性の種別ごとに別途リストなどを記述しておき、そのリストで指定されたＩＤ（Identification）を参照して情報を取得するようにすることで、同一の属性を何度も繰り返し記述することを避けることができる。たとえば抽出されたラインの情報を図４のようにリストとして記述し、各ラインにＩＤ（ラインＩＤ）を割り振っておき、図３のように記述されたツリー形状の表構造情報において、各セルブロックの罫線の情報としてラインＩＤを記述することで、同一のラインを共有する複数のセルブロック間で同一の情報を何度も繰り返して記述する必要が無くなる。記述の順序は特に決まりは無いが、記述の対象となるセルブロックの順序は、親ＩＤの小さい順とし、表の左上から順に情報が記述されるようにすることが望ましい。もちろん、図３及び図４に示された項目以外にも記述する情報を増やしてもよい。 The specific description method of the table structure information is not particularly determined. For example, if it is described so that it can be referred to in a tree shape as shown in FIG. 3, an XML (Extensible Markup Language) format file format described later is used. Conversion to etc becomes easy. In addition, a list or the like is separately described for each attribute type, and information is acquired by referring to an ID (Identification) specified in the list, so that the same attribute is repeatedly described. You can avoid that. For example, the extracted line information is described as a list as shown in FIG. 4, IDs (line IDs) are assigned to the respective lines, and each cell block in the tree-shaped table structure information described as shown in FIG. By describing the line ID as the ruled line information, it is not necessary to repeatedly describe the same information among a plurality of cell blocks sharing the same line. Although the order of description is not particularly determined, it is desirable that the order of cell blocks to be described is in descending order of parent ID, and information is described in order from the upper left of the table. Of course, information to be described may be increased in addition to the items shown in FIGS.

なお、変換したいオフィス文書ファイルの種類によって、後述するファイル記述部３６が実行するファイルの記述方法（変換方法）が異なる。たとえば、ＯｆｆｉｃｅＯｐｅｎＸＭＬ（ＯＯＸＭＬ）の規格に準拠するオフィススイート（オフィス業務用ソフトウェアのセット）では、ＸＭＬで記述された文書群と、写真やイラストレーション、図形などのバイナリデータを圧縮により１つのファイルとして統合したものをオフィス文書ファイルとして使用する。これに対して、異なる種類のソフトウェア（たとえば文書作成ソフトウェアとプレゼンテーションソフトウェア）の間ではファイルの記述方法は異なり、また、ＯＯＸＭＬと、異なる規格の文書形式（たとえば、ＯｐｅｎＤｏｃｕｍｅｎｔＦｏｒｍａｔ）では、同じ種類のソフトウェアであってもファイルの記述方法は異なる。 Note that the file description method (conversion method) executed by the file description unit 36 described later differs depending on the type of office document file to be converted. For example, in an office suite (a set of office business software) compliant with the OfficeOpenXML (OOXML) standard, a document group described in XML and binary data such as photographs, illustrations, and figures are integrated into one file by compression. Use as an office document file. On the other hand, the file description method differs between different types of software (for example, document creation software and presentation software), and the same type of software between OOXML and different document formats (for example, OpenDocument Format). Even so, the file description method is different.

従って、あらゆるファイル形式に容易に対応できるようにするため、表構造化処理部３５では、特定のファイル形式に限定されない表構造情報を取得するようにし、後述するファイル記述部３６において、前記表構造情報を用いて、指定されたファイル形式に合わせて表を構造化するようにする。このようにすることで、ユーザが、変換処理部３０の機能を有する変換処理装置を利用する際に、あるファイル形式で変換した結果を見た後に、異なるファイル形式に変換する場合も、表構造化処理部３５を最初から実行しなくても、既に得られている表構造情報を再使用して容易に変換することが可能となる。 Therefore, in order to be able to easily cope with any file format, the table structuring processing unit 35 acquires table structure information not limited to a specific file format, and the file description unit 36 described later uses the table structure information. Use information to structure the table for the specified file format. In this way, when a user uses a conversion processing device having the function of the conversion processing unit 30, the user can see the result of conversion in a certain file format and then convert the file into a different file format. Even if the conversion processing unit 35 is not executed from the beginning, the table structure information already obtained can be reused and easily converted.

ファイル記述部３６は、前段までに抽出された文字や図などの要素の情報、ラインの情報及び表構造情報を用いて、指定されたファイル形式への変換を行う。ファイル記述部３６は、指定されたファイル形式によって記述方法が異なるが、ファイル構造が公開されているファイル形式については、既に得られている表構造情報や要素の情報等を用いて公知の方法により記述することができる。たとえば、Ｍｉｃｒｏｓｏｆｔ（登録商標）社が提供する文書作成ソフトウェアである「ＭｉｃｒｏｓｏｆｔＷｏｒｄ２０１０」（左記において、Ｍｉｃｒｏｓｏｆｔは登録商標）、またはその後継バージョン（以降、単純にＷｏｒｄと表記する）のファイル形式（ｄｏｃｘ）が採用するファイルフォーマットであるＯＯＸＭＬは、ＥＣＭＡ−３７６、及びＩＳＯ／ＩＥＣ２９５００として標準化されており、それらの仕様書に記載されたフォーマットに従って記述することでＷｏｒｄのファイルとして構造化することができる。ＥＣＭＡ−３７６として標準化されたフォーマットは、下記の参考文献４として公開されており、参考文献４に記載のフォーマットに従って記述する。以下では、文書画像中の特に表の構造化に係る部分の処理例を挙げ、詳細の記述方法については省略する。
「参考文献４：ECMA-376, 4th Edition Office Open XML File Formats、［平成２７年９月１０日検索］、インターネット(URL http://www.ecma-international.org/publications/standards/Ecma-376.htm)」 The file description unit 36 performs conversion into a designated file format using information of elements such as characters and figures, line information, and table structure information extracted up to the previous stage. The file description unit 36 has a different description method depending on the designated file format. However, the file format whose file structure is publicly disclosed can be obtained by a known method using already obtained table structure information, element information, or the like. Can be described. For example, a document creation software provided by Microsoft (registered trademark) “Microsoft Word 2010” (in the left column, Microsoft is a registered trademark), or a succeeding version (hereinafter simply referred to as Word) file format (docx) OOXML, which is a file format adopted by) is standardized as ECMA-376 and ISO / IEC 29500, and can be structured as a Word file by describing according to the format described in those specifications. . The format standardized as ECMA-376 is disclosed as the following Reference 4 and is described according to the format described in Reference 4. In the following, a processing example of a part related to the structuring of the table in the document image will be given, and a detailed description method will be omitted.
“Reference 4: ECMA-376, 4th Edition Office Open XML File Formats, [searched on September 10, 2015], Internet (URL http://www.ecma-international.org/publications/standards/Ecma-376 .htm) "

図５は、Ｗｏｒｄファイルのファイル構造（一部）の一例を示すツリー図である。Ｗｏｒｄファイルでは、図５に示すような一連のフォルダ及びファイルをＺＩＰ圧縮し、ファイル拡張子をｄｏｃｘに置き換えることで１つのファイルとしている。図５の１行目の/word/フォルダ以下に、文書を構造化するためのデータを記述したファイルや、文書画像中から抽出されたグラフィックを画像として保存した画像ファイルなどを格納する。たとえば、document.xmlファイルは、ドキュメント本文を構成する文字（列）やグラフィック、表などのオブジェクトを、その情報に従って、ＷｏｒｄＰｒｏｃｅｓｓｉｎｇＭＬと呼ばれるマークアップ言語を用いて記述される。たとえば表は、図６に示すように、<w:tbl>を開始宣言、</w:tbl>を終了宣言とするタグの間に記述された情報（以下、この情報を記述１という）に従って、表を構造化する。記述１は、表全体にわたるプロパティなどの情報（以下、この情報を記述２という）と、各行を構造化するための情報（以下、この情報を記述３という）に分かれる。 FIG. 5 is a tree diagram illustrating an example of a file structure (part) of a Word file. In the Word file, a series of folders and files as shown in FIG. 5 are ZIP-compressed and the file extension is replaced with docx to form one file. A file describing data for structuring a document, an image file storing graphics extracted from a document image, and the like are stored in the / word / folder on the first line of FIG. For example, the document.xml file describes objects such as characters (columns), graphics, and tables that make up the document body according to the information using a markup language called WordProcessingML. For example, as shown in FIG. 6, the table is based on information described between tags having <w: tbl> as a start declaration and </ w: tbl> as an end declaration (hereinafter, this information is referred to as description 1). Structuring the table. The description 1 is divided into information such as properties over the entire table (hereinafter this information is referred to as description 2) and information for structuring each row (hereinafter this information is referred to as description 3).

記述３は<w:tr>から</w:tr>の間に記述され、先頭の行から順に行数分反復する。記述３は更に、行全体にわたるプロパティなどの情報（以下、この情報を記述４という）と、各セルを構造化するための情報（以下、この情報を記述５という）に分かれる。記述５は<w:tc>から</w:tc>の間に記述され、先頭（左端）から順に列数分反復する。記述５は更に、セルに関するプロパティなどの情報（以下、この情報を記述６という）と、各セルに格納される要素の集合である段落を構造化するための情報（以下、この情報を記述７という）に分かれる。記述７の<w:p>から</w:p>の間に記述される内容は１つの段落を示す。記述７は、更に、各段落に関するプロパティなどの情報（以下、この情報を記述８という）と、格納される要素を構造化するための情報（以下、この情報を記述９という）に分かれる。 Description 3 is described between <w: tr> and </ w: tr>, and repeats for the number of lines in order from the first line. The description 3 is further divided into information such as properties over the entire row (hereinafter this information is referred to as description 4) and information for structuring each cell (hereinafter this information is referred to as description 5). Description 5 is described between <w: tc> and </ w: tc>, and is repeated for the number of columns in order from the top (left end). The description 5 further includes information such as properties related to cells (hereinafter, this information is referred to as description 6) and information for structuring a paragraph that is a set of elements stored in each cell (hereinafter, this information is described as 7). Divided). The content described between <w: p> and </ w: p> in description 7 indicates one paragraph. The description 7 is further divided into information such as properties relating to each paragraph (hereinafter this information is referred to as description 8) and information for structuring the stored elements (hereinafter this information is referred to as description 9).

記述９の<w:r>から</w:r>の間に記述される内容は１つのランを示す。ラン毎に、プロパティなどの情報（以下、この情報を記述１０という）と要素を示すデータの情報（以下、この情報を記述１１という）を記述するため、プロパティなどの情報を共有する要素毎にランに分類する。すなわち、文字列の場合でも、文字色やサイズなどのプロパティが異なる場合はランを分ける。なお、図６において、記述１１にあたる<w:t>から</w:t>の間の情報は文字（列）データを記述するものであり、画像を格納する場合は<w:drawing>から</w:drawing>の間に、格納する画像に関する情報を記述する。なお、画像ファイルは図５の/word/mediaフォルダに格納され、/word/rels/document.xml.relsにおいてファイル名と関連づけられたIDをdocument.xmlファイル内で参照することにより、対応するグラフィックを文書ファイル中に貼り付けることができる。図５の各ファイルの詳細、及び具体的な各ファイルの記述方法については参考文献４などのフォーマット仕様書を参照されたい。 The content described between <w: r> and </ w: r> of description 9 indicates one run. For each run, information such as properties (hereinafter referred to as “description 10”) and data information indicating elements (hereinafter referred to as “description 11”) are described. Classify into runs. That is, even in the case of a character string, if properties such as character color and size are different, the runs are divided. In FIG. 6, information between <w: t> and </ w: t> corresponding to description 11 describes character (string) data. When storing an image, from <w: drawing> </ w: drawing> describes information about the image to be stored. The image file is stored in the / word / media folder in FIG. 5, and the corresponding graphic is obtained by referring to the ID associated with the file name in /word/rels/document.xml.rels in the document.xml file. Can be pasted into a document file. For details of each file in FIG. 5 and a specific description method of each file, refer to the format specifications such as Reference 4.

なお、各種属性において算出過程を省略するなどにより適切な値が設定されていない場合は、その属性に関するプロパティの記述などを省略し、各種アプリケーションで設定される初期値を使用するようにしてもよい。たとえば、文字色の算出を省略した場合、黒色など予め決められた色を初期値として使うことができる。なお、セルブロックの四辺の罫線について、前述の通り「不定」として与えられている場合、そのセルの辺の罫線情報は設定しない。また、各行において、行を構成する全ての仮セルの上辺もしくは下辺の罫線が、同一のラインにより構成されている場合、セル単位でなく、行単位（図６では記述４にあたる）で上辺もしくは下辺の罫線の情報を一括して設定するようにしてもよい。 If appropriate values are not set for various attributes, such as omitting the calculation process, description of properties related to the attributes may be omitted, and initial values set by various applications may be used. . For example, when the calculation of the character color is omitted, a predetermined color such as black can be used as the initial value. When the ruled lines on the four sides of the cell block are given as “undefined” as described above, the ruled line information on the sides of the cell is not set. Also, in each row, when the ruled lines on the upper side or the lower side of all the temporary cells constituting the row are constituted by the same line, the upper side or the lower side is not a cell unit but a row unit (corresponding to description 4 in FIG. 6). The ruled line information may be set collectively.

また、上述した図３において、セルブロックに対応付けられる格納要素（第１要素、第２要素、…）としては、文字や、文字を複数並べて成る文字列に加えて、画像オブジェクトである図、及び写真などがある。これらの情報をどのよう利用するかはファイル記述部３６における記述の仕方と変換するファイルフォーマット次第である。例えば、Ｍｉｃｒｏｓｏｆｔ（登録商標）社のＥｘｃｅｌなどは、セルの中に画像を文字のように記述することができないため、単純に画像オブジェクトを重ねているだけになる。これに対して、Ｗｏｒｄであれば、表のセル中に画像オブジェクトを挿入することもできる。したがって、本実施の形態では、変換先のファイルフォーマットにしたがって変換を行うことになる。 In addition, in FIG. 3 described above, as storage elements (first element, second element,...) Associated with cell blocks, in addition to characters and character strings formed by arranging a plurality of characters, FIG. And photos. How to use these pieces of information depends on the description method in the file description unit 36 and the file format to be converted. For example, Microsoft (registered trademark) Excel and the like cannot simply describe an image in a cell like a character. On the other hand, in the case of Word, an image object can be inserted into a table cell. Therefore, in this embodiment, conversion is performed according to the file format of the conversion destination.

図７は、図領域抽出処理部３４（図領域抽出装置）の構成を示すブロック図である。図領域抽出処理部３４は、非文字列マップ生成処理部３４１、非文字列エリア追加処理部３４２、オブジェクトマップ生成処理部３４３、及び有効オブジェクトエリア判定処理部３４４を備える。 FIG. 7 is a block diagram showing the configuration of the figure area extraction processing unit 34 (figure area extraction apparatus). The figure region extraction processing unit 34 includes a non-character string map generation processing unit 341, a non-character string area addition processing unit 342, an object map generation processing unit 343, and an effective object area determination processing unit 344.

非文字列マップ生成処理部３４１は、入力画像に対してエッジ検出処理を行い、得られたエッジ検出結果に対し、文字抽出処理部３１で抽出された文字から構成される文字列領域を排除することにより、残されたエッジ領域を非文字列マップとして生成する。非文字列エリア追加処理部３４２は、文字抽出処理部３１で抽出された文字から構成された文字列領域を除いた領域に対して画像のヒストグラムエントロピー値を算出し、エントロピー値の高い領域を非文字列エリアとして非文字列マップに追加する。オブジェクトマップ生成処理部３４３は、非文字列エリアが追加された非文字列マップに対して表のライン領域の除去処理、ラベリング処理、矩形化処理を行い、オブジェクトマップを生成する。有効オブジェクトエリア判定処理部３４４は、オブジェクトマップ生成処理部３４３で生成されたオブジェクトマップ上の各オブジェクトに対し、必要に応じて矩形領域の統合・分割処理を行い、最終的にフォーマット変換処理時に変換すべき画像オブジェクトであるかを判定し、変換すべき画像オブジェクトと判定した場合はそのオブジェクトエリアをマップ上に残し、変換すべきでないと判定した場合にはそのオブジェクトエリアをマップ上から削除する。 The non-character string map generation processing unit 341 performs edge detection processing on the input image, and excludes a character string region composed of characters extracted by the character extraction processing unit 31 from the obtained edge detection result. As a result, the remaining edge region is generated as a non-character string map. The non-character string area addition processing unit 342 calculates a histogram entropy value of the image with respect to a region excluding the character string region composed of the characters extracted by the character extraction processing unit 31, and removes a region having a high entropy value. Add to non-string map as string area. The object map generation processing unit 343 performs a line area removal process, a labeling process, and a rectangular process on the non-character string map to which the non-character string area is added, and generates an object map. The valid object area determination processing unit 344 performs rectangular area integration / division processing on each object on the object map generated by the object map generation processing unit 343 as necessary, and finally converts it during the format conversion processing. It is determined whether the image object is to be converted. If it is determined that the image object is to be converted, the object area is left on the map. If it is determined that the image object is not to be converted, the object area is deleted from the map.

以下では、図領域抽出処理部３４の各処理部について図８の表画像４００と図９，１２，１５，１９のフローチャートを用いながら詳述する。図８の表画像４００における表において、四角とハートは便宜上網掛けにしているが、実際には写真のように適度な濃度変化があり、また、図形の外周のエッジ強度が弱いものであるとする。非文字列マップ生成処理部３４１は、図９のフローチャートに示すように、まず、入力画像に対してエッジ検出処理を行う（ステップＳａ１）。エッジ検出の方法としては、たとえば、ＳｏｂｅｌフィルタやＰｒｅｗｉｔｔフィルタなどの１次微分フィルタや、ラプラシアンフィルタなどの２次微分系のフィルタを用いて検出を行う。ここでは一例として、ラプラシアンフィルタを用いてＲＧＢ値のＧ値に対してエッジ検出を行う方法について説明する。図１０（ａ）は３×３サイズのラプラシアンフィルタである。フィルタ処理を行う注目画素の水平座標位置をｘ、垂直座標位置をｙとし、注目画素のＧ値をｐ＿ｇ（ｘ，ｙ）と表すとき、ラプラシアンフィルタ処理結果ｐ’_ｇ（ｘ，ｙ）は以下の式（５）によって表される。 Hereinafter, each processing unit of the figure region extraction processing unit 34 will be described in detail with reference to the table image 400 of FIG. 8 and the flowcharts of FIGS. In the table in the table image 400 of FIG. 8, the square and the heart are shaded for convenience, but actually there is an appropriate density change as in the photograph, and the edge strength at the outer periphery of the figure is weak. To do. As shown in the flowchart of FIG. 9, the non-character string map generation processing unit 341 first performs edge detection processing on the input image (step Sa1). As an edge detection method, detection is performed using, for example, a primary differential filter such as a Sobel filter or a Prewitt filter, or a secondary differential filter such as a Laplacian filter. Here, as an example, a method for performing edge detection on G values of RGB values using a Laplacian filter will be described. FIG. 10A shows a 3 × 3 Laplacian filter. When the horizontal coordinate position of the pixel of interest to be filtered is x, the vertical coordinate position is y, and the G value of the pixel of interest is represented by p_g (x, y), the Laplacian filter processing result p′_g (x, y) is (5)

式（５）に従って、図１０（ｂ）の太線で囲まれた領域内にある各画素のＧ値に対して図１０（ａ）のラプラシアンフィルタによるエッジ強度検出処理を行った結果が図１０（ｃ）である。なお、ラプラシアンフィルタの計算では±の値が出てくるが、ここでは絶対値を取ることで処理結果がプラスの値のみになるようにしている。 The result of performing the edge intensity detection processing by the Laplacian filter of FIG. 10A on the G value of each pixel in the region surrounded by the thick line of FIG. c). In the calculation of the Laplacian filter, a value of ± appears, but here the absolute value is taken so that the processing result becomes only a positive value.

更に、このエッジ強度検出結果に対し、非文字列マップ生成処理部３４１は、予め定められた閾値を用いて閾値よりも大きいエッジ強度を持つ画素のみエッジ画素として検出する（ステップＳａ２）。たとえば、閾値を５０と設定し、エッジ強度が５０よりも大きい画素をエッジ画素として１を設定し、それ以外の画素を０とした結果が、図１０（ｄ）である。このエッジ検出結果に対し、非文字列マップ生成処理部３４１は、文字抽出処理部３１で定義された文字列領域に存在するエッジ検出結果を排除する（ステップＳａ３）。たとえば、図８の表画像４００に対するエッジ検出結果が、図１１（ａ）の表画像４０１であり、文字抽出処理部３１で定義された文字列領域が図１１（ｂ）の表画像４０２における黒く塗られた部分だとすると、エッジ検出結果に対し、文字抽出処理部３１で定義された文字列領域に存在するエッジ検出結果を排除した結果は、図１１（ｃ）のようになる。表画像全体に対し、このような処理を行った結果が非文字列マップ４０３である。 Further, for this edge strength detection result, the non-character string map generation processing unit 341 detects only pixels having edge strength larger than the threshold value as edge pixels using a predetermined threshold value (step Sa2). For example, FIG. 10D shows a result of setting the threshold value to 50, setting 1 as an edge pixel for a pixel having an edge strength greater than 50, and setting the other pixels to 0. In response to this edge detection result, the non-character string map generation processing unit 341 eliminates the edge detection result existing in the character string region defined by the character extraction processing unit 31 (step Sa3). For example, the edge detection result for the table image 400 in FIG. 8 is the table image 401 in FIG. 11A, and the character string region defined by the character extraction processing unit 31 is black in the table image 402 in FIG. If it is a painted part, the result of eliminating the edge detection result existing in the character string area defined by the character extraction processing unit 31 with respect to the edge detection result is as shown in FIG. The non-character string map 403 is the result of performing such processing on the entire table image.

次に、非文字列エリア追加処理部３４２は、図１２に示すフローチャートにしたがって、非文字列マップ生成処理部３４１によるエッジ検出処理に基づいては抽出できない写真などのオブジェクトを非文字列エリアとして抽出し、非文字列マップに追加する。文書画像の所定局所領域毎に画素値の出現頻度（即ち、ヒストグラム）を求めた場合に、図領域の一つである写真領域上の各画素では輝度変化が広範囲に及ぶヒストグラムが得られることを利用して、ヒストグラムのエントロピー（平均情報量）を輝度変化情報として算出する。例えば、非文字列エリア追加処理部３４２は、１１×１１画素のエリアを１つの局所領域とみなし、この領域内でのヒストグラムを算出する（ステップＳｂ１）。次いで、非文字列エリア追加処理部３４２は、このヒストグラムのエントロピー値を算出する（ステップＳｂ２）。なお、ヒストグラムを求める際には、文字抽出処理部３１で抽出された文字から構成された文字列領域を排除した画素のみでヒストグラムを求めることで、文字のエントロピー値が算出されることを極力抑えることができる。ヒストグラムのエントロピーは次式（６）で求められる。 Next, the non-character string area addition processing unit 342 extracts objects such as photographs that cannot be extracted based on the edge detection processing by the non-character string map generation processing unit 341 as a non-character string area according to the flowchart shown in FIG. And add it to the non-string map. When the frequency of appearance of pixel values (ie, a histogram) is determined for each predetermined local area of a document image, a histogram with a wide range of luminance changes can be obtained for each pixel on the photographic area, which is one of the figure areas. The histogram entropy (average information amount) is calculated as luminance change information. For example, the non-character string area addition processing unit 342 regards an area of 11 × 11 pixels as one local region, and calculates a histogram in this region (step Sb1). Next, the non-character string area addition processing unit 342 calculates an entropy value of this histogram (step Sb2). When obtaining the histogram, it is possible to suppress the calculation of the entropy value of the character as much as possible by obtaining the histogram with only the pixels excluding the character string area composed of the character extracted by the character extraction processing unit 31. be able to. The entropy of the histogram is obtained by the following equation (6).

式（６）において、Ｌはヒストグラムの階調数（８ビットならばＬ＝２５６）を表し、ｈ（ｉ）は階調ｉの度数、Ｎはヒストグラムカウント対象の画素数、ｐ（ｉ）はｈ（ｉ）をＮで正規化した値である。以上で求められたエントロピー値が高い領域を抽出することで精度よく写真領域を抽出することが可能となる。その為に、非文字列エリア追加処理部３４２は、予め定められた閾値（たとえば、３５程度の値）を超えるエントロピー値の画素領域のみを１とし、それ以外を０とする（ステップＳｂ３）。図１３は、非文字列エリア追加処理部３４２が、ヒストグラムエントロピーによる非文字列エリア検出処理を図８の表画像４００に対して行った結果である非文字列エリア４０４を示す例である。非文字列エリア４０４を検出した場合、非文字列エリア追加処理部３４２は、非文字列マップ４０３に非文字列エリア４０４を追加する（ステップＳｂ４）。図１１（ｃ）の非文字列マップ４０３に図１３の非文字列エリア４０４を追加した結果が、図１４に示す非文字列マップ４０５である。 In Equation (6), L represents the number of gradations in the histogram (L = 256 if 8-bit), h (i) is the frequency of gradation i, N is the number of pixels to be counted, and p (i) is This is a value obtained by normalizing h (i) by N. By extracting a region having a high entropy value obtained as described above, it is possible to accurately extract a photographic region. For this purpose, the non-character string area addition processing unit 342 sets only pixel regions having an entropy value exceeding a predetermined threshold (for example, a value of about 35) to 1 and sets the others to 0 (step Sb3). FIG. 13 is an example showing a non-character string area 404 that is a result of the non-character string area addition processing unit 342 performing a non-character string area detection process based on histogram entropy on the table image 400 of FIG. When the non-character string area 404 is detected, the non-character string area addition processing unit 342 adds the non-character string area 404 to the non-character string map 403 (step Sb4). The result of adding the non-character string area 404 of FIG. 13 to the non-character string map 403 of FIG. 11C is a non-character string map 405 shown in FIG.

次に、オブジェクトマップ生成処理部３４３は、図１５に示すフローチャートにしたがって、まず、ライン抽出処理部３２で判定された表領域のラインの除去処理を行う（ステップＳｃ１）。図８の表画像４００を表領域と判定し、ライン抽出により各罫線をラインとして抽出できているとする。オブジェクトマップ生成処理部３４３は、図１６のように、図１４の更新された非文字列マップ４０５から表領域のラインを削除する。なお、図１６に示すライン削除後の非文字列マップ４０６では、削除された跡がわかりやすいように削除されたラインを点線で示しているが、実際には、当該点線は存在しない。このように非文字列マップ４０５に対して表の罫線を除去する処理を行うことによって、表の罫線に画像オブジェクトが接触したり重畳したりしている場合でも、表の罫線から切り離し、画像オブジェクト領域だけを抽出することが可能となる。 Next, according to the flowchart shown in FIG. 15, the object map generation processing unit 343 first performs a process of removing the line in the table area determined by the line extraction processing unit 32 (step Sc1). Assume that the table image 400 of FIG. 8 is determined as a table region, and each ruled line can be extracted as a line by line extraction. As illustrated in FIG. 16, the object map generation processing unit 343 deletes the table area line from the updated non-character string map 405 illustrated in FIG. 14. In the non-character string map 406 after line deletion shown in FIG. 16, the deleted line is indicated by a dotted line so that the deleted trace can be easily understood. However, the dotted line does not actually exist. By performing the process of removing the ruled line of the table on the non-character string map 405 in this manner, even when the image object is in contact with or superimposed on the ruled line of the table, the image object is separated from the ruled line of the table. Only the area can be extracted.

次に、オブジェクトマップ生成処理部３４３は、ラベリング処理を行う（ステップＳｃ２）。上下左右斜めに接続している画素は同一のラベルとすることを条件として、ラベリング処理を行う。ラベリングの手法としては一般的な手法で構わない。図１７に、図１６の非文字列マップ４０６に対してラベリング処理を行った結果の例を示す。オブジェクトマップ生成処理部３４３により、四角がラベル２０１に、丸の外側のラインがラベル２０２に、丸の内側のラインがラベル２０３に、三角がラベル２０４に、ハートがラベル２０５に、スターの外側のラインがラベル２０６に、スターの内側のラインがラベル２０７にラベル付けされる。 Next, the object map generation processing unit 343 performs a labeling process (step Sc2). The labeling process is performed on condition that the pixels connected diagonally up, down, left, and right have the same label. A general technique may be used as the labeling technique. FIG. 17 shows an example of a result of labeling processing performed on the non-character string map 406 of FIG. The object map generation processing unit 343 causes the square to be the label 201, the outer circle line to the label 202, the inner circle line to the label 203, the triangle to the label 204, the heart to the label 205, and the star outer line. Are labeled 206 and the inner line of the star is labeled 207.

オブジェクトマップ生成処理部３４３は、ラベル付けした各オブジェクトに対し、各画素の座標情報を比較し、ラベル毎の座標の水平・垂直方向それぞれの最小値と最大値を求める。オブジェクトマップ生成処理部３４３は、求めた最大値と最小値を頂点に持つ矩形領域（最小値が矩形領域の左上、最大値が矩形領域の右下の頂点を表す）の値を１とすることで図１８に示すようにラベル付けした各オブジェクト領域を矩形化処理する（ステップＳｃ３）。その際、オブジェクトマップ生成処理部３４３は、各ラベルの最大値座標と最小値座標を比較する。これにより、ラベル２０３の矩形領域はラベル２０２の矩形領域に包含されることがわかるので、オブジェクトマップ生成処理部３４３は、ラベル２０３を、ラベル２０２に統合する。同様に、ラベル２０７の矩形領域はラベル２０６の矩形領域に包含されることがわかるので、オブジェクトマップ生成処理部３４３は、ラベル２０７をラベル２０６に統合する。このようにして、オブジェクトマップ生成処理部３４３は、ラベル２０１、ラベル２０２、ラベル２０４、ラベル２０５、ラベル２０６の５つの矩形領域を作成する。 The object map generation processing unit 343 compares the coordinate information of each pixel with respect to each labeled object, and obtains the minimum value and the maximum value in the horizontal and vertical directions of the coordinates for each label. The object map generation processing unit 343 sets the value of the rectangular area having the obtained maximum value and minimum value as vertices (the minimum value represents the upper left corner of the rectangular area and the maximum value represents the lower right vertex of the rectangular area) to 1. As shown in FIG. 18, the labeled object areas are rectangularized (step Sc3). At that time, the object map generation processing unit 343 compares the maximum value coordinate and the minimum value coordinate of each label. As a result, since the rectangular area of the label 203 is found to be included in the rectangular area of the label 202, the object map generation processing unit 343 integrates the label 203 into the label 202. Similarly, since it can be seen that the rectangular area of the label 207 is included in the rectangular area of the label 206, the object map generation processing unit 343 integrates the label 207 into the label 206. In this way, the object map generation processing unit 343 creates five rectangular areas of a label 201, a label 202, a label 204, a label 205, and a label 206.

なお、ライン抽出処理で抽出された表以外の罫線を画像オブジェクトとして扱う場合には、矩形化処理された画像オブジェクトからなるオブジェクトマップに対し、罫線も画像オブジェクトとして追加しても構わない。また、ライン抽出処理で抽出された表以外の罫線を画像オブジェクトとしてではなく、ベクター情報として扱うのであれば、オブジェクトマップは矩形化処理された画像オブジェクトのみとなる。 When a ruled line other than the table extracted by the line extraction process is handled as an image object, the ruled line may be added as an image object to the object map including the image object subjected to the rectangularization process. Further, if ruled lines other than the table extracted by the line extraction processing are handled as vector information instead of image objects, the object map is only the rectangular image processing object.

上記のように、非文字列マップ生成処理部３４１が、輝度変化の高いイラストなどの線画やグラフなどの図はエッジベースで抽出する。エッジ強度がそれほど強くないが輝度変化のある写真などの画像領域については、非文字列エリア追加処理部３４２が、ヒストグラムエントロピーに基づいて抽出する。これにより、オブジェクトマップ生成処理部３４３は、様々なタイプの画像オブジェクトを幅広く抽出することが可能となる。また、一部が抽出できていなくても、図形や写真は矩形であることが多いので、ラベリングと矩形化処理を行うことで、図形や写真の一部の抽出抜けを防ぐことが可能となる。 As described above, the non-character string map generation processing unit 341 extracts line drawings such as illustrations and graphs with high luminance changes, and diagrams such as graphs on an edge basis. The non-character string area addition processing unit 342 extracts an image region such as a photograph that has a change in luminance, although the edge strength is not so strong, based on the histogram entropy. As a result, the object map generation processing unit 343 can widely extract various types of image objects. In addition, even if a part of the figure or picture is not extracted, the figure or photograph is often a rectangle. Therefore, by performing labeling and rectangle processing, it is possible to prevent a part of the figure or picture from being extracted. .

次に、有効オブジェクトエリア判定処理部３４４は、図１９のフローチャートに示すように、まず、オブジェクトマップ生成処理部３４３で生成されたオブジェクトマップ上の各オブジェクトに対し、必要に応じて矩形領域の統合・分割処理を行う（ステップＳｄ１）。図１８の例の場合、表中の画像オブジェクトのみで矩形が区切られていたので統合・分割処理は、必要ない。実際には、表中の画像オブジェクトだけに関わらず、表の外にも画像オブジェクトが存在することもあり、入力画像の画像オブジェクトの形状や配置によっては、画像オブジェクトの矩形領域の一部同士が重なっていたり、画像オブジェクトの矩形領域が文字列と重なっていたり、画像オブジェクトの矩形領域の一部が表と重なっていたり、といったことが起こり得る。このような場合に、矩形領域の統合処理や分割処理を行う。 Next, as shown in the flowchart of FIG. 19, the effective object area determination processing unit 344 first integrates rectangular areas for each object on the object map generated by the object map generation processing unit 343 as necessary. Division processing is performed (step Sd1). In the case of the example of FIG. 18, since the rectangle is divided only by the image objects in the table, the integration / division processing is not necessary. Actually, image objects may exist outside the table regardless of only the image objects in the table. Depending on the shape and arrangement of the image object in the input image, some of the rectangular areas of the image object It can happen that they overlap, the rectangular area of the image object overlaps the character string, or a part of the rectangular area of the image object overlaps the table. In such a case, integration processing and division processing of rectangular areas are performed.

例えば、図２０に示す、画像オブジェクト５０１，５０２のように、矩形領域同士が一部重なっている場合、有効オブジェクトエリア判定処理部３４４は、再度ラベリング処理を行い、座標の最大値と最小値を算出し、矩形化処理を行う。これにより、重なり合っていた２つの画像オブジェクト５０１，５０２の矩形領域が１つの矩形領域５０４に統合される。これにより、ファイルフォーマット変換処理時に、画像オブジェクトの上から別の画像オブジェクトを重ねて見栄えが悪くなったり、重複している分、ファイルサイズが大きくなったりすることを防ぐことが可能となる。 For example, when the rectangular areas partially overlap each other like the image objects 501 and 502 shown in FIG. 20, the valid object area determination processing unit 344 performs the labeling process again, and determines the maximum and minimum coordinate values. Calculate and perform the rectangle processing. As a result, the rectangular areas of the two overlapping image objects 501 and 502 are integrated into one rectangular area 504. As a result, it is possible to prevent the appearance of the image object from being overlaid on the image object at the time of the file format conversion process, or the file size from being increased due to the overlap.

また、図２１のように、画像オブジェクト５０１の矩形領域に文字列３０１が重なっている場合に、文字列３０１の一部だけが画像オブジェクトになるとフォーマット変換後のレイアウトで一部フォント、一部画像といった具合になる。すなわち、図２１の例では、最下層のレイヤから順に、文字のオブジェクト、表のオブジェクト、ラインのオブジェクト、画像オブジェクトの順に重なっており、画像オブジェクト５０１と重なっている部分の文字列３０１の部分と、画像オブジェクト５０１と重なっていない部分の文字列３０１の部分を分けて情報を格納したとする。このとき、画像オブジェクト５０１と重なっている部分の文字列３０１は、画像オブジェクト５０１に含まれる単なるイメージとしての文字画像として格納されることになる。このように格納された情報を、再構成する場合、画像オブジェクト５０１と重なっていない文字列３０１の部分は、他のフォントで置き換えられることもある。その場合、画像オブジェクト５０１で表示された文字画像と重ね合わせた結果、フォントや大きさが異なってしまう場合があり、見栄えが悪くなる。 In addition, as shown in FIG. 21, when the character string 301 overlaps the rectangular area of the image object 501, if only a part of the character string 301 becomes an image object, a partial font and a partial image are used in the layout after format conversion. And so on. In other words, in the example of FIG. 21, the character object, the table object, the line object, and the image object overlap in order from the lowest layer, and the character string 301 that overlaps the image object 501 Assume that information is stored by dividing a portion of the character string 301 that does not overlap the image object 501. At this time, the character string 301 of the portion overlapping the image object 501 is stored as a character image as a simple image included in the image object 501. When the information stored in this way is reconstructed, the portion of the character string 301 that does not overlap the image object 501 may be replaced with another font. In that case, as a result of overlapping with the character image displayed on the image object 501, the font and size may be different, and the appearance will be deteriorated.

有効オブジェクトエリア判定処理部３４４は、図２１のような場合、文字列３０１の矩形座標と画像オブジェクト５０１の矩形座標を比較し、画像オブジェクト５０１，５０２同士の重なりのときと同様に座標の最大値と最小値を算出し、矩形化処理を行う（ステップＳｄ２）。この処理により、画像オブジェクト５０１と重なっていない部分の文字列３０１は、文字列３０１を含んだ画像オブジェクト５０３の矩形領域に統合される。なお、この統合により、文字列３０１自体がなくなるわけではなく、文字列３０１自体はフォーマット変換時に画像オブジェクト５０１の下に配置されることになるため、画像オブジェクト５０４に統合された文字列も、テキスト検索など、テキストデータして使用可能である。これにより、ファイルフォーマット変換処理時に、フォントに変換された文字列３０１の上から別の画像オブジェクト５０１内にある文字画像を重ねて文字の書体や大きさが変化し見栄えが悪くなってしまうことを防ぐことが可能となる。 In the case shown in FIG. 21, the valid object area determination processing unit 344 compares the rectangular coordinates of the character string 301 with the rectangular coordinates of the image object 501, and the maximum coordinate value is the same as when the image objects 501 and 502 overlap. And the minimum value is calculated, and rectangularization processing is performed (step Sd2). As a result of this processing, the portion of the character string 301 that does not overlap the image object 501 is integrated into the rectangular area of the image object 503 that includes the character string 301. This integration does not eliminate the character string 301 itself, but the character string 301 itself is arranged under the image object 501 at the time of format conversion. Therefore, the character string integrated into the image object 504 is also a text It can be used as text data such as search. As a result, during the file format conversion process, the character image in another image object 501 is overlaid on the character string 301 converted into the font, and the font type and size of the character change to deteriorate the appearance. It becomes possible to prevent.

また、図２２のように、画像オブジェクト５０５の矩形領域が表領域４１０と重なっている場合がある。この場合、有効オブジェクトエリア判定処理部３４４は、画像オブジェクト５０５の矩形領域の座標と表領域４１０の座標とを比較して重なっている範囲を抽出し、画像オブジェクト５０５の矩形領域から重なっている表領域４１０を削除する。更に、有効オブジェクトエリア判定処理部３４４は、表領域４１０の外枠のラインの延長上に沿って、画像オブジェクト５０５の残った領域に対し、分割処理を行う（ステップＳｄ３）。これにより、有効オブジェクトエリア判定処理部３４４は、矩形化された画像オブジェクト５０５の領域と表領域４１０が重なった際にも画像オブジェクト５０５の重なっている領域を画像オブジェクト５０５の分割により排除し、表領域４１０に重なって表領域４１０の一部が見えなくなることを回避可能となる。すなわち、ファイルフォーマット変換処理時に、表領域４１０の上に画像オブジェクト５０５が重ねられて、表領域４１０の一部が見えなくなってしまうことを防ぐことが可能となる。 Further, as shown in FIG. 22, the rectangular area of the image object 505 may overlap the table area 410. In this case, the effective object area determination processing unit 344 compares the coordinates of the rectangular area of the image object 505 with the coordinates of the table area 410 and extracts an overlapping range, and extracts the overlapping table from the rectangular area of the image object 505. The area 410 is deleted. Further, the effective object area determination processing unit 344 performs division processing on the remaining area of the image object 505 along the extension of the outer frame line of the table area 410 (step Sd3). As a result, the valid object area determination processing unit 344 eliminates the overlapping area of the image object 505 by dividing the image object 505 even when the area of the rectangular image object 505 overlaps the table area 410. It can be avoided that a part of the table area 410 becomes invisible by overlapping with the area 410. That is, it is possible to prevent the image object 505 from being overlaid on the table area 410 and partially obscuring the table area 410 during the file format conversion process.

図２２の例では表領域４１０の水平方向の外枠のラインの延長線上に沿って画像オブジェクト５０５の残った領域を画像オブジェクト５０６，５０７に分割している例を示している。なお、画像オブジェクトの残った領域の分割は、表領域４１０の外枠ラインの水平方向ではなく、垂直方向の外枠のラインの延長線上に沿って分割しても構わないし、水平垂直両方とも分割しても構わない。また、分割する際はたとえば２画素分の領域で分割を行い、あとでこのオブジェクトエリア判定結果を用いて画像オブジェクト５０５を切り出す際には、矩形領域から全体に１画素分膨張させた領域を切り出すことによって、分割された画像オブジェクト５０６，５０７が見た目には２つに分割されたようには見えずにフォーマット変換可能となる。 In the example of FIG. 22, the remaining area of the image object 505 is divided into image objects 506 and 507 along the extended line of the outer frame line in the horizontal direction of the table area 410. Note that the remaining area of the image object may be divided not along the horizontal direction of the outer frame line of the table area 410 but along the extension line of the vertical outer frame line, or both horizontal and vertical may be divided. It doesn't matter. Further, when dividing, for example, the image is divided into areas of two pixels, and when the image object 505 is cut out later using the object area determination result, the area expanded by one pixel from the rectangular area is cut out. As a result, the divided image objects 506 and 507 can be format-converted without seemingly being divided into two.

このようにして得られた各画像オブジェクト５０４，５０３，５０６，５０７の矩形領域に対し、有効オブジェクトエリア判定処理部３４４は、再ラベリングを行って矩形化処理を行う（ステップＳｄ４）。このとき、有効オブジェクトエリア判定処理部３４４は、予め定められた面積未満の矩形領域については画像オブジェクトとしてはふさわしくない大きさとして、キャンセル処理を行ってもよい。例えば、３００ｄｐｉの入力画像を７５ｄｐｉに解像度変換処理し、７５ｄｐｉの画像サイズで矩形化処理をした場合、矩形領域の横のサイズが３０画素未満、または、矩形領域の縦のサイズが３０画素未満、または、矩形領域の面積が９００画素未満の矩形領域については画像オブジェクトとしなくてもよい。また、矩形領域の面積に対し、実際にラベリングされたオブジェクト領域の画素数が２５％未満の矩形領域については画像オブジェクトとしなくてもよい。ただし、分割処理された画像オブジェクトに対しては、分割前の大きさも考慮して画像オブジェクトにするかどうかを判定する。 For the rectangular regions of the image objects 504, 503, 506, and 507 obtained in this way, the valid object area determination processing unit 344 performs re-labeling to perform a rectangular process (step Sd4). At this time, the valid object area determination processing unit 344 may perform the canceling process so that a rectangular area having a size smaller than a predetermined area is not suitable for an image object. For example, when a resolution conversion process is performed on an input image of 300 dpi to 75 dpi and a rectangular process is performed with an image size of 75 dpi, the horizontal size of the rectangular area is less than 30 pixels, or the vertical size of the rectangular area is less than 30 pixels. Alternatively, a rectangular area having an area of less than 900 pixels may not be an image object. Further, a rectangular area in which the number of pixels of an object area that is actually labeled is less than 25% of the area of the rectangular area may not be an image object. However, it is determined whether or not an image object subjected to the division process is to be an image object in consideration of the size before the division.

以上の処理を行い、有効オブジェクトエリア判定処理部３４４は、最終的に残った画像オブジェクトの領域を有効オブジェクトエリアとして判定する（ステップＳｄ５）。有効オブジェクトエリア判定処理部３４４は、有効オブジェクトエリアと判定した情報に従って、入力画像から画像オブジェクトを切り出すことで、表内にある画像オブジェクトはもとより、他の画像オブジェクトも精度よく切り出すことが可能となり、フォーマット変換した際にも、見栄えの良い結果を得ることが可能となる。 By performing the above processing, the effective object area determination processing unit 344 determines the finally remaining image object area as the effective object area (step Sd5). The effective object area determination processing unit 344 cuts out the image object from the input image according to the information determined as the valid object area, so that not only the image object in the table but also other image objects can be cut out with high accuracy. Even when the format is converted, it is possible to obtain a good-looking result.

上記の実施の形態１の構成により、文字列領域情報と、線分情報と、表領域情報と、ヒストグラムエントロピーによる輝度変化情報とを用いることで、表外にある画像オブジェクト領域はもとより、表内に存在する画像オブジェクト領域の抽出をも行うことが可能となる。また、矩形化された画像オブジェクト領域と表領域が重なった際にも画像オブジェクトの重なっている領域を排除し、表が重なって見えなくなることを回避可能となる。したがって、表のセルの中に画像など文字以外のオブジェクトが存在する場合でも、そのオブジェクトを文字として誤って抽出することなく、正しくオブジェクトを抽出して変換し、表の中にそのオブジェクトを正しく配置することが可能となる。 By using the character string region information, the line segment information, the table region information, and the luminance change information based on the histogram entropy, the configuration of the first embodiment described above allows not only the image object region outside the table but also the inside of the table. It is also possible to extract the image object area existing in the. Further, even when the rectangular image object area and the table area overlap, it is possible to eliminate the area where the image object overlaps and prevent the table from overlapping and becoming invisible. Therefore, even if an object other than a character such as an image exists in a table cell, the object is correctly extracted and converted without accidentally extracting the object as a character, and the object is correctly placed in the table. It becomes possible to do.

また、上述したように、ファイル記述部３６による、各オブジェクトを重ね合わせる順は、最下層から文字領域のオブジェクト、表領域のオブジェクト、線分領域のオブジェクト、画像オブジェクトの順である。線分のオブジェクトは、画像オブジェクトの一部として扱われることもあるため、線分領域のオブジェクトと画像オブジェクトの順が入れ替わってもよい。このように、文字領域のオブジェクトを一番下にすることで、画像オブジェクトと重なった場合に、重なった文字領域は画像オブジェクトに統合された文字画像で表示されることで見栄えが良くなり、検索ではＯＣＲされた文字情報で検索可能な状態にすることができる。また、表領域と重なった画像オブジェクトは分割されているので、表領域が画像オブジェクトの下にあっても、表領域が隠れるようなことはない。 Further, as described above, the order in which the objects are overlapped by the file description unit 36 is the order of the character area object, the table area object, the line segment area object, and the image object from the lowest layer. Since the line segment object may be handled as a part of the image object, the order of the object in the line segment area and the image object may be switched. In this way, by placing the object in the character area at the bottom, when it overlaps with the image object, the overlapped character area is displayed as a character image integrated with the image object, so that the appearance is improved. Then, it is possible to make a searchable state by using the OCR character information. Further, since the image object that overlaps the table area is divided, the table area is not hidden even if the table area is below the image object.

なお、上記の実施の形態１における図領域抽出処理部３４の非文字列マップ生成処理部３４１、非文字列エリア追加処理部３４２、オブジェクトマップ生成処理部３４３、及び有効オブジェクトエリア判定処理部３４４において、文字抽出処理部３１により抽出された文字列を扱うようにしているが、本発明の構成は、当該実施の形態に限られない。文字列だけでなく、文字抽出処理部３１により抽出される文字を対象に処理を行ってもよい。文字列として抽出する場合に比べて、文字として抽出する場合、抽出の精度によって、文字の一部が抽出できなかったり、句読点等を抽出できなかったりする恐れがあるものの、図領域抽出処理部３４における処理を、文字列として処理しても、文字として処理をしても同様の作用効果を奏する。
また、上記の実施の形態１では、文字抽出処理部３１は、文字列領域、または上記の文字領域を抽出する際に、その領域として、外接矩形を求めるようにしているが、本発明の構成は、当該実施の形態に限られない。文字列領域、または文字領域として、外接矩形に限らず、曲線が一部に含まれる形状、たとえば、外接する円や外接する楕円などであってもよい。 Note that in the non-character string map generation processing unit 341, the non-character string area addition processing unit 342, the object map generation processing unit 343, and the valid object area determination processing unit 344 of the figure region extraction processing unit 34 in the first embodiment described above. Although the character string extracted by the character extraction processing unit 31 is handled, the configuration of the present invention is not limited to the embodiment. Processing may be performed not only for character strings but also for characters extracted by the character extraction processing unit 31. Compared with the case of extracting as a character string, when extracting as a character, there is a possibility that a part of the character may not be extracted or punctuation marks may not be extracted depending on the extraction accuracy. Even if the process is processed as a character string or as a character, the same effect is obtained.
In the first embodiment, the character extraction processing unit 31 obtains a circumscribed rectangle as the character string area or the character area when extracting the character string area. Is not limited to this embodiment. The character string area or the character area is not limited to a circumscribed rectangle, but may be a shape including a curve, for example, a circumscribed circle or a circumscribed ellipse.

また、上記の実施の形態１における非文字列エリア追加処理部３４２において、全ての領域においてヒストグラムを求めて、エントロピーを算出するのではなく、不明な領域に対してのみ行うようにしてもよい。例えば、エッジ検出により検出した表領域を除いた残りの表領域についてのみヒストグラムを求めて、エントロピーを算出するようにしてもよい。
また、上記の実施の形態１において、非文字列エリア追加処理部３４２が、ヒストグラムエントロピーを求める場合、隣接する画像オブジェクトの距離が近い場合、１つの画像オブジェクトとみなしてしまう場合がある。この場合、各々の画像オブジェクトは、多くの場合、１つのセルに含まれていることから、非文字列エリア追加処理部３４２が、セルの境界を示す情報を参照して、各々のオブジェクトを明確に分離して、ヒストグラムエントロピーを求めるようにしてもよい。 Further, in the non-character string area addition processing unit 342 in the first embodiment, histograms may be obtained in all regions and entropy may not be calculated, but may be performed only on unknown regions. For example, the entropy may be calculated by obtaining a histogram only for the remaining table areas excluding the table areas detected by edge detection.
In the first embodiment, when the non-character string area addition processing unit 342 obtains the histogram entropy, it may be regarded as one image object if the distance between adjacent image objects is short. In this case, since each image object is often included in one cell, the non-character string area addition processing unit 342 clearly identifies each object with reference to the information indicating the cell boundary. Histogram entropy may be obtained separately.

また、上記の実施の形態１において、非文字列エリア追加処理部３４２が、求める輝度変化情報は、エントロピーに限られるものではなく、他の指標であってもよい。
ここで、輝度変化情報の変形例について説明する。
図２４は、輝度変化情報の変形例を説明するための図である。
図２４に示す例において、画像Ｐは、文字列と、画像オブジェクトとを含む。非文字列エリア追加処理部３４２は、例えば、注目画素に対応する局所領域（例えば、１１［ｐｉｘｅｌ］×１１［ｐｉｘｅｌ］の領域）のヒストグラムを求め、その度数が所定の閾値Ｔ（例えば、度数５）以上である階調値（ビン）の中からピークの度数を示す階調値を抽出する。次に、非文字列エリア追加処理部３４２は、ピークの周囲において、その度数が閾値Ｔ以上で連続している階調値を抽出し、抽出した一連の階調値の幅（階調幅）を輝度変化情報として使用する。そして、非文字列エリア追加処理部３４２は、この階調幅が所定の階調数（例えば、４８階調）の幅（基準幅ＳＷ）以上である場合、写真領域であるとして非文字列エリアに追加してもよい。 In the first embodiment, the luminance change information that the non-character string area addition processing unit 342 obtains is not limited to entropy, and may be another index.
Here, a modified example of the luminance change information will be described.
FIG. 24 is a diagram for explaining a modification of the luminance change information.
In the example shown in FIG. 24, the image P includes a character string and an image object. For example, the non-character string area addition processing unit 342 obtains a histogram of a local region (for example, a region of 11 [pixel] × 11 [pixel]) corresponding to the target pixel, and the frequency is a predetermined threshold T (for example, the frequency) 5) A gradation value indicating the frequency of the peak is extracted from the gradation values (bins) described above. Next, the non-character string area addition processing unit 342 extracts gradation values whose frequency is continuous at a threshold value T or more around the peak, and sets the width (gradation width) of the series of extracted gradation values. Used as luminance change information. If the gradation width is equal to or greater than a predetermined gradation number (for example, 48 gradations) (reference width SW), the non-character string area addition processing unit 342 determines that the image area is a non-character string area. May be added.

具体的には、例えば、図２４において、画像Ｐのうち、写真を含む局所領域Ａ１の場合、ピークＢに係る階調幅ＷＢは基準幅ＳＷよりも大きいため、局所領域Ａ１は写真オブジェクトの領域であると判定される。これに対して、画像Ｐのうち、写真も文字列も含まない局所領域Ａ２や、文字列を含む局所領域Ａ３の場合、各局所領域に現れるピークＣ、Ｄ、Ｅに係る階調幅ＷＣ、ＷＤ、ＷＥは、いずれも基準幅ＳＷに満たないため、局所領域Ａ２、Ａ３は、写真オブジェクトの領域でないと判定される。このように、エントロピーだけでなく、各ピークに係る階調幅を、輝度変化情報として用いてもよい。 Specifically, for example, in FIG. 24, in the case of a local region A1 including a photograph in the image P, the gradation width WB related to the peak B is larger than the reference width SW, and thus the local region A1 is a region of a photographic object. It is determined that there is. On the other hand, in the case of the local region A2 that does not include a photograph or a character string or the local region A3 that includes a character string in the image P, the gradation widths WC and WD related to the peaks C, D, and E appearing in each local region. , WE are less than the reference width SW, and it is determined that the local areas A2 and A3 are not photographic object areas. In this way, not only entropy but also the gradation width related to each peak may be used as the luminance change information.

また、上記の実施の形態１において、オブジェクトマップ生成処理部３４３は、１つのセルの中に、重複しない矩形領域が存在する場合、２つの矩形領域として処理しているが、本発明の構成は、当該実施の形態に限られず、１つのセル内の矩形領域を１つに統合するようにしてもよい。
また、上記の実施の形態１において、有効オブジェクトエリア判定処理部３４４は、表領域に文字列領域が重なっている場合、表領域に画像オブジェクトが重なっている場合と同様の処理を文字列領域に適用するようにしてもよい。 In Embodiment 1 described above, the object map generation processing unit 343 processes two rectangular areas when there are non-overlapping rectangular areas in one cell. However, the present invention is not limited to this embodiment, and rectangular areas in one cell may be integrated into one.
In the first embodiment, the valid object area determination processing unit 344 performs the same processing for the character string area when the character string area overlaps the table area and the image object overlaps the table area. You may make it apply.

なお、実施の形態１、及び以下に記載する実施の形態２，３において、閾値等の予め定める数値以上か否かの判定は、一例であり、予め定める数値の大きさによっては、予め定める数値を超えるか否かの判定であってもよく、また、予め定める数値以下か否かの判定についても、予め定める数値の大きさによっては、予め定める数値未満か否かの判定であってもよい。 In the first embodiment and the second and third embodiments described below, the determination as to whether or not the threshold value or the like is greater than or equal to a predetermined numerical value is an example, and depending on the size of the predetermined numerical value, a predetermined numerical value may be used. It may also be a determination as to whether or not the value exceeds a predetermined value, and the determination as to whether or not the value is less than or equal to a predetermined value may also be a determination as to whether or not the value is less than a predetermined value depending on the size of the predetermined value. .

〔実施の形態２〕
上記の実施の形態１では、本発明に係る変換処理装置を変換処理部３０として、画像形成装置１００が有する画像処理装置１に適用した構成について説明したが、本発明の構成は、これに限るものではない。実施の形態２では、本発明に係る変換処理装置を変換処理部３０として、フラットベッドスキャナ等の画像読取装置１００ａが有する画像処理装置１ａに適用した例について説明する。
なお、実施の形態１の説明に用いた図面に記載されている部材と同じ機能を有する部材については、以下の説明においても同じ符号を付記する。また、それらの各部材の詳細な説明はここでは繰り返さない。 [Embodiment 2]
In the first embodiment described above, the configuration in which the conversion processing apparatus according to the present invention is applied to the image processing apparatus 1 included in the image forming apparatus 100 as the conversion processing unit 30 has been described. However, the configuration of the present invention is not limited thereto. It is not a thing. In the second embodiment, an example in which the conversion processing apparatus according to the present invention is applied as the conversion processing unit 30 to the image processing apparatus 1a included in the image reading apparatus 100a such as a flatbed scanner will be described.
In addition, about the member which has the same function as the member described in drawing used for description of Embodiment 1, the same code | symbol is attached | subjected also in the following description. The detailed description of each member will not be repeated here.

図２３は、実施の形態２に係る画像処理装置１ａを備える画像読取装置１００ａ（情報処理装置）の構成を示すブロック図である。図２３に示すように、画像読取装置１００ａは、画像処理装置１ａ、画像入力装置２、送信装置４、記憶部６、及び操作パネル５を備えている。画像処理装置１ａは、Ａ／Ｄ変換部１０、シェーディング補正部１１、原稿種別判別部１２、入力階調補正部１３、領域分離処理部１４、圧縮処理部２０、及び変換処理部３０（変換処理装置）を備えている。当該変換処理部３０により、実施の形態１において説明したのと同様に、指定されたファイル形式に変換されたファイルが出力される。画像読取装置１００ａで実行される各種処理は、画像読取装置１００ａに備えられる図示しない制御部（ＣＰＵ（Central Processing Unit）あるいはＤＳＰ（Digital Signal Processor）等のプロセッサを含むコンピュータ）により制御される。実施の形態２では、画像読取装置１００ａは、スキャナに限定されることはなく、たとえば、デジタルスチルカメラ、書画カメラ、あるいは、カメラを搭載した電子機器類（たとえば、携帯電話、スマートフォン、タブレット端末等）であってもよい。 FIG. 23 is a block diagram illustrating a configuration of an image reading apparatus 100a (information processing apparatus) including the image processing apparatus 1a according to the second embodiment. As shown in FIG. 23, the image reading apparatus 100a includes an image processing apparatus 1a, an image input apparatus 2, a transmission apparatus 4, a storage unit 6, and an operation panel 5. The image processing apparatus 1a includes an A / D conversion unit 10, a shading correction unit 11, a document type determination unit 12, an input tone correction unit 13, a region separation processing unit 14, a compression processing unit 20, and a conversion processing unit 30 (conversion processing unit 30). Device). The conversion processing unit 30 outputs a file converted into the designated file format, as described in the first embodiment. Various processes executed by the image reading apparatus 100a are controlled by a control unit (a computer including a processor such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor)) provided in the image reading apparatus 100a. In the second embodiment, the image reading apparatus 100a is not limited to a scanner. For example, a digital still camera, a document camera, or an electronic device equipped with a camera (for example, a mobile phone, a smartphone, a tablet terminal, etc.) ).

〔実施の形態３〕
実施の形態１及び２において、本発明に係る変換処理装置を変換処理部３０として、画像形成装置１００、または画像読取装置１００ａが有する画像処理装置１、１ａに適用する例を示したが、本発明の構成は、これに限るものではない。本発明に係る変換処理装置を、たとえばサーバ装置に適用してもよい。この場合のサーバ装置の構成の一例は、画像形成装置１００、または画像読取装置１００ａにより画像読取及び各種画像処理が施された文書画像をネットワークを介して受信する受信装置と、実施の形態１にて説明した変換処理部での処理を実行する変換処理装置と、当該変換処理装置が備えるファイル記述部から出力された文書ファイルを、ネットワークを介して送信する送信装置と、を備えたサーバ装置（情報処理装置）である。このようにサーバ装置を構成することにより、画像形成装置１００、または画像読取装置１００ａにより画像読取及び各種画像処理が施された文書画像を、ネットワークを経由して受信して、前記の変換処理部での処理を実行する変換処理装置により文書ファイルを作成し、出力されたファイルをユーザの端末装置（たとえば、パーソナルコンピュータやタブレット端末等）に送信する、という使い方が可能となる。また、このサーバ装置により、既に設置された画像形成装置あるいは画像読取装置を交換することなく、フォーマット変換機能を利用することが可能となる。 [Embodiment 3]
In Embodiments 1 and 2, the conversion processing apparatus according to the present invention is applied as the conversion processing unit 30 to the image processing apparatuses 1 and 1a included in the image forming apparatus 100 or the image reading apparatus 100a. The configuration of the invention is not limited to this. The conversion processing device according to the present invention may be applied to, for example, a server device. An example of the configuration of the server device in this case is a receiving device that receives a document image subjected to image reading and various image processing by the image forming device 100 or the image reading device 100a via a network, and the first embodiment. A server apparatus (including a conversion processing device that executes processing in the conversion processing unit described above, and a transmission device that transmits a document file output from the file description unit included in the conversion processing device via a network ( Information processing apparatus). By configuring the server device in this way, a document image that has been subjected to image reading and various types of image processing by the image forming apparatus 100 or the image reading apparatus 100a is received via the network, and the conversion processing unit described above is received. It is possible to use such a method that a document file is created by a conversion processing device that executes the processing in (1), and the output file is transmitted to a user terminal device (for example, a personal computer or a tablet terminal). In addition, the server apparatus can use the format conversion function without replacing an already installed image forming apparatus or image reading apparatus.

また、前記変換処理装置を、ファイル記述部３６を除いて構成し、文書を構造化するための各種情報を作成し、ネットワークを介して送信するようにしても良い。この場合、ファイル記述部３６が実行する処理手段を、前記各種情報を受信した前記端末装置側が実行することで、文書の構造化に必要な各種情報の取得を何度も繰り返すことなく、所望のファイルへの変換をスムーズに実行することができるので、操作ミスにより誤ったファイル形式を指定してしまった場合や、指定したファイル形式の変換結果に満足できない場合などに、異なるファイル形式に変換し直すことが容易となる。 Further, the conversion processing device may be configured without the file description unit 36, and various information for structuring the document may be created and transmitted via a network. In this case, the processing means executed by the file description unit 36 is executed by the terminal device side that has received the various information, so that the desired information can be obtained without repeating acquisition of various information necessary for structuring the document. Conversion to a file can be performed smoothly, so if you have specified an incorrect file format due to an operation error, or if you are not satisfied with the conversion result of the specified file format, you can convert it to a different file format. It becomes easy to fix.

実施の形態１、２、３における画像処理装置１、１ａ（特に、変換処理部３０）やサーバ装置（特に、変換処理装置）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。後者の場合、画像処理装置１、１ａやサーバ装置は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、前記プログラム及び各種データがコンピュータ（又はＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）又は記憶装置（これらを「記録媒体」と称する）、前記プログラムを展開するＲＡＭ（Random Access Memory）等を備えている。そして、コンピュータ（又はＣＰＵ）が前記プログラムを前記記録媒体から読み取って実行することにより、本発明の目的が達成される。前記記録媒体としては、「一時的でない有形の媒体」、たとえば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路等を用いることができる。また、前記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して前記コンピュータに供給されてもよい。なお、本発明は、前記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。なお、前記した画像処理装置１、１ａ及び画像処理方法は、カラーの画像データを扱う構成としたが、これに限るものではなく、白黒の画像データを扱う構成であってもよい。 The image processing apparatuses 1, 1a (particularly the conversion processing unit 30) and the server apparatus (particularly the conversion processing apparatus) in the first, second, and third embodiments are logical circuits (hardware) formed in an integrated circuit (IC chip) or the like. Hardware), or software using a CPU (Central Processing Unit). In the latter case, the image processing apparatuses 1 and 1a and the server apparatus include a CPU that executes instructions of a program that is software for realizing each function, and a ROM in which the program and various data are recorded so as to be readable by a computer (or CPU). (Read Only Memory) or a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) for expanding the program, and the like. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the program is embodied by electronic transmission. The image processing apparatuses 1 and 1a and the image processing method described above are configured to handle color image data, but are not limited thereto, and may be configured to handle monochrome image data.

本発明は上述した各実施の形態に限定されるものではなく、種々の変更が可能である。すなわち、本発明の要旨を逸脱しない範囲内において適宜変更した技術的手段を組み合わせて得られる実施の形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made. That is, embodiments obtained by combining technical means appropriately changed within the scope not departing from the gist of the present invention are also included in the technical scope of the present invention.

３０変換処理部
３１文字抽出処理部
３２ライン抽出処理部
３３表領域抽出処理部
３４図領域抽出処理部
３５表構造化処理部
３６ファイル記述部
３４１非文字列マップ生成処理部
３４２非文字列エリア追加処理部
３４３オブジェクトマップ生成処理部
３４４有効オブジェクトエリア判定処理部 30 conversion processing unit 31 character extraction processing unit 32 line extraction processing unit 33 table region extraction processing unit 34 diagram region extraction processing unit 35 table structuring processing unit 36 file description unit 341 non-character string map generation processing unit 342 non-character string area addition Processing unit 343 Object map generation processing unit 344 Effective object area determination processing unit

Claims

A character extraction processing unit for extracting a character region existing in the document image information;
A line extraction processing unit for extracting a line segment existing in the document image information;
A table region extraction processing unit for extracting a table region using the line segment information extracted from the line extraction processing unit;
A predetermined local region is set for the document image information, a luminance histogram of the local region is created to determine luminance change information of the local region, and the luminance change information and the character extraction processing unit extract The outside of the table area using the information on the character area, the line segment information extracted from the line extraction processing section, and the table area information extracted from the table area extraction processing section, or A diagram area extraction processing unit for extracting an image object area including a picture or a photograph existing in the table area;
A table structuring processing unit that analyzes the table structure based on the information on the character area in the table area, the information on the line segment, and the information on the image object area, and acquires table structure information for reconfiguring the table; ,
A conversion processing apparatus comprising:

The character region extracted from the document image information by the character extraction processing unit includes a character string region including a character string,
The figure region extraction processing unit
A predetermined local region is set for the document image information, a luminance histogram of the local region is created to determine luminance change information of the local region, and the luminance change information and the character extraction processing unit extract Using the information on the character string region, the line segment information extracted from the line extraction processing unit, and the table region information extracted from the table region extraction processing unit. Or, extract the image object area including the figure or photo that exists in the table area,
The table structuring processing unit
Analyzing the table structure based on the information on the character string area in the table area, the information on the line segment, and the information on the image object area, and obtaining table structure information for reconfiguring the table. The conversion processing apparatus according to claim 1.

Order of the character string area object, the table area object, the line segment object, the image object, or the character string area object, the table area object, the image object, the line object The conversion processing apparatus according to claim 2, further comprising: a file description unit described in a file format designated to arrange objects in order.

The figure region extraction processing unit
Edge detection is performed on the document image information, and a character string region that is one of the document components extracted by the character extraction processing unit is excluded from the result of the edge detection. A non-character string map generation processing unit for generating a candidate non-character string map;
As the brightness change information, the entropy of the histogram in the local region of the document image information is calculated, and a region having a high value of the calculated entropy is added to the non-character string map as the image object region candidate. A column area addition processing unit;
For the non-character string map to which the candidate for the image object area is added, the line segment of the table area extracted by the table area extraction processing unit is deleted, and the non-character string from which the line segment of the table area is deleted Labeling the image object region by performing a labeling process on the map, and generating an object map by obtaining a rectangular region of the labeled image object region;
The conversion processing apparatus according to claim 2, further comprising:

The figure region extraction processing unit
The conversion processing apparatus according to claim 4, further comprising: an effective object area determination processing unit that performs integration processing of the rectangular regions or division processing of the rectangular regions for each of the rectangular regions of the image object region.

The effective object area determination processing unit
When the rectangular areas of the plurality of image object areas overlap, the maximum and minimum values of the rectangular area coordinates of the plurality of image objects are calculated, and the overlapping image object areas are integrated into one rectangular area. The conversion processing device according to claim 5, wherein:

The effective object area determination processing unit
When the character string area overlaps the rectangular area of the image object area, the maximum and minimum coordinates of the rectangular area of the image object and the area of the character string area are calculated, The conversion processing apparatus according to claim 5, wherein the character string area overlapping with the image object area is integrated into one rectangular area.

The effective object area determination processing unit
When the table area overlaps the rectangular area of the image object area, the overlapping area of the table area is excluded from the rectangular area of the image object area and the overlapping table is excluded. 8. The conversion according to claim 5, wherein the rectangular area of the image object area is divided along a horizontal frame line of the area or an extension line of the vertical frame line. 9. Processing equipment.

An information processing apparatus comprising the conversion processing apparatus according to claim 1.

Computer
Character extraction processing means for extracting a character region existing in the document image information;
Line extraction processing means for extracting a line segment existing in the document image information;
Table area extraction processing means for extracting a table area using the line segment information extracted by the line extraction processing means,
A predetermined local region is set for the document image information, a luminance histogram of the local region is created to determine luminance change information of the local region, and the luminance change information and the character extraction processing unit extract The outside of the table area using the information on the character area, the line segment information extracted by the line extraction processing means, and the table area information extracted by the table area extraction processing means, or Figure area extraction processing means for extracting a picture existing in the table area or an image object area including a photograph;
Table structuring processing means for analyzing the table structure based on the information on the character area in the table area, the information on the line segment, and the information on the image object area, and acquiring table structure information for reconfiguring the table;
Program to function as.

The computer-readable recording medium which recorded the program of Claim 10.