JP2014048851A

JP2014048851A - Layout holding device of spread sheet software

Info

Publication number: JP2014048851A
Application number: JP2012190683A
Authority: JP
Inventors: Takuya Kawano; 卓也川野; Takahiro Tsutsumi; 隆弘堤; Akira Oki; 亮大木
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2012-08-30
Filing date: 2012-08-30
Publication date: 2014-03-17
Anticipated expiration: 2032-08-30
Also published as: JP6065467B2

Abstract

PROBLEM TO BE SOLVED: To provide a layout holding device of a spread sheet software which does not damage an arrangement relation of a layout even when a copy is made on another sheet in which a column width is fixed, and realizes easy handling.SOLUTION: A layout holding device executes character recognition with respect to an input image acquired by reading a document image. When forming a spread sheet file by inserting a result into corresponding cells in spread sheet software, in a case where the result of the character recognition R1 (R2) positions over a plurality of cells C1 and C2, or in a case where a first character recognition result R1 and a second character recognition result R2 are contained within one cell of the spread sheet software, the layout holding device joins both the character recognition results R1 and R2 together or separates them from each other in accordance with the relation between text box widths D of the character recognition results R1 and R2 and a cell width d of the spread sheet software.

Description

この発明は、表計算ソフトによるファイル作成機能をもったＭＦＰ（Multi Function Peripherals) 等に適用される表計算ソフトのレイアウト保持装置に関する。 The present invention relates to a layout holding device for spreadsheet software applied to an MFP (Multi Function Peripherals) having a file creation function using spreadsheet software.

従来より、ＭＦＰ等には、スキャナ部で読み取った文書画像の画像データを入力画像とし、該入力画像における所望のテキスト領域内に対して光学式文字認識装置（ＯＣＲともいう）を使用して文字認識を行い、表計算ソフト、例えばマイクロソフト社が提供しているexcel(エクセルともいう）によるファイル形式を作成する機能を備えたものが存在する。 2. Description of the Related Art Conventionally, an MFP or the like uses an image data of a document image read by a scanner unit as an input image, and uses an optical character recognition device (also referred to as OCR) for a desired text area in the input image. Some of them have a function of recognizing and creating a file format using spreadsheet software such as excel provided by Microsoft.

そして、このようなエクセル等を利用して作成されたファイル形式のシートを別のシートにコピーする場合には、レイアウト等が適正に保持されることが大事である。 When a file format sheet created using such Excel is copied to another sheet, it is important that the layout is properly maintained.

また、従来、文書内のテキスト領域内に対する文字認識の変換結果（ＯＣＲ結果）をそれぞれ各テキスト領域（セル領域）に流し込む場合に、文字の縮小を極力回避しつつも、それらテキスト領域同士の配置関係を崩さないようにした技術が提案されている。 Conventionally, when the conversion result (OCR result) of character recognition for a text area in a document is poured into each text area (cell area), the arrangement of the text areas is avoided while avoiding character reduction as much as possible. A technique that does not break the relationship has been proposed.

具体的には、レイアウト解析部で入力画像中の各領域を求めてＯＣＲを行い、翻訳する場合に、配列構造解析部では、ページの外周および各領域間の余白に境界線を設定し、レイアウト処理部では、各領域に対して領域内のＯＣＲ結果と翻訳結果を記載し、配列構造調整部では、ページの外周の境界線をページ外側へ、また、空白領域に隣接する境界線を空白領域の方向へそれぞれ移動させ、その移動に応じて領域を移動または拡大し、すべての領域がページ内に収まるまで、境界線の移動と領域の移動、または拡大を行なうようにした技術である（例えば、特許文献１参照）。 Specifically, when the layout analysis unit obtains each region in the input image, performs OCR, and translates, the sequence structure analysis unit sets a boundary line on the outer periphery of the page and the margin between each region, and performs layout. In the processing unit, the OCR result and translation result in each region are described for each region, and in the arrangement structure adjusting unit, the boundary line on the outer periphery of the page is outside the page, and the boundary line adjacent to the blank region is a blank region. In this technique, the area is moved or enlarged according to the movement, and the boundary line is moved and the area is moved or enlarged until all the areas are within the page (for example, , See Patent Document 1).

特開２００９−２３０６０５号公報JP 2009-230605 A

しかし、一般に、エクセル等の表計算ソフトのシ−トを他のシートにコピーした際、別のシートの列幅等の違い等からレイアウトが崩れてしまうことも少なくなく、その場合、使い回しがしにくくなる。 However, in general, when a sheet of spreadsheet software such as Excel is copied to another sheet, the layout is often corrupted due to the difference in the column width of another sheet. It becomes difficult to do.

また、先行技術においては、テキスト領域同士の配置関係を崩さないようにするためにテキスト領域の高さおよび幅を可変にしており、このファイル形式では、配置関係が保たれるが、別のシートへのコピーを行なう際に、レイアウトの配置関係が崩れてしまう。これは、表計算ソフトのシートをコピーする際に、コピー先のシートでは、列幅がデフォルト値で固定されるためである。よって、先行技術では、テキスト領域の列幅が変更された場合、列方向（幅方向）に配置がずれてしまい、配置関係が維持できなくなる。 Also, in the prior art, the height and width of the text area are made variable so as not to break the arrangement relation between the text areas. In this file format, the arrangement relation is maintained, but another sheet is used. When copying to, the layout arrangement relationship is broken. This is because the column width is fixed to the default value in the copy destination sheet when copying the spreadsheet sheet. Therefore, in the prior art, when the column width of the text area is changed, the arrangement is shifted in the column direction (width direction), and the arrangement relationship cannot be maintained.

この発明は、上記実情に鑑みてなされたものであり、シートの列幅を固定した他のシートにコピーした際でも、レイアウトの配置関係を崩すことなく、良好なハンドリングが行なえる表計算ソフトのレイアウト保持装置を提供することを課題としている。 The present invention has been made in view of the above circumstances, and even when copying to another sheet with a fixed column width of the sheet, spreadsheet software that can perform good handling without destroying the layout arrangement relationship. It is an object to provide a layout holding device.

上記課題は、以下の手段によって解決される。
（１）文書画像を読み込んで画像データに変換して出力する読み取り手段と、前記読み取り手段による画像データを入力画像とし、その入力画像の文字領域に対して文字認識を行なう文字認識手段と、文字認識手段により得られた複数の文字認識結果を表計算ソフトにおける対応セルに挿入して表計算ファイルを作成するファイル作成手段と、前記文字認識結果が複数のセルに跨がっている場合、または、第１文字認識結果と第２文字認識結果が表計算ソフトの一つのセルの中に内包されている場合には、文字認識結果のテキストボックス幅と前記表計算ソフトのセル幅の関係に応じて、前記両文字認識結果を結合、または分離するように制御する制御手段と、を備えていることを特徴する表計算ソフトのレイアウト保持装置。
（２）前記制御手段は、第１文字認識結果と第２文字認識結果とで一つの単語になる場合には、第１文字認識結果と第２文字認識結果との間に空白を加入してレイアウトを保持した状態で両文字認識結果を結合する前項１に記載の表計算ソフトのレイアウト保持装置。
（３）前記制御手段は、第１文字認識結果と第２文字認識結果とで一つの単語・文節にならない場合には、第１文字認識結果と第２文字認識結果をそれぞれ前記表計算ソフトの別々のセルに挿入する前項１に記載の表計算ソフトのレイアウト保持装置。
（４）前記制御手段は、文字認識結果のテキストボックス幅が前記表計算ソフトのセル幅より大きい場合には、該文字認識結果に対して構文解析を行う機能を有しており、前記制御手段は、前記構文解析の結果、文字認識結果を単語・文節として分離できない場合には、該文字認識結果をそのまま表計算ソフトのセルに挿入する前項１に記載の表計算ソフトのレイアウト保持装置。
（５）前記制御手段は、文字認識結果のテキストボックス幅が表計算ソフトのセル幅より大きく、該文字認識結果に対して構文解析を行った結果、該表計算ソフトのセル境界線で、単語・文節として分離できる場合には、それら分離可能な複数の文字認識結果を別々のセルに挿入する前項４に記載の表計算ソフトのレイアウト保持装置。
（６）前記制御手段は、第１文字認識結果が第２文字認識結果に対して行方向で一定距離だけずれている場合には、第１文字認識結果と第２文字認識結果を表計算ソフトの同一行の別々のセルに挿入できるように、第１文字認識結果もしくは第２文字認識結果の位置を調整する前項１〜５のいずれかに記載の表計算ソフトのレイアウト保持装置。 The above problem is solved by the following means.
(1) Reading means for reading a document image, converting it into image data, and outputting it; character recognition means for using the image data from the reading means as an input image and performing character recognition on a character area of the input image; A file creation means for creating a spreadsheet file by inserting a plurality of character recognition results obtained by the recognition means into corresponding cells in the spreadsheet software, and when the character recognition results straddle a plurality of cells, or When the first character recognition result and the second character recognition result are included in one cell of the spreadsheet software, the character recognition result depends on the relationship between the text box width and the spreadsheet software cell width. And a control means for controlling to combine or separate the character recognition results.
(2) When the first character recognition result and the second character recognition result result in one word, the control means adds a space between the first character recognition result and the second character recognition result. 2. The layout holding device for spreadsheet software according to item 1, wherein the character recognition results are combined in a state where the layout is held.
(3) When the first character recognition result and the second character recognition result do not result in one word / phrase, the control means displays the first character recognition result and the second character recognition result respectively in the spreadsheet software. 2. The layout holding device for spreadsheet software according to item 1, which is inserted into separate cells.
(4) The control means has a function of performing syntax analysis on the character recognition result when the text box width of the character recognition result is larger than the cell width of the spreadsheet software. 2. The layout holding device for spreadsheet software according to item 1, wherein if the character recognition result cannot be separated as a word / phrase as a result of the syntax analysis, the character recognition result is directly inserted into a cell of the spreadsheet software.
(5) The control means is configured such that the text box width of the character recognition result is larger than the cell width of the spreadsheet software, and the character recognition result is subjected to syntax analysis. The layout holding device for spreadsheet software according to item 4 above, in which, if the phrase can be separated, the plurality of separable character recognition results are inserted into separate cells.
(6) When the first character recognition result is deviated from the second character recognition result by a certain distance in the row direction, the control means calculates the first character recognition result and the second character recognition result using spreadsheet software. The layout holding device for spreadsheet software according to any one of the preceding items 1 to 5, wherein the position of the first character recognition result or the second character recognition result is adjusted so that it can be inserted into different cells in the same row.

前項（１）に記載の発明によれば、読み取り手段により文書画像を読み込んで得られた入力画像に対して文字認識が行なわれ、得られた複数の文字認識結果が、ファイル作成手段により表計算ソフトにおける対応セルに挿入されて、表計算ファイルが作成される。 According to the invention described in item (1) above, character recognition is performed on the input image obtained by reading the document image by the reading means, and a plurality of obtained character recognition results are spreadsheeted by the file creating means. A spreadsheet file is created by inserting into the corresponding cell in the software.

その場合、前記文字認識結果が前記表計算ソフトの複数のセルに跨がっている場合、または、第１文字認識結果と第２文字認識結果が表計算ソフトの一つのセルの中に内包されている場合には、制御手段により、文字認識結果のテキストボックス幅と表計算ソフトのセル幅の関係に応じて、前記両文字認識結果を結合、または分離するように制御されるので、表計算ソフト内のシート列幅がデフォルトで固定されている場合であっても、他のシートにコピーした際にレイアウトを崩さずに、ハンドリングの良い表計算ソフトのシートの作成が可能となる。 In that case, when the character recognition result extends over a plurality of cells of the spreadsheet software, or the first character recognition result and the second character recognition result are included in one cell of the spreadsheet software. If so, the control means controls to combine or separate the character recognition results according to the relationship between the text recognition result text box width and the spreadsheet software cell width. Even if the sheet row width in the software is fixed by default, it is possible to create a spreadsheet of spreadsheet software with good handling without destroying the layout when copied to another sheet.

前項（２）に記載の発明によれば、第１文字認識結果と第２文字認識結果とで一つの単語になる場合には、第１文字認識結果と第２文字認識結果との間に空白を加入してレイアウトを保持した状態で両文字認識結果が結合されるので、テキストファイルにコピーした際のハンドリングが容易となる。 According to the invention described in the preceding item (2), when the first character recognition result and the second character recognition result become one word, a space is provided between the first character recognition result and the second character recognition result. Since both character recognition results are combined in a state in which the layout is maintained by joining, the handling when copying to a text file becomes easy.

前項（３）に記載の発明によれば、第１文字認識結果と第２文字認識結果とで一つの単語・文節にならない場合には、第１文字認識結果と第２文字認識結果が表計算ソフトの別々のセルに挿入されるので、表計算ソフトの他のシートにコピーした際、ハンドリングやすく、検索が行ないやすいファイルの作成が可能となる。 According to the invention described in the preceding item (3), when the first character recognition result and the second character recognition result do not become one word / phrase, the first character recognition result and the second character recognition result are spreadsheeted. Since it is inserted into a separate cell of the software, it is possible to create a file that is easy to handle and search when copied to another sheet of spreadsheet software.

前項（４）に記載の発明によれば、文字認識結果のテキストボックス幅が前記表計算ソフトのセル幅より大きい場合には、該第文字認識結果に対して構文解析が行なわれ、第３文字認識結果を単語・文節として分離できない場合には、文字認識結果が表計算ソフトのセルにそのまま挿入されるので、表計算ソフトの他のシートにファイルをコピーした際のハンドリングがしやすく、検索が行ないやすいファイルの作成が可能となる。 According to the invention described in item (4) above, when the text box width of the character recognition result is larger than the cell width of the spreadsheet software, the third character recognition result is parsed. If the recognition result cannot be separated into words / phrases, the character recognition result is inserted into the spreadsheet software cell as it is, making it easier to handle when copying files to other spreadsheet software sheets. It is possible to create a file that is easy to perform.

前項（５）に記載の発明によれば、文字認識結果のテキストボックス幅が表計算ソフトのセル幅より大き、該文字認識結果に対して構文解析を行った結果、該表計算ソフトのセル境界線で、単語・文節として分離できる場合には、それら分離可能な複数の文字認識結果を該表計算ソフトの別々のセルに挿入することで、他のシートにファイルをコピーした際のハンドリングがしやすく、検索が行いやすいファイルの作成が可能となる。 According to the invention described in item (5), the text box width of the character recognition result is larger than the cell width of the spreadsheet software, and the result of parsing the character recognition result is that the cell boundary of the spreadsheet software is If the line can be separated as a word / sentence, it can be handled when the file is copied to another sheet by inserting the separated character recognition results into separate cells of the spreadsheet software. It is easy to create a file that is easy to search.

前項（６）に記載の発明によれば、第１文字認識結果が第２文字認識結果に対して行方向で一定距離だけずれている場合には、第１文字認識結果と第２文字認識結果を表計算ソフトの同一行の別々のセルに挿入できるように、第１文字認識結果もしくは第２文字認識結果の位置が調整されるので、他のシートにファイルをコピーした際のハンドリングがしやすく、検索が行ないやすいファイルの作成が可能となる。 According to the invention described in (6) above, when the first character recognition result is deviated from the second character recognition result by a certain distance in the row direction, the first character recognition result and the second character recognition result. The position of the first character recognition result or the second character recognition result is adjusted so that the file can be inserted into different cells in the same row of the spreadsheet software, making it easier to handle when copying files to other sheets This makes it possible to create a file that is easy to search.

この発明の一実施形態に係る表計算ソフトのレイアウト保持装置が適用された画像形成装置の電気的構成を示すブロック図である。1 is a block diagram showing an electrical configuration of an image forming apparatus to which a spreadsheet software layout holding device according to an embodiment of the present invention is applied. FIG. 入力画像についてＯＣＲ認識したときの複数のＯＣＲ結果をエクセルの各セルに挿入して、表計算ファイルを作成する場合の説明図である。It is explanatory drawing at the time of inserting the some OCR result at the time of OCR recognition about an input image into each cell of Excel, and creating a spreadsheet file. 複数のＯＣＲ結果から表計算ソフトの一つのセルの中に複数のＯＣＲ結果が挿入される場合の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in case a some OCR result is inserted in one cell of spreadsheet software from a some OCR result. 複数のＯＣＲ結果から表計算ソフトの複数のセルにＯＣＲ結果が挿入される場合の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process in case an OCR result is inserted in the several cell of spreadsheet software from a several OCR result. 複数のＯＣＲ結果からあるＯＣＲ結果が挿入されるセルに注目し、該セルに隣接するセルにＯＣＲ結果が挿入される場合の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of a process when paying attention to the cell in which a certain OCR result is inserted from several OCR results, and inserting an OCR result in the cell adjacent to this cell. 表計算ソフトで作成されるファイル例の説明図である。It is explanatory drawing of the example of a file produced with spreadsheet software. 表計算ソフトで作成される別のファイル例の説明図である。It is explanatory drawing of the example of another file produced with spreadsheet software. 表計算ソフトで作成されるさらに別のファイル例の説明図である。It is explanatory drawing of another example of a file produced with spreadsheet software. 表計算ソフトで作成されるさらに別のファイル例の説明図である。It is explanatory drawing of another example of a file produced with spreadsheet software.

以下、この発明の実施形態を図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、この発明の一実施形態に係る表計算ソフトのレイアウト保持装置が適用された多機能デジタル複合機（以下、ＭＦＰともいう）の電気的構成を示すブロック図である。 FIG. 1 is a block diagram showing an electrical configuration of a multi-function digital multi-function peripheral (hereinafter also referred to as MFP) to which a layout holding device for spreadsheet software according to an embodiment of the present invention is applied.

図１において、このＭＦＰは、例えば、ＣＰＵ１と、ＲＯＭ２と、ＲＡＭ３と、スキャナ部４と、操作パネル部５と、記憶部６と、プリンタ部７と、外部インターフェース部８とを備えている。 In FIG. 1, the MFP includes, for example, a CPU 1, a ROM 2, a RAM 3, a scanner unit 4, an operation panel unit 5, a storage unit 6, a printer unit 7, and an external interface unit 8.

前記ＣＰＵ１は、ＭＦＰの全体の動作を統括制御する他に、表計算ファイル形式の作成部１１、さらには、ＯＣＲ結果に対して構文を解析する構文解析部１２等の機能を備えている。 In addition to the overall control of the overall operation of the MFP, the CPU 1 has functions such as a spreadsheet file format creation unit 11 and a syntax analysis unit 12 that analyzes the syntax of the OCR result.

前記ＲＯＭ２は、ＣＰＵ１の動作プログラム等が格納されたメモリであり、前記ＲＡＭ３は、ＣＰＵ１が動作プログラムに基づいて動作する際に作業領域を提供するメモリである。 The ROM 2 is a memory in which an operation program of the CPU 1 is stored, and the RAM 3 is a memory that provides a work area when the CPU 1 operates based on the operation program.

前記スキャナ部４は、文書画像を読み取って電子データである画像データに変換して出力する読み取り部を構成するものである。また、スキャナ部４は、前記得られた画像データを入力画像として複数の必要部分領域に対して文字認識を行なうＯＣＲ（光学式文字認識装置）４１の機能も有している。 The scanner unit 4 constitutes a reading unit that reads a document image, converts it into image data that is electronic data, and outputs the image data. The scanner unit 4 also has a function of an OCR (optical character recognition device) 41 that performs character recognition on a plurality of necessary partial areas using the obtained image data as an input image.

前記操作パネル部５は、スタートキー、ストップキー、テンキー等のハードキーを備えているキー部５１と、液晶表示装置（ＬＣＤ）等からなる表示部５２とを備えている。 The operation panel unit 5 includes a key unit 51 including hard keys such as a start key, a stop key, and a numeric keypad, and a display unit 52 including a liquid crystal display (LCD).

前記記憶部６は、各種データやアプリケーションソフトを記憶する記憶手段であり、例えばハードディスク装置（ＨＤＤ）等からなり、ここでは、表計算ソフトとしてのエクセルやＯＣＲ用のソフト等が格納されている。 The storage unit 6 is a storage unit that stores various data and application software, and includes, for example, a hard disk device (HDD). Here, Excel as spreadsheet software, OCR software, and the like are stored.

前記プリンタ部７は、画像データを用紙に印刷するエンジン部を構成するものである。 The printer unit 7 constitutes an engine unit that prints image data on paper.

前記外部インターフェース部８は、ネートワークを介して接続されているユーザ端末等との間での通信を司るものである。 The external interface unit 8 controls communication with a user terminal or the like connected via a network.

図２（Ａ）〜（Ｃ）は、前記スキャナ部４により原稿を読み取って得られた画像データを入力画像とし、その入力画像の複数の必要部分領域についてＯＣＲ認識を行い、複数のＯＣＲ結果をエクセルの各セルに挿入して、表計算ファイルを作成する場合の説明図である。 2A to 2C, image data obtained by reading a document by the scanner unit 4 is used as an input image, OCR recognition is performed on a plurality of necessary partial areas of the input image, and a plurality of OCR results are obtained. It is explanatory drawing at the time of inserting in each cell of Excel and creating a spreadsheet file.

この例では、入力画像における必要部分領域に対してＯＣＲ認識を行い、得られた複数のＯＣＲ結果Ｒ１，Ｒ２が複数（互いに隣り合う二つ）のセルＣ１，Ｃ２に跨がっている場合、または、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２がエクセルの一つのセルＣ１の中に内包されているとき、文字認識結果のテキストボックス幅Ｄとエクセルのセル幅ｄの関係により、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とを結合、または分離する場合を示している。 In this example, when OCR recognition is performed on a necessary partial region in the input image, and a plurality of obtained OCR results R1 and R2 extend over a plurality of (two adjacent to each other) cells C1 and C2, Alternatively, when the OCR result R1 and the OCR result R2 are included in one cell C1 of Excel, the OCR result R1 and the OCR result R2 depend on the relationship between the text box width D of the character recognition result and the cell width d of Excel. And the case where these are combined or separated.

図２（Ａ）では、同図左部に示すように、ＯＣＲ結果Ｒ２が二つのセルＣ１，Ｃ２にま跨がっており、構文解析の結果、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とで一つの単語・文節にならない場合を示している。この場合、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とを別々のセルＣ１，Ｃ２に挿入してある（同図右端の丸数字１）。 In FIG. 2A, as shown in the left part of FIG. 2, the OCR result R2 extends over the two cells C1 and C2. As a result of the syntax analysis, one OCR result R1 and one OCR result R2 exist. It shows the case where it does not become a word or phrase. In this case, the OCR result R1 and the OCR result R2 are inserted into separate cells C1 and C2 (the circled number 1 at the right end of the figure).

この場合、エクセルの他のシートにコピーした際、ハンドリングがしやすく、検索が行ないやすいファイルの作成が可能となる。 In this case, when copying to another sheet of Excel, it becomes possible to create a file that is easy to handle and easy to search.

また、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とで一つの単語・文節になる場合には、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２との間に空白（斜線表示）Ｇを加入して両者Ｒ１，Ｒ２を結合した状態で各セルＣ１，Ｃ２に挿入してある（同図右端の丸数字２）。 In addition, when the OCR result R1 and the OCR result R2 become one word / phrase, a blank (indicated by hatching) G is added between the OCR result R1 and the OCR result R2, and both R1 and R2 are combined. In the state, it is inserted in each of the cells C1 and C2 (circle number 2 at the right end of the figure).

このように、ＯＣＲ１とＯＣＲ２との間に空白Ｇを加入してレイアウトを保持した状態で両ＯＣＲＲ１，Ｒ２が結合されるので、テキストファイルにコピーした際のハンドリングが容易となる。 Thus, since both OCRR1 and R2 are joined in a state in which a blank G is added between OCR1 and OCR2 and the layout is maintained, handling when copying to a text file is facilitated.

図２（Ｂ）は、入力画像における必要部分領域に対してＯＣＲ認識を行い、その結果、テキストボックスの幅Ｄがセル幅ｄよりも大きい場合を示している。 FIG. 2B shows a case where OCR recognition is performed on a necessary partial region in the input image, and as a result, the width D of the text box is larger than the cell width d.

図２（Ｂ）では、同図左部に示すように、ＯＣＲ認識した結果Ｒ３のテキストボックスの幅Ｄがセル幅ｄよりも大きいので、このＯＣＲ結果Ｒ３の構文解析を行う。そして、エクセルのセル境界線Ｌで単語・文節が分離可能で、かつ、ＯＣＲ結果Ｒ３をＯＣＲ結果Ｒ３ＡとＯＣＲ結果Ｒ３Ｂに分離できる場合は、これらＯＣＲ結果Ｒ３Ａ，Ｒ３Ｂを別々のセルＣ１，Ｃ２に挿入する（同図右端の丸数字３）。 In FIG. 2B, as shown in the left part of FIG. 2, since the width D of the text box of the result R3 recognized by OCR is larger than the cell width d, the OCR result R3 is parsed. If the word / phrase can be separated at the cell boundary L of Excel and the OCR result R3 can be separated into the OCR result R3A and the OCR result R3B, the OCR results R3A and R3B are separated into separate cells C1 and C2. Insert (circle number 3 at the right end of the figure).

このように、ＯＣＲ結果Ｒ３を分離したＯＣＲ結果Ｒ３Ａ，Ｒ３Ｂが別々のセルＣ１，Ｃ２に挿入されるので、エクセルの他のシートにファイルをコピーした際のハンドリングがしやすく、検索が行ないやすいファイルの作成が可能となる。 As described above, since the OCR results R3A and R3B obtained by separating the OCR result R3 are inserted into the separate cells C1 and C2, the file is easy to handle and search when the file is copied to another sheet of Excel. Can be created.

一方、前記ＯＣＲ結果Ｒ３の構文解析を行い、単語・文節として分離できないものであれば、ＯＣＲ結果Ｒ３を、そのままエクセルのセルＣ１に挿入する（同図右端の丸数字４）。 On the other hand, if the OCR result R3 is parsed and cannot be separated as a word / phrase, the OCR result R3 is inserted into the Excel cell C1 as it is (circle number 4 at the right end of the figure).

このように、ＯＣＲ結果Ｒ３がエクセルのセルＣ１にそのまま挿入されるので、エクセルの他のシートにファイルをコピーした際のハンドリングがしやすく、検索が行ないやすいファイルの作成が可能となる。 As described above, since the OCR result R3 is inserted into the Excel cell C1 as it is, it is possible to create a file that can be easily handled and searched when the file is copied to another sheet of Excel.

図２（Ｃ）は、エクセルにおけるあるセルＣ１（Ｃ２）に着目し、ＯＣＲ結果Ｒ１（Ｒ２）がＯＣＲ結果Ｒ２（Ｒ１）に対して行方向で一定距離だけずれている場合には、ＯＣＲ結果Ｒ１（Ｒ２）とＯＣＲ結果Ｒ２（Ｒ１）を同一行の別々のセルＣ１，Ｃ２に当てはめるように、ＯＣＲ結果Ｒ１（Ｒ２）もしくはＯＣＲ結果Ｒ２（Ｒ１）の位置を調整するようになっている。 FIG. 2C focuses on a certain cell C1 (C2) in Excel. When the OCR result R1 (R2) is deviated from the OCR result R2 (R1) by a certain distance in the row direction, the OCR result The position of the OCR result R1 (R2) or the OCR result R2 (R1) is adjusted so that the R1 (R2) and the OCR result R2 (R1) are applied to different cells C1 and C2 in the same row.

図２（Ｃ）では、同図左部に示すように、ＯＣＲ結果Ｒ１がＯＣＲ結果Ｒ２に対して行方向で一定距離だけずれているので、ＯＣＲ結果Ｒ１の位置を調整し、これらＯＣＲ結果Ｒ１，Ｒ２を別々のセルＣ１，Ｃ２に挿入する（同図右端の丸数字５）。 In FIG. 2C, as shown in the left part of FIG. 2, since the OCR result R1 is shifted by a certain distance in the row direction with respect to the OCR result R2, the position of the OCR result R1 is adjusted, and these OCR results R1 , R2 are inserted into separate cells C1, C2 (circle number 5 at the right end of the figure).

このように、ＯＣＲ結果Ｒ１（Ｒ２）もしくはＯＣＲ結果Ｒ２（Ｒ１）の行方向の位置を調整することにより、エクセルの他のシートにファイルをコピーした際のハンドリングがしやすく、検索が行ないやすいファイルの作成が可能となる。 As described above, by adjusting the position in the row direction of the OCR result R1 (R2) or the OCR result R2 (R1), the file is easy to handle and search when the file is copied to another sheet of Excel. Can be created.

図３は、複数のＯＣＲ結果Ｒ１，Ｒ２から図２（Ａ）に示すように、エクセルの一つのセルＣ１（Ｃ２）の中に複数のＯＣＲ結果Ｒ１，Ｒ２が入る場合の処理の流れを示すフローチャートである。 FIG. 3 shows the flow of processing when a plurality of OCR results R1 and R2 enter one cell C1 (C2) of Excel as shown in FIG. 2A from a plurality of OCR results R1 and R2. It is a flowchart.

図３において、ステップＳ１で、スキャナ部４により文書画像を読み取り、その画像データを入力画像（画像入力）とする。 In FIG. 3, in step S1, a document image is read by the scanner unit 4, and the image data is set as an input image (image input).

ステップＳ２では、入力画像の全面に対してＯＣＲ認識を行なう。なお、必要な部位のみにＯＣＲ認識を行っても良い。 In step S2, OCR recognition is performed on the entire input image. Note that OCR recognition may be performed only on necessary portions.

ステップＳ３では、ある部分領域についてのＯＣＲ処理により、入力画像内の文字または文字列をＯＣＲ結果Ｒ１として出力する。また、ステップＳ４では、別の部分領域についてのＯＣＲ処理により、入力画像内の文字、または文字列をＯＣＲ結果Ｒ２として出力する。 In step S3, the character or character string in the input image is output as the OCR result R1 by OCR processing for a certain partial region. In step S4, a character or character string in the input image is output as an OCR result R2 by OCR processing for another partial region.

ステップＳ５では、エクセルの一つのセルＣ１の中に、複数（例えば２つ）のＯＣＲ結果Ｒ１，Ｒ２が入る場合、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とが、即ち（Ｒ１＋Ｒ２）が一つの単語または文節を構成するか否かを判断する。 In step S5, when a plurality of (for example, two) OCR results R1 and R2 are included in one cell C1 of Excel, the OCR result R1 and the OCR result R2, that is, (R1 + R2) is one word or phrase. Is determined.

ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とが一つの単語、または文節を構成すれば（ステップＳ５の判定がＹＥＳ）、ステップＳ６では、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とを結合して、図２（Ａ）の丸数字２に示すように、一つのセルＣ１に挿入する。この場合、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２との間には、レイアウトを保持するために、空白Ｇ（斜線表示）Ｇを加入する。 If the OCR result R1 and the OCR result R2 constitute one word or phrase (the determination in step S5 is YES), in step S6, the OCR result R1 and the OCR result R2 are combined, and FIG. As shown by the circled number 2 in FIG. In this case, a blank G (shaded display) G is added between the OCR result R1 and the OCR result R2 in order to maintain the layout.

ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とが一つの単語、または文節を構成しなければ（ステップＳ５の判定がＮＯ）、ステップＳ７で、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とを結合することなく、図２（Ａ）の丸数字１に示すように、ＯＣＲ結果Ｒ１をセルＣ１に挿入し、ＯＣＲ結果Ｒ２をその右隣のセルＣ２に挿入する。 If the OCR result R1 and the OCR result R2 do not constitute one word or phrase (the determination in step S5 is NO), the OCR result R1 and the OCR result R2 are not combined in step S7 (FIG. 2). As indicated by the circled number 1 in A), the OCR result R1 is inserted into the cell C1, and the OCR result R2 is inserted into the cell C2 adjacent to the right.

このように、エクセル内のシートの列幅を固定したものに対して、ＯＣＲ結果Ｒ１，Ｒ２がエクセルの一つのセルＣ１（Ｃ２）に内包されている場合には、分離してセルＣ１（Ｃ２）にそれぞれ代入し、ＯＣＲ結果Ｒ１，Ｒ２が二つのセルＣ１，Ｃ２に跨がっている場合には、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２とを結合してセルＣ１に挿入するので、エクセルシートに対してレイアウトが保持されて、使い回し（ハンドリング）が良くなる。 As described above, when the OCR results R1 and R2 are included in one cell C1 (C2) of Excel with respect to the sheet having a fixed column width in Excel, the cells C1 (C2) are separated. ), And the OCR results R1 and R2 straddle the two cells C1 and C2, the OCR result R1 and the OCR result R2 are combined and inserted into the cell C1. On the other hand, the layout is maintained and the handling (handling) is improved.

とくに、ＯＣＲ結果Ｒ１，Ｒ２で一つの単語・文節になる場合には、ＯＣＲ結果Ｒ１との間に空白Ｇを加入するので、レイアウトが良好に保持された状態となる。 In particular, when the OCR results R1 and R2 result in one word / phrase, a blank G is added between the OCR result R1 and the layout is maintained well.

図４は、複数のＯＣＲ結果から図２（Ｂ）に示すように、入力画像における必要部分領域に対してＯＣＲ認識を行い、その結果、テキストボックスの幅Ｄがセル幅ｄよりも大きい場合の処理の流れを示すフローチャートである。 FIG. 4 shows a case where OCR recognition is performed on a necessary partial region in the input image from a plurality of OCR results, and as a result, the width D of the text box is larger than the cell width d. It is a flowchart which shows the flow of a process.

図４において、ステップＳ１１で、スキャナ部４により文書画像を読み取り、その画像データを入力画像（画像入力）とする。 4, in step S11, a document image is read by the scanner unit 4, and the image data is set as an input image (image input).

ステップＳ１２では、入力画像の全面に対してＯＣＲ認識を行なう。 In step S12, OCR recognition is performed on the entire input image.

ステップＳ１３では、ある部分領域についてのＯＣＲ処理により、入力画像内の文字または文字列をＯＣＲ結果（Ｒ１）として出力する。また、ステップＳ１４では、別の部分領域についてのＯＣＲ処理により、入力画像内の文字または文字列をＯＣＲ結果（Ｒ２）として出力する。 In step S13, the character or character string in the input image is output as an OCR result (R1) by OCR processing for a certain partial region. In step S14, the character or character string in the input image is output as an OCR result (R2) by OCR processing for another partial region.

ステップＳ１５では、エクセルの複数のセルＣ１（Ｃ２）の中に、ＯＣＲ結果Ｒ３が入り、構文解析した結果、別の単語・文節であるＯＣＲ結果Ｒ３ＡとＯＣＲ結果Ｒ３Ｂに分離できるか否かを判断する。 In step S15, the OCR result R3 is entered into a plurality of Excel cells C1 (C2), and as a result of the syntax analysis, it is determined whether the OCR result R3A and the OCR result R3B, which are different words / phrases, can be separated. To do.

構文解析した結果、ＯＣＲ結果Ｒ３Ａ、ＯＣＲ結果Ｒ３Ｂに分離できる場合には（ステップＳ１５の判定がＹＥＳ）、ステップＳ１６では、ＯＣＲ結果Ｒ３をＯＣＲ結果Ｒ３ＡとＯＣＲ結果Ｒ３Ｂに分割して、図２（Ｂ）の丸数字に示すように、ＯＣＲ結果Ｒ３Ｂの左上座標に対応するセルＣ１，Ｃ２にＯＣＲ結果Ｒ３Ａ、ＯＣＲ結果Ｒ３Ｂを挿入してから、終了する。 As a result of the syntax analysis, when the OCR result R3A and the OCR result R3B can be separated (the determination in step S15 is YES), in step S16, the OCR result R3 is divided into the OCR result R3A and the OCR result R3B, and FIG. As indicated by the circled numbers in B), the OCR result R3A and the OCR result R3B are inserted into the cells C1 and C2 corresponding to the upper left coordinates of the OCR result R3B, and then the process ends.

構文解析した結果、ＯＣＲ結果Ｒ３Ａ、ＯＣＲ結果Ｒ３Ｂに分離できない場合には（ステップＳ１５の判定がＮＯ）、ステップＳ１７で、図２（Ｂ）の丸数字４に示すように、ＯＣＲ結果Ｒ３の左上座標に対応するセルにＯＣＲ結果Ｒ３を分離することなく挿入する。 As a result of the syntax analysis, when separation into the OCR result R3A and the OCR result R3B cannot be made (determination in step S15 is NO), the upper left of the OCR result R3 is shown in step S17 as indicated by the circled number 4 in FIG. The OCR result R3 is inserted into the cell corresponding to the coordinates without separation.

図５は、複数のＯＣＲ結果から図２（Ｃ）の丸数字５に示すように、あるＯＣＲ結果が挿入されるセルＣ１（Ｃ２）に注目し、そのセルＣ１（Ｃ２）に隣接するセルＣ２（Ｃ１）にＯＣＲ結果がＲ２（Ｒ１）が挿入されていた場合の処理の流れを示すフローチャートである。 FIG. 5 focuses on a cell C1 (C2) into which an OCR result is inserted as shown by a circled number 5 in FIG. 2C from a plurality of OCR results, and a cell C2 adjacent to the cell C1 (C2). It is a flowchart which shows the flow of a process when the OCR result R2 (R1) is inserted in (C1).

図５において、ステップＳ２１で、スキャナ部４により文書画像を読み取り、その画像データを入力画像（画像入力）とする。 In FIG. 5, in step S21, a document image is read by the scanner unit 4, and the image data is set as an input image (image input).

ステップＳ２２では、入力画像の全面に対してＯＣＲ認識を行なう。 In step S22, OCR recognition is performed on the entire input image.

ステップＳ２３では、ある部分領域についてのＯＣＲ処理により、入力画像内の文字、または文字列をＯＣＲ結果Ｒ１として出力する。また、ステップＳ２４では、別の部分領域についてのＯＣＲ処理により、入力画像内の文字、または文字列をＯＣＲ結果Ｒ２として出力する。 In step S23, the character or character string in the input image is output as the OCR result R1 by OCR processing for a certain partial region. In step S24, the character or character string in the input image is output as the OCR result R2 by OCR processing for another partial region.

ステップＳ２５では、あるＯＣＲ結果が挿入されるセルＣ１に注目し、そのセルＣ１に隣接するセルＣ２にＯＣＲ結果Ｒ２が挿入されている場合、「セルＣ１のＯＣＲ結果Ｒ１とセルＣ２のＯＣＲ結果Ｒ２」を構文解析し、一つの単語、または文節と判断できるか否かを判断する。 In step S25, paying attention to the cell C1 in which a certain OCR result is inserted, and when the OCR result R2 is inserted in the cell C2 adjacent to the cell C1, "OCR result R1 of the cell C1 and OCR result R2 of the cell C2" "Is parsed to determine whether it can be determined as a single word or phrase.

一つの単語または文節と判断できる場合には（ステップＳ２５の判定がＹＥＳ）、ステップＳ２６で、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２を一つのセルに入れる。 If it can be determined as one word or phrase (YES in step S25), the OCR result R1 and the OCR result R2 are put in one cell in step S26.

「セルＣ１のＯＣＲ結果Ｒ１とセルＣ２のＯＣＲ結果Ｒ２」を構文解析し、一つの単語または文節と判断できない場合には（ステップＳ２５の判定がＮＯ）、ステップＳ２７で、ＯＣＲ結果Ｒ１とＯＣＲ結果Ｒ２を元々の対応するセルＣ１とＣ２に挿入する。 When “the OCR result R1 of the cell C1 and the OCR result R2 of the cell C2” are parsed and cannot be determined as one word or phrase (the determination in step S25 is NO), the OCR result R1 and the OCR result are determined in step S27. Insert R2 into the original corresponding cells C1 and C2.

図６は、エクセルで作成されるファイル例の説明図である。 FIG. 6 is an explanatory diagram of an example of a file created by Excel.

図６において、左側が文書画像をスキャンして得られた入力画像Ｖでの形態であり、また、右側が文書画像をＯＣＲ認識し、そのＯＣＲ結果Ｒ１をエクセルシートのセルに挿入した形態を示している。 In FIG. 6, the left side is a form of an input image V obtained by scanning a document image, and the right side is a form in which the document image is OCR-recognized and the OCR result R1 is inserted into a cell of an Excel sheet. ing.

なお、左側の入力画像Ｖの形態では、所定のジョブに従って印刷に利用され、また、右側のエクセルシートでの形態は、テキストに再利用される。 Note that, in the form of the left input image V, it is used for printing according to a predetermined job, and the form in the right Excel sheet is reused for text.

図７は、同じくエクセルで作成される別のファイル例の説明図である。 FIG. 7 is an explanatory diagram of another example of a file that is also created in Excel.

図７において、前記文書画像をスキャンする際、文書が複数ページにわたる場合には、１シートだけでは、標準表示で何ページまでスキャンしたのかが一目で分かりにくいので、１ページ毎に別シートで用意する。例えば、デフォルトでは、２５５ページ（シート）毎に別ファイルにするのがよい。 In FIG. 7, when the document image is scanned, if the document covers a plurality of pages, it is difficult to understand at a glance how many pages have been scanned in the standard display with only one sheet. To do. For example, by default, a separate file is recommended for every 255 pages (sheets).

図８は、同じくエクセルで作成されるさらに別のファイル例の説明図であり、エクセルシートのみにより構成されており、ＯＣＲ認識機能がＯＮの時のみ有効にする。 FIG. 8 is an explanatory diagram of still another example of a file created in the same Excel, which is configured only by an Excel sheet and is enabled only when the OCR recognition function is ON.

図９は、同じくエクセルで作成される別のファイル例の説明図である。 FIG. 9 is an explanatory diagram of another file example that is also created in Excel.

図９において、エクセルで作成されるＣＳＶ（Comma separated Value)フォーマットのファイルを開いた形態を示しており、ＯＣＲ認識機能がＯＮの時のみ有効にする。この場合、フォントサイズは保存されない。 FIG. 9 shows a form in which a CSV (Comma separated Value) format file created by Excel is opened, and is enabled only when the OCR recognition function is ON. In this case, the font size is not saved.

１ＣＰＵ
４スキャナ部
４１文字認識部（ＯＣＲ）
１１ファイル作成部
１２構文解析部
Ｃ１，Ｃ２表計算ソフトのセル
ｄセル幅
Ｄテキストボックス幅
Ｇ空白
Ｒ１，Ｒ２，Ｒ３ＯＣＲ結果
Ｒ３Ａ，Ｒ３Ｂ分離されたＯＣＲ結果
Ｖ読み取り画像（入力画像） 1 CPU
4 Scanner part 41 Character recognition part (OCR)
11 File creation unit 12 Parsing unit C1, C2 Spreadsheet cell d Cell width D Text box width G Blank R1, R2, R3 OCR result R3A, R3B Separated OCR result V Read image (input image)

Claims

Reading means for reading a document image, converting it into image data, and outputting it;
Character recognition means for performing image recognition on a character area of the input image, using the image data obtained by the reading means as an input image;
A file creation means for creating a spreadsheet file by inserting a plurality of character recognition results obtained by the character recognition means into corresponding cells in the spreadsheet software;
Character recognition is performed when the character recognition result extends over a plurality of cells, or when the first character recognition result and the second character recognition result are included in one cell of the spreadsheet software. Control means for controlling to combine or separate the character recognition results according to the relationship between the result text box width and the cell width of the spreadsheet software;
A layout holding device for spreadsheet software, comprising:

When the first character recognition result and the second character recognition result result in one word, the control means adds a space between the first character recognition result and the second character recognition result to hold the layout. The layout holding device for spreadsheet software according to claim 1, wherein both character recognition results are combined in a state of being processed.

When the first character recognition result and the second character recognition result do not result in one word / phrase, the control means displays the first character recognition result and the second character recognition result in separate cells of the spreadsheet software. The layout holding device for spreadsheet software according to claim 1, which is inserted into the spreadsheet.

The control means has a function of parsing the character recognition result when the text box width of the character recognition result is larger than the cell width of the spreadsheet software,
The layout of the spreadsheet software according to claim 1, wherein if the character recognition result cannot be separated into words and phrases as a result of the parsing, the control means inserts the character recognition result into a cell of the spreadsheet software as it is. Holding device.

The control means is such that the text box width of the character recognition result is larger than the cell width of the spreadsheet software, and as a result of parsing the character recognition result, as a word / phrase at the cell boundary line of the spreadsheet software 5. The layout holding device for spreadsheet software according to claim 4, wherein, if separation is possible, the plurality of separable character recognition results are inserted into separate cells.

When the first character recognition result is deviated from the second character recognition result by a certain distance in the row direction, the control means displays the first character recognition result and the second character recognition result on the same line of the spreadsheet software. The layout holding device for spreadsheet software according to any one of claims 1 to 5, wherein the position of the first character recognition result or the second character recognition result is adjusted so that it can be inserted into a separate cell.