JP2021005173A

JP2021005173A - Ocr recognition result confirmation support program, ocr recognition result confirmation support method, and ocr recognition result confirmation support system

Info

Publication number: JP2021005173A
Application number: JP2019117796A
Authority: JP
Inventors: 哲成川口; Tetsushige Kawaguchi; 幸弘杉村; Yukihiro Sugimura
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2021-01-14
Anticipated expiration: 2039-06-25
Also published as: JP7283257B2

Abstract

To make it possible to output a list for facilitating an operation to compare handwritten character image data with its character recognition result.SOLUTION: An OCR recognition result confirmation support device 100 acquires character string image data handwritten on a form, and outputs a list L2 arranged for confirmation image data L21 in a unit of the form in parallel with a character recognition result L22 in which the confirmation image data L21 has been subjected to the character recognition. The confirmation image data L21 of the list L2 is enlarged or reduced in size of each character included in the character string image data of the form, so that the size of each character in the character string is approximately equal, the image data of each character is then concatenated, and thereby a user can easily compare the confirmation image data L21 with the character recognition result L22 and can appropriately determine the correctness of the character recognition result.SELECTED DRAWING: Figure 6A

Description

本発明は、文字認識（ＯＣＲ：ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）結果の確認を支援するＯＣＲ認識結果確認支援プログラム、ＯＣＲ認識結果確認支援方法およびＯＣＲ認識結果確認支援システムに関する。 The present invention relates to an OCR recognition result confirmation support program that supports confirmation of character recognition (OCR: Optical Character Recognition) results, an OCR recognition result confirmation support method, and an OCR recognition result confirmation support system.

ＯＣＲを用いることで手書き文字をデータ化することができる。例えば、帳票に記入された手書き文字を文字認識することで、帳票の記入内容をテキストデータ等にデータ化し、帳票データのデータベースを構築できる。 Handwritten characters can be converted into data by using OCR. For example, by recognizing the handwritten characters entered in the form, the contents of the form can be converted into text data or the like, and a database of form data can be constructed.

ＯＣＲによる文字認識では、手書きされた文字の形状や大きさ等が一様ではないため、文字認識に誤りが生じることがある。このため、手書きされた文字のイメージデータ（例えば画像データ）と、ＯＣＲ結果のデータ（例えばテキストデータ）とを見比べて、文字認識結果の正誤をユーザが確認する作業を行う。 In character recognition by OCR, since the shape and size of handwritten characters are not uniform, character recognition may be erroneous. Therefore, the user confirms the correctness of the character recognition result by comparing the handwritten character image data (for example, image data) with the OCR result data (for example, text data).

文字認識結果の確認に関連する技術として、ＯＣＲの認識結果を色や品詞等のパターン毎にクラスタリングし、同じ文字パターンとして文字認識された複数の画像パターンの認識結果を並べて表示し、確認させるオペレータを選択する技術が開示されている。また、同じ認識結果となった文字イメージから類似するもの同士をクラスタリングして同じクラスタと判定された文字イメージ同士を並べて表示する技術が開示されている。また、認識元の文字列のイメージを１文字毎に分解し、指定された認識結果１文字分の前後の文字のイメージを連結してソートしたものを一覧表示する技術が開示されている。また、文書画像内の図、罫線、画像、表等の領域認識結果と、領域内の文字認識結果を併せて表示する技術が開示されている。 As a technology related to confirmation of character recognition results, an operator that clusters OCR recognition results for each pattern such as color and part of speech, and displays and confirms the recognition results of multiple image patterns that have been recognized as the same character pattern side by side. The technology for selecting is disclosed. Further, there is disclosed a technique of clustering similar character images from character images having the same recognition result and displaying the character images determined to be the same cluster side by side. Further, a technique is disclosed in which an image of a character string of a recognition source is decomposed for each character, and images of characters before and after the specified recognition result of one character are concatenated and sorted to be displayed in a list. Further, a technique for displaying the area recognition result of a figure, a ruled line, an image, a table, etc. in a document image and the character recognition result in the area are disclosed.

特開２０１７−１１１５００号公報JP-A-2017-111500 特開２００５−３０９６０８号公報Japanese Unexamined Patent Publication No. 2005-309608 特開２０１５−５５８９１号公報JP-A-2015-55891 特開平１１−１４９５２０号公報Japanese Unexamined Patent Publication No. 11-149520

従来技術では、ＯＣＲにより得られた文字認識結果の確認作業を効率化できなかった。例えば、確認リスト上に確認対象として並べる手書き文字が複数の文字の文字列からなり各文字のサイズが違う場合、単に同じ文字列のイメージを並べただけでは、比較すべき各文字同士の位置が異なり、容易に確認することができない。 In the prior art, the confirmation work of the character recognition result obtained by OCR could not be made efficient. For example, if the handwritten characters to be confirmed on the confirmation list consist of multiple character strings and the size of each character is different, simply arranging the images of the same character string will determine the position of each character to be compared. Unlike, it cannot be easily confirmed.

また、ユーザの癖により手書き文字の大きさが文字毎に異なる場合、同じ文字のイメージ（字形）が同じ位置に並んだとしでも、文字サイズが違うため、異なった字形に見誤る場合がある。このように、手書き文字が複数の文字から構成される文字列の場合、従来技術では、ユーザによる確認作業の際、文字認識結果の正誤を一目で容易に確認することができなかった。 Further, when the size of handwritten characters differs for each character due to the user's habit, even if the images (character shapes) of the same characters are lined up at the same position, the character sizes are different, so that the characters may be mistaken for different character shapes. As described above, in the case where the handwritten character is a character string composed of a plurality of characters, in the prior art, it has not been possible to easily confirm the correctness of the character recognition result at a glance during the confirmation work by the user.

一つの側面では、本発明は、手書き文字のイメージデータと文字認識結果とを比較する作業を容易に行えるリストを出力できることを目的とする。 In one aspect, it is an object of the present invention to be able to output a list that facilitates the task of comparing image data of handwritten characters with character recognition results.

本発明の一側面によれば、イメージデータを取得し、前記イメージデータに含まれる文字列の各文字の大きさが略均等になるように各文字の大きさを拡大又は縮小し、拡大又は縮小した前記文字を連結して前記文字列の確認用イメージデータを生成し、前記イメージデータ又は前記確認用イメージデータを文字認識した結果得られる文字認識結果と、前記確認用イメージデータとを比較可能な表示形式のリストを出力する、処理を行うことを要件とする。 According to one aspect of the present invention, image data is acquired, and the size of each character is enlarged or reduced so that the size of each character in the character string included in the image data is substantially equal, and the size of each character is enlarged or reduced. The character recognition result obtained as a result of character recognition of the image data or the confirmation image data by concatenating the characters to generate confirmation image data of the character string can be compared with the confirmation image data. It is a requirement to output a list in display format and perform processing.

本発明の一態様によれば、手書き文字のイメージデータと文字認識結果とを比較する作業を容易に行えるリストを出力できるという効果を奏する。 According to one aspect of the present invention, it is possible to output a list that facilitates the work of comparing the image data of handwritten characters with the character recognition result.

図１は、実施の形態にかかるＯＣＲ認識結果確認支援の一実施例を示す説明図である。FIG. 1 is an explanatory diagram showing an embodiment of OCR recognition result confirmation support according to the embodiment. 図２は、実施の形態にかかるＯＣＲ認識結果確認支援装置のハードウェア構成例を示す図である。FIG. 2 is a diagram showing a hardware configuration example of the OCR recognition result confirmation support device according to the embodiment. 図３は、実施の形態にかかるＯＣＲ認識用のイメージデータの変換処理を説明する図である。FIG. 3 is a diagram illustrating a conversion process of image data for OCR recognition according to the embodiment. 図４は、実施の形態にかかるＯＣＲ認識用のイメージデータの変換処理例を示す図である。FIG. 4 is a diagram showing an example of conversion processing of image data for OCR recognition according to the embodiment. 図５は、実施の形態にかかるＯＣＲ認識結果確認支援の処理例を示すフローチャートである。FIG. 5 is a flowchart showing a processing example of OCR recognition result confirmation support according to the embodiment. 図６Ａは、実施の形態にかかるＯＣＲ認識結果確認支援装置が出力するＯＣＲ結果確認リスト例を示す図である。（その１）FIG. 6A is a diagram showing an example of an OCR result confirmation list output by the OCR recognition result confirmation support device according to the embodiment. (Part 1) 図６Ｂは、実施の形態にかかるＯＣＲ認識結果確認支援装置が出力するＯＣＲ結果確認リスト例を示す図である。（その２）FIG. 6B is a diagram showing an example of an OCR result confirmation list output by the OCR recognition result confirmation support device according to the embodiment. (Part 2) 図７は、従来技術に相当する文字認識結果のリストを示す図である。FIG. 7 is a diagram showing a list of character recognition results corresponding to the prior art.

（実施の形態）
以下に図面を参照して、開示のＯＣＲ認識結果確認支援プログラム、ＯＣＲ認識結果確認支援方法およびＯＣＲ認識結果確認支援システムの実施の形態を詳細に説明する。 (Embodiment)
Hereinafter, embodiments of the disclosed OCR recognition result confirmation support program, OCR recognition result confirmation support method, and OCR recognition result confirmation support system will be described in detail with reference to the drawings.

図１は、実施の形態にかかるＯＣＲ認識結果確認支援の一実施例を示す説明図である。ＯＣＲ認識結果確認支援装置１００は、帳票等に手書きされた文字を文字認識し、ユーザが行う文字認識結果の確認作業を支援する。ＯＣＲでは、手書き文字の文字認識結果に誤りが生じることがあり、実施の形態のＯＣＲ認識結果確認支援装置１００は、ＯＣＲ認識結果を確認するユーザに対して、文字認識状態の正誤を容易に確認可能な表示形式で提示する。 FIG. 1 is an explanatory diagram showing an embodiment of OCR recognition result confirmation support according to the embodiment. The OCR recognition result confirmation support device 100 recognizes characters handwritten on a form or the like, and supports the confirmation work of the character recognition result performed by the user. In OCR, an error may occur in the character recognition result of the handwritten character, and the OCR recognition result confirmation support device 100 of the embodiment easily confirms the correctness of the character recognition state to the user who confirms the OCR recognition result. Present in a possible display format.

ＯＣＲ認識結果確認支援装置１００は、例えば、手書き文字で記入された多数の帳票の内容を集約して電子化（電子データ化）する業務に適用できる。ＯＣＲ認識結果確認支援装置１００は、ＣＰＵや記憶部１０１等を有するＰＣ、サーバ等のコンピュータ装置を用いて構成でき、ＯＣＲ認識結果確認支援プログラムは、記憶部１０１等に格納されている。ＣＰＵが記憶部１０１等の記憶部に格納されているＯＣＲ認識結果確認支援プログラムをプログラム実行することで、所定のＯＣＲ認識処理を実行する。 The OCR recognition result confirmation support device 100 can be applied to, for example, a business of aggregating the contents of a large number of forms written in handwritten characters and digitizing them (digitizing them into electronic data). The OCR recognition result confirmation support device 100 can be configured by using a computer device such as a PC or a server having a CPU, a storage unit 101, or the like, and the OCR recognition result confirmation support program is stored in the storage unit 101 or the like. A predetermined OCR recognition process is executed by the CPU executing a program of an OCR recognition result confirmation support program stored in a storage unit such as the storage unit 101.

ＯＣＲ認識結果確認支援システムは、上記のＯＣＲ認識結果確認支援装置１００と、このＯＣＲ認識結果確認支援装置１００に接続される出力部１０３と、操作部１０４と、データベース（ＤＢ）１０５と、スキャナ１１０と、を含む。出力部１０３は、例えば、図１に記載したディスプレイや、プリンタ等である。操作部１０４は、例えば、図１に記載したキーボードである。ＤＢ１０５は、ＯＣＲ認識結果確認支援装置１００から独立して配置した構成としてもよいし、ＯＣＲ認識結果確認支援装置１００の記憶部１０１の一部の記憶領域を用いる構成としてもよい。 The OCR recognition result confirmation support system includes the above-mentioned OCR recognition result confirmation support device 100, an output unit 103 connected to the OCR recognition result confirmation support device 100, an operation unit 104, a database (DB) 105, and a scanner 110. And, including. The output unit 103 is, for example, the display shown in FIG. 1, a printer, or the like. The operation unit 104 is, for example, the keyboard shown in FIG. The DB 105 may be configured to be arranged independently of the OCR recognition result confirmation support device 100, or may be configured to use a part of the storage area of the storage unit 101 of the OCR recognition result confirmation support device 100.

図１を用いて、帳票に対するＯＣＲ認識結果確認支援に関する全体の処理の流れを説明する。はじめに、市町村の役場等の職員（ユーザ）Ｕが利用者宅Ｈを訪問して所定の帳票Ｔを利用者に渡し、利用者に対し帳票Ｔの各記入欄への記入を依頼する。利用者は筆記具を用いて帳票Ｔの記入欄に文字を手書きして記入する。職員Ｕは、手書き文字が記入された帳票Ｔを回収し、役場に持ち帰る（ステップＳ１）。 With reference to FIG. 1, the flow of the entire process relating to the OCR recognition result confirmation support for the form will be described. First, a staff member (user) U of a municipal office or the like visits a user's house H, hands a predetermined form T to the user, and requests the user to fill in each entry field of the form T. The user writes by handwriting in the entry field of the form T using a writing instrument. Employee U collects the form T on which the handwritten characters are written and takes it back to the government office (step S1).

職員Ｕは、複数の利用者宅Ｈの訪問により、複数の帳票Ｔを収集する。なお、図１に記載した各工程別の複数の職員Ｕは、同一の職員であってもよいし、異なる職員であってもよく、同一の職員Ｕが複数の構成の作業を実施してもよい。 Employee U collects a plurality of forms T by visiting a plurality of user's homes H. The plurality of staff U for each process shown in FIG. 1 may be the same staff, different staff, or the same staff U may carry out work having a plurality of configurations. Good.

この後、職員Ｕは、ＯＣＲ認識結果確認支援装置１００を用いて収集した複数の帳票Ｔの記入内容の電子化を行うため、スキャナ１１０を用いて複数の帳票Ｔをそれぞれ読み取る（ステップＳ２）。スキャナ１１０で読み取った帳票Ｔのイメージデータ（画像データ）ＩＭＧは、ＯＣＲ認識結果確認支援装置１００に入力される（ステップＳ３）。 After that, the staff U reads each of the plurality of forms T using the scanner 110 in order to digitize the entry contents of the plurality of forms T collected by using the OCR recognition result confirmation support device 100 (step S2). The image data (image data) IMG of the form T read by the scanner 110 is input to the OCR recognition result confirmation support device 100 (step S3).

ＯＣＲ認識結果確認支援装置１００は、例えば、読み取った帳票Ｔ毎にイメージデータＩＭＧに対するＯＣＲ認識（文字認識）、および文字認識結果を確認するための表示にかかる処理を行う（ステップＳ４）。ＯＣＲ認識結果確認支援装置１００の記憶部１０１には、これら文字認識および文字認識結果の表示制御にかかる処理プログラムが格納されており、ＣＰＵ等のプロセッサが処理プログラムを実行する。 The OCR recognition result confirmation support device 100 performs, for example, OCR recognition (character recognition) for the image data IMG for each read form T and processing related to display for confirming the character recognition result (step S4). The storage unit 101 of the OCR recognition result confirmation support device 100 stores processing programs related to these character recognition and display control of the character recognition results, and a processor such as a CPU executes the processing programs.

文字認識および文字認識結果の表示制御にかかる処理（ステップＳ４）は、帳票Ｔのデータの分解処理（ステップＳ１０１）、属性別変換結果データのリスト化処理（ステップＳ１０２）、帳票Ｔのデータの統合処理（ステップＳ１０３）を含む。 The processing related to character recognition and display control of the character recognition result (step S4) includes decomposition processing of form T data (step S101), attribute-based conversion result data listing process (step S102), and integration of form T data. The process (step S103) is included.

帳票Ｔのデータの分解処理（ステップＳ１０１）では、ＯＣＲ認識結果確認支援装置１００は、一つの帳票Ｔ毎の文字認識処理を行う。この文字認識処理時に、ＯＣＲ認識結果確認支援装置１００は、帳票ＴのイメージデータＩＭＧと、このイメージデータを文字認識処理した文字認識結果（テキストデータ）ＯＣＲを関連付けて保持する。 In the data decomposition process of the form T (step S101), the OCR recognition result confirmation support device 100 performs a character recognition process for each form T. At the time of this character recognition processing, the OCR recognition result confirmation support device 100 holds the image data IMG of the form T and the character recognition result (text data) OCR obtained by character recognition processing of the image data in association with each other.

ここで、ＯＣＲ認識結果確認支援装置１００は、文字認識の処理の実行前に、イメージデータの文字列を文字単位に分解する分解処理と、文字列の各文字のイメージデータの大きさを揃える整形処理と、を行う。これら処理の詳細は後述する。 Here, the OCR recognition result confirmation support device 100 decomposes the character string of the image data into character units before executing the character recognition process, and shapes the image data of each character of the character string to be uniform in size. Process and perform. Details of these processes will be described later.

この後、ＯＣＲ認識結果確認支援装置１００は、文字認識処理を行い、文字認識後のイメージデータを属性別に分解する。例えば、帳票Ｔには、複数の記入欄を有し、一つの記入欄が氏名、他の記入欄が住所、さらに他の記入欄としてアンケート回答等、を有する。例えば、これら異なる項目の記入欄の氏名、住所等が属性（属性項目）に相当する。そして、ＯＣＲ認識結果確認支援装置１００は、属性別のイメージデータを属性別変換結果データ１０２として記憶部１０１に保持する。 After that, the OCR recognition result confirmation support device 100 performs character recognition processing and decomposes the image data after character recognition into attributes. For example, the form T has a plurality of entry fields, one entry field having a name, another entry field having an address, and another entry field such as a questionnaire response. For example, the name, address, etc. in the entry fields of these different items correspond to attributes (attribute items). Then, the OCR recognition result confirmation support device 100 holds the image data for each attribute in the storage unit 101 as the conversion result data 102 for each attribute.

リスト化処理（ステップＳ１０２）では、ＯＣＲ認識結果確認支援装置１００は、氏名や住所等の属性別の属性別変換結果データ１０２のうち、同じ属性のイメージデータと文字認識結果（テキストデータ）を複数の帳票Ｔから集約する。そして、ＯＣＲ認識結果確認支援装置１００は、文字列データでソートしたリストを作成し、イメージデータと文字認識結果とを比較可能な形式で出力する（ステップＳ５）。例えば、リストを図１に示す出力部としてのディスプレイ１０３に表示出力、あるいはプリンタに印字出力する。 In the listing process (step S102), the OCR recognition result confirmation support device 100 uses a plurality of image data and character recognition results (text data) of the same attribute among the conversion result data 102 for each attribute such as name and address. Aggregate from the form T of. Then, the OCR recognition result confirmation support device 100 creates a list sorted by the character string data, and outputs the image data and the character recognition result in a comparable format (step S5). For example, the list is displayed and output on the display 103 as an output unit shown in FIG. 1, or printed out on a printer.

例えば、図１に示すリストＬ１は、属性として氏名を昇順にソートした表示状態を示す。ＯＣＲ認識結果確認支援装置１００は、氏名のリストＬ１として、帳票Ｔに記入された氏名の確認用イメージデータＩＭＧ（Ｌ１１）と、文字認識結果ＯＣＲ（テキストデータ）Ｌ１２とを横に並べて配置する。縦方向には、異なる複数の帳票Ｔに記入された氏名の確認用イメージデータＬ１１と、文字認識結果（テキストデータ）Ｌ１２を配置する。 For example, the list L1 shown in FIG. 1 shows a display state in which names are sorted in ascending order as attributes. The OCR recognition result confirmation support device 100 arranges the image data IMG (L11) for confirming the name entered in the form T and the character recognition result OCR (text data) L12 side by side as the name list L1. In the vertical direction, the image data L11 for confirming the name written on a plurality of different forms T and the character recognition result (text data) L12 are arranged.

また、図１に示すリストＬ２は、属性として住所を昇順にソートした表示状態を示す。ＯＣＲ認識結果確認支援装置１００は、住所のリストＬ２として、帳票Ｔに記入された住所の確認用イメージデータＬ２１と、文字認識結果（テキストデータ）Ｌ２２とを横に並べて配置する。縦方向には、異なる複数の帳票Ｔに記入された住所の確認用イメージデータＬ２１と、文字認識結果（テキストデータ）Ｌ２２を配置する。 Further, the list L2 shown in FIG. 1 shows a display state in which addresses are sorted in ascending order as attributes. The OCR recognition result confirmation support device 100 arranges the address confirmation image data L21 entered in the form T and the character recognition result (text data) L22 side by side as the address list L2. In the vertical direction, the image data L21 for confirming the address entered in a plurality of different forms T and the character recognition result (text data) L22 are arranged.

なお、図１に示した確認用イメージデータＬ１１，Ｌ２１および後述するイメージデータは、実際には手書き文字であるが、便宜上、所定のフォントで記載してある。 The confirmation image data L11 and L21 shown in FIG. 1 and the image data described later are actually handwritten characters, but are described in a predetermined font for convenience.

役場の職員Ｕは、例えば、属性別に文字認識結果の確認を異なる職員が担当してもよい。図１の例では、属性が氏名に関する文字認識結果の確認を職員Ｕ１が行い、属性が住所に関する文字認識結果の確認を職員Ｕ２が行う。 For example, the staff U of the government office may be in charge of confirming the character recognition result for each attribute by a different staff. In the example of FIG. 1, the staff U1 confirms the character recognition result regarding the attribute of the name, and the staff U2 confirms the character recognition result regarding the attribute of the address.

そして、職員Ｕ１は、ディスプレイ１０３等に表示されている氏名のリストＬ１を参照し、氏名に関する文字認識結果の確認を行う。この際、職員Ｕ１は、氏名のリストＬ１に表示されている確認用イメージデータＬ１１と文字認識結果（テキストデータ）Ｌ１２とを見比べることにより、確認用イメージデータＬ１１に対する文字認識結果（テキストデータ）Ｌ１２の確認作業を行う。 Then, the staff U1 refers to the list L1 of the names displayed on the display 103 or the like, and confirms the character recognition result regarding the names. At this time, the staff member U1 compares the confirmation image data L11 displayed in the name list L1 with the character recognition result (text data) L12, thereby comparing the character recognition result (text data) L12 with respect to the confirmation image data L11. Confirm the work.

例えば、職員Ｕ１は、氏名のリストＬ１上で、互いに横に並べられた確認用イメージデータＬ１１と、文字認識結果（テキストデータ）Ｌ１２とを比較する。そして文字認識結果（テキストデータ）Ｌ１２が正しく文字認識されたものであるか正誤を確認する。図１の例では、氏名のリストＬ１の文字認識結果は正常である例を示している。ここで、例えば、氏名リストＬ１の最上段のイメージデータ「川口明子」に対し、文字認識結果（テキストデータ）Ｌ１２が「川口朗子」の場合、職員Ｕ１は、文字認識結果（テキストデータ）Ｌ１２に誤りがあることを確認できる。 For example, the staff member U1 compares the confirmation image data L11 arranged side by side with each other on the name list L1 and the character recognition result (text data) L12. Then, it is confirmed whether the character recognition result (text data) L12 is correctly recognized as a character. In the example of FIG. 1, the character recognition result of the name list L1 is normal. Here, for example, when the character recognition result (text data) L12 is "Ryoko Kawaguchi" with respect to the image data "Akiko Kawaguchi" at the top of the name list L1, the staff U1 is assigned to the character recognition result (text data) L12. You can confirm that there is an error.

ここで、氏名のリストＬ１の確認用イメージデータＬ１１は、帳票Ｔ上で実際には手書きされ、文字毎の大きさ等が異なるものであったが、上記の分解処理（ステップＳ１０１）で行ったイメージデータの整形処理により、各文字の大きさが揃えられている。これにより、職員Ｕ１は、氏名のリストＬ１に表示されている確認用イメージデータＬ１１の各文字を見落とすことなく確認できる。この際、職員Ｕ１は、文字の大きさが揃えられた確認用イメージデータＬ１１と、文字認識結果（テキストデータ）Ｌ１２とを容易に見比べることができる。このように、ＯＣＲ認識結果確認支援装置１００は、職員Ｕ１が行う確認作業を正確かつ効率的に遂行できるように支援する。 Here, the confirmation image data L11 of the name list L1 was actually handwritten on the form T, and the size and the like of each character were different, but the above decomposition process (step S101) was performed. The size of each character is made uniform by the image data shaping process. As a result, the staff member U1 can confirm each character of the confirmation image data L11 displayed in the name list L1 without overlooking. At this time, the staff U1 can easily compare the confirmation image data L11 having the same character size and the character recognition result (text data) L12. In this way, the OCR recognition result confirmation support device 100 supports the confirmation work performed by the staff U1 so that the confirmation work can be performed accurately and efficiently.

ここで、氏名のリストＬ１の確認用イメージデータＬ１１、および文字認識結果（テキストデータ）Ｌ１２について、異なる複数の帳票Ｔを縦方向に昇順にソートして配列している。これにより、同じ文字列が縦方向に連続して配置されているため、確認用イメージデータＬ１１を縦方向に目で追って確認した際に異なる文字の混入があれば容易に確認（発見）できるようになる。同様に、文字認識結果（テキストデータ）Ｌ１２を縦方向に目で追って確認した際に異なる文字の混入（文字の誤認識に相当）を容易に発見できるようになる。 Here, with respect to the confirmation image data L11 of the name list L1 and the character recognition result (text data) L12, a plurality of different forms T are sorted and arranged in ascending order in the vertical direction. As a result, since the same character strings are continuously arranged in the vertical direction, it is possible to easily confirm (discover) if different characters are mixed when the confirmation image data L11 is visually confirmed in the vertical direction. become. Similarly, when the character recognition result (text data) L12 is visually checked in the vertical direction, it becomes possible to easily find a mixture of different characters (corresponding to erroneous recognition of characters).

詳細は後述するが、異なる複数の帳票Ｔを文字認識結果で縦方向に昇順にソートした場合、確認用イメージデータＬ１１は縦方向で同じ文字列が続いて配置される。このため、同じ文字列が連続している間は、文字認識結果（テキストデータ）Ｌ１２だけを縦方向に確認していけばよいため、確認作業をさらに効率化できる。 Although the details will be described later, when a plurality of different forms T are sorted in ascending order in the vertical direction based on the character recognition result, the same character string is continuously arranged in the confirmation image data L11 in the vertical direction. Therefore, while the same character string is continuous, only the character recognition result (text data) L12 needs to be confirmed in the vertical direction, so that the confirmation work can be further made more efficient.

また、職員Ｕ２についても、住所のリストＬ２を参照し、住所に関する文字認識結果の確認を容易に行う支援を行う。この際、職員Ｕ２は、住所のリストＬ２に表示されている確認用イメージデータＬ２１と文字認識結果（テキストデータ）Ｌ２２とを見比べる。そして、確認用イメージデータＬ２１に対する文字認識結果（テキストデータ）Ｌ２２が正しく文字認識されているかの正誤の確認作業を行う。職員Ｕ２についても、ＯＣＲ認識結果確認支援装置１００は、住所に関する文字認識結果の確認作業を正確かつ効率的に遂行できるよう支援する。 In addition, the staff U2 also refers to the address list L2 and provides support for easily confirming the character recognition result regarding the address. At this time, the staff member U2 compares the confirmation image data L21 displayed in the address list L2 with the character recognition result (text data) L22. Then, the correctness confirmation work is performed to confirm whether the character recognition result (text data) L22 for the confirmation image data L21 is correctly recognized as a character. For the staff U2, the OCR recognition result confirmation support device 100 supports the confirmation work of the character recognition result related to the address so that it can be accurately and efficiently performed.

そして、職員Ｕ１，Ｕ２は、文字認識結果で文字認識に誤りがあれば、例えば、操作部としてのキーボード１０４等を操作することで、誤った文字認識結果（テキストデータ）Ｌ１２，Ｌ２２を正しい文字認識結果に修正することができる。この際、ＯＣＲ認識結果確認支援装置１００は、属性別変換結果データ１０２の対応する文字認識結果のデータ（テキストデータ）をキーボード１０４から入力された文字（文字列）に変更し、記憶部１０１に格納する。 Then, if there is an error in character recognition in the character recognition result, the staff U1 and U2 can correct the wrong character recognition results (text data) L12 and L22 by operating the keyboard 104 or the like as an operation unit, for example. It can be modified to the recognition result. At this time, the OCR recognition result confirmation support device 100 changes the data (text data) of the corresponding character recognition result of the attribute-based conversion result data 102 to the character (character string) input from the keyboard 104, and stores the data in the storage unit 101. Store.

図１に示す例では、属性が氏名と住所の例で職員Ｕ１，Ｕ２が文字認識の確認作業を行う例を示したが、さらに他の属性について他の職員Ｕが文字認識の確認作業を行ってもよい。 In the example shown in FIG. 1, the example in which the attributes are the name and the address and the staff U1 and U2 perform the character recognition confirmation work is shown, but the other staff U performs the character recognition confirmation work for other attributes. You may.

そして、帳票Ｔについて、全ての属性別の文字認識の確認作業が終了すると、ＯＣＲ認識結果確認支援装置１００は、属性別変換結果データ１０２を用いて帳票Ｔのデータの統合作業を行い（ステップＳ１０３）、ＤＢ１０５に格納する。 Then, when the confirmation work of character recognition for each attribute of the form T is completed, the OCR recognition result confirmation support device 100 performs the data integration work of the form T using the conversion result data 102 for each attribute (step S103). ), Stored in DB105.

属性別変換結果データ１０２には、氏名、住所等の属性別に分解されたデータであるが、この統合作業では、帳票Ｔ毎に氏名や住所等の各属性を統合し、元の各帳票Ｔが有する全項目を文字認識結果ＯＣＲ（テキストデータ）としてＤＢ１０５に格納する。イメージデータＩＭＧは、文字認識結果の確認用に用いるため、ＤＢ１０５に格納する必要はなく、文字認識の確認作業が終了した時点で削除する。これに限らず、ＤＢ１０５に、文字認識結果ＯＣＲ（テキストデータ）と、この文字認識結果ＯＣＲ（テキストデータ）に対応するイメージデータＩＭＧを関連付けて格納してもよい。 The conversion result data 102 by attribute is data decomposed by attributes such as name and address, but in this integration work, each attribute such as name and address is integrated for each form T, and each original form T is All the items to have are stored in the DB 105 as the character recognition result OCR (text data). Since the image data IMG is used for confirming the character recognition result, it is not necessary to store it in the DB 105, and the image data IMG is deleted when the character recognition confirmation work is completed. Not limited to this, the character recognition result OCR (text data) and the image data IMG corresponding to the character recognition result OCR (text data) may be stored in association with each other in the DB 105.

これにより、ＤＢ１０５には、帳票Ｔに手書きで記入された文字をテキストデータに変換した電子化データが格納される。このＤＢ１０５に格納された電子化データに対し、外部端末がアクセスすることで、外部端末では、帳票Ｔ毎の情報を容易に取得できるようになる。外部端末は、上記ディスプレイ１０３、キーボード１０４を備えた端末装置であってもよい。 As a result, the DB 105 stores the digitized data obtained by converting the characters handwritten on the form T into text data. When the external terminal accesses the digitized data stored in the DB 105, the external terminal can easily acquire the information for each form T. The external terminal may be a terminal device provided with the display 103 and the keyboard 104.

図２は、実施の形態にかかるＯＣＲ認識結果確認支援装置のハードウェア構成例を示す図である。ＯＣＲ認識結果確認支援装置１００は、例えば、図２に示すハードウェアからなる汎用のＰＣやサーバで構成することができる。 FIG. 2 is a diagram showing a hardware configuration example of the OCR recognition result confirmation support device according to the embodiment. The OCR recognition result confirmation support device 100 can be configured by, for example, a general-purpose PC or server composed of the hardware shown in FIG.

ＯＣＲ認識結果確認支援装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、メモリ２０２、ネットワークインタフェース（ＩＦ）２０３、記録媒体ＩＦ２０４、記録媒体２０５、入出力ＩＦ２０６、を含む。２００は各部を接続するバスである。 The OCR recognition result confirmation support device 100 includes a CPU (Central Processing Unit) 201, a memory 202, a network interface (IF) 203, a recording medium IF204, a recording medium 205, and an input / output IF206. Reference numeral 200 denotes a bus connecting each part.

ＣＰＵ２０１は、ＯＣＲ認識結果確認支援装置１００の全体の制御を司る制御部として機能する演算処理装置である。メモリ２０２は、不揮発性メモリおよび揮発性メモリを含む。不揮発性メモリは、例えば、ＣＰＵ２０１のプログラムを格納するＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）である。揮発性メモリは、例えば、ＣＰＵ２０１のワークエリアとして使用されるＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等である。 The CPU 201 is an arithmetic processing unit that functions as a control unit that controls the entire OCR recognition result confirmation support device 100. Memory 202 includes non-volatile memory and volatile memory. The non-volatile memory is, for example, a ROM (Read Only Memory) for storing the program of the CPU 201. The volatile memory is, for example, a DRAM (Dynamic Random Access Memory) used as a work area of the CPU 201, a SRAM (Static Random Access Memory), or the like.

ネットワークＩＦ２０３は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどのネットワークＮＷに対する通信インタフェースである。ＯＣＲ認識結果確認支援装置１００は、ネットワークＩＦ２０３を介してネットワークＮＷに通信接続する。例えば、ＯＣＲ認識結果確認支援装置１００は、ネットワークＮＷを介して、外部端末と通信可能である。 The network IF203 is a communication interface for a network NW such as LAN (Local Area Network), WAN (Wide Area Network), and the Internet. The OCR recognition result confirmation support device 100 communicates and connects to the network NW via the network IF203. For example, the OCR recognition result confirmation support device 100 can communicate with an external terminal via the network NW.

記録媒体ＩＦ２０４は、ＣＰＵ２０１が処理した情報を記録媒体２０５との間で読み書きするためのインタフェースである。記録媒体２０５は、メモリ２０２を補助する記録装置であり、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）や、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）フラッシュドライブ等を用いることができる。 The recording medium IF204 is an interface for reading and writing the information processed by the CPU 201 to and from the recording medium 205. The recording medium 205 is a recording device that assists the memory 202, and an HDD (Hard Disk Drive), an SSD (Solid State Drive), a USB (Universal Bus) flash drive, or the like can be used.

メモリ２０２または記録媒体２０５に記録されたプログラムをＣＰＵ２０１が実行することにより、ＯＣＲ認識結果確認支援の処理機能を実現する。また、メモリ２０２や記録媒体２０５は、ＯＣＲ認識結果確認支援装置１００が扱う情報を記録保持する。また、記録媒体２０５は、図１に示した記憶部１０１に相当し、属性別変換結果データ１０２を記憶保持する。 The CPU 201 executes the program recorded in the memory 202 or the recording medium 205 to realize the processing function of the OCR recognition result confirmation support. Further, the memory 202 and the recording medium 205 record and hold the information handled by the OCR recognition result confirmation support device 100. Further, the recording medium 205 corresponds to the storage unit 101 shown in FIG. 1 and stores and holds the conversion result data 102 for each attribute.

入出力ＩＦ２０６は、ＣＰＵ２０１に対する情報入出力用のインタフェースであり、入出力ＩＦ２０６を介して外部装置としてのスキャナ１１０、ディスプレイ１０３、キーボード１０４等（図１参照）、が接続される。 The input / output IF206 is an interface for inputting / outputting information to the CPU 201, and a scanner 110, a display 103, a keyboard 104, etc. (see FIG. 1) as external devices are connected via the input / output IF206.

また、図１に記載のデータベース１０５は、図２に記載の記録媒体２０５を用いて構成できる。またデータベース１０５は、図２に記載の他の各構成（ＣＰＵ２０１〜記録媒体２０５）を含む構成としてもよい。 Further, the database 105 shown in FIG. 1 can be configured by using the recording medium 205 shown in FIG. Further, the database 105 may be configured to include each of the other configurations (CPU 201 to recording medium 205) shown in FIG.

図３は、実施の形態にかかるＯＣＲ認識用のイメージデータの変換処理を説明する図である。図３（ａ）は、上述した帳票ＴのＯＣＲ認識元となった手書き文字のイメージデータＩＭＧ（１）の例を示す。この手書き文字のイメージデータＩＭＧ（１）の文字列は「金沢市増泉１丁目」であり、文字３０１「金」、文字３０２「沢」、…、文字３０ｎ「目」の複数（ｎ個）の文字からなる。そして、各文字３０１〜３０ｎは、便宜上、各文字の大きさが異なり不揃いの状態を示している。 FIG. 3 is a diagram illustrating a conversion process of image data for OCR recognition according to the embodiment. FIG. 3A shows an example of the image data IMG (1) of the handwritten character that is the OCR recognition source of the above-mentioned form T. The character string of the image data IMG (1) of this handwritten character is "1-chome, Masuizumi, Kanazawa-shi", and there are a plurality of (n) characters 301 "gold", 302 "sawa", ..., 30n "eyes". It consists of letters. Then, for convenience, each character 301 to 30n shows an irregular state in which the size of each character is different.

ＯＣＲ認識結果確認支援装置１００は、このように各文字の大きさが異なり不揃いの状態のイメージデータＩＭＧ（１）について、文字として認識できる単位（文字単位）で１文字ずつイメージデータを分割する分割処理を行う。この後、各文字のサイズが略均等なサイズになるように、各文字を拡大または縮小する整形処理等、の複数の変換処理を行い、確認用イメージデータＩＭＧ（ｎ）を生成する。確認用イメージデータＩＭＧ（ｎ）は、文字列の各文字３０１，３０２，…３０ｎのサイズが所定サイズで揃えた状態に整形される。なお、整形処理は、各文字のサイズを所定（基準）サイズと略均等になるように、各文字を拡大または縮小してもよい。なお、略均等とは、各文字の高さの差が所定の範囲内、各文字の横幅の差が所定の範囲内、および／又は、各文字を囲む矩形の面積の差が所定の範囲内等の状態である。 The OCR recognition result confirmation support device 100 divides the image data IMG (1) in a state where the sizes of the characters are different and irregular in this way by dividing the image data one character at a time in units that can be recognized as characters (character units). Perform processing. After that, a plurality of conversion processes such as a shaping process for enlarging or reducing each character are performed so that the sizes of the characters are substantially equal, and the confirmation image data IMG (n) is generated. The confirmation image data IMG (n) is shaped so that the sizes of the characters 301, 302, ... 30n of the character string are aligned to a predetermined size. In the shaping process, each character may be enlarged or reduced so that the size of each character is substantially equal to the predetermined (reference) size. In addition, substantially equal means that the difference in height of each character is within a predetermined range, the difference in width of each character is within a predetermined range, and / or the difference in the area of a rectangle surrounding each character is within a predetermined range. And so on.

そして、ＯＣＲ認識結果確認支援装置１００は、生成した確認用イメージデータＩＭＧ（ｎ）を文字認識確認用のリストＬ１，Ｌ２（図１参照）の確認用イメージデータＬ１１，Ｌ２１として出力する。なお、括弧（１）、（ｎ）は、当初のイメージデータＩＭＧ（１）に対して所定の変換処理を行った毎に増加する回数を示し、最終的にｎ回目の変換処理で確認用イメージデータ（ｎ）を生成している。 Then, the OCR recognition result confirmation support device 100 outputs the generated confirmation image data IMG (n) as confirmation image data L11 and L21 of the character recognition confirmation lists L1 and L2 (see FIG. 1). The parentheses (1) and (n) indicate the number of times the initial image data IMG (1) is increased each time a predetermined conversion process is performed, and finally the confirmation image is obtained in the nth conversion process. Data (n) is being generated.

（イメージデータの変換処理例）
図４は、実施の形態にかかるＯＣＲ認識用のイメージデータの変換処理例を示す図である。図４の（ａ）〜（ｄ）に示すように、ＯＣＲ認識結果確認支援装置１００の制御部（ＣＰＵ２０１）は、手書き文字のイメージデータＩＭＧに対して複数（ｎ）回の変換処理を行う。 (Example of image data conversion process)
FIG. 4 is a diagram showing an example of conversion processing of image data for OCR recognition according to the embodiment. As shown in FIGS. 4A to 4D, the control unit (CPU201) of the OCR recognition result confirmation support device 100 performs a plurality of (n) conversion processes on the image data IMG of handwritten characters.

図４（ａ）に示すように、はじめに制御部は、スキャナ１１０が読み取った帳票Ｔ上の手書き文字の領域を切り取ったイメージデータＩＭＧ（１）を生成する。この例では、帳票Ｔの属性「住所」の所定の枠Ｅ（図示の例では１行）内の複数の文字３０１〜３０８からなる文字列を切り取る。 As shown in FIG. 4A, first, the control unit generates image data IMG (1) obtained by cutting out an area of handwritten characters on the form T read by the scanner 110. In this example, a character string consisting of a plurality of characters 301 to 308 in a predetermined frame E (one line in the illustrated example) of the attribute "address" of the form T is cut out.

図４（ａ）に示す例では、手書き文字で生じる特徴を強調するため、便宜上、１行の文字列の各文字の大きさ、筆記状態（図示の例は手書きの各文字を異なるフォントで表現）、１行内での各文字の高さ位置、が異なっている状態を示した。 In the example shown in FIG. 4A, in order to emphasize the characteristics generated by the handwritten characters, for convenience, the size and writing state of each character in the character string of one line (the illustrated example expresses each handwritten character in a different font). ), The height position of each character in one line is different.

次に、図４（ｂ）に示すように、制御部は、枠Ｅ内の文字列を１文字単位で複数に分割する分割処理を行ったイメージデータＩＭＧ（２）を生成する。この際、制御部は、文字３０１「金」、文字３０２「沢」、…文字３０８「目」のそれぞれの文字の大きさに対応し、各文字単位の分割枠Ｅ１〜Ｅ８を設定する。 Next, as shown in FIG. 4B, the control unit generates the image data IMG (2) which has been subjected to the division process of dividing the character string in the frame E into a plurality of characters in units of one character. At this time, the control unit sets the division frames E1 to E8 for each character corresponding to the size of each character of the character 301 "gold", the character 302 "sawa", ... the character 308 "eye".

例えば、制御部は、文字３０１「金」は、この「金」の文字全体（例えば文字の外縁）に適合した四角（□）状の分割枠Ｅ１を設定する。ここで、文字３０１「金」と比較して文字３０２「沢」のように小さい文字については、制御部は、文字「沢」の大きさに対応して分割枠Ｅ２も分割枠Ｅ１に比べて小さく設定する。 For example, the control unit sets a square (□) -shaped division frame E1 in which the character 301 “gold” fits the entire character (for example, the outer edge of the character) of the “gold”. Here, for a character smaller than the character 301 "gold" such as the character 302 "sawa", the control unit also compares the division frame E2 with the division frame E1 according to the size of the character "sawa". Set it small.

この後、図４（ｃ）に示すように、制御部は、分割処理後のイメージデータＩＭＧ（２）の各文字３０１「金」、文字３０２「沢」、…文字３０８「目」のそれぞれの文字の大きさを拡大または縮小して一様に揃える整形処理を行ったイメージデータＩＭＧ（３）を生成する。 After that, as shown in FIG. 4C, the control unit receives each of the characters 301 "gold", the characters 302 "sawa", ... the characters 308 "eyes" of the image data IMG (2) after the division process. The image data IMG (3) is generated by performing a shaping process in which the size of characters is enlarged or reduced to make them uniform.

例えば、制御部は、最も大きい文字３０１「金」を基準として、他の文字３０２「沢」〜文字３０８「目」を文字３０１「金」と同じ大きさにする。この場合、制御部は、基準の文字３０１「金」の分割枠Ｅ１の大きさ（縦横）と同じ大きさとなるように、他の文字３０２「沢」の分割枠Ｅ２〜文字３０８「目」の分割枠Ｅ８をそれぞれ拡大または縮小して基準となる同じ大きさに揃える。これにより、文字列の文字３０１「金」、文字３０２「沢」、…文字３０８「目」の大きさが同じになる。 For example, the control unit sets the other characters 302 "sawa" to 308 "eyes" to the same size as the character 301 "gold" with reference to the largest character 301 "gold". In this case, the control unit of the other characters 302 "sawa" of the division frames E2 to the characters 308 "eyes" so as to have the same size (vertical and horizontal) as the division frame E1 of the reference character 301 "gold". The division frames E8 are enlarged or reduced to the same size as the reference. As a result, the size of the character 301 "gold", the character 302 "sawa", ... the character 308 "eye" of the character string becomes the same.

この後、図４（ｄ）に示すように、制御部は、文字列の文字３０１「金」、文字３０２「沢」、…文字３０８「目」の各文字を連結したイメージデータＩＭＧ（ｎ）を生成する。この場合、イメージデータの変換処理は４回であり、ＩＭＧ（ｎ）のｎ＝４である。この処理により、１文字単位で分割されたイメージデータＩＭＧ（３）は、各文字の大きさが揃えられた状態で分割前の状態のイメージデータＩＭＧ（１）に連結される。 After that, as shown in FIG. 4D, the control unit concatenates the characters of the character string 301 "gold", 302 "sawa", ..., 308 "eyes", and image data IMG (n). To generate. In this case, the image data conversion process is performed four times, and n = 4 of IMG (n). By this process, the image data IMG (3) divided in units of one character is connected to the image data IMG (1) in the state before the division in a state where the sizes of the characters are aligned.

例えば、制御部は、文字３０１「金」の分割枠Ｅ１、文字３０２「沢」の分割枠Ｅ２、…文字３０８「目」の分割枠Ｅ８を行方向で連結し、一つのイメージデータＩＭＧ（ｎ）を生成する。生成したイメージデータＩＭＧ（ｎ）は、図３（ｂ）で説明した確認用イメージデータＩＭＧに相当する。また、制御部は、このイメージデータＩＭＧ（ｎ）を図１に示した住所のリストＬ２の確認用イメージデータＬ２１として出力する。 For example, the control unit connects the division frame E1 of the character 301 “gold”, the division frame E2 of the character 302 “sawa”, and the division frame E8 of the character 308 “eye” in the row direction, and one image data IMG (n). ) Is generated. The generated image data IMG (n) corresponds to the confirmation image data IMG described with reference to FIG. 3 (b). Further, the control unit outputs this image data IMG (n) as confirmation image data L21 of the address list L2 shown in FIG.

（ＯＣＲ認識結果確認支援装置の処理例）
図５は、実施の形態にかかるＯＣＲ認識結果確認支援の処理例を示すフローチャートである。制御部（ＣＰＵ）２０１が実行し、主に複数の帳票Ｔに対する文字認識結果の確認支援の処理例を示す。 (Processing example of OCR recognition result confirmation support device)
FIG. 5 is a flowchart showing a processing example of OCR recognition result confirmation support according to the embodiment. An example of processing that is executed by the control unit (CPU) 201 and that mainly supports confirmation of character recognition results for a plurality of forms T is shown.

はじめに、制御部は、複数の帳票Ｔの中から一つの帳票Ｔに対しＯＣＲ（文字認識）する箇所のイメージデータＩＭＧを切り取る（ステップＳ５０１）。この処理は、図４（ａ）の枠ＥのイメージデータＩＭＧ（１）の切り取りに相当する。 First, the control unit cuts out the image data IMG of the portion to be OCR (character recognition) for one form T from the plurality of forms T (step S501). This process corresponds to cutting out the image data IMG (1) in the frame E of FIG. 4 (a).

次に、制御部は、切り取ったイメージデータを１文字単位のイメージデータに分割する（ステップＳ５０２）。この処理により、図４（ｂ）に示した１文字単位の分割枠Ｅ１〜Ｅ８で分割したイメージデータＩＭＧ（２）が生成される。 Next, the control unit divides the cut image data into image data in character units (step S502). By this process, the image data IMG (2) divided by the character-by-character division frames E1 to E8 shown in FIG. 4B is generated.

次に、制御部は、１文字単位に分割したイメージデータの中の文字部分のサイズを整形する（ステップＳ５０３）。この処理により、図４（ｃ）に示した各文字の分割枠Ｅ１〜Ｅ８を拡大または縮小して基準となる同じ大きさにすることで、各文字の大きさを揃えたイメージデータＩＭＧ（３）が生成される。 Next, the control unit shapes the size of the character portion in the image data divided into character units (step S503). By this processing, the division frames E1 to E8 of each character shown in FIG. 4C are enlarged or reduced to the same size as the reference, so that the image data IMG (3) having the same size of each character is used. ) Is generated.

次に、制御部は、サイズを整形した１文字単位のイメージデータを分割前の状態に連結する（ステップＳ５０４）。この処理により、図４（ｄ）に示した各文字の大きさを揃えた分割枠Ｅ１〜Ｅ８を連結したイメージデータＩＭＧ（４）が生成される。 Next, the control unit concatenates the image data of one character unit whose size has been shaped into the state before the division (step S504). By this process, the image data IMG (4) in which the divided frames E1 to E8 having the same size of each character shown in FIG. 4D are connected is generated.

このイメージデータＩＭＧ（４）は、ユーザＵが文字認識結果を確認するための確認用イメージデータ（図３（ｂ）参照））に相当する。例えば、制御部は、ユーザＵが文字認識結果を確認する際、生成したイメージデータＩＭＧ（４）を図１に示した住所のリストＬ２の確認用イメージデータＬ２１として出力する。 This image data IMG (4) corresponds to confirmation image data (see FIG. 3B) for the user U to confirm the character recognition result. For example, when the user U confirms the character recognition result, the control unit outputs the generated image data IMG (4) as the confirmation image data L21 of the address list L2 shown in FIG.

次に、制御部は、一つの帳票Ｔの全箇所をＯＣＲ（文字認識）したか判断する（ステップＳ５０５）。帳票Ｔ上には、複数の記入箇所を有しており、これら全ての記入箇所に対する文字認識を行い、上述したイメージデータＩＭＧの生成を行う。複数の記入箇所は、上記説明した氏名や住所などの各属性に相当する。なお、文字認識は、帳票Ｔを読み取った当初のイメージデータに対しておこなうこともできる。そして、一つの帳票Ｔの全箇所を文字認識済であれば（ステップＳ５０５：Ｙｅｓ）、制御部は、ステップＳ５０６の処理に移行する。一方、一つの帳票Ｔで文字認識が終わっていない箇所があれば（ステップＳ５０５：Ｎｏ）、ステップＳ５０１の処理に戻って未処理部分に対しステップＳ５０１以下の処理を再度実行する。 Next, the control unit determines whether all the parts of one form T have been OCR (character recognition) (step S505). The form T has a plurality of entry points, character recognition is performed for all the entry points, and the above-mentioned image data IMG is generated. The plurality of entry points correspond to each attribute such as the name and address described above. The character recognition can also be performed on the initial image data obtained by reading the form T. Then, if all the parts of one form T have been recognized as characters (step S505: Yes), the control unit shifts to the process of step S506. On the other hand, if there is a part where the character recognition is not completed in one form T (step S505: No), the process returns to the process of step S501 and the process of step S501 or less is executed again for the unprocessed portion.

ステップＳ５０６では、制御部は、全ての帳票Ｔを処理したか判断する（ステップＳ５０６）。上記の処理で一つの帳票Ｔに対する文字認識を行い、イメージデータＩＭＧの生成を行うが、ここでは、対象とする複数の帳票Ｔに対する文字認識によりイメージデータＩＭＧの生成を行ったかを判断する。全ての帳票Ｔを処理済であれば（ステップＳ５０６：Ｙｅｓ）、制御部は、ステップＳ５０７の処理に移行し、未処理の帳票Ｔがあれば（ステップＳ５０６：Ｎｏ）、ステップＳ５０１の処理に戻る。 In step S506, the control unit determines whether all the forms T have been processed (step S506). In the above process, character recognition is performed for one form T and image data IMG is generated. Here, it is determined whether the image data IMG is generated by character recognition for a plurality of target forms T. If all the forms T have been processed (step S506: Yes), the control unit proceeds to the process of step S507, and if there is an unprocessed form T (step S506: No), returns to the process of step S501. ..

ステップＳ５０７では、制御部は、氏名、住所などの属性単位にリスト化したＯＣＲ（文字認識）結果を出力する（ステップＳ５０７）。上記の処理により、制御部は、帳票Ｔに記入された手書き文字は、氏名や住所の属性単位のイメージデータＩＭＧを生成し、生成したイメージデータＩＭＧをディスプレイ１０３上に表示する。また、プリンタを介して印字出力してもよい。 In step S507, the control unit outputs an OCR (character recognition) result listed in attribute units such as name and address (step S507). By the above processing, the control unit generates an image data IMG for each attribute of the name and address of the handwritten characters entered in the form T, and displays the generated image data IMG on the display 103. Further, it may be printed out via a printer.

このイメージデータＩＭＧは、職員（ユーザ）Ｕが帳票Ｔの文字認識結果を確認する際のリスト上の確認用イメージデータとして用いられる。詳細は後述するが、例えば、図１で説明した例の属性「氏名」のリストＬ１の確認用イメージデータＬ１１としてイメージデータＩＭＧを表示し、その横に文字認識結果（テキストデータ）Ｌ１２を表示出力する。同様に、属性「住所」のリストＬ２の確認用イメージデータＬ２１としてイメージデータＩＭＧを表示し、その横に文字認識結果（テキストデータ）Ｌ２２を表示出力する。 This image data IMG is used as confirmation image data on the list when the staff (user) U confirms the character recognition result of the form T. The details will be described later, but for example, the image data IMG is displayed as the confirmation image data L11 of the list L1 of the attribute “name” of the example described in FIG. To do. Similarly, the image data IMG is displayed as the confirmation image data L21 of the list L2 of the attribute "address", and the character recognition result (text data) L22 is displayed and output next to the image data IMG.

これにより、職員Ｕが属性別のリストＬ１、Ｌ２に表示されているイメージデータと、文字認識結果（テキストデータ）とを比較することで、文字認識結果の正誤を確認することができる。例えば、職員Ｕは、文字認識結果で文字認識結果の正誤を、例えば、キーボード１０４等の操作で制御部に伝える。この際、職員Ｕは、誤った文字認識結果（テキストデータ）があれば正しい文字を入力操作して文字認識結果を修正する。また、文字認識に誤りがなければ無操作、あるいは正しい文字認識結果の旨を操作入力する。 As a result, the staff U can confirm the correctness of the character recognition result by comparing the image data displayed in the lists L1 and L2 for each attribute with the character recognition result (text data). For example, the staff U conveys the correctness of the character recognition result to the control unit by operating the keyboard 104 or the like based on the character recognition result. At this time, if there is an erroneous character recognition result (text data), the staff U inputs the correct character and corrects the character recognition result. If there is no error in character recognition, no operation is performed, or the correct character recognition result is input.

そして、制御部は、全ての属性分を確認したか判断する（ステップＳ５０８）。職員（ユーザ）Ｕによる氏名や住所等、帳票Ｔの全ての属性に対する上記文字認識の確認作業が終了すれば（ステップＳ５０８：Ｙｅｓ）、制御部は、ステップＳ５０９の処理に移行する。また、文字認識の確認作業で未処理分の属性が残っていれば（ステップＳ５０８：Ｎｏ）、ステップＳ５０７の処理に戻る。 Then, the control unit determines whether or not all the attributes have been confirmed (step S508). When the confirmation work of the character recognition for all the attributes of the form T such as the name and address by the staff (user) U is completed (step S508: Yes), the control unit shifts to the process of step S509. If the unprocessed attributes remain in the character recognition confirmation work (step S508: No), the process returns to the process of step S507.

ステップＳ５０９では、制御部は、一つの帳票Ｔ単位にＯＣＲ（文字認識）結果を統合し（ステップＳ５０９）、以上の一連の処理を終了する。この処理では、一つの帳票Ｔに含まれる氏名や、住所等の複数の属性のイメージデータＩＭＧをこの一つの帳票Ｔ単位のデータのまとまりとして統合した電子データをＤＢ１０５に格納する。 In step S509, the control unit integrates the OCR (character recognition) result into one form T unit (step S509), and ends the above series of processes. In this process, the DB 105 stores electronic data in which image data IMGs of a plurality of attributes such as a name and an address included in one form T are integrated as a set of data in units of this one form T.

なお、イメージデータＩＭＧは、文字認識結果と比較するために生成したものであり、ステップＳ５０８で文字認識結果の正誤を確認した後は、確認済の帳票Ｔの全てのイメージデータＩＭＧは不要となる。このため、職員Ｕが文字認識結果の正誤を確認した時点で上記生成したイメージデータＩＭＧを削除してもよい。例えば、制御部は、ステップＳ５０９の処理でイメージデータＩＭＧを削除する。この場合、ＤＢ１０５には、帳票Ｔの文字認識結果のみを格納し、イメージデータＩＭＧは格納しない。 The image data IMG is generated for comparison with the character recognition result, and after confirming the correctness of the character recognition result in step S508, all the image data IMGs of the confirmed form T become unnecessary. .. Therefore, the image data IMG generated above may be deleted when the staff U confirms the correctness of the character recognition result. For example, the control unit deletes the image data IMG in the process of step S509. In this case, only the character recognition result of the form T is stored in the DB 105, and the image data IMG is not stored.

図６Ａ，図６Ｂは、実施の形態にかかるＯＣＲ認識結果確認支援装置が出力するＯＣＲ結果確認リスト例を示す図である。図６Ａは、複数の帳票Ｔを住所の属性でソートしたリストＬ２の出力例を示す。ＯＣＲ認識結果確認支援装置１００の制御部（ＣＰＵ２０１）は、リストＬ２として、各帳票Ｔ別の結果Ｎｏ（６０１）、文字認識（ＯＣＲ）結果（住所）Ｌ２２、確認用イメージデータＬ２１をリスト化する。 6A and 6B are diagrams showing an example of an OCR result confirmation list output by the OCR recognition result confirmation support device according to the embodiment. FIG. 6A shows an output example of a list L2 in which a plurality of forms T are sorted by address attributes. The control unit (CPU201) of the OCR recognition result confirmation support device 100 lists the result No. (601) for each form T, the character recognition (OCR) result (address) L22, and the confirmation image data L21 as the list L2. ..

各行が一つの帳票Ｔのデータを示し、例えば、最上部の１行は、ある一つの帳票Ｔ（結果Ｎｏ「１０４」のデータであり、文字認識結果Ｌ２２として「金沢市増泉１丁目」（テキストデータ）６１１を表示する。また、確認用イメージデータＬ２１として「金沢市増泉１丁目」の手書き文字のイメージデータ６１２を表示する。 Each line shows the data of one form T. For example, the top one line is the data of one form T (result No. "104", and the character recognition result L22 is "Masuizumi 1-chome, Kanazawa" (text). Data) 611 is displayed. In addition, image data 612 of handwritten characters of "Masuizumi 1-chome, Kanazawa City" is displayed as confirmation image data L21.

２行目は、他の一つの帳票Ｔ（結果Ｎｏ「７８９」）のデータであり、文字認識（ＯＣＲ）結果Ｌ２２として「金沢市増泉１丁目」（テキストデータ）６１３を表示する。また、確認用イメージデータＬ２１として「金沢市増泉１丁目」の手書き文字のイメージデータ６１４を表示する。３行目は、さらに他の一つの帳票Ｔ（結果Ｎｏ「１５９」のデータであり、文字認識結果Ｌ２２として「金沢市増泉１丁目」（テキストデータ）６１５を表示する。また、確認用イメージデータＬ２１として「金沢市増泉１丁目」の手書き文字のイメージデータ６１６を表示する。 The second line is the data of another form T (result No. 789), and displays "Kanazawa-shi Masuizumi 1-chome" (text data) 613 as the character recognition (OCR) result L22. Further, as the confirmation image data L21, the image data 614 of the handwritten characters of "Masuizumi 1-chome, Kanazawa City" is displayed. The third line displays "Kanazawa-shi Masuizumi 1-chome" (text data) 615 as yet another form T (data of result No. "159" and character recognition result L22. Also, confirmation image data. Image data 616 of handwritten characters of "Masuizumi 1-chome, Kanazawa" is displayed as L21.

住所の文字認識結果を確認する職員（ユーザ）Ｕ２は、住所のリストＬ２上で、互いに横に並べられた文字認識結果Ｌ２２と、確認用イメージデータＬ２１と、を比較する。例えば、帳票Ｔ単位で１行目では、文字認識結果Ｌ２２の領域のテキストデータ６１１と、確認用イメージデータＬ２１の領域のイメージデータ６１２と、を比較する。これにより、文字認識結果（テキストデータ）Ｌ１２が正しく文字認識されているか正誤を確認する。 The staff (user) U2 who confirms the character recognition result of the address compares the character recognition result L22 arranged side by side with each other on the address list L2 and the confirmation image data L21. For example, in the first line in the form T unit, the text data 611 in the area of the character recognition result L22 and the image data 612 in the area of the confirmation image data L21 are compared. As a result, it is confirmed whether the character recognition result (text data) L12 is correctly recognized as a character.

そして、確認用イメージデータＬ２１は、手書き文字の文字列のイメージデータを文字毎に分解する分解処理、および各文字の大きさを同じにする整形処理を行った後のイメージデータである（上記ＩＭＧ（４）に相当）。ここで、確認用イメージデータＬ２１は、帳票Ｔ上では実際には手書きされ、文字毎の大きさ等が異なるものであったが、上述した分解処理および整形処理により、各文字の大きさが揃えられている。 The confirmation image data L21 is the image data after the decomposition process of decomposing the image data of the character string of the handwritten character for each character and the shaping process of making the size of each character the same (the above IMG). Equivalent to (4)). Here, the confirmation image data L21 was actually handwritten on the form T, and the size of each character was different, but the size of each character was made uniform by the above-mentioned decomposition process and shaping process. Has been done.

これにより、職員Ｕ２は、住所のリストＬ２に表示されている確認用イメージデータＬ２１の各文字を見落とすことなく確認できる。この際、職員Ｕ２は、文字の大きさが揃えられた確認用イメージデータＬ２１と、文字認識結果（テキストデータ）Ｌ２２とを容易に見比べることができる。 As a result, the staff U2 can confirm each character of the confirmation image data L21 displayed in the address list L2 without overlooking. At this time, the staff U2 can easily compare the confirmation image data L21 having the same character size and the character recognition result (text data) L22.

例えば、最上部の１行の帳票Ｔ（結果Ｎｏ「１０４」）の文字認識結果Ｌ２２のテキストデータ６１１が「金沢市増泉１丁目」であり、確認用イメージデータＬ２１のイメージデータ６１２が「金沢市増泉１丁目」である。この場合、職員Ｕ２は、文字認識結果Ｌ２２のテキストデータ６１１が正しく文字認識されていると判断することができる。 For example, the text data 611 of the character recognition result L22 of the form T (result No. “104”) in the uppermost line is “Masuizumi 1-chome, Kanazawa City”, and the image data 612 of the confirmation image data L21 is “Kanazawa City”. Masuizumi 1-chome ". In this case, the staff U2 can determine that the text data 611 of the character recognition result L22 is correctly recognized as a character.

また、リストＬ２は、文字認識結果を住所の昇順にソートしたものであり、ＯＣＲ認識結果確認支援装置１００は、複数の帳票Ｔの文字認識結果Ｌ２２の縦方向には、同じ住所を連続して配置する。図示の例では、住所「金沢市増泉１丁目」の複数の帳票Ｔを表示している。 Further, the list L2 sorts the character recognition results in ascending order of the addresses, and the OCR recognition result confirmation support device 100 continuously sets the same address in the vertical direction of the character recognition results L22 of a plurality of forms T. Deploy. In the illustrated example, a plurality of forms T of the address "Masuizumi 1-chome, Kanazawa" are displayed.

このようにソート表示することで、職員（ユーザ）Ｕ２は、同じ文字認識結果「金沢市増泉１丁目」が連続している間は、確認用イメージデータＬ２１だけを上部から順に見ていき、文字が似ていないものの有無を判断するだけでよい。このようにして、複数の帳票Ｔの文字認識結果の正誤を容易に確認できるようになる。 By sorting and displaying in this way, the staff (user) U2 looks at only the confirmation image data L21 in order from the top while the same character recognition result "Masuizumi 1-chome, Kanazawa" is continuous, and the characters. All you have to do is determine if there are any dissimilar ones. In this way, it becomes possible to easily confirm the correctness of the character recognition results of the plurality of forms T.

この際の職員（ユーザ）Ｕ２の視線Ｐの移動状態を図示した。例えば、文字認識結果Ｌ２２の領域内で新たな住所の文字認識結果６１１として「金沢市増泉１丁目」の表示があった場合、以降の視線Ｐは確認用イメージデータＬ２１の領域を上部から順に移動させればよい。すなわち、イメージデータ６１２→イメージデータ６１４→イメージデータ６１６と縦に移動させるだけでよい。 The moving state of the line of sight P of the staff (user) U2 at this time is illustrated. For example, if "Masuizumi 1-chome, Kanazawa City" is displayed as the character recognition result 611 of the new address in the area of the character recognition result L22, the subsequent line of sight P moves the area of the confirmation image data L21 in order from the top. Just let me do it. That is, it is only necessary to move the image data 612 → the image data 614 → the image data 616 vertically.

ここで、確認用イメージデータＬ２１に表示されている各帳票Ｔのイメージデータ６１２，６１４，６１６は、上述したイメージデータの分割処理および整形処理により、文字列の各文字が同じ大きさになるように揃えられている。このため、確認用イメージデータＬ２１の領域内で上下位置で表示されている各帳票Ｔのイメージデータ６１２，６１４，６１６は、文字列の各文字がほぼ同じ位置に位置している。例えば、イメージデータ６１２，６１４，６１６の住所の先頭文字３０１「金」は位置ｐ１で上下方向に対しほぼ同じ位置に表示される。同様に、イメージデータ６１２，６１４，６１６の住所の末尾文字３０８「目」は位置ｐｎで上下方向に対しほぼ同じ位置に表示される。すなわち、文字列の各文字が上下方向で位置ずれなく表示されている。 Here, in the image data 612,614,616 of each form T displayed in the confirmation image data L21, each character of the character string has the same size by the above-mentioned image data division processing and shaping processing. It is arranged in. Therefore, in the image data 612, 614, 616 of each form T displayed in the vertical position in the area of the confirmation image data L21, each character of the character string is located at substantially the same position. For example, the first character 301 "gold" of the address of the image data 612,614,616 is displayed at the position p1 at substantially the same position in the vertical direction. Similarly, the last character 308 "eye" of the address of the image data 612,614,616 is displayed at the position pn at substantially the same position in the vertical direction. That is, each character in the character string is displayed in the vertical direction without any misalignment.

確認用イメージデータＬ２１に表示する各帳票Ｔのイメージデータ６１２，６１４，６１６はそれぞれ複数の文字からなる文字列である。実施の形態では、文字列の各文字を同じ大きさにすることで文字間隔を均一にし、上下方向で文字の位置ずれを起こすことなく表示している。 The image data 612, 614, 616 of each form T displayed in the confirmation image data L21 is a character string composed of a plurality of characters. In the embodiment, the character spacing is made uniform by making each character of the character string the same size, and the characters are displayed without shifting in the vertical direction.

これにより、上記視線Ｐの移動時において、職員Ｕ２は、確認用イメージデータＬ２１の領域内で上下に連続する文字列全体をイメージ、例えば、図形として捉え、似ていない図形を見つける、という直観的な作業を行うことができる。そして、文字認識結果の正誤の確認作業を効率的に遂行できる。 As a result, when the line of sight P is moved, the staff U2 intuitively captures the entire character string that is continuous up and down in the area of the confirmation image data L21 as an image, for example, a figure, and finds a dissimilar figure. Work can be done. Then, the work of confirming the correctness of the character recognition result can be efficiently performed.

図６Ａには、帳票Ｔの結果Ｎｏ「１５９」の文字認識結果が誤認識である状態を示している。文字認識結果Ｌ２２の文字認識結果Ｌ２２領域内で帳票Ｔの結果Ｎｏ「１５９」の文字認識結果６１５が「金沢市増泉１丁目」であるが、対応する確認用イメージデータＬ２１の領域内のイメージデータ６１６は「金沢市増泉２丁目」であったとする。この場合、確認用イメージデータＬ２１の領域内で職員Ｕ２が上下に連続する文字列全体を見ていく途中で、イメージデータ６１６の「金沢市増泉２丁目」の文字列のうち６番目の文字Ｘ「２」が上下と異なることを容易に見つけ出すことができる。この場合、文字認識結果６１５は正しくは「金沢市増泉２丁目」であり、帳票Ｔの結果Ｎｏ「１５９」の住所の文字認識結果に誤りが生じている、との確認作業を行う。 FIG. 6A shows a state in which the character recognition result of the result No. “159” of the form T is erroneous recognition. The character recognition result 615 of the form T result No. "159" in the character recognition result L22 area of the character recognition result L22 is "Masuizumi 1-chome, Kanazawa", but the image data in the corresponding confirmation image data L21 area. It is assumed that 616 was "Masuizumi 2-chome, Kanazawa City". In this case, while the staff U2 is looking at the entire character string that is continuous up and down in the area of the confirmation image data L21, the sixth character X in the character string of "Masuizumi 2-chome, Kanazawa City" of the image data 616. It can be easily found that "2" is different from the top and bottom. In this case, the character recognition result 615 is correctly "Masuizumi 2-chome, Kanazawa City", and the confirmation work is performed that the character recognition result of the address of the address No. "159" of the result T of the form T is incorrect.

住所のリストＬ２は、文字認識結果で昇順にソートしたものであるため、職員Ｕ２は、確認用イメージデータＬ２１の領域内で視線Ｐを上下に移動させてイメージ（図形相当）での相似状態を確認するだけで、文字認識結果の誤りを容易に見つけ出すことができる。 Since the address list L2 is sorted in ascending order based on the character recognition result, the staff U2 moves the line of sight P up and down within the area of the confirmation image data L21 to display the similarity state in the image (corresponding to the figure). You can easily find an error in the character recognition result just by checking it.

また、図６Ｂは、複数の帳票Ｔを氏名の属性でソートしたリストＬ１の出力例を示す。ＯＣＲ認識結果確認支援装置１００の制御部（ＣＰＵ２０１）は、上述した住所の属性のリストＬ２と同様に、氏名の属性のリストＬ１についても、各帳票Ｔ別の結果Ｎｏ（６０１）、文字認識（ＯＣＲ）結果（氏名）Ｌ１２、確認用イメージデータＬ１１をリスト化する。 Further, FIG. 6B shows an output example of the list L1 in which a plurality of forms T are sorted by the attribute of the name. The control unit (CPU201) of the OCR recognition result confirmation support device 100 also performs the result No. (601) and character recognition (601) for each form T for the name attribute list L1 as well as the address attribute list L2 described above. OCR) Result (name) L12 and confirmation image data L11 are listed.

この際、確認用イメージデータＬ１１の領域内の各帳票Ｔのイメージデータ６２２，６２４，６２６は、文字列の各文字が同じ大きさであり、上下位置で同じ位置に位置している。例えば、イメージデータ６２２，６２４，６２６の氏名の先頭文字３０１「川」は位置ｐ１で上下方向に対しほぼ同じ位置に表示される。同様に、イメージデータ６２２，６２４，６２６の氏名の２番目の文字３０２「口」は位置ｐ２で上下方向に対しほぼ同じ位置に表示される。このように、文字列の各文字が上下方向で位置ずれなく表示されている。 At this time, in the image data 622,624,626 of each form T in the area of the confirmation image data L11, each character of the character string has the same size and is located at the same position in the vertical position. For example, the first character 301 "river" of the name of the image data 622,624,626 is displayed at the position p1 at substantially the same position in the vertical direction. Similarly, the second character 302 “mouth” of the name in the image data 622,624,626 is displayed at position p2 at substantially the same position in the vertical direction. In this way, each character in the character string is displayed in the vertical direction without any misalignment.

これにより、職員（ユーザ）Ｕ１は、氏名のリストＬ１に表示されている確認用イメージデータＬ１１の各文字を見落とすことなく確認できる。この際、職員Ｕ１は、文字の大きさが揃えられた確認用イメージデータＬ１１と、文字認識結果（テキストデータ）Ｌ１２とを容易に見比べることができ、文字認識結果の正誤を容易に確認できる。 As a result, the staff (user) U1 can confirm each character of the confirmation image data L11 displayed in the name list L1 without overlooking. At this time, the staff member U1 can easily compare the confirmation image data L11 having the same character size and the character recognition result (text data) L12, and can easily confirm the correctness of the character recognition result.

また、職員（ユーザ）Ｕ１は、視線Ｐの移動状態に示すように、文字認識結果Ｌ１２の領域内で新たな氏名の文字認識結果６２１として「金沢市増泉１丁目」の表示があった場合、以降の視線Ｐは確認用イメージデータＬ１１の領域を上部から順に移動させればよい。すなわち、イメージデータ６２２→イメージデータ６２４→イメージデータ６２６と縦に移動させるだけでよい。これにより、上記視線Ｐの移動時において、職員Ｕ１は、確認用イメージデータＬ１１の領域内で上下に連続する文字列全体をイメージ（図形相当）として捉え、似ていない図形を見つける、という直観的な作業を行うことができる。そして、文字認識結果の正誤の確認作業を効率的に遂行できる。 In addition, when the staff (user) U1 displays "Masuizumi 1-chome, Kanazawa City" as the character recognition result 621 of the new name in the area of the character recognition result L12, as shown in the moving state of the line of sight P, For the subsequent line of sight P, the area of the confirmation image data L11 may be moved in order from the top. That is, it is only necessary to move the image data 622 → the image data 624 → the image data 626 vertically. As a result, when the line of sight P is moved, the staff U1 intuitively finds a dissimilar figure by grasping the entire character string that is continuous vertically in the area of the confirmation image data L11 as an image (corresponding to a figure). Work can be done. Then, the work of confirming the correctness of the character recognition result can be efficiently performed.

（対比参考図）
図７は、従来技術に相当する文字認識結果のリストを示す図である。図７には、実施の形態との対比のために、便宜上、実施の形態（図６Ａ）同様の住所の属性と同じ表示形式とした。すなわち、帳票Ｔの結果Ｎｏ（７０１）、文字認識（ＯＣＲ）結果７０２、確認用イメージデータ７０３を有し、文字認識結果７０２を昇順にソートした住所のリストＬを示している。 (Comparison reference diagram)
FIG. 7 is a diagram showing a list of character recognition results corresponding to the prior art. For convenience, FIG. 7 has the same display format as the address attribute similar to that of the embodiment (FIG. 6A) for comparison with the embodiment. That is, a list L of addresses having the result No. (701) of the form T, the character recognition (OCR) result 702, and the confirmation image data 703, and the character recognition result 702 sorted in ascending order is shown.

従来、帳票Ｔ別に、文字認識結果７０２と、確認用イメージデータ７０３とを並べることで１行の帳票Ｔ単位で文字認識結果の正誤を確認することができる。しかし、従来技術では、確認用イメージデータ７０３の領域に表示する各帳票Ｔのイメージデータ７１２，７１４，７１６は、帳票Ｔ上に手書きされた文字をそのままイメージとして表示することになる。 Conventionally, by arranging the character recognition result 702 and the confirmation image data 703 for each form T, the correctness of the character recognition result can be confirmed in the form T unit of one line. However, in the prior art, the image data 712,714,716 of each form T displayed in the area of the confirmation image data 703 displays the characters handwritten on the form T as they are as an image.

このため、確認用イメージデータ７０３の領域に表示されるイメージデータ７１２，７１４，７１６は、各帳票Ｔ毎に文字３０１〜３０ｎの大きさが異なり、文字間隔も異なり、文字列全体の大きさ（横幅）も異なる。このため、従来は、文字認識結果の正誤の確認に手間がかかり、正誤判断の誤りも生じた。このように、従来は、文字認識結果７０２と確認用イメージデータ７０３とを比較しにくく、確認作業を効率的かつ正確に遂行することができない。 Therefore, in the image data 712,714,716 displayed in the area of the confirmation image data 703, the size of the characters 301 to 30n is different for each form T, the character spacing is also different, and the size of the entire character string ( Width) is also different. For this reason, conventionally, it takes time and effort to confirm the correctness of the character recognition result, and an error in the correctness judgment has also occurred. As described above, conventionally, it is difficult to compare the character recognition result 702 and the confirmation image data 703, and the confirmation work cannot be performed efficiently and accurately.

さらに、確認用イメージデータ７０３の領域では、異なる帳票Ｔのイメージデータ７１２，７１４，７１６の文字が上下に位置していない。例えば、１行目（結果Ｎｏ「１０４」）の帳票Ｔのイメージデータ７１２の先頭の文字３０１「金」の位置を基準として上下位置ｐ１の領域を見るとする。この場合、３行目（結果Ｎｏ「１５９」）の帳票Ｔのイメージデータ７１６では先頭の文字３０１「金」、および２番目の文字３０２「沢」の２文字が位置している。 Further, in the area of the confirmation image data 703, the characters of the image data 712, 714, 716 of different forms T are not positioned vertically. For example, suppose that the area of the vertical position p1 is viewed with reference to the position of the first character 301 "gold" of the image data 712 of the form T on the first line (result No. "104"). In this case, in the image data 716 of the form T on the third line (result No. "159"), the first character 301 "gold" and the second character 302 "sawa" are located.

また、３行目（結果Ｎｏ「１５９」）の帳票Ｔのイメージデータ７１６の末尾の文字３０８「目」の位置の上下位置ｐｎの領域で見た場合、１行目（結果Ｎｏ「１０４」）の帳票Ｔのイメージデータ７１２の４番目の文字３０４「増」が位置している。ここで、３行目（結果Ｎｏ「１５９」）の帳票Ｔのイメージデータ７１６の末尾の文字３０８「目」に対し、１行目（結果Ｎｏ「１０４」）の帳票Ｔのイメージデータ７１２の末尾の文字３０８「目」は、文字列方向（横方向）でずれた位置に位置している。 Further, when viewed in the area of the vertical position pn of the position of the character 308 "eyes" at the end of the image data 716 of the form T on the third line (result No. "159"), the first line (result No. "104"). The fourth character 304 "increase" of the image data 712 of the form T of the above is located. Here, the end of the image data 712 of the form T of the first line (result No. "104") is compared with the character 308 "eye" at the end of the image data 716 of the form T of the third line (result No. "159"). The character 308 "eye" of is located at a position shifted in the character string direction (horizontal direction).

このように、異なる帳票Ｔのイメージデータ７１２，７１４，７１６の文字が上下に位置していないため、実施の形態で説明したような確認用イメージデータの領域だけで視線Ｐを上下方向に移動させて異なるイメージを捉えるという作業は行えない。これにより、各行の文字列内で文字が似ていないものの有無を判断することができず、文字認識結果の正誤の確認作業を効率化できない。 As described above, since the characters of the image data 712, 714, 716 of the different forms T are not located vertically, the line of sight P is moved in the vertical direction only in the area of the confirmation image data as described in the embodiment. It is not possible to capture different images. As a result, it is not possible to determine whether or not the characters are not similar in the character string of each line, and it is not possible to streamline the work of confirming the correctness of the character recognition result.

従来技術では、仮に図７の表示形式としても、確認用イメージデータ７０３では、文字列の各文字の大きさが違うことが理由で、同じ文字列画像を並べても、同じ文字が位置ずれを起こして上下には並ばない。比較すべき文字同士（１文字目同士、２文字目同士、…）は、文字列内で位置ずれを起こして上下には並ばない。また、手書き文字では、帳票に記入した人の癖により各文字の大きさがバラバラになるため、仮に、確認用イメージデータ７０３の同じ文字が同じ上下位置に並んだとしても、異なった字形に見誤ることがある。 In the prior art, even if the display format shown in FIG. 7 is used, in the confirmation image data 703, the same characters are misaligned even if the same character string images are arranged because the size of each character in the character string is different. Do not line up above and below. Characters to be compared (first characters, second characters, ...) Are misaligned in the character string and do not line up one above the other. In addition, in handwritten characters, the size of each character varies depending on the habit of the person who filled out the form, so even if the same characters in the confirmation image data 703 are lined up in the same vertical position, they will look different. You may make a mistake.

このように、従来技術では、手書きの文字列に対する一目での確認作業を実現できない。また、一目確認の本質は、並んだ字形の中から違和感のある「字形」を検出することであり、この本質において文字サイズの不揃いは、確認作業上邪魔な要素となる。例えば、上記の特許文献２の技術では、画像の特徴量を利用しているが、関係のない文字サイズまでクラスタリングの演算要素となってしまい、確認精度が落ちてしまう。 As described above, the conventional technique cannot realize the work of confirming the handwritten character string at a glance. In addition, the essence of glance confirmation is to detect a strange "character shape" from the arranged character shapes, and in this essence, irregular character sizes are an obstacle to the confirmation work. For example, in the technique of Patent Document 2 described above, the feature amount of the image is used, but the confirmation accuracy is lowered because it becomes a calculation element of clustering up to an unrelated character size.

これに対し、実施の形態では、図６Ａ，図６Ｂを用いて説明したように、異なる帳票Ｔの確認用イメージデータＬ１１，Ｌ２１の領域内のイメージデータは、文字列の各文字が上下で同じ位置に位置している。これにより、職員（ユーザ）Ｕは、確認用イメージデータＬ１１，Ｌ２１の領域だけで視線Ｐを上下方向に移動させて異なるイメージを捉えるという作業を行うことができる。そして、各行の文字列内で文字が似ていないものの有無を簡単に判断することができ、文字認識結果の正誤の確認作業を効率化することができる。 On the other hand, in the embodiment, as described with reference to FIGS. 6A and 6B, the image data in the areas of the confirmation image data L11 and L21 of the different forms T have the same upper and lower characters in the character string. It is located in position. As a result, the staff (user) U can perform the work of moving the line of sight P in the vertical direction only in the areas of the confirmation image data L11 and L21 to capture different images. Then, it is possible to easily determine whether or not the characters are not similar in the character string of each line, and it is possible to streamline the work of confirming the correctness of the character recognition result.

以上説明した実施の形態は、ユーザによるＯＣＲ認識結果の確認作業を支援するために、イメージデータを取得し、イメージデータに含まれる文字列の各文字の大きさが略均等になるように各文字の大きさを拡大又は縮小する。そして、拡大又は縮小した文字を連結して前記文字列の確認用イメージデータを生成し、イメージデータ又は確認用イメージデータを文字認識した結果得られる文字認識結果と、確認用イメージデータとを比較可能な表示形式のリストを出力する。これにより、リストに表示される確認用イメージデータの各文字が同じ大きさに揃えられ、ユーザは文字認識結果の確認作業時に、確認用イメージデータと、当該確認用イメージデータの文字認識結果とを容易に比較でき、確認作業を適切に行えるようになる。帳票等に記載された手書き文字は、各種多様であり、大きさや筆跡状態が異なるが、上記のように手書き文字のイメージデータに対する分割および整形を行うことで、文字列の各文字を同じ大きさにでき、ユーザによる確認用イメージデータの視認性を向上でき、文字認識結果との比較も容易かつ正しく文字認識結果の正誤を精度よく判断できるようになる。 In the embodiment described above, in order to support the user in confirming the OCR recognition result, image data is acquired, and each character is substantially equal in size of each character in the character string included in the image data. Enlarge or reduce the size of. Then, the enlarged or reduced characters are concatenated to generate confirmation image data of the character string, and the character recognition result obtained as a result of character recognition of the image data or confirmation image data can be compared with the confirmation image data. Output a list of display formats. As a result, each character of the confirmation image data displayed in the list is aligned to the same size, and when the user confirms the character recognition result, the confirmation image data and the character recognition result of the confirmation image data are displayed. You will be able to easily compare and perform confirmation work appropriately. There are various types of handwritten characters written on forms, etc., and their sizes and handwriting states are different. However, by dividing and shaping the image data of handwritten characters as described above, each character in the character string has the same size. This makes it possible to improve the visibility of the confirmation image data by the user, and it becomes possible to easily and correctly compare the character recognition result with the correctness of the character recognition result.

また、取得したイメージデータを文字毎に分割し、各分割した枠の大きさを基準となる大きさに拡大又は縮小してもよい。この処理はイメージデータに対する汎用の画像処理で容易に行うことができ、視認性を向上させた確認用イメージデータを容易に生成できる。 Further, the acquired image data may be divided for each character, and the size of each divided frame may be enlarged or reduced to a reference size. This processing can be easily performed by general-purpose image processing on the image data, and confirmation image data with improved visibility can be easily generated.

また、帳票は、一般的に複数の属性別の手書き記入欄を有している。このような帳票に対応して、イメージデータの取得では、複数の帳票の属性別のイメージデータを取得する。また、リストの出力では、複数の帳票で共通する所定の属性の確認用イメージデータと、当該確認用イメージデータの文字認識結果と、を比較可能な表示形式でリスト出力する。これにより、複数の帳票に対する属性別の文字認識結果の確認作業を行えるようになる。帳票には、氏名、住所等の複数の属性の記入欄が設けられており、属性別に文字認識結果の確認作業を行うことで、この確認作業を効率的に行えるようになる。 In addition, the form generally has a plurality of handwritten entry fields for each attribute. Corresponding to such a form, in the acquisition of image data, image data for each attribute of a plurality of forms is acquired. Further, in the output of the list, the confirmation image data of a predetermined attribute common to a plurality of forms and the character recognition result of the confirmation image data are output as a list in a comparable display format. As a result, it becomes possible to confirm the character recognition result for each attribute for a plurality of forms. The form is provided with fields for entering multiple attributes such as name and address, and by confirming the character recognition result for each attribute, this confirmation work can be performed efficiently.

さらに、複数の帳票のリストの出力では、複数の帳票で共通する一つの属性の文字列の文字認識結果に含まれる文字列でソートする。そして、ソートした複数の帳票の文字認識結果と、当該文字認識結果に対応する確認用イメージデータと、を比較可能な表示形式でリスト出力してもよい。例えば、氏名や住所等の属性別にソートされたリストを用いることで、文字認識結果および確認用イメージデータが氏名の昇順で表示できる。このため、例えば、複数の帳票にわたる確認用イメージデータの表示領域部分で連続する文字列全体をイメージ、例えば、図形として捉え、似ていない図形を見つける、という直観的な作業を行うことができる。そして、文字認識結果の正誤の確認作業を効率的に行えるようになる。 Further, in the output of the list of a plurality of forms, the character string included in the character recognition result of the character string of one attribute common to the plurality of forms is sorted. Then, the character recognition results of the plurality of sorted forms and the confirmation image data corresponding to the character recognition results may be output as a list in a comparable display format. For example, by using a list sorted by attributes such as name and address, character recognition results and confirmation image data can be displayed in ascending order of names. Therefore, for example, it is possible to perform an intuitive work of capturing the entire continuous character string as an image, for example, a figure in the display area portion of the confirmation image data over a plurality of forms, and finding a figure that is not similar. Then, the work of confirming the correctness of the character recognition result can be efficiently performed.

また、出力されたリストを用いたユーザによる文字認識結果の正誤の確認に基づき、当該確認済の文字認識結果を、帳票の電子データとして記憶する。これにより、手書き文字の帳票を容易に電子化でき、正しく文字認識された電子データとして記憶しておくことができる。 Further, based on the confirmation of the correctness of the character recognition result by the user using the output list, the confirmed character recognition result is stored as electronic data of the form. As a result, the form of handwritten characters can be easily digitized and stored as electronic data with correctly recognized characters.

また、複数の属性別の文字認識結果を帳票単位で統合し、帳票の電子データとして記憶してもよい。これにより、１枚の帳票が有する手書き記入欄を全て含む帳票単位の電子データを得ることができる。また、帳票の記載内容に基づく各種処理を行う外部端末に対し容易に帳票のデータを入出力できるようになる。 Further, the character recognition results for each of a plurality of attributes may be integrated for each form and stored as electronic data of the form. As a result, it is possible to obtain electronic data for each form including all the handwritten entry fields of one form. In addition, form data can be easily input / output to an external terminal that performs various processes based on the contents of the form.

また、ＯＣＲ認識結果確認支援システムは、以上の各処理を行うＯＣＲ認識結果確認支援装置と、帳票に手書きで記載された文字列を読み取りイメージデータを出力するスキャナを含む。また、リストを表示または印刷出力する出力部と、出力されたリストに基づき、文字認識結果の正誤の確認をユーザ操作するための操作部と、を含み構成してもよい。このような汎用の外部装置を接続することで、ユーザによる文字認識結果の確認作業を効率的に行えるようになる。 Further, the OCR recognition result confirmation support system includes an OCR recognition result confirmation support device that performs each of the above processes, and a scanner that reads a character string handwritten on a form and outputs image data. Further, an output unit that displays or prints out the list and an operation unit for user operation to confirm the correctness of the character recognition result based on the output list may be included. By connecting such a general-purpose external device, the user can efficiently confirm the character recognition result.

これらのことから、実施の形態によれば、手書き文字で記載された帳票の文字認識精度を向上して電子化できるようになる。すなわち、帳票の手書き文字の確認用イメージデータと、文字認識した文字認識結果（テキストデータ）とをユーザが比較し、文字認識結果の正誤を確認する確認作業を効率化できる。この際、確認用イメージデータ自体の視認性を向上でき、文字認識結果の正誤をユーザが正しく判断できるため、最終的に帳票全体の文字認識精度を向上した電子データを得ることができるようになる。 From these facts, according to the embodiment, it becomes possible to improve the character recognition accuracy of the form described by the handwritten character and digitize it. That is, the user can compare the image data for confirming the handwritten characters of the form with the character recognition result (text data) for character recognition, and the confirmation work for confirming the correctness of the character recognition result can be made more efficient. At this time, the visibility of the confirmation image data itself can be improved, and the user can correctly judge the correctness of the character recognition result, so that it becomes possible to finally obtain electronic data with improved character recognition accuracy of the entire form. ..

なお、本発明の実施の形態で説明したＯＣＲ認識結果確認支援にかかる方法は、あらかじめ用意されたプログラムをサーバ等のプロセッサに実行させることにより実現することができる。本方法は、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、フラッシュメモリ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また本方法は、インターネット等のネットワークを介して配布してもよい。 The method for supporting OCR recognition result confirmation described in the embodiment of the present invention can be realized by causing a processor such as a server to execute a program prepared in advance. This method is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM (Compact Disc-Read Only Memory), a DVD (Digital Versailles Disk), or a flash memory, and is read from the recording medium by the computer. Is executed by. Further, this method may be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are further disclosed with respect to the above-described embodiment.

（付記１）イメージデータを取得し、
前記イメージデータに含まれる文字列の各文字の大きさが略均等になるように各文字の大きさを拡大又は縮小し、
拡大又は縮小した前記文字を連結して前記文字列の確認用イメージデータを生成し、
前記イメージデータ又は前記確認用イメージデータを文字認識した結果得られる文字認識結果と、前記確認用イメージデータとを比較可能な表示形式のリストを出力する、
処理をコンピュータに実行させることを特徴とするＯＣＲ認識結果確認支援プログラム。 (Appendix 1) Obtain image data and
Enlarge or reduce the size of each character so that the size of each character of the character string included in the image data is substantially equal.
Image data for confirmation of the character string is generated by concatenating the enlarged or reduced characters.
Outputs a list of display formats capable of comparing the character recognition result obtained as a result of character recognition of the image data or the confirmation image data with the confirmation image data.
An OCR recognition result confirmation support program characterized by having a computer execute processing.

（付記２）前記イメージデータは、帳票上に複数の属性別の手書き記入欄を有し、
前記イメージデータの取得では、複数の前記帳票の前記属性別のイメージデータを取得し、
前記リストの出力では、複数の前記帳票で共通する所定の前記属性の前記確認用イメージデータと、当該確認用イメージデータの文字認識結果と、を比較可能な表示形式でリスト出力する、
ことを特徴とする付記１に記載のＯＣＲ認識結果確認支援プログラム。 (Appendix 2) The image data has a plurality of handwritten entry fields for each attribute on the form.
In the acquisition of the image data, image data for each of the attributes of the plurality of forms is acquired.
In the output of the list, the confirmation image data of the predetermined attribute common to the plurality of forms and the character recognition result of the confirmation image data are output as a list in a comparable display format.
The OCR recognition result confirmation support program described in Appendix 1 characterized by this.

（付記３）前記リストの出力では、複数の前記帳票で共通する一つの前記属性の前記文字列の前記文字認識結果に含まれる前記文字列でソートし、
前記ソートした複数の前記帳票の前記文字認識結果と、当該文字認識結果に対応する前記確認用イメージデータと、を比較可能な表示形式でリスト出力する、
ことを特徴とする付記２に記載のＯＣＲ認識結果確認支援プログラム。 (Appendix 3) In the output of the list, the character string of the character string of one attribute common to the plurality of forms is sorted by the character string included in the character recognition result.
The character recognition result of the sorted plurality of forms and the confirmation image data corresponding to the character recognition result are output as a list in a comparable display format.
The OCR recognition result confirmation support program described in Appendix 2 characterized by this.

（付記４）出力された前記リストを用いたユーザによる前記文字認識結果の正誤の確認に基づき、当該確認済の前記文字認識結果を前記帳票の電子データとして記憶する、
ことを特徴とする付記２または３に記載のＯＣＲ認識結果確認支援プログラム。 (Appendix 4) Based on the confirmation of correctness of the character recognition result by the user using the output list, the confirmed character recognition result is stored as electronic data of the form.
The OCR recognition result confirmation support program described in Appendix 2 or 3, characterized in that.

（付記５）複数の属性別の前記文字認識結果を前記帳票単位で統合し、前記帳票の電子データとして記憶する、
ことを特徴とする付記４に記載のＯＣＲ認識結果確認支援プログラム。 (Appendix 5) The character recognition results for each of a plurality of attributes are integrated for each form and stored as electronic data of the form.
The OCR recognition result confirmation support program described in Appendix 4 characterized by the above.

（付記６）取得した前記イメージデータを複数の前記文字それぞれの外形を含む複数の分割枠に分割し、
分割した前記分割枠の大きさを基準となる大きさに拡大又は縮小する、
ことを特徴とする付記１〜５のいずれか一つに記載のＯＣＲ認識結果確認支援プログラム。 (Appendix 6) The acquired image data is divided into a plurality of division frames including the outer shapes of the plurality of characters.
Enlarge or reduce the size of the divided frame to a reference size.
The OCR recognition result confirmation support program according to any one of Supplementary notes 1 to 5, characterized in that.

（付記７）イメージデータを取得し、
前記イメージデータに含まれる文字列の各文字の大きさが略均等になるように各文字の大きさを拡大又は縮小し、
拡大又は縮小した前記文字を連結して前記文字列の確認用イメージデータを生成し、
前記イメージデータ又は前記確認用イメージデータを文字認識した結果得られる文字認識結果と、前記確認用イメージデータとを比較可能な表示形式のリストを出力する、
処理をコンピュータが実行することを特徴とするＯＣＲ認識結果確認支援方法。 (Appendix 7) Obtain image data and
Enlarge or reduce the size of each character so that the size of each character of the character string included in the image data is substantially equal.
Image data for confirmation of the character string is generated by concatenating the enlarged or reduced characters.
Outputs a list of display formats capable of comparing the character recognition result obtained as a result of character recognition of the image data or the confirmation image data with the confirmation image data.
An OCR recognition result confirmation support method characterized in that a computer executes processing.

（付記８）イメージデータを取得し、前記イメージデータに含まれる文字列の各文字の大きさが略均等になるように各文字の大きさを拡大又は縮小し、拡大又は縮小した前記文字を連結して前記文字列の確認用イメージデータを生成し、前記イメージデータ又は前記確認用イメージデータを文字認識した結果得られる文字認識結果と、前記確認用イメージデータとを比較可能な表示形式のリストを出力する制御部、
を備えたことを特徴とするＯＣＲ認識結果確認支援システム。 (Appendix 8) Acquire image data, enlarge or reduce the size of each character so that the size of each character in the character string included in the image data is substantially equal, and concatenate the enlarged or reduced characters. Then, the confirmation image data of the character string is generated, and the character recognition result obtained as a result of character recognition of the image data or the confirmation image data is displayed in a display format in which the confirmation image data can be compared. Control unit to output,
OCR recognition result confirmation support system characterized by being equipped with.

（付記９）帳票に手書きで記載された文字列を読み取り、前記イメージデータを出力するスキャナと、
前記リストを表示または印刷出力する出力部と、
出力された前記リストに基づき、前記文字認識結果の正誤の確認をユーザ操作するための操作部と、
を備えたことを特徴とする付記８に記載のＯＣＲ認識結果確認支援システム。 (Appendix 9) A scanner that reads a character string handwritten on a form and outputs the image data.
An output unit that displays or prints out the list,
Based on the output list, an operation unit for user operation to confirm the correctness of the character recognition result, and
The OCR recognition result confirmation support system according to Appendix 8, which is characterized by the above.

１００ＯＣＲ認識結果確認支援装置
１０１記憶部
１０２属性別変換結果データ
１０３ディスプレイ
１０４キーボード
１０５データベース
１１０スキャナ
２０１ＣＰＵ（制御部）
２０２メモリ
２０３ネットワークインタフェース
２０５記録媒体
２０６入出力インタフェース
３０１〜３０ｎ文字
６１１，６１３，６１５，６２１文字認識結果（テキストデータ）
６１２，６１４，６１６，６２２，６２４，６２６イメージデータ
Ｅ枠
Ｅ１〜Ｅ８分割枠
Ｈ利用者宅
ＮＷネットワーク
ＩＭＧイメージデータ
Ｌ１（Ｌ１，Ｌ２）リスト
Ｌ１１，Ｌ２１確認用イメージデータ
Ｌ１２，Ｌ２２文字認識結果（テキストデータ）
Ｐ視線（視線の移動状態）
Ｔ帳票
Ｕ（Ｕ１，Ｕ２）職員（ユーザ） 100 OCR recognition result confirmation support device 101 Storage unit 102 Conversion result data by attribute 103 Display 104 Keyboard 105 Database 110 Scanner 201 CPU (Control unit)
202 Memory 203 Network interface 205 Recording medium 206 Input / output interface 301-30n characters 611,613,615,621 Character recognition result (text data)
612,614,616,622,624,626 Image data E frame E1 to E8 Divided frame H User's home NW network IMG image data L1 (L1, L2) List L11, L21 Confirmation image data L12, L22 Character recognition result ( Text data)
P line of sight (moving state of line of sight)
T form U (U1, U2) staff (user)

Claims

Get image data
Enlarge or reduce the size of each character so that the size of each character of the character string included in the image data is substantially equal.
Image data for confirmation of the character string is generated by concatenating the enlarged or reduced characters.
Outputs a list of display formats capable of comparing the character recognition result obtained as a result of character recognition of the image data or the confirmation image data with the confirmation image data.
An OCR recognition result confirmation support program characterized by having a computer execute processing.

The image data has a plurality of handwritten entry fields for each attribute on the form.
In the acquisition of the image data, image data for each of the attributes of the plurality of forms is acquired.
In the output of the list, the confirmation image data of the predetermined attribute common to the plurality of forms and the character recognition result of the confirmation image data are output as a list in a comparable display format.
The OCR recognition result confirmation support program according to claim 1, characterized in that.

In the output of the list, the character string of the character string of one attribute common to the plurality of forms is sorted by the character string included in the character recognition result.
The character recognition result of the sorted plurality of forms and the confirmation image data corresponding to the character recognition result are output as a list in a comparable display format.
The OCR recognition result confirmation support program according to claim 2, characterized in that.

Based on the confirmation of correctness of the character recognition result by the user using the output list, the confirmed character recognition result is stored as the form electronic data.
The OCR recognition result confirmation support program according to claim 2 or 3, characterized in that.

The character recognition results for each of a plurality of attributes are integrated for each form and stored as electronic data of the form.
The OCR recognition result confirmation support program according to claim 4, characterized in that.

Get image data
Enlarge or reduce the size of each character so that the size of each character of the character string included in the image data is substantially equal.
Image data for confirmation of the character string is generated by concatenating the enlarged or reduced characters.
Outputs a list of display formats capable of comparing the character recognition result obtained as a result of character recognition of the image data or the confirmation image data with the confirmation image data.
An OCR recognition result confirmation support method characterized in that a computer executes processing.

The image data is acquired, the size of each character is enlarged or reduced so that the size of each character of the character string included in the image data is substantially equal, and the enlarged or reduced characters are concatenated to form the character. A control unit that generates column confirmation image data and outputs a list of display formats that can compare the character recognition result obtained as a result of character recognition of the image data or the confirmation image data with the confirmation image data. ,
OCR recognition result confirmation support system characterized by being equipped with.

A scanner that reads the character string handwritten on the form and outputs the image data,
An output unit that displays or prints out the list,
Based on the output list, an operation unit for user operation to confirm the correctness of the character recognition result, and
The OCR recognition result confirmation support system according to claim 7, wherein the OCR recognition result confirmation support system is provided.