JP7111143B2

JP7111143B2 - Image processing device, image processing method and program

Info

Publication number: JP7111143B2
Application number: JP2020177513A
Authority: JP
Inventors: 裕一中谷; 克彦近藤; 哲 ▲瀬▼川; 充杉本; 康日高; 隼哉秋山
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2022-08-02
Anticipated expiration: 2038-04-02
Also published as: JP2021012741A

Description

本発明は、画像処理装置、画像処理方法およびプログラムに関する。 The present invention relates to an image processing device, an image processing method, and a program.

帳票のフォーマットに関連して、特許文献１には、帳票に最適な帳票フォーマットを効率的に選出するための方法が記載されている。この方法では、帳票フォーマットをグループ分けし、グループ毎に代表帳票フォーマットを１つ定めておく。そしてこの方法では、読み取った帳票画像と代表帳票フォーマットとの特徴合致率に基づいて何れか１つのグループを選出する。さらにこの方法では、選出したグループ内の各帳票フォーマットのうち、読み取った帳票画像との特徴合致率が最も高い帳票フォーマットを選択する。 In relation to the form format, Patent Document 1 describes a method for efficiently selecting the most suitable form format for the form. In this method, the form formats are grouped and one representative form format is defined for each group. In this method, any one group is selected based on the feature match rate between the read form image and the representative form format. Furthermore, in this method, the form format having the highest feature match rate with the read form image is selected from among the form formats in the selected group.

また、文字認識結果の修正に関連して、特許文献２には、複数種類の帳票が混在している場合の文字認識で読取不能文字（リジェクト文字）が生じた場合に、読取不能文字の訂正処理を行う帳票の種類を指定し、指定された種類の帳票における読取不能文字の訂正処理を行うことで、同じ種類の帳票の訂正処理を連続的に行うことが記載されている。 In addition, in relation to correction of character recognition results, Patent Document 2 discloses that when unreadable characters (reject characters) occur in character recognition when multiple types of forms are mixed, correction of unreadable characters is performed. It is described that by designating the type of form to be processed and correcting unreadable characters in the designated type of form, the same type of form is continuously corrected.

また、文字認識結果の修正に関連して、特許文献３には、イメージ入力（文字認識）された帳票の修正・確認を行う際、入力された内容を所定の書式に従って表示して修正を受け付けること、および、利用者から修正項目が指示されると、受け付けた修正項目に対応する特定のイメージ部分を強調表示することが記載されている。 In addition, in relation to correction of character recognition results, Patent Document 3 discloses that when correcting/confirming an image-inputted (character-recognized) form, input contents are displayed according to a predetermined format and corrections are accepted. and that when a correction item is instructed by the user, a specific image portion corresponding to the received correction item is highlighted.

特開２０１６－０４８４４４号公報JP 2016-048444 A 特開２００４－１１８３８０号公報Japanese Patent Application Laid-Open No. 2004-118380 特開２００２－００７９５１号公報JP-A-2002-007951

文字認識結果の確認および訂正の際、確認訂正の作業者が、どの項目がどこに示されているかを把握できることが重要である。特に、作業者が普段見慣れていないフォーマットの文書の読み取り結果を確認および訂正する場合、確認したい項目がどこに記載されているかの把握に手間取る可能性がある。
そこで、作業者が、どの項目がどこに示されているかを把握できるよう補助できることが好ましい。その際、文字認識対象の文書のフォーマットが予めわかっていない場合でも、補助を行えることが好ましい。 When confirming and correcting character recognition results, it is important that the confirmation/correction operator be able to ascertain which item is shown where. In particular, when a worker confirms and corrects the result of reading a document with an unfamiliar format, it may take time to grasp where the item to be confirmed is written.
Therefore, it is preferable to assist the operator in understanding which item is displayed where. At that time, it is preferable to be able to assist even if the format of the document to be subjected to character recognition is not known in advance.

本発明は、上述の課題を解決することのできる画像処理装置、画像処理方法およびプログラムを提供することを目的としている。 An object of the present invention is to provide an image processing apparatus, an image processing method, and a program that can solve the above-described problems.

本発明の第１の態様によれば、画像処理装置は、文字列を含む複数の画像を用いた学習の結果に基づいて、認識対象の画像における特定項目の前記文字列を認識する文字列認識部と、前記認識対象の画像と、前記文字列認識部による文字列認識結果とにおいて同じ前記特定項目を示す対応関係が把握可能な態様で、前記認識対象の画像と前記文字列認識結果とを出力する出力部と、を備え、前記文字列認識部は、前記認識対象の画像における前記特定項目の前記文字列の候補を特定し、前記出力部は、前記特定された文字列の候補の範囲を把握可能な態様で出力する。 According to the first aspect of the present invention, an image processing apparatus performs character string recognition for recognizing a character string of a specific item in a recognition target image based on learning results using a plurality of images including character strings. and the image to be recognized and the result of character string recognition by the character string recognition unit in such a manner that a correspondence relationship indicating the same specific item in the image to be recognized and the character string recognition result by the character string recognition unit can be understood. and an output unit for outputting , wherein the character string recognition unit identifies the character string candidates of the specific item in the recognition target image, and the output unit outputs a range of the identified character string candidates. is output in a comprehensible manner .

本発明の第２の態様によれば、画像処理方法は、文字列を含む複数の画像を用いた学習の結果に基づいて、認識対象の画像における特定項目の前記文字列を認識する工程と、前記認識対象の画像と、文字列認識結果とにおいて同じ前記特定項目を示す対応関係が把握可能な態様で、前記認識対象の画像と前記文字列認識結果とを出力する工程と、を含み、前記文字列を認識する工程では、前記認識対象の画像における前記特定項目の前記文字列の候補を特定し、前記出力する工程では、前記特定された文字列の候補の範囲を把握可能な態様で出力する。 According to a second aspect of the present invention, an image processing method includes a step of recognizing a character string of a specific item in an image to be recognized based on a result of learning using a plurality of images including character strings; a step of outputting the recognition target image and the character string recognition result in a manner in which a correspondence relationship indicating the same specific item in the recognition target image and the character string recognition result can be grasped ; In the step of recognizing the character string, the candidate character string of the specific item in the image to be recognized is specified, and in the step of outputting, the range of the specified candidate character string is grasped. output .

本発明の第３の態様によれば、プログラムは、コンピュータに、文字列を含む複数の画像を用いた学習の結果に基づいて、認識対象の画像における特定項目の前記文字列を認識する工程と、前記認識対象の画像と、文字列認識結果とにおいて同じ前記特定項目を示す対応関係が把握可能な態様で、前記認識対象の画像と前記文字列認識結果とを出力する工程と、を実行させ、前記文字列を認識する工程では、前記認識対象の画像における前記特定項目の前記文字列の候補を特定させ、前記出力する工程では、前記特定された文字列の候補の範囲を把握可能な態様で出力させるためのプログラムである。
According to a third aspect of the present invention, a program instructs a computer to recognize a character string of a specific item in an image to be recognized based on learning results using a plurality of images containing character strings; and outputting the recognition target image and the character string recognition result in such a manner that a correspondence relationship indicating the same specific item in the recognition target image and the character string recognition result can be grasped. and, in the step of recognizing the character string, specifying the candidate character string of the specific item in the image to be recognized, and in the step of outputting, a range of the specified candidate character string can be grasped. This is a program for outputting with .

この発明によれば、文字認識対象の文書のフォーマットが予めわかっていない場合でも、作業者が、どの項目がどこに示されているかを把握できるよう補助を行うことができる。 According to the present invention, even if the format of the document to be subjected to character recognition is not known in advance, it is possible to assist the operator in recognizing which item is shown where.

実施形態に係る画像処理装置を含む画像処理システムの装置構成例を示す図である。1 is a diagram showing a device configuration example of an image processing system including an image processing device according to an embodiment; FIG. 実施形態に係る画像処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the image processing apparatus which concerns on embodiment. 第一実施形態に係る画像処理装置の機能構成を示す概略ブロック図である。1 is a schematic block diagram showing the functional configuration of an image processing apparatus according to a first embodiment; FIG. 文書帳票の一例を示す図である。FIG. 4 is a diagram showing an example of a document form; 第一実施形態に係るデータベースが記憶する記録テーブルの概要を示す図である。It is a figure which shows the outline|summary of the recording table which the database based on 1st embodiment memorize|stores. 第一実施形態に係る画像処理装置の処理フローを示す第一の図である。FIG. 3 is a first diagram showing the processing flow of the image processing apparatus according to the first embodiment; 第一実施形態に係る画像処理装置の処理フローを示す第二の図である。FIG. 7 is a second diagram showing the processing flow of the image processing apparatus according to the first embodiment; 第一実施形態に係る表示部が、記録文字列の表示と、文書帳票の画像とを並べて表示した表示画面の例を示す図である。FIG. 8 is a diagram showing an example of a display screen in which a display of recorded character strings and an image of a document form are displayed side by side by the display unit according to the first embodiment; 第一実施形態に係る表示部が、記録文字列の表示と文書帳票の画像との対応関係を表示した表示画面の例を示す図である。FIG. 10 is a diagram showing an example of a display screen on which the display unit according to the first embodiment displays a correspondence relationship between a display of recorded character strings and an image of a document form; 第二実施形態に係る画像処理装置の機能構成を示す概略ブロック図である。FIG. 6 is a schematic block diagram showing the functional configuration of an image processing apparatus according to a second embodiment; 第二実施形態に係る画像処理装置の処理フローを示す第一の図である。FIG. 11 is a first diagram showing a processing flow of an image processing device according to a second embodiment; 第二実施形態に係る画像処理装置の処理フローを示す第二の図である。FIG. 9 is a second diagram showing the processing flow of the image processing apparatus according to the second embodiment; 実施形態に係る画像処理装置の構成の例を示す図である。1 is a diagram illustrating an example of the configuration of an image processing apparatus according to an embodiment; FIG.

以下、本発明の実施形態を説明するが、以下の実施形態は請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Embodiments of the present invention will be described below, but the following embodiments do not limit the invention according to the claims. Also, not all combinations of features described in the embodiments are essential for the solution of the invention.

図１は、実施形態に係る画像処理装置を含む画像処理システムの装置構成例を示す図である。
図１に示す構成で、画像処理システム１００は画像処理装置１、画像読取装置２、記録装置３およびデータベース４により構成される。
画像処理装置１は画像読取装置２と通信ケーブルにより接続されている。画像読取装置２は光学的に文書帳票などの画像データを取得して画像処理装置１へ出力する。画像処理装置１は文書帳票の画像データをＯＣＲ処理し文字認識する。画像処理装置１は文字認識結果を記録装置３に出力し、記録装置３がその文字認識結果をデータベースに記録する。なお、画像処理装置１が対象とする文書は、特定の種類のものに限定されない。ＯＣＲ処理可能ないろいろな文書を、画像処理装置１の処理対象とすることができる。 FIG. 1 is a diagram illustrating an example configuration of an image processing system including an image processing apparatus according to an embodiment.
With the configuration shown in FIG. 1, the image processing system 100 comprises an image processing device 1, an image reading device 2, a recording device 3, and a database 4. FIG.
The image processing device 1 is connected to the image reading device 2 by a communication cable. The image reading device 2 optically acquires image data such as a document form and outputs the image data to the image processing device 1 . The image processing apparatus 1 performs OCR processing on the image data of the document form and recognizes the characters. The image processing device 1 outputs the character recognition result to the recording device 3, and the recording device 3 records the character recognition result in a database. Note that the document targeted by the image processing apparatus 1 is not limited to a specific type. Various OCR-processable documents can be processed by the image processing apparatus 1 .

データベース４は画像処理装置１と記録装置３とに接続されている。データベース４は記録装置３から過去に登録された複数の文書帳票の画像データとその画像データに含まれる文字列のうち記録対象となる文字列を示す記録文字列の対応関係を記憶している。記録文字列が示す文字列は文書帳票に記述される文字列のうちデータベース４に記録、保存しておくべき重要な文字列である。画像処理システム１００を利用する作業者は予め、記録装置３を用いて過去に登録された複数の文書帳票の画像データとその画像データに含まれる文字列のうちの記録文字列をデータベース４に登録しておく。
作業者を、画像処理装置１のユーザ、または単にユーザとも称する。 A database 4 is connected to the image processing device 1 and the recording device 3 . The database 4 stores correspondence relationships between image data of a plurality of document forms registered in the past from the recording device 3 and recorded character strings indicating character strings to be recorded among the character strings included in the image data. The character string indicated by the recorded character string is an important character string that should be recorded and saved in the database 4 among the character strings described in the document form. An operator who uses the image processing system 100 registers image data of a plurality of document forms registered in the past using the recording device 3 and recorded character strings among the character strings included in the image data in the database 4 in advance. Keep
The operator is also called a user of the image processing apparatus 1 or simply a user.

そしてデータベース４には文書帳票の画像データとその画像データに含まれる文字列の情報のうち記録対象となる文字列を示す記録文字列の情報との対応関係が、多くの文書帳票について十分に記録されているものとする。このような状態において、画像処理装置１は処理を行う。 In the database 4, the correspondence relationship between the image data of the document form and the information of the record character string indicating the character string to be recorded among the character string information included in the image data is sufficiently recorded for many document forms. It shall be In such a state, the image processing apparatus 1 performs processing.

図２は画像処理装置のハードウェア構成例を示す図である。
図２に示す構成で、画像処理装置１は、画像処理装置本体１０と、表示装置１７と、入力デバイス１８とを備える。
画像処理装置本体１０は、ＣＰＵ（Central Processing Unit）１１、ＩＦ（Interface）１２、通信モジュール１３、ＲＯＭ（Read Only Memory）１４、ＲＡＭ（Random Access Memory）１５、ＨＤＤ（Hard Disk Drive）１６などの構成を備えたコンピュータである。通信モジュール１３は画像読取装置２、記録装置３、データベース４との間で無線通信を行うものであっても、有線通信を行うものであってもよく、それら２つの機能を有していてもよい。
表示装置１７は、例えば液晶パネルまたはＬＥＤ（Light Emitting Diode）パネル等の表示画面を備える。
入力デバイス１８は、例えばキーボードおよびマウス、あるいは、表示装置１７の表示画面に設けられてタッチパネルを構成するタッチセンサ、あるいはこれらの組み合わせなど、ユーザ操作を受ける装置である。 FIG. 2 is a diagram showing a hardware configuration example of the image processing apparatus.
With the configuration shown in FIG. 2 , the image processing apparatus 1 includes an image processing apparatus body 10 , a display device 17 and an input device 18 .
The image processing apparatus main body 10 includes a CPU (Central Processing Unit) 11, an IF (Interface) 12, a communication module 13, a ROM (Read Only Memory) 14, a RAM (Random Access Memory) 15, an HDD (Hard Disk Drive) 16, and the like. A computer with a configuration. The communication module 13 may perform wireless communication with the image reading device 2, the recording device 3, and the database 4, or may perform wired communication. good.
The display device 17 has a display screen such as a liquid crystal panel or an LED (Light Emitting Diode) panel.
The input device 18 is a device that receives user operations, such as a keyboard and mouse, a touch sensor that is provided on the display screen of the display device 17 and constitutes a touch panel, or a combination thereof.

＜第一実施形態＞
図３は、第一実施形態に係る画像処理装置１の機能構成を示す概略ブロック図である。
通信部１１０は、図２の通信モジュールを用いて構成され、他の装置と通信を行う。特に、通信部１１０は、画像読取装置２、記録装置３、データベース４と通信を行う。
表示部１２０は、図２の表示装置１７を用いて構成され、各種画像を表示する。特に、表示部１２０は出力部の例に該当し、第一の文書画像と、第一の文書画像に対応して表示されている第二の文書画像とで、同じ特定項目を示す対応関係情報を出力する。 <First embodiment>
FIG. 3 is a schematic block diagram showing the functional configuration of the image processing apparatus 1 according to the first embodiment.
The communication unit 110 is configured using the communication module shown in FIG. 2, and communicates with other devices. In particular, the communication unit 110 communicates with the image reading device 2, the recording device 3, and the database 4. FIG.
The display unit 120 is configured using the display device 17 in FIG. 2, and displays various images. In particular, the display unit 120 corresponds to an example of an output unit, and the first document image and the second document image displayed corresponding to the first document image correspond to the same specific item. to output

例えば、表示部１２０は、第一の文書画像として文書帳票の画像（ＯＣＲ結果ではなく生の画像）を表示し、第二の文書画像として、画像処理装置１の処理結果である記録文字列を所定の書式で表示するＧＵＩ画面画像を表示する。そして、表示部１２０は、第二の文書画像に示す文字列が、第一の文書画像におけるどの文字列の読み取り結果かを、両者の文字列間に線を引くことで示す。 For example, the display unit 120 displays an image of a document form (a raw image, not an OCR result) as a first document image, and a recorded character string that is a processing result of the image processing apparatus 1 as a second document image. A GUI screen image displayed in a predetermined format is displayed. Then, the display unit 120 indicates which character string in the first document image the character string indicated in the second document image is the reading result of, by drawing a line between both character strings.

あるいは、記録文字列の正解の文字列が得られている場合、表示部１２０が、第一の文書画像として文書帳票のＯＣＲ結果を、その文書帳票の書式で表示し、第二の文書画像として正解の文字列を所定の書式で表示するＧＵＩ画面画像を表示するようにしてもよい。そして、表示部１２０が、第二の文書画像に示す文字列が、第一の文書画像におけるどの文字列の正解を示すかを、両者の文字列間に線を引くことで示すようにしてもよい。
但し、出力部が画像および関係情報を出力する方法は、これら画像および関係情報を表示する方法に限定されない。例えば、通信部１１０が出力部として機能し、画像および関係情報を他の装置に送信して表示させるようにしてもよい。 Alternatively, when the correct character string of the recorded character string is obtained, the display unit 120 displays the OCR result of the document form as the first document image in the format of the document form, and displays the OCR result of the document form as the second document image. A GUI screen image displaying the correct character string in a predetermined format may be displayed. Then, the display unit 120 may indicate which character string in the first document image the character string indicated in the second document image indicates the correct character string by drawing a line between the two character strings. good.
However, the method by which the output unit outputs the image and related information is not limited to the method of displaying these images and related information. For example, the communication unit 110 may function as an output unit and transmit images and related information to another device for display.

操作入力部１３０は、図２の入力デバイスを用いて構成され、ユーザ操作を受け付ける。特に、操作入力部１３０は、表示部１２０が表示する文字列に対する修正操作を受け付ける。
記憶部１８０は、図２のＲＯＭ１４、ＲＡＭ１５およびＨＤＤ１６を用いて構成され、各種データを記憶する。
制御部１９０は、図２のＣＰＵ１１が、記憶部１８０（図２のＲＯＭ１４、ＲＡＭ１５およびＨＤＤ１６）からプログラムを読み出して実行することで構成される。制御部１９０は、画像処理装置１の各部を制御して各種処理を実行する。
取得部１９１は、文書帳票の画像データを取得する。 The operation input unit 130 is configured using the input device shown in FIG. 2 and receives user operations. In particular, operation input unit 130 receives correction operations for character strings displayed by display unit 120 .
The storage unit 180 is configured using the ROM 14, RAM 15 and HDD 16 of FIG. 2, and stores various data.
The control unit 190 is configured by the CPU 11 in FIG. 2 reading and executing a program from the storage unit 180 (ROM 14, RAM 15 and HDD 16 in FIG. 2). The control unit 190 controls each unit of the image processing apparatus 1 to execute various processes.
The acquisition unit 191 acquires image data of a document form.

特徴量抽出部１９２は、文書帳票の画像データの認識処理結果から文字列の特徴量を抽出する。例えば特徴量抽出部１９２は、複数の文書帳票の画像データの認識処理結果に基づいて、文書帳票の画像データに含まれる記録文字列の特徴を示す第一特徴量を文書帳票の画像データ毎に抽出する。具体的には、特徴量抽出部１９２は、文字認識処理された文書画像に含まれる記録文字列を特定し、特定した記録文字列の文書画像中における第一特徴量を抽出する。ここでいう記録文字列の特定は、文書帳票における文字列のうち何れか１つを、１つの記録文字列に決定することである。特徴量の抽出を特徴量の生成とも称する。 A feature amount extraction unit 192 extracts a feature amount of a character string from the recognition processing result of the image data of the document form. For example, the feature amount extracting unit 192 extracts a first feature amount indicating the feature of the recorded character string included in the image data of the document form for each image data of the document form, based on the recognition processing results of the image data of a plurality of document forms. Extract. Specifically, the feature amount extraction unit 192 identifies a recorded character string included in the document image that has undergone character recognition processing, and extracts a first feature amount of the identified recorded character string in the document image. The identification of the recorded character string here means determining any one of the character strings in the document form as one recorded character string. Extraction of the feature quantity is also referred to as generation of the feature quantity.

記録部１９３は、新たな文書帳票の画像データにおける文字列の特徴量を用いて、新たな文書帳票の画像データから読み取った文字列の情報のうちの記録文字列を抽出して記録する。特に、記録部１９３は、文字列検出部の例に該当し、複数の文書画像を用いた学習の結果に基づいて予め記録され、文書画像の種別毎かつ特定項目毎にその項目の文字列の特徴を示す特徴量（第一特徴量）のうち、表示されている第一の文書画像（文書帳票の画像）についての特徴量に基づいて、第一の文書画像における特定項目の文字列を検出する。 The recording unit 193 extracts and records the recorded character string from the character string information read from the image data of the new document form using the feature amount of the character string in the image data of the new document form. In particular, the recording unit 193 corresponds to an example of a character string detection unit, and is recorded in advance based on the result of learning using a plurality of document images, and for each type of document image and for each specific item, the character string of the item is detected. Character strings of specific items in the first document image are detected based on the feature amount of the displayed first document image (document form image) among the feature amounts (first feature amount) indicating the features. do.

また、記録部１９３は、第一の文書画像における特定項目と同じ項目の文字列を第二の文書画像から検出する。記録部１９３が、第二の文書画像の書式について既知の場合は、書式の情報を用いて第二の文書画像における記録文字列を検出する。一方、記録部１９３が第二の文書画像の書式について既知でない場合、特徴量抽出部１９２が第二の文書画像についても第一特徴量を抽出し、記録部１９３が、得られた第一特徴量を用いて記録文字列を特定するようにしてもよい。 Also, the recording unit 193 detects a character string of the same item as the specific item in the first document image from the second document image. When the format of the second document image is known, the recording unit 193 uses the format information to detect the recorded character string in the second document image. On the other hand, when the recording unit 193 does not know the format of the second document image, the feature amount extraction unit 192 extracts the first feature amount also for the second document image, and the recording unit 193 stores the obtained first feature Quantities may also be used to specify record strings.

また、特徴量抽出部１９２と記録部１９３との組み合わせは、対応関係学習部の例に該当し、第一の文書画像と第二の文書画像との、同じ特定項目の対応関係を機械学習する。
例えば、記録部１９３が第二の文書画像の書式について既知の場合、特徴量抽出部１９２が機械学習にて第一の文書画像（文書帳票の画像）における文字列の特徴量を抽出し、記録部１９３が、得られた文字列の特徴量に基づいて第一の文書画像における記録文字列を特定する。これにより、記録部１９３は、第一の文書画像、第二の文書画像のいずれについても記録文字列について既知であり、これら２つの文書画像間での記録文字列の対応関係を取得している。 Also, the combination of the feature amount extracting unit 192 and the recording unit 193 corresponds to an example of a correspondence learning unit, and machine-learns the correspondence of the same specific item between the first document image and the second document image. .
For example, when the recording unit 193 knows the format of the second document image, the feature amount extraction unit 192 extracts the feature amount of the character string in the first document image (image of the document form) by machine learning, and records it. A unit 193 identifies the recorded character string in the first document image based on the obtained character string feature amount. As a result, the recording unit 193 knows the recorded character strings for both the first document image and the second document image, and acquires the correspondence relationship between the recorded character strings between these two document images. .

あるいは、記録部１９３が第二の文書画像の書式について未知の場合、特徴量抽出部１９２が、第一の文書画像に加えて第二の文書画像についても文字列の特徴量を抽出するようにしてもよい。この場合、記録部１９３は、第一の文書画像、第二の文書画像それぞれについて、得られた文字列の特徴量に基づいて記録文字列を特定する。これにより、記録部１９３は、第一の文書画像、第二の文書画像のいずれについても記録文字列について既知であり、これら２つの文書画像間での記録文字列の対応関係を取得している。 Alternatively, if the recording unit 193 does not know the format of the second document image, the feature amount extraction unit 192 extracts the character string feature amount for the second document image in addition to the first document image. may In this case, the recording unit 193 specifies the recorded character string based on the obtained character string feature amount for each of the first document image and the second document image. As a result, the recording unit 193 knows the recorded character strings for both the first document image and the second document image, and acquires the correspondence relationship between the recorded character strings between these two document images. .

対応関係学習部（特徴量抽出部１９２と記録部１９３との組み合わせ）が、第二の文書画像の文字列が確定されたとき以後の画像を用いて、第一の文書画像と第二の文書画像との、同じ特定項目の対応関係を機械学習するようにしてもよい。第二の文書画像の文字列が確定された場合、確定された文字列は正確であると考えられる。対応関係学習部が、この正確な文字列を利用して機械学習を行うことで、第一の文書画像と第二の文書画像との、同じ特定項目の対応関係を比較的高精度に学習できると期待される。 The correspondence learning unit (combination of the feature amount extraction unit 192 and the recording unit 193) uses the image after the character string of the second document image is determined to determine the first document image and the second document. Machine learning may be performed on the correspondence between the same specific item and the image. If the character string in the second document image is confirmed, the confirmed character string is considered correct. By performing machine learning using this accurate character string, the correspondence learning unit can learn the correspondence of the same specific item between the first document image and the second document image with relatively high accuracy. is expected.

このような処理により画像処理装置１は新たな文書帳票の画像データに含まれる記録するべき文字列情報の記録の労力を軽減する。 Through such processing, the image processing apparatus 1 reduces labor for recording character string information to be recorded contained in image data of a new document form.

図４は文書帳票の一例を示す図である。
この図が示すように文書帳票には、その文書を作成した企業のマーク、作成日、作成担当者、文書内容が、その文書帳票に特有のフォーマットで記述されている。文書内容は、例えば文書帳票が発注票であれば発注した商品名やその発注個数などの情報の組が１つまたは複数示される。作業者はある１つの文書帳票に基づいて、その文書帳票に記述されている文字列のうち記録すべき特定の文字列（記録文字列）を、記録装置３を用いてデータベース４へ記録する。具体的には作業者は文書帳票を見ながら記録装置３がデータベース４に記録すべき記録文字列を入力する。また作業者は文書帳票の画像データを画像読取装置２に読み込ませる。文書帳票は作業者の操作に基づいて画像読取装置２が読み取り画像処理装置１へ出力する。そして記録装置３は作業者の操作と画像処理装置１の制御とに基づいて、１つの文書帳票についての画像データと、その文書帳票に記述されている文字列のうち記録文字列を対応付けてデータベース４に記録する。図４の例においては、日付５１、発注先５２、商品名５３、数量５４、金額５５が記録文字列である。文書帳票５には作業者によって記録されない非記録文字列等のその他の情報も印字されている。当該情報は例えば文書帳票を発行した発注者の名称５０１、発注者のエンブレム画像５０２、文書帳票のタイトル５０３、挨拶文５０４などである。 FIG. 4 is a diagram showing an example of a document form.
As shown in this figure, in the document form, the mark of the company that created the document, the date of creation, the person in charge of creation, and the content of the document are described in a format unique to the document form. For example, if the document form is an order form, one or more sets of information such as the name of the ordered product and the number of ordered products are indicated. Based on one document form, the operator uses the recording device 3 to record a specific character string (recording character string) to be recorded among the character strings described in the document form in the database 4 . Specifically, the operator inputs a recording character string to be recorded in the database 4 by the recording device 3 while looking at the document form. Also, the operator causes the image reading device 2 to read the image data of the document form. The document form is read by the image reading device 2 and output to the image processing device 1 based on the operator's operation. Based on the operation of the operator and the control of the image processing apparatus 1, the recording device 3 associates the image data of one document form with the recorded character string among the character strings described in the document form. Record in database 4. In the example of FIG. 4, the date 51, orderer 52, product name 53, quantity 54, and amount 55 are recorded character strings. Other information such as non-recorded character strings that are not recorded by the operator are also printed on the document form 5 . The information includes, for example, the name 501 of the orderer who issued the document, the emblem image 502 of the orderer, the title 503 of the document, the greeting 504, and the like.

図５はデータベースが記憶する記録テーブルの概要を示す図である。
図５で示すようにデータベース４は文書帳票についての画像データと、その文書帳票に記述されている文字列のうち記録文字列を対応付けて記録テーブルに記憶する。 FIG. 5 is a diagram showing an outline of a record table stored in the database.
As shown in FIG. 5, the database 4 associates the image data of the document form with the recorded character strings among the character strings described in the document form and stores them in a record table.

図６は第一実施形態に係る画像処理装置の処理フローを示す第一の図である。図６は、画像処理装置１が第一特徴量を抽出する処理手順の例を示す。
次に画像処理装置１の処理フローについて順を追って説明する。
まずデータベース４にはある文書帳票についての画像データと、その文書帳票に記述されている記録文字列との組み合わせが、同じ書式（Format）の文書帳票複数枚分記録されている。例えば図４で示す文書帳票５の書式の記録文字列情報（記録文字列を示す情報）が複数枚分記録されているとする。
これら画像データと記録文字列情報との組み合わせとして、例えば過去の業務で扱われた文書帳票の画像データおよび記録文字列情報を用いることができる。過去の業務から画像データおよび記録文字列情報を必要量確保できる場合、画像処理装置に第一特徴量を取得させるために画像データおよび記録文字列情報を別途用意する必要はない。
このような状態で作業者が画像処理装置１を起動し、当該画像処理装置１へ処理開始を指示する。 FIG. 6 is a first diagram showing the processing flow of the image processing apparatus according to the first embodiment. FIG. 6 shows an example of a processing procedure for the image processing device 1 to extract the first feature amount.
Next, the processing flow of the image processing apparatus 1 will be explained step by step.
First, in the database 4, a combination of image data for a certain document form and a record character string described in the document form is recorded for a plurality of document forms of the same format. For example, it is assumed that the recorded character string information (information indicating the recorded character string) in the format of the document form 5 shown in FIG. 4 is recorded for a plurality of sheets.
As a combination of these image data and recorded character string information, for example, image data and recorded character string information of document forms handled in past business can be used. If the necessary amount of image data and recorded character string information can be secured from past work, there is no need to separately prepare the image data and recorded character string information in order to cause the image processing apparatus to acquire the first feature amount.
In such a state, the operator activates the image processing apparatus 1 and instructs the image processing apparatus 1 to start processing.

画像処理装置１の取得部１９１は、通信部１１０を制御してデータベース４から文書帳票の画像データとその画像データに対応する記録文字列の情報とを読み取る（ステップＳ６０１）。取得部１９１は画像データと記録文字列とを特徴量抽出部１９２へ出力する。特徴量抽出部１９２は画像データをＯＣＲ処理して画像データ中の全ての文字列と、当該文字列の範囲を示す画像データ内の座標とを検出する（ステップＳ６０２）。なお文字列は複数の文字によって構成される文字の纏まりである。特徴量抽出部１９２は他の文字との間隔などによってその１つの纏まりの範囲を解析し、その範囲に含まれる１つまたは複数の文字を文字列として抽出すると共に、その画像データ内の文字列の範囲を示す座標を検出する。文字列として含まれる文字は、表意文字、表音文字などの記号、マーク、アイコン画像などを含んでよい。 The acquisition unit 191 of the image processing apparatus 1 controls the communication unit 110 to read the image data of the document form and the information of the recorded character string corresponding to the image data from the database 4 (step S601). Acquisition unit 191 outputs the image data and the recorded character string to feature amount extraction unit 192 . The feature amount extraction unit 192 performs OCR processing on the image data to detect all character strings in the image data and coordinates in the image data indicating the range of the character strings (step S602). A character string is a group of characters composed of a plurality of characters. The feature amount extracting unit 192 analyzes the range of one group based on the interval with other characters, extracts one or more characters included in the range as a character string, and extracts the character string in the image data. Find the coordinates that indicate the extent of . Characters included as character strings may include symbols such as ideograms and phonetic characters, marks, icon images, and the like.

特徴量抽出部１９２はＯＣＲ処理により画像データから抽出した文字列と、画像データと共にデータベース４から読み取った記録文字列とを比較する。特徴量抽出部１９２はＯＣＲ処理により画像データから抽出した文字列のうち、記録文字列の文字情報と一致した画像データ中の文字列と、その文字列に含まれる文字の属性と、その範囲の座標とを特定する（ステップＳ６０３）。 The feature quantity extraction unit 192 compares the character string extracted from the image data by OCR processing with the recorded character string read from the database 4 together with the image data. The feature amount extracting unit 192 extracts character strings in the image data that match character information of the recorded character strings among the character strings extracted from the image data by OCR processing, attributes of the characters included in the character strings, and their ranges. coordinates are specified (step S603).

ここでいう文字の属性（文字属性）は、数字、アルファベット、ひらがな、漢字、文字数、文字高さ、フォントなどにより表される情報である。また文字列の範囲の座標は、文書帳票における文字列の位置を示す座標である。例えば、文字列の範囲の座標は、文字列に含まれる先頭文字の座標、終了文字の座標などを示す情報であってもよい。以下、文字列に含まれる文字の属性と文字列の範囲の座標とを総称して、文字列の属性または文字列属性と表記する。 The attributes of characters (character attributes) here are information represented by numbers, alphabets, hiragana, kanji, number of characters, character height, font, and the like. The coordinates of the character string range are coordinates indicating the position of the character string in the document form. For example, the coordinates of the character string range may be information indicating the coordinates of the leading character and the coordinates of the ending character included in the character string. Hereinafter, attributes of characters included in a character string and coordinates within the range of the character string are collectively referred to as attributes of the character string or character string attributes.

また、ここでの文字情報は、文字列のみであってもよいし、文字列属性を含んでいてもよい。すなわち、特徴量抽出部１９２が、記録文字列と画像データ中の文字列とが文字列として同一か否かを判定するようにしてもよい。あるいは、特徴量抽出部１９２が、文字の同一性に加えて、文字列属性の同一性を判定するようにしてもよい。 Also, the character information here may be only a character string, or may include a character string attribute. That is, the feature amount extraction unit 192 may determine whether or not the recorded character string and the character string in the image data are the same as character strings. Alternatively, the feature quantity extraction unit 192 may determine the identity of character string attributes in addition to the identity of characters.

なお、特徴量抽出部１９２が記録文字列と文字情報が一致する文字列を一意に特定できない場合、画像処理装置１が、その文書画像を処理対象（第一特徴量の抽出対象）から除外するようにしてもよい。あるいは、画像処理装置１が、記録文字列の候補それぞれの範囲を枠で示した画像を表示部１２０に表示させ、作業者によって選択された文字列に記録文字列を特定するようにしてもよい。ここでいう記録文字列の候補は、文字情報が記録文字列の文字情報と一致した文字列のうち、一意に特定されていないと判定された記録文字列に対応付けられている文字列である。また、ここでいう記録文字列の特定は、文書帳票における文字列のうち何れか１つを、１つの記録文字列に決定することである。
特徴量抽出部１９２が、１つの記録文字列の文字情報に対して、文書帳票における複数の文字列それぞれの文字情報が一致すると判定した場合、これら複数の文字列が、その記録情報の候補となる。作業者が、これら複数の文字列のうち何れか１つを選択することで、記録文字列が一意に特定される。 If the feature amount extraction unit 192 cannot uniquely identify a character string whose character information matches the recorded character string, the image processing apparatus 1 excludes the document image from the processing target (first feature amount extraction target). You may do so. Alternatively, the image processing apparatus 1 may cause the display unit 120 to display an image in which the range of each recorded character string candidate is framed, and specify the recorded character string in the character string selected by the operator. . The recorded character string candidate here is a character string that is associated with a recorded character string that is determined not to be uniquely identified among character strings whose character information matches the character information of the recorded character string. . Further, specifying the recorded character string here means determining any one of the character strings in the document form as one recorded character string.
When the feature quantity extraction unit 192 determines that the character information of each of a plurality of character strings in the document form matches the character information of one recorded character string, these character strings are candidates for the recorded information. Become. The recorded character string is uniquely specified by the operator selecting any one of the plurality of character strings.

次に、特徴量抽出部１９２は、文書帳票毎かつ記録文字列毎に抽出した文字列属性を用いて、同じ書式の文書帳票に共通かつ記録文字列毎の特徴量を抽出する（ステップＳ６０４）。
具体的には、特徴量抽出部１９２は、記録文字列毎に、複数の文書帳票におけるその記録文字列の文字列属性を解析して、１つの記録文字列に１つの特徴量を抽出する。
特徴量抽出部１９２が、同じ書式の文書帳票に共通かつ記録文字列毎の特徴量を抽出する方法は、特定の方法に限定されない。例えば、特徴量抽出部１９２が、複数の文書帳票から得られた複数の文字列属性について、先頭の文字の座標、末尾の文字の座標、文字の種類、文字の高さ、フォントの種類などの項目毎に最頻値（Mode）を求めるようにしてもよい。また、特徴量抽出部１９２が、先頭の文字の座標、末尾の文字の座標、文字の高さ、文字間の距離など数値で示される属性について項目毎に平均値（Average）または中央値（Median）を求めるようにしてもよい。また、特徴量抽出部１９２が、数値で表される項目について、その最大値および最小値を特徴量とするなど、範囲を有する特徴量、あるいは、複数の数値で表される特徴量を用いるようにしてもよい。また、特徴量抽出部１９２が、文字の種類、フォントの種類など数値以外の属性を数値化して特徴量を求めるようにしてもよい。また、特徴量抽出部１９２が、公知の機械学習アルゴリズムを用いて特徴量を抽出するようにしてもよい。
特徴量抽出部１９２が、文書帳票の１つの書式かつ１つの記録文字列について複数の数値を取得した場合、これら複数の数値をベクトル化して１つのベクトルの特徴量を抽出するようにしてもよい。 Next, the feature quantity extraction unit 192 extracts a feature quantity common to document forms of the same format and for each recorded character string using the character string attributes extracted for each document form and for each recorded character string (step S604). .
Specifically, for each recorded character string, the feature quantity extraction unit 192 analyzes the character string attributes of the recorded character string in a plurality of document forms, and extracts one characteristic quantity for each recorded character string.
The method by which the feature amount extraction unit 192 extracts the feature amount for each recorded character string that is common to document forms of the same format is not limited to a specific method. For example, for a plurality of character string attributes obtained from a plurality of document forms, the feature amount extracting unit 192 may determine the coordinates of the leading character, the coordinates of the trailing character, the type of character, the height of the character, the type of font, and the like. A mode may be obtained for each item. In addition, the feature quantity extraction unit 192 calculates the average value (Average) or the median value (Median ) may be obtained. In addition, the feature quantity extraction unit 192 may use a feature quantity having a range, such as using the maximum value and the minimum value as a feature quantity for an item represented by a numerical value, or a feature quantity represented by a plurality of numerical values. can be Further, the feature quantity extraction unit 192 may obtain the feature quantity by digitizing non-numerical attributes such as the type of character and the type of font. Also, the feature quantity extraction unit 192 may extract the feature quantity using a known machine learning algorithm.
When the feature amount extraction unit 192 acquires a plurality of numerical values for one format and one recorded character string of a document form, the plurality of numerical values may be vectorized to extract a feature amount of one vector. .

特徴量抽出部１９２が抽出した、同じ書式の文書帳票に共通かつ記録文字列毎の特徴量を第一特徴量と称する。特徴量抽出部１９２は、同じ書式の複数枚の文書帳票を用いて、その書式における記録文字列それぞれの第一特徴量を抽出する。第一特徴量は記録文字列を抽出するための特徴量である。第一特徴量に、文字の属性を示す情報、文字列の範囲を示す座標の何れか、またはこれらの組み合わせが含まれていてもよい。
特徴量抽出部１９２は、記録文字列毎に得られた第一特徴量を、文書帳票の書式の識別子に紐づけてデータベース４に記録する（ステップＳ６０５）。 A feature amount common to document forms of the same format and for each recorded character string extracted by the feature amount extraction unit 192 is referred to as a first feature amount. The feature quantity extraction unit 192 uses a plurality of document forms of the same format and extracts the first feature quantity of each recorded character string in the format. The first feature quantity is a feature quantity for extracting a recorded character string. The first feature amount may include information indicating character attributes, coordinates indicating the range of character strings, or a combination thereof.
The feature quantity extraction unit 192 records the first feature quantity obtained for each recorded character string in the database 4 in association with the format identifier of the document form (step S605).

例えば特徴量抽出部１９２は、図４の文書帳票５の書式に含まれる記録文字列である日付５１、発注先５２、商品名５３、数量５４、金額５５それぞれの、文字属性、文字列の範囲を示す座標などを示す各第一特徴量を、文書帳票５の書式識別子に紐づけてデータベース４に記録する。
ステップＳ６０５の後、画像処理装置１は、図６の処理を終了する。 For example, the feature amount extraction unit 192 extracts the character attributes and character string ranges of the date 51, orderer 52, product name 53, quantity 54, and amount 55, which are recorded character strings included in the format of the document form 5 in FIG. Each first feature quantity indicating coordinates indicating the , etc. is recorded in the database 4 in association with the format identifier of the document form 5 .
After step S605, the image processing apparatus 1 ends the processing of FIG.

以上の処理により画像処理装置１は、作業者の記録文字列を記録する労力を軽減するために利用する情報（第一特徴量）を抽出してデータベース４に蓄積することができる。これにより画像処理装置１は新たな文書帳票の画像データの入力を受けて、その文書帳票に含まれる記録文字列を自動でデータベース４に記録していくことができる。図７を参照して、その処理について説明する。 Through the above processing, the image processing apparatus 1 can extract information (first feature amount) used for reducing the labor of the operator to record the record character string, and store the extracted information in the database 4 . As a result, the image processing apparatus 1 can receive input of image data of a new document form and automatically record the record character string included in the new document form in the database 4 . The processing will be described with reference to FIG.

図７は第一実施形態に係る画像処理装置の処理フローを示す第二の図である。図７は、画像処理装置１が、新たに入力された画像データから記録文字列を抽出する処理手順の例を示す。
作業者は新たな文書帳票を画像読取装置２に読み取らせる操作を行う。これにより画像読取装置２は文書帳票の画像データを生成して画像処理装置１へ出力（送信）する。画像処理装置１の取得部１９１は、通信部１１０の受信データから画像データを取得する（ステップＳ７０１）。取得部１９１は画像データを特徴量抽出部１９２へ出力する。特徴量抽出部１９２は画像データをＯＣＲ処理して、文字列と、その文字列に含まれる文字の特徴（文字属性）と、その文字列の範囲の画像データ中の座標とを文字列毎に検出する（ステップＳ７０２）。特徴量抽出部１９２はそれら検出した情報を特徴量化した第三特徴量を、画像データ中の文字列毎に抽出する（ステップＳ７０３）。つまり第三特徴量は新たに読み込んだ画像データの文書帳票に含まれる文字列の特徴を示す情報である。その後、特徴量抽出部１９２はデータベース４から記録文字列毎の第一特徴量を読み出す（ステップＳ７０４）。特徴量抽出部１９２は記録部１９３へ第三特徴量と第一特徴量とを出力する。 FIG. 7 is a second diagram showing the processing flow of the image processing apparatus according to the first embodiment. FIG. 7 shows an example of a processing procedure in which the image processing apparatus 1 extracts a recorded character string from newly input image data.
The operator performs an operation to cause the image reading device 2 to read a new document form. As a result, the image reading device 2 generates image data of the document form and outputs (transmits) it to the image processing device 1 . The acquisition unit 191 of the image processing apparatus 1 acquires image data from the data received by the communication unit 110 (step S701). Acquisition unit 191 outputs the image data to feature amount extraction unit 192 . The feature amount extraction unit 192 performs OCR processing on the image data, and extracts a character string, characteristics of characters (character attributes) included in the character string, and coordinates in the image data within the range of the character string for each character string. Detect (step S702). The feature amount extraction unit 192 extracts the third feature amount obtained by converting the detected information into the feature amount for each character string in the image data (step S703). That is, the third feature amount is information indicating the feature of the character string included in the document form of the newly read image data. After that, the feature quantity extraction unit 192 reads out the first feature quantity for each recorded character string from the database 4 (step S704). The feature amount extraction unit 192 outputs the third feature amount and the first feature amount to the recording unit 193 .

記録部１９３は画像データ中の文字列毎の第三特徴量と、記録文字列毎の第一特徴量とを取得し、第一特徴量と第三特徴量とを対応付ける（ステップＳ７０５）。具体的には、記録部１９３は、第一特徴量の各々について、その特徴量と一致する第三特徴量、または、最も近い第三特徴量を１つ対応付ける。この対応付けにより、文書帳票の画像データをＯＣＲ処理して得られた文字列の中から記録文字列が選択される。
次に、特徴量抽出部１９２は、表示部１２０を制御して、画像処理装置１による処理結果である記録文字列の表示と、文書帳票の画像とを並べて表示させる（ステップＳ７０６）。 The recording unit 193 acquires the third feature amount for each character string in the image data and the first feature amount for each recorded character string, and associates the first feature amount and the third feature amount (step S705). Specifically, the recording unit 193 associates each of the first feature amounts with a third feature amount that matches the feature amount or one of the closest third feature amounts. With this association, a recorded character string is selected from the character strings obtained by OCR processing the image data of the document form.
Next, the feature amount extraction unit 192 controls the display unit 120 to display the recorded character string, which is the result of processing by the image processing apparatus 1, and the image of the document form side by side (step S706).

図８は、表示部１２０が、記録文字列の表示と、文書帳票の画像とを並べて表示した表示画面の例を示す図である。図８の例で、表示部１２０は、文書帳票の画像Ｗ１と、記録文字列の表示ウィンドウＷ２とを表示している。文書帳票の画像Ｗ１における文字列Ｃ１ａおよびＣ１ｂのＯＣＲ結果の文字列が、それぞれ、記録文字列の表示ウィンドウＷ２に記録文字列である文字列Ｃ２ａおよびＣ２ｂとして表示されている。「ＯＫ」ボタンのアイコンＢ１は、操作者が記録文字列の表示ウィンドウＷ２に示される記録文字列を確認・修正した後、記録文字列を確定させるために押下するボタンアイコンである。ボタンアイコンの押下操作は、例えばボタンアイコンに対するタッチ操作、またはマウスクリック等で行う。 FIG. 8 is a diagram showing an example of a display screen on which the display unit 120 displays a recorded character string and an image of a document form side by side. In the example of FIG. 8, the display unit 120 displays a document form image W1 and a recorded character string display window W2. Character strings resulting from OCR of the character strings C1a and C1b in the document image W1 are displayed as character strings C2a and C2b, which are recorded character strings, in the recorded character string display window W2. The "OK" button icon B1 is a button icon that the operator presses to confirm and correct the recorded character string displayed in the recorded character string display window W2. The pressing operation of the button icon is performed, for example, by a touch operation on the button icon, a mouse click, or the like.

次に、記録部１９３は、記録文字列の表示と文書帳票の画像との対応関係を取得済か判定する（ステップＳ７０７）。例えば、記録部１９３は、記録文字列の表示と文書帳票の画像との対応関係の把握のための学習の回数が所定回数以上である場合に、対応関係を取得済と判定する。上記のように、記録文字列の表示と文書帳票の画像との対応関係の把握のための学習は、記録文字列の表示における記録文字列の特定のための学習であってもよいし、文書帳票の画像における記録文字列の特定のための学習であってもよい。あるいは、これらの学習の組み合わせであってもよい。 Next, the recording unit 193 determines whether the correspondence relationship between the display of the recorded character string and the image of the document form has been acquired (step S707). For example, the recording unit 193 determines that the correspondence has been acquired when the number of times of learning for grasping the correspondence between the display of the recorded character string and the image of the document form is equal to or greater than a predetermined number. As described above, the learning for grasping the correspondence relationship between the display of the recorded character string and the image of the document form may be the learning for specifying the recorded character string in the display of the recorded character string, or the learning for identifying the recorded character string in the display of the recorded character string. It may be learning for identifying the recorded character string in the image of the form. Alternatively, it may be a combination of these learnings.

記録文字列の表示と文書帳票の画像との対応関係を取得済と判定した場合（ステップＳ７０７：ＹＥＳ）、記録部１９３は、表示部１２０を制御して、記録文字列の表示と文書帳票の画像との対応関係を表示させる（ステップＳ７０８）。
図９は、表示部１２０が、記録文字列の表示と文書帳票の画像との対応関係を表示した表示画面の例を示す図である。
図９では、図８の表示画面にて、記録文字列の表示と文書帳票の画像との対応関係を表示した例を示している。文字列Ｃ１ａ、Ｃ１ｂ、Ｃ２ａ、Ｃ２ｂがそれぞれ矩形Ｆ１ａ、Ｆ１ｂ、Ｆ２ａ、Ｆ２ｂで囲って示されている。そして、文字列Ｃ１ａとＣ２ａとが対応することが、線Ｌａにて示されている。文字列Ｃ１ｂとＣ２ｂとが対応することが、線Ｌｂにて示されている。
このように、表示部１２０が、記録文字列の表示と文書帳票の画像との対応関係を表示することで、作業者が、文書帳票の書式または記録文字列の表示ウィンドウの書式のいずれかまたは両方に不慣れな場合でも、文字列の対応関係を容易かつ確実に把握できる。 If it is determined that the correspondence relationship between the display of the recorded character string and the image of the document form has been acquired (step S707: YES), the recording unit 193 controls the display unit 120 to display the recorded character string and the image of the document form. The correspondence with the image is displayed (step S708).
FIG. 9 is a diagram showing an example of a display screen on which the display unit 120 displays the correspondence relationship between the display of the recorded character string and the image of the document form.
FIG. 9 shows an example of displaying the correspondence relationship between the display of the recorded character string and the image of the document form on the display screen of FIG. Character strings C1a, C1b, C2a, and C2b are shown surrounded by rectangles F1a, F1b, F2a, and F2b, respectively. A line La indicates that the character strings C1a and C2a correspond to each other. A line Lb indicates that the character strings C1b and C2b correspond.
In this manner, the display unit 120 displays the correspondence relationship between the display of the recorded character string and the image of the document form, so that the operator can select either the format of the document form or the format of the display window of the recorded character string. Even if you are unfamiliar with both, you can easily and reliably grasp the correspondence between character strings.

作業者が記録文字列を確認し修正した後、確定操作を行うと、画像処理装置１は、修正後の記録文字列を取得する（ステップＳ７０９）。ステップＳ７０７で、記録部１９３が、記録文字列の表示と文書帳票の画像との対応関係を取得済でないと判定した場合（ステップＳ７０７：ＮＯ）も、処理がステップＳ７０９へ進む。
そして、対応関係学習部としての特徴量抽出部１９２および記録部１９３は、処理結果の表示（図８、９の例では、記録文字列の表示ウィンドウＷ２）における記録文字列の座標を学習する（ステップＳ７１０）。例えば、特徴量抽出部１９２が、文書帳票の場合と同様に、処理結果の表示についても第一特徴量を抽出し、記録部１９３が、第一特徴量を用いて、記録文字列の位置を特定し、記憶する。
記録部１９３が、処理結果の表示の書式を既知の場合は、ステップＳ７１０の処理は不要である。 When the operator confirms and corrects the recorded character string and then performs a confirmation operation, the image processing apparatus 1 acquires the corrected recorded character string (step S709). If the recording unit 193 determines in step S707 that the correspondence relationship between the display of the recorded character string and the image of the document form has not been acquired (step S707: NO), the process also proceeds to step S709.
Then, the feature quantity extraction unit 192 and the recording unit 193 as the correspondence relationship learning unit learn the coordinates of the recorded character string in the display of the processing result (in the example of FIGS. 8 and 9, the recorded character string display window W2). step S710). For example, the feature amount extraction unit 192 extracts the first feature amount for display of the processing result in the same manner as in the document form, and the recording unit 193 uses the first feature amount to determine the position of the recorded character string. Identify and remember.
If the recording unit 193 already knows the display format of the processing result, the processing of step S710 is unnecessary.

記録部１９３は、記録文字列を文書帳票の識別情報に対応付けて記録テーブルに記録する（ステップＳ７１１）。
例えば、文書帳票の画像データ中から第三特徴量ａ３、第三特徴量ｂ３、第三特徴量ｃ３、第三特徴量ｄ３が取得できたとする。そして第三特徴量ａ３が予めデータベースに記録されている第一特徴量ａ１と、第三特徴量ｂ３が第一特徴量ｂ１と、第三特徴量ｃ３が第一特徴量ｃ１と、第三特徴量ｄ３が第一特徴量ｄ１とそれぞれ特徴量が一致したとする。この場合、記録部１９３は、第三特徴量ａ３、第三特徴量ｂ３、第三特徴量ｃ３、第三特徴量ｄ３それぞれに対応する文字列を、記録文字列として文書帳票の記録テーブルに記録する。ここでいう第三特徴量に対応する文字列は、その第三特徴量の抽出元の文字列である。作業者による記録文字列の修正があった場合、記録部１９３は、修正後の記録文字列を文書帳票の記録テーブルに記録する。
ステップＳ７１１の後、画像処理装置１は、図７の処理を終了する。 The recording unit 193 records the recording character string in the recording table in association with the identification information of the document form (step S711).
For example, assume that a third feature amount a3, a third feature amount b3, a third feature amount c3, and a third feature amount d3 have been acquired from the image data of the document form. Then, the third feature amount a3 is the first feature amount a1 recorded in advance in the database, the third feature amount b3 is the first feature amount b1, the third feature amount c3 is the first feature amount c1, and the third feature amount It is assumed that the feature amount of the quantity d3 matches the first feature amount d1. In this case, the recording unit 193 records the character strings corresponding to the third feature amount a3, the third feature amount b3, the third feature amount c3, and the third feature amount d3 in the record table of the document form as recorded character strings. do. The character string corresponding to the third feature amount referred to here is the character string from which the third feature amount is extracted. When the operator corrects the recorded character string, the recording unit 193 records the corrected recorded character string in the recording table of the document form.
After step S711, the image processing apparatus 1 ends the processing of FIG.

画像処理装置１が、図７の処理で文書帳票における第一特徴量を更新するようにしてもよい。例えば画像処理装置１が、ステップＳ７１０で、処理結果の表示における第一特徴量の機械学習に加えて、あるいは代えて、文書帳票における第一特徴量を学習するようにしてもよい。その場合、画像処理装置１が、図６の処理フローを再度実施するようにしてもよい。あるいは画像処理装置１が、ステップＳ７１０で追加学習を行うことで、図６で処理済みのデータについては再度処理を行うことなく第一特徴量を更新するようにしてもよい。
画像処理装置１が、図７の処理で第一特徴量を更新することで、サンプルデータ数が増加して第一特徴量の精度が向上し、画像処理装置１が記録文字列を抽出する精度が向上することが期待される。また、図７の処理で記録文字列が追加された場合、画像処理装置１が新たに追加された記録文字列についても画像データから抽出できるようになり、作業者が文字列を入力する手間を省けることが期待される。 The image processing apparatus 1 may update the first feature amount in the document form in the process of FIG. For example, in step S710, the image processing apparatus 1 may learn the first feature amount in the document form in addition to or instead of the machine learning of the first feature amount in the processing result display. In that case, the image processing apparatus 1 may perform the processing flow of FIG. 6 again. Alternatively, the image processing apparatus 1 may perform additional learning in step S710 to update the first feature amount without reprocessing the data that has already been processed in FIG.
The image processing device 1 updates the first feature quantity by the processing of FIG. 7, thereby increasing the number of sample data and improving the accuracy of the first feature quantity, thereby increasing the accuracy with which the image processing device 1 extracts the recorded character string. is expected to improve. In addition, when a recorded character string is added in the process of FIG. 7, the image processing apparatus 1 can extract the newly added recorded character string from the image data, thus saving the operator the trouble of inputting the character string. It is expected that it will be saved.

図７に示す処理によれば、画像処理装置１は予め作業者が記録しておいた文書帳票の画像データと記録文字列とによって、新たに入力させた文書帳票の画像データにおける記録文字列を自動的に記録することができる。したがって画像処理装置１は文書帳票における記録文字列の記録の作業者の労力を軽減することができる。また、作業者は表示部１２０の表示を参照して、画像処理装置１の処理結果である記録文字列を確認および修正できる。その際、表示部１２０が、処理結果の表示における文字列と文書帳票の画像における文字列との対応関係を示すことで、作業者は、対応関係を容易に把握することができる。 According to the processing shown in FIG. 7, the image processing apparatus 1 converts the recorded character string in the image data of the newly input document form based on the image data of the document form recorded in advance by the operator and the recorded character string. Can be automatically recorded. Therefore, the image processing apparatus 1 can reduce the labor of the operator for recording the record character string in the document form. Also, the operator can refer to the display on the display unit 120 to check and correct the recorded character string that is the processing result of the image processing apparatus 1 . At this time, the display unit 120 indicates the correspondence between the character strings in the display of the processing result and the character strings in the image of the document form, so that the operator can easily understand the correspondence.

＜第二実施形態＞
第二実施形態では、画像処理装置１が、文書帳票の複数の書式に対応する場合について説明する。
図１０は第二実施形態に係る画像処理装置の機能構成を示す概略ブロック図である。
図１０に示すように第二実施形態に係る画像処理装置１は、図３で示した各機能部に加え、さらにグループ分類部１９４、グループ特定部１９５の機能を有する。 <Second embodiment>
In the second embodiment, a case where the image processing apparatus 1 supports a plurality of formats of document forms will be described.
FIG. 10 is a schematic block diagram showing the functional configuration of an image processing apparatus according to the second embodiment.
As shown in FIG. 10, the image processing apparatus 1 according to the second embodiment has functions of a group classifying section 194 and a group identifying section 195 in addition to the functional sections shown in FIG.

図１１は第二実施形態に係る画像処理装置の処理フローを示す第一の図である。
次に第二実施形態に係る画像処理装置１の処理フローについて順を追って説明する。
データベース４には書式が異なる複数の文書帳票についての画像データと、各文書帳票に記述されている記録文字列の組み合わせが、その文書帳票毎に多数記録されている。このような状態で作業者が画像処理装置１を起動し、当該画像処理装置１へ処理開始を指示する。 FIG. 11 is a first diagram showing the processing flow of the image processing apparatus according to the second embodiment.
Next, the processing flow of the image processing apparatus 1 according to the second embodiment will be explained step by step.
In the database 4, a large number of combinations of image data for a plurality of document forms with different formats and recorded character strings described in each document form are recorded for each document form. In such a state, the operator activates the image processing apparatus 1 and instructs the image processing apparatus 1 to start processing.

画像処理装置１の取得部１９１はデータベース４から文書帳票の画像データとその画像データに対応する記録文字列の情報とを全て読み込んだかを判定する（ステップＳ９０１）。ＮＯの場合、取得部１９１はデータベース４から文書帳票の画像データとその画像データに対応する記録文字列の情報とを読み取る（ステップＳ９０２）。取得部１９１は画像データと記録文字列とを特徴量抽出部１９２へ出力する。特徴量抽出部１９２は画像データをＯＣＲ処理して画像データ中の全ての文字列とその画像データ内の座標とを検出する（ステップＳ９０３）。なお文字列は複数の文字によって構成される文字の纏まりである。特徴量抽出部１９２は他の文字との間隔などによってその１つの纏まりの範囲を解析し、その範囲に含まれる１つまたは複数の文字を文字列として抽出すると共に、その画像データ内の文字列の範囲を示す座標を検出する。文字列として含まれる文字は、表意文字、表音文字などの記号、マーク、アイコン画像などを含んでよい。 The acquisition unit 191 of the image processing apparatus 1 determines whether all the image data of the document form and the information of the recorded character string corresponding to the image data have been read from the database 4 (step S901). In the case of NO, the acquisition unit 191 reads the image data of the document form and the information of the recorded character string corresponding to the image data from the database 4 (step S902). Acquisition unit 191 outputs the image data and the recorded character string to feature amount extraction unit 192 . The feature amount extraction unit 192 performs OCR processing on the image data to detect all character strings in the image data and coordinates in the image data (step S903). A character string is a group of characters composed of a plurality of characters. The feature amount extracting unit 192 analyzes the range of one group based on the interval with other characters, extracts one or more characters included in the range as a character string, and extracts the character string in the image data. Find the coordinates that indicate the extent of . Characters included as character strings may include symbols such as ideograms and phonetic characters, marks, icon images, and the like.

特徴量抽出部１９２はＯＣＲ処理により画像データから抽出した文字列と、画像データと共にデータベース４から読み取った記録文字列とを比較する。特徴量抽出部１９２はＯＣＲ処理により画像データから抽出した文字列のうち、記録文字列の文字情報と一致した画像データ中の文字列と、その文字列に含まれる文字の属性と、その範囲の座標とを特定する（ステップＳ９０４）。 The feature quantity extraction unit 192 compares the character string extracted from the image data by OCR processing with the recorded character string read from the database 4 together with the image data. The feature quantity extracting unit 192 extracts character strings in the image data that match character information of the recorded character strings among the character strings extracted from the image data by OCR processing, attributes of the characters included in the character strings, and their ranges. coordinates are identified (step S904).

第一実施形態で説明したように、文字の属性は、数字、アルファベット、ひらがな、漢字、文字数、文字高さ、フォントなどにより表される情報である。また文字列の範囲の座標は、文書帳票における文字列の位置を示す座標である。例えば、文字列の範囲の座標は、文字列に含まれる先頭文字の座標、終了文字の座標などを示す情報であってもよい。文字列に含まれる文字の属性と文字列の範囲の座標とを総称して、文字列の属性または文字列属性と表記する。 As described in the first embodiment, character attributes are information represented by numbers, alphabets, hiragana, kanji, number of characters, character height, font, and the like. The coordinates of the character string range are coordinates indicating the position of the character string in the document form. For example, the coordinates of the character string range may be information indicating the coordinates of the leading character and the coordinates of the ending character included in the character string. Attributes of characters included in a character string and coordinates within the range of the character string are collectively referred to as character string attributes or character string attributes.

第一実施形態の場合と同様、ここでの文字情報は、文字列のみであってもよいし、文字列属性を含んでいてもよい。すなわち、特徴量抽出部１９２が、記録文字列と画像データ中の文字列とが文字列として同一か否かを判定するようにしてもよい。あるいは、特徴量抽出部１９２が、文字の同一性に加えて、文字列属性の同一性を判定するようにしてもよい。 As in the case of the first embodiment, the character information here may be only character strings, or may include character string attributes. That is, the feature amount extraction unit 192 may determine whether or not the recorded character string and the character string in the image data are the same as character strings. Alternatively, the feature quantity extraction unit 192 may determine the identity of character string attributes in addition to the identity of characters.

なお、特徴量抽出部１９２が記録文字列と文字情報が一致する文字列を一意に特定できない場合、画像処理装置１が、その文書画像を処理対象（第一特徴量の抽出対象）から除外するようにしてもよい。あるいは、画像処理装置１が、記録文字列の候補それぞれの範囲を枠で示した画像を表示部１２０に表示させ、作業者によって選択された文字列に記録文字列を特定するようにしてもよい。第一実施形態で説明したように、ここでいう記録文字列の候補は、文字情報が記録文字列の文字情報と一致した文字列のうち、一意に特定されていないと判定された記録文字列に対応付けられている文字列である。また、ここでいう記録文字列の特定は、文書帳票における文字列のうち何れか１つを、１つの記録文字列に決定することである。
特徴量抽出部１９２が、１つの記録文字列の文字情報に対して、文書帳票における複数の文字列それぞれの文字情報が一致すると判定した場合、これら複数の文字列が、その記録情報の候補となる。作業者が、これら複数の文字列のうち何れか１つを選択することで、記録文字列が一意に特定される。 If the feature amount extraction unit 192 cannot uniquely identify a character string whose character information matches the recorded character string, the image processing apparatus 1 excludes the document image from the processing target (first feature amount extraction target). You may do so. Alternatively, the image processing apparatus 1 may cause the display unit 120 to display an image in which the range of each recorded character string candidate is framed, and specify the recorded character string in the character string selected by the operator. . As described in the first embodiment, the recorded character string candidate here is a recorded character string that is determined not to be uniquely identified among character strings whose character information matches the character information of the recorded character string. is a string associated with Further, specifying the recorded character string here means determining any one of the character strings in the document form as one recorded character string.
When the feature quantity extraction unit 192 determines that the character information of each of a plurality of character strings in the document form matches the character information of one recorded character string, these character strings are candidates for the recorded information. Become. The recorded character string is uniquely specified by the operator selecting any one of the plurality of character strings.

次に、特徴量抽出部１９２は、文書帳票毎かつ記録文字列毎に抽出した文字列属性を用いて、文書帳票毎かつ記録文字列毎の特徴量を抽出する（ステップＳ９０５）。具体的には、特徴量抽出部１９２は、ステップＳ９０４で記録文字列に対応付けた文字列の文字列属性を特徴量化する。第二実施形態では複数種類の書式を対象とするため、文書帳票を書式毎にグループ分けしていないステップＳ９０５の時点では、図７のステップＳ６０４の場合と異なり直接第一特徴量を抽出することはできない。そこで、特徴量抽出部１９２は、グループ毎の第一特徴量を抽出する準備として、文書帳票毎かつ記録文字列毎の特徴量を抽出しておく。この文書帳票毎かつ記録文字列毎の特徴量を個別第一特徴量と称する。
特徴量抽出部１９２は、得られた個別第一特徴量を、文書帳票の識別子および記録文字列の識別子に紐づけてデータベース４に記録する（ステップＳ９０６）。記録文字列の識別子として、例えばその記録文字列の位置を示す座標値を用いることができる。 Next, the feature quantity extraction unit 192 extracts a feature quantity for each document form and each recorded character string using the character string attributes extracted for each document form and each recorded character string (step S905). Specifically, the feature quantity extraction unit 192 converts the character string attribute of the character string associated with the recorded character string in step S904 into a feature quantity. Since a plurality of types of formats are targeted in the second embodiment, at the time of step S905 when document forms are not grouped by format, unlike the case of step S604 in FIG. 7, the first feature value is directly extracted. can't. Therefore, the feature quantity extraction unit 192 extracts the feature quantity for each document form and for each recorded character string in preparation for extracting the first feature quantity for each group. The feature amount for each document form and for each recorded character string is called an individual first feature amount.
The feature amount extraction unit 192 records the obtained individual first feature amount in the database 4 in association with the identifier of the document form and the identifier of the recording character string (step S906). As the identifier of the recorded character string, for example, a coordinate value indicating the position of the recorded character string can be used.

例えば特徴量抽出部１９２は、図４の文書帳票５の書式に含まれる記録文字列である日付５１、発注先５２、商品名５３、数量５４、金額５５それぞれの、文字属性、文字列の範囲を示す座標などを示す個別第一特徴量を、文書帳票毎かつ記録文字列毎に、文書帳票５の識別子および記録文字列の識別子に紐づけてデータベース４に記録する。 For example, the feature amount extraction unit 192 extracts the character attributes and character string ranges of the date 51, orderer 52, product name 53, quantity 54, and amount 55, which are recorded character strings included in the format of the document form 5 in FIG. is recorded in the database 4 in association with the identifier of the document form 5 and the identifier of the recorded character string for each document form and each recorded character string.

特徴量抽出部１９２はまた、記録文字列に含まれる文字情報と一致しない画像データ中の非記録文字列と、その非記録文字列の文字列属性とを抽出する（ステップＳ９０７）。上述したように、非記録文字列は、作業者によって記録されない文字列、すなわち、記録文字列以外の文字列である。文字列属性には、その文字列に含まれる文字の属性を示す情報、その文字列の範囲の座標を示す情報の何れか一方あるは両方が含まれていてもよい。 The feature amount extraction unit 192 also extracts non-recorded character strings in the image data that do not match the character information included in the recorded character strings and character string attributes of the non-recorded character strings (step S907). As described above, the non-recorded character string is a character string that is not recorded by the operator, that is, a character string other than the recorded character string. The character string attribute may include either or both of information indicating the attributes of characters included in the character string and information indicating the coordinates of the range of the character string.

特徴量抽出部１９２は、文書帳票毎かつ非記録文字列毎に抽出した文字列属性を用いて、文書帳票毎かつ非記録文字列毎の特徴量を抽出する（ステップＳ９０８）。
具体的には、特徴量抽出部１９２は、ステップＳ９０４で何れの記録文字列にも対応付けられなかった文字列の各々について、その文字列の属性（文字列属性）を特徴量化する。第一特徴量の場合と同様、文書帳票を書式毎にグループ分けしていないステップＳ９０８の時点では、同じ書式の文書帳票に共通の特徴量を生成することはできない。そこで、特徴量抽出部１９２は、グループ毎の第二特徴量を抽出する準備として、文書帳票毎かつ非記録文字列毎の特徴量を抽出しておく。この文書帳票毎かつ非記録文字列毎の特徴量を個別第二特徴量と称する。
特徴量抽出部１９２が、文書帳票毎、かつ、複数の非記録文字列を纏めた個別第二特徴量を生成するようにしてもよい。例えば、特徴量抽出部１９２が、１つの文書帳票につき１つの個別第二特徴量を生成するようにしてもよい。 The feature amount extraction unit 192 extracts the feature amount for each document form and each non-recorded character string using the character string attributes extracted for each document form and each non-recorded character string (step S908).
Specifically, for each character string that was not associated with any recorded character string in step S904, the feature quantity extraction unit 192 converts the attribute of the character string (character string attribute) into a feature quantity. As in the case of the first feature amount, at the time of step S908 when the document forms are not grouped by format, it is not possible to generate a common feature amount for the document forms of the same format. Therefore, the feature amount extraction unit 192 extracts the feature amount for each document form and for each non-recorded character string in preparation for extracting the second feature amount for each group. The feature amount for each document form and for each non-recorded character string is called an individual second feature amount.
The feature amount extraction unit 192 may generate an individual second feature amount for each document form and by summarizing a plurality of non-recorded character strings. For example, the feature quantity extraction unit 192 may generate one individual second feature quantity for one document form.

特徴量抽出部１９２は、得られた個別第二特徴量を、文書帳票の識別子および非記録文字列の識別子に紐づけてデータベース４に記録する（ステップＳ９０９）。非記録文字列の識別子として、例えばその非記録文字列の位置を示す座標値を用いることができる。
例えば特徴量抽出部１９２は、図４の文書帳票５の書式に含まれる非記録文字列である発注者の名称５０１、発注者のエンブレム画像、文書帳票のタイトル５０３、挨拶文５０４などを示す個別第二特徴量を、文書帳票５の識別子および非記録文字列の識別子に紐づけてデータベース４に記録する。 The feature amount extraction unit 192 records the obtained individual second feature amount in the database 4 in association with the identifier of the document form and the identifier of the unrecorded character string (step S909). As the identifier of the non-recorded character string, for example, a coordinate value indicating the position of the non-recorded character string can be used.
For example, the feature quantity extraction unit 192 extracts individual characters indicating the name 501 of the orderer, the emblem image of the orderer, the title 503 of the document form, the greeting 504, etc., which are non-recorded character strings included in the format of the document form 5 in FIG. The second feature amount is recorded in the database 4 in association with the identifier of the document form 5 and the identifier of the unrecorded character string.

データベース４には、文書帳票の複数の書式それぞれについて、その書式の文書帳票の画像データとその画像データに対応する記録文字列の情報が記録されている。画像処理装置１の取得部１９１は全ての文書帳票についての画像データと記録文字列の情報を読み込むまでステップＳ９０１～ステップＳ９０９の処理を繰り返す。 In the database 4, for each of a plurality of formats of the document form, image data of the document form of the format and information of recorded character strings corresponding to the image data are recorded. The acquisition unit 191 of the image processing apparatus 1 repeats the processing of steps S901 to S909 until the image data and recorded character string information for all document forms are read.

ステップＳ９０１で、取得部１９１がデータベース４から文書帳票の画像データとその画像データに対応する記録文字列の情報とを全て読み込んだと判定した場合（ステップＳ９０１：ＹＥＳ）、グループ分類部１９４が文書帳票をグループ分けする（ステップＳ９２１）。グループ分類部１９４は、文書帳票の画像データに含まれる個別第二特徴量に基づいて、文書帳票をグループ分けする。例えばグループ分類部１９４は、各文書帳票を、個別第二特徴量が示す非記録文字列の一致度や、エンブレム画像の一致度、非記録文字列の座標範囲の一致度などに基づいてグループ分けする。グループ分類部１９４はこのグループ分けの処理において文書帳票のグループ識別子を決定する。グループ分類部１９４は全ての文書帳票についてグループ分けが終了したかを判定する（ステップＳ９２２）。 If it is determined in step S901 that the acquisition unit 191 has read all the image data of the document form and the information of the recorded character string corresponding to the image data from the database 4 (step S901: YES), the group classification unit 194 The forms are grouped (step S921). The grouping unit 194 groups the document forms based on the individual second feature amount included in the image data of the document forms. For example, the grouping unit 194 groups each document form based on the degree of matching of non-recorded character strings indicated by the individual second feature amount, the degree of matching of emblem images, the degree of matching of coordinate ranges of non-recorded character strings, and the like. do. The grouping unit 194 determines group identifiers of document forms in this grouping process. The grouping unit 194 determines whether all document forms have been grouped (step S922).

グループ分類部１９４は全ての文書帳票のグループ分けが完了していない場合にはステップＳ９２１の処理を繰り返す。具体的には、ステップＳ９２２で、グループ分けを終了していない文書帳票があるとグループ分類部１９４が判定した場合（ステップＳ９２２：ＮＯ）、処理がステップＳ９２１へ戻る。
グループ分類部１９４は、全ての文書帳票のグループ分けが完了した場合（ステップＳ９２２：ＹＥＳ）、文書帳票の識別子とその文書帳票に付与されたグループ識別子とを対応付けてデータベース４のグループテーブル（記録テーブル）に記録する（ステップＳ９２３）。 The grouping unit 194 repeats the process of step S921 if grouping of all the document forms has not been completed. Specifically, when the grouping unit 194 determines in step S922 that there is a document form that has not been grouped (step S922: NO), the process returns to step S921.
When grouping of all the document forms is completed (step S922: YES), the grouping unit 194 associates the identifier of the document form with the group identifier assigned to the document form, table) (step S923).

そして特徴量抽出部１９２は、あるグループに属する１つまたは複数の文書帳票の各個別第一特徴量および各個別第二特徴量をデータベース４から読み取り、グループに属する文書帳票の各個別第一特徴量および各個別第二特徴量に対応する各グループ第一特徴量、各グループ第二特徴量を抽出する（ステップＳ９２４）。各グループ第一特徴量はグループに属する文書帳票の各個別第一特徴量の平均等の値であってもよい。同様に各グループ第二特徴量はグループに属する文書帳票の各個別第二特徴量の平均等の値であってもよい。各グループ第一特徴量、各グループ第二特徴量は、各個別第一特徴量の平均、各個別第二特徴量の平均でなくとも、所定の統計処理や機械学習等の手法を用いて、グループに属する１つ又は複数の文書帳票の記録文字列や非記録文字列を特定できるよう算出された特徴量であれば、どのような手法を用いて、各グループ第一特徴量、各グループ第二特徴量を抽出してもよい。 Then, the feature amount extraction unit 192 reads each individual first feature amount and each individual second feature amount of one or a plurality of document forms belonging to a certain group from the database 4, and extracts each individual first feature amount of the document forms belonging to the group. Each group first feature amount and each group second feature amount corresponding to the amount and each individual second feature amount are extracted (step S924). Each group first feature amount may be a value such as an average of individual first feature amounts of documents belonging to the group. Similarly, each group second feature amount may be a value such as the average of each individual second feature amount of the document form belonging to the group. Each group's first feature amount and each group's second feature amount are not the average of each individual first feature amount and the average of each individual second feature amount, but by using a method such as predetermined statistical processing or machine learning If it is a feature amount calculated so as to identify recorded character strings and non-recorded character strings of one or more document forms belonging to a group, each group's first feature amount, each group's first feature amount, each group You may extract two feature-values.

例えば、特徴量抽出部１９２がステップＳ９２４で、同一グループに属する複数の文書帳票から記録文字列毎に文字列属性を抽出して（直接的に）グループ第一特徴量を生成するようにしてもよい。この場合、特徴量抽出部１９２が、ステップＳ９０５およびステップＳ９０６での個別第一特徴量の抽出および記録をパスする（特に何も処理を行わない）。
一方、個別第二特徴量については、ステップＳ９２１でのグループ分けで使用できるように、特徴量抽出部１９２がステップＳ９０８で抽出しておく。但し、ステップ９２１で、グループ分類部１９４が、個別第二特徴量を用いず非記録文字列を用いて文書帳票のグループ分けを行うようにしてもよい。この場合、特徴量抽出部１９２がステップＳ９２４で、同一グループに属する複数の文書帳票の非記録文字列の文字列属性から（直接的に）グループ第二特徴量を抽出するようにしてもよい。この場合、特徴量抽出部１９２は、ステップＳ９０７～ステップＳ９０９では特に何も処理を行わない。
特徴量抽出部１９２は、グループそれぞれについて各グループ第一特徴量、各グループ第二特徴量を算出し、グループの識別子に対応付けてデータベース４に記録する（ステップＳ９２５）。
ステップＳ９２５の後、画像処理装置１は、図１１の処理を終了する。 For example, in step S924, the feature amount extraction unit 192 may extract the character string attribute for each recorded character string from a plurality of document forms belonging to the same group and (directly) generate the first group feature amount. good. In this case, the feature quantity extraction unit 192 passes the extraction and recording of the individual first feature quantity in steps S905 and S906 (no particular processing is performed).
On the other hand, the second individual feature amount is extracted by the feature amount extraction unit 192 in step S908 so that it can be used for grouping in step S921. However, in step 921, the grouping unit 194 may group the document forms using non-recorded character strings instead of using the second individual feature amount. In this case, in step S924, the feature amount extraction unit 192 may (directly) extract the group second feature amount from the character string attributes of the non-recorded character strings of the plurality of document forms belonging to the same group. In this case, the feature quantity extraction unit 192 does not perform any particular processing in steps S907 to S909.
The feature amount extraction unit 192 calculates the first group feature amount and the second group feature amount for each group, and records them in the database 4 in association with the group identifier (step S925).
After step S925, the image processing apparatus 1 terminates the processing of FIG.

以上の処理により画像処理装置１は、作業者の記録文字列を記録する労力を軽減するために必要な情報（グループ第一特徴量およびグループ第二特徴量）を文書帳票のグループ毎に抽出してデータベース４に蓄積することができる。これにより画像処理装置１は新たな文書帳票の画像データの入力を受けて、その文書帳票に含まれる記録文字列を自動でデータベース４に記録していくことができる。図１２を参照して、その処理について説明する。 Through the above processing, the image processing apparatus 1 extracts the information (group first feature amount and group second feature amount) necessary for reducing the labor of the operator to record the record character string for each group of document forms. can be stored in the database 4. As a result, the image processing apparatus 1 can receive input of image data of a new document form and automatically record the record character string included in the new document form in the database 4 . The processing will be described with reference to FIG.

図１２は第二実施形態に係る画像処理装置の処理フローを示す第二の図である。図１２は、画像処理装置１が、新たに入力された画像データから記録文字列を抽出する処理手順の例を示す。
作業者は新たな文書帳票を画像読取装置２に読み取らせる操作を行う。これにより画像読取装置２は文書帳票の画像データを生成して画像処理装置１へ出力（送信）する。画像処理装置１の取得部１９１は、通信部１１０の受信データから画像データを取得する（ステップＳ１００１）。取得部１９１は画像データを特徴量抽出部１９２へ出力する。特徴量抽出部１９２は画像データをＯＣＲ処理して、文字列と、その文字列に含まれる文字の特徴（文字属性）と、その文字列の範囲の画像データ中の座標とを文字列毎に検出する（ステップＳ１００２）。特徴量抽出部１９２はそれら検出した情報を特徴量化した第三特徴量を、画像データ中の文字列毎に抽出する（ステップＳ１００３）。第三特徴量は新たに読み込んだ画像データの文書帳票に含まれる文字列の特徴を示す情報である。 FIG. 12 is a second diagram showing the processing flow of the image processing apparatus according to the second embodiment. FIG. 12 shows an example of a processing procedure in which the image processing apparatus 1 extracts a recorded character string from newly input image data.
The operator performs an operation to cause the image reading device 2 to read a new document form. As a result, the image reading device 2 generates image data of the document form and outputs (transmits) it to the image processing device 1 . The acquisition unit 191 of the image processing apparatus 1 acquires image data from the data received by the communication unit 110 (step S1001). Acquisition unit 191 outputs the image data to feature amount extraction unit 192 . The feature amount extraction unit 192 performs OCR processing on the image data, and extracts a character string, characteristics of characters (character attributes) included in the character string, and coordinates in the image data within the range of the character string for each character string. Detect (step S1002). The feature amount extraction unit 192 extracts the third feature amount obtained by converting the detected information into the feature amount for each character string in the image data (step S1003). The third feature amount is information indicating the feature of the character string included in the newly read document form of the image data.

次にグループ特定部１９５が、データベース４が記憶しているグループ第二特徴量のうち新たな文書帳票のグループ特定に利用するグループ第二特徴量を読み取る。当該グループ第二特徴量は例えば文書帳票の画像データに表示される発注者のエンブレム画像５０２に対応する特徴量であってよい。グループ特定部１９５はあるグループ第二特徴量に示す情報が、ステップＳ１００１で取得した文書帳票の画像データから特定できるかどうかを判定する。グループ特定部１９５は全てのグループについてのグループ第二特徴量を用いて同様の処理を行う。グループ特定部１９５はデータベース４から読み取ったグループ第二特徴量に一致する情報が新たに読み込んだ文書帳票の画像データから特定できた場合、そのグループ第二特徴量を有するグループを、新たに読み込んだ文書帳票の画像データのグループと特定する（ステップＳ１００４）。その後、グループ特定部１９５はデータベース４からそのグループについての１つまたは複数のグループ第一特徴量を読み出す（ステップＳ１００５）。グループ特定部１９５は記録部１９３へ第三特徴量と１つまたは複数のグループ第一特徴量とを出力する。グループ第一特徴量はそのグループに属する文書帳票内の１つまたは複数の記録文字列を特定するための特徴量である。 Next, the group specifying unit 195 reads out the group second feature amount to be used for group specification of the new document form among the group second feature amounts stored in the database 4 . The group second feature amount may be, for example, a feature amount corresponding to the emblem image 502 of the orderer displayed in the image data of the document form. The group identification unit 195 determines whether or not information indicated by a certain group second feature amount can be identified from the image data of the document form acquired in step S1001. The group identification unit 195 performs similar processing using the group second feature amount for all groups. If information matching the group second feature value read from the database 4 can be specified from the image data of the newly read document form, the group specifying unit 195 newly reads the group having the group second feature value. A group of image data of a document form is specified (step S1004). After that, the group identification unit 195 reads out one or more group first feature amounts for the group from the database 4 (step S1005). The group identification unit 195 outputs the third feature amount and one or more group first feature amounts to the recording unit 193 . The group first feature amount is a feature amount for specifying one or more recorded character strings in the document form belonging to the group.

記録部１９３は画像データ中の文字列毎の第三特徴量と、記録文字列毎のグループ第一特徴量とを取得し、グループ第一特徴量と第三特徴量とを対応付ける（ステップＳ１００６）。具体的には、記録部１９３は、グループ第一特徴量の各々について、その特徴量と一致する第三特徴量、または、最も近い第三特徴量を１つ対応付ける。この対応付けにより、文書帳票の画像データをＯＣＲ処理して得られた文字列の中から記録文字列が選択される。
次に、特徴量抽出部１９２は、表示部１２０を制御して、画像処理装置１による処理結果である記録文字列の表示と、文書帳票の画像とを並べて表示させる（ステップＳ１００７）。ステップＳ１００７は、図７のステップＳ７０６と同様である。 The recording unit 193 acquires the third feature amount for each character string in the image data and the group first feature amount for each recorded character string, and associates the group first feature amount and the third feature amount (step S1006). . Specifically, the recording unit 193 associates each of the group first feature amounts with a third feature amount that matches the feature amount or one of the closest third feature amounts. With this association, a recorded character string is selected from the character strings obtained by OCR processing the image data of the document form.
Next, the feature quantity extraction unit 192 controls the display unit 120 to display the recorded character string, which is the result of processing by the image processing apparatus 1, and the image of the document form side by side (step S1007). Step S1007 is the same as step S706 in FIG.

次に、記録部１９３は、記録文字列の表示と文書帳票の画像との対応関係を取得済か判定する（ステップＳ１００８）。例えば、記録部１９３は、記録文字列の表示と文書帳票の画像との対応関係の把握のための学習の回数が所定回数以上である場合に、対応関係を取得済と判定する。上記のように、記録文字列の表示と文書帳票の画像との対応関係の把握のための学習は、記録文字列の表示における記録文字列の特定のための学習であってもよいし、文書帳票の画像における記録文字列の特定のための学習であってもよい。あるいは、これらの学習の組み合わせであってもよい。 Next, the recording unit 193 determines whether the correspondence relationship between the display of the recorded character string and the image of the document form has been obtained (step S1008). For example, the recording unit 193 determines that the correspondence has been acquired when the number of times of learning for grasping the correspondence between the display of the recorded character string and the image of the document form is equal to or greater than a predetermined number. As described above, the learning for grasping the correspondence relationship between the display of the recorded character string and the image of the document form may be the learning for specifying the recorded character string in the display of the recorded character string, or the learning for identifying the recorded character string in the display of the recorded character string. It may be learning for identifying the recorded character string in the image of the form. Alternatively, it may be a combination of these learnings.

記録文字列の表示と文書帳票の画像との対応関係を取得済と判定した場合（ステップＳ１００８：ＹＥＳ）、記録部１９３は、表示部１２０を制御して、記録文字列の表示と文書帳票の画像との対応関係を表示させる（ステップＳ１００９）。ステップＳ１００９は、図７のステップＳ７０８と同様である。
作業者が記録文字列を確認し修正した後、確定操作を行うと、画像処理装置１は、修正後の記録文字列を取得する（ステップＳ１０１０）。ステップＳ１００８で、記録部１９３が、記録文字列の表示と文書帳票の画像との対応関係を取得済でないと判定した場合（ステップＳ１００８：ＮＯ）も、処理がステップＳ１０１０へ進む。 If it is determined that the correspondence relationship between the display of the recorded character string and the image of the document form has been acquired (step S1008: YES), the recording unit 193 controls the display unit 120 to display the recorded character string and the image of the document form. The correspondence with the image is displayed (step S1009). Step S1009 is the same as step S708 in FIG.
When the operator confirms and corrects the recorded character string and then performs a confirmation operation, the image processing apparatus 1 acquires the corrected recorded character string (step S1010). If the recording unit 193 determines in step S1008 that the correspondence relationship between the display of the recorded character string and the image of the document form has not been obtained (step S1008: NO), the process also proceeds to step S1010.

そして、対応関係学習部としての特徴量抽出部１９２および記録部１９３は、処理結果の表示（図８、９の例では、記録文字列の表示ウィンドウＷ２）における記録文字列の座標を学習する（ステップＳ１０１１）。例えば、特徴量抽出部１９２が、文書帳票の場合と同様に、処理結果の表示についても第一特徴量を抽出し、記録部１９３が、第一特徴量を用いて、記録文字列の位置を特定し、記憶する。
記録部１９３が、処理結果の表示の書式を既知の場合は、ステップＳ１０１１の処理は不要である。 Then, the feature quantity extraction unit 192 and the recording unit 193 as the correspondence relationship learning unit learn the coordinates of the recorded character string in the display of the processing result (in the example of FIGS. 8 and 9, the recorded character string display window W2). step S1011). For example, the feature amount extraction unit 192 extracts the first feature amount for display of the processing result in the same manner as in the document form, and the recording unit 193 uses the first feature amount to determine the position of the recorded character string. Identify and remember.
If the recording unit 193 already knows the display format of the processing result, the processing of step S1011 is unnecessary.

記録部１９３は、記録文字列を文書帳票の識別情報に対応付けて記録テーブルに記録する（ステップＳ１０１２）。
例えば、文書帳票の画像データ中から第三特徴量ａ３、第三特徴量ｂ３、第三特徴量ｃ３、第三特徴量ｄ３が取得できたとする。そして第三特徴量ａ３が予めデータベースに記録されている第一特徴量ａ１と、第三特徴量ｂ３が第一特徴量ｂ１と、第三特徴量ｃ３が第一特徴量ｃ１と、第三特徴量ｄ３が第一特徴量ｄ１とそれぞれ特徴量が一致したとする。この場合、記録部１９３は、第三特徴量ａ３、第三特徴量ｂ３、第三特徴量ｃ３、第三特徴量ｄ３それぞれに対応する文字列を、記録文字列として文書帳票の記録テーブルに記録する。ここでいう第三特徴量に対応する文字列は、その第三特徴量の抽出元の文字列である。作業者による記録文字列の修正があった場合、記録部１９３は、修正後の記録文字列を文書帳票の記録テーブルに記録する。
ステップＳ１０１２の後、画像処理装置１は、図１２の処理を終了する。 The recording unit 193 records the recording character string in the recording table in association with the identification information of the document form (step S1012).
For example, assume that a third feature amount a3, a third feature amount b3, a third feature amount c3, and a third feature amount d3 have been acquired from the image data of the document form. Then, the third feature amount a3 is the first feature amount a1 recorded in advance in the database, the third feature amount b3 is the first feature amount b1, the third feature amount c3 is the first feature amount c1, and the third feature amount It is assumed that the feature amount of the quantity d3 matches the first feature amount d1. In this case, the recording unit 193 records the character strings corresponding to the third feature amount a3, the third feature amount b3, the third feature amount c3, and the third feature amount d3 in the record table of the document form as recorded character strings. do. The character string corresponding to the third feature amount referred to here is the character string from which the third feature amount is extracted. When the operator corrects the recorded character string, the recording unit 193 records the corrected recorded character string in the recording table of the document form.
After step S1012, the image processing apparatus 1 ends the processing of FIG.

画像処理装置１が、図１２の処理で文書帳票における第一特徴量を更新するようにしてもよい。例えば画像処理装置１が、ステップＳ１０１１で、処理結果の表示における第一特徴量の機械学習に加えて、あるいは代えて、文書帳票における第一特徴量を学習するようにしてもよい。その場合、画像処理装置１が、図６の処理フローを再度実施するようにしてもよい。あるいは画像処理装置１が、ステップＳ１０１１で追加学習を行うことで、図６で処理済みのデータについては再度処理を行うことなく第一特徴量を更新するようにしてもよい。
画像処理装置１が、図１２の処理で第一特徴量を更新することで、サンプルデータ数が増加して第一特徴量の精度が向上し、画像処理装置１が記録文字列を抽出する精度が向上することが期待される。また、図１２の処理で記録文字列が追加された場合、画像処理装置１が新たに追加された記録文字列についても画像データから抽出できるようになり、作業者が文字列を入力する手間を省けることが期待される。 The image processing apparatus 1 may update the first feature amount in the document form by the process of FIG. For example, in step S1011, the image processing apparatus 1 may learn the first feature amount in the document form in addition to or instead of the machine learning of the first feature amount in the processing result display. In that case, the image processing apparatus 1 may perform the processing flow of FIG. 6 again. Alternatively, the image processing apparatus 1 may perform additional learning in step S1011 to update the first feature amount without reprocessing the data processed in FIG.
The image processing device 1 updates the first feature quantity by the processing of FIG. 12, thereby increasing the number of sample data and improving the accuracy of the first feature quantity, thereby increasing the accuracy with which the image processing device 1 extracts the recorded character string. is expected to improve. 12, the image processing apparatus 1 can also extract the newly added recorded character string from the image data, saving the operator the trouble of inputting the character string. It is expected that it will be saved.

図１２に示す処理によれば、画像処理装置１は予め作業者が記録しておいた複数の書式それぞれの文書帳票の画像データと記録文字列とによって、新たに入力させた文書帳票の種別によらずにその文書帳票の画像データにおける記録文字列を自動的に記録することができる。したがって画像処理装置１は文書帳票における記録文字列の記録の作業者の労力を軽減することができる。また、作業者は表示部１２０の表示を参照して、画像処理装置１の処理結果である記録文字列を確認および修正できる。その際、表示部１２０が、処理結果の表示における文字列と文書帳票の画像における文字列との対応関係を示すことで、作業者は、対応関係を容易に把握することができる。 According to the process shown in FIG. 12, the image processing apparatus 1 selects the type of a newly input document form based on the document form image data and recorded character strings for each of a plurality of formats recorded in advance by the operator. It is possible to automatically record the record character string in the image data of the document form without using it. Therefore, the image processing apparatus 1 can reduce the labor of the operator for recording the record character string in the document form. Also, the operator can refer to the display on the display unit 120 to check and correct the recorded character string that is the processing result of the image processing apparatus 1 . At this time, the display unit 120 indicates the correspondence between the character strings in the display of the processing result and the character strings in the image of the document form, so that the operator can easily understand the correspondence.

＜第三実施形態＞
なお、画像処理装置１の処理の他の例としては、作業者が予め文書帳票のグループを画像処理装置１に登録しておいてもよい。例えば作業者は、過去において文書帳票の画像データを登録する際、文書帳票の種類に合わせてグループ識別子を入力しておき文書帳票の画像データと紐づけてデータベース４に登録しておく。これにより、同一グループ内に画像処理装置１の処理誤り等により異種の帳票が混じることがなくなり、精度のよい第一特徴量を抽出することができる。なおこの場合、登録時は作業者が文書帳票のグループを入力するが、新たな帳票に対しては、ステップＳ１００４と同じく、第二特徴量を用いてグループ特定する。 <Third embodiment>
As another example of the processing of the image processing apparatus 1, the operator may register groups of document forms in the image processing apparatus 1 in advance. For example, when registering image data of a document form in the past, the operator entered a group identifier according to the type of the document form and registered it in the database 4 in association with the image data of the document form. As a result, different kinds of forms are not mixed in the same group due to a processing error of the image processing apparatus 1 or the like, and the first feature amount can be extracted with high accuracy. In this case, the operator inputs the group of the document form at the time of registration, but for the new form, the group is specified using the second feature amount, as in step S1004.

＜第四実施形態＞
また、画像処理装置１の処理の他の例としては、画像処理装置１は第二特徴量を用いて文書帳票をグループ分けするだけでなく、第一特徴量を用いて、また第二特徴量と共に第一特徴量を用いて、文書帳票をグループ分けするようにしてもよい。第一特徴量は記録文字列の特徴量であるが、同じ種類の文書帳票であれば、記録文字列の座標やその文字属性は同じであると考えられ、第一特徴量を用いて帳票をグループ分けすることが可能となる。最初のグループ分けを第四実施形態で示すように作業者が行い、新たな文書帳票に対してはステップＳ１００４の処理により第一特徴量を用いてグループ分けすることにより、ＯＣＲ処理において精度よく記録文字列を読み取ることが可能となる。
この場合、取得部１９１が、複数の帳票画像データとその帳票画像データに含まれる文字列のうち記録対象となった記録文字列とを取得する。そしてグループ分類部１９４が第一特徴量に基づいて帳票画像データをグループ分けする。そして、特徴量抽出部１９２は、グループに含まれる帳票画像データに対応する第一特徴量を用いて記録文字列を抽出する。 <Fourth embodiment>
As another example of processing of the image processing apparatus 1, the image processing apparatus 1 not only groups document forms using the second feature amount, but also uses the first feature amount to The document form may be grouped by using the first feature amount together with . The first feature amount is the feature amount of the recorded character string. If the document form is of the same type, it is considered that the coordinates and character attributes of the recorded character string are the same. Grouping is possible. The first grouping is performed by the operator as shown in the fourth embodiment, and the new document form is grouped using the first feature amount in the processing of step S1004, thereby accurately recording in the OCR process. Strings can be read.
In this case, the acquisition unit 191 acquires a plurality of form image data and a recorded character string to be recorded among the character strings included in the form image data. Then, the grouping unit 194 groups the form image data based on the first feature amount. Then, the feature quantity extraction unit 192 extracts the recorded character string using the first feature quantity corresponding to the form image data included in the group.

＜第五実施形態＞
第二実施形態においてはステップＳ１００４において第二特徴量に基づいて新たな帳票のグループを特定している。しかしながら、別の処理態様として、画像処理装置１はグループを特定する処理を行わずに、作業者により設定された全グループに対して、１グループごとに順に特定して第一特徴量を読み出し、第三特徴量と一致する個数をカウントする。正しいグループの場合には最も多く第一特徴量と第三特徴量とが一致するはずなので、画像処理装置１は一致個数が最も多いときの特定グループの第三特徴量それぞれに含まれる文字列を記録する。これにより、グループを特定しなくても記録文字列を記録することができる。
この場合、取得部１９１が、複数の帳票画像データとその帳票画像データに含まれる文字列のうち記録対象となった記録文字列とを取得する。そして、特徴量抽出部１９２は、取得部１９１の取得した帳票画像データを認識処理した結果に基づいて、記録文字列の特徴を示す第一特徴量または記録文字列以外の認識情報を示す第二特徴量を抽出する。特徴量抽出部１９２は、予め設定された所定のグループに含まれる帳票画像データに対応する第一特徴量を用いて記録文字列を抽出する。 <Fifth embodiment>
In the second embodiment, in step S1004, a new group of forms is specified based on the second feature amount. However, as another processing mode, the image processing apparatus 1 does not perform the process of specifying groups, but sequentially specifies each group for all groups set by the operator, reads out the first feature amount, Count the number of matches with the third feature amount. In the case of the correct group, the first feature amount and the third feature amount should match the most. Record. This makes it possible to record the recording string without specifying the group.
In this case, the acquisition unit 191 acquires a plurality of form image data and a recorded character string to be recorded among the character strings included in the form image data. Then, the feature amount extraction unit 192 performs the first feature amount indicating the feature of the recorded character string or the second feature amount indicating recognition information other than the recorded character string based on the result of the recognition processing of the form image data acquired by the acquisition unit 191 . Extract features. The feature amount extraction unit 192 extracts a recorded character string using a first feature amount corresponding to form image data included in a predetermined group.

以上のように、記録部１９３は、複数の文書画像を用いた学習の結果に基づいて予め記録され、文書画像の種別毎かつ特定項目毎にその項目の文字列の特徴を示す特徴量のうち、表示されている第一の文書画像についての特徴量に基づいて、第一の文書画像における特定項目の文字列を検出する。表示部１２０は、第一の文書画像と当該第一の文書画像に対応して表示されている第二の文書画像とで、同じ特定項目を示す対応関係情報を出力する。
画像処理装置１によれば、文書画像における文字列の位置を学習により取得することができる。従って、画像処理装置１によれば、文字認識対象の文書のフォーマットが予めわかっていない場合でも、作業者が、どの項目がどこに示されているかを把握できるよう補助を行うことができる。 As described above, the recording unit 193 preliminarily records based on the results of learning using a plurality of document images. , a character string of a specific item in the first document image is detected based on the feature amount of the displayed first document image. The display unit 120 outputs correspondence information indicating the same specific item between the first document image and the second document image displayed corresponding to the first document image.
According to the image processing apparatus 1, the position of the character string in the document image can be acquired by learning. Therefore, according to the image processing apparatus 1, even if the format of the document to be subjected to character recognition is not known in advance, it is possible to assist the operator in understanding which item is shown where.

また、記録部１９３は、第一の文書画像における特定項目と同じ項目の文字列を第二の文書画像から検出する。
これにより、画像処理装置１は、第一の文書画像における特定項目の文字列と、第二の文書画像における特定項目の文字列との対応関係をユーザに提示することができる。ユーザは、対応関係の提示を参照することで、文字列の正誤を比較的容易に確認することができる。 Also, the recording unit 193 detects a character string of the same item as the specific item in the first document image from the second document image.
Thereby, the image processing apparatus 1 can present the correspondence relationship between the character string of the specific item in the first document image and the character string of the specific item in the second document image to the user. The user can relatively easily check whether the character string is correct or incorrect by referring to the presentation of the correspondence.

また、特徴量抽出部１９２および記録部１９３は、第一の文書画像と第二の文書画像との、同じ特定項目の対応関係を機械学習する。
これにより、画像処理装置１ではユーザの特別な処置を必要とせずに、第一の文書画像と第二の文書画像との、同じ特定項目の対応関係を自動的に学習できる。ユーザは、通常の処理を行えばよく、ユーザの負担が増えない。 Also, the feature amount extraction unit 192 and the recording unit 193 machine-learn the correspondence relationship of the same specific item between the first document image and the second document image.
As a result, the image processing apparatus 1 can automatically learn the correspondence relationship of the same specific item between the first document image and the second document image without requiring any special action by the user. The user only has to perform normal processing, and the burden on the user does not increase.

また、特徴量抽出部１９２および記録部１９３は、第二の文書画像の文字列が確定されたとき以後の画像を用いて、第一の文書画像と第二の文書画像との、同じ特定項目の対応関係を機械学習する。
第二の文書画像の文字列が確定された場合、確定された文字列は正確であると考えられる。対応関係学習部が、この正確な文字列を利用して機械学習を行うことで、第一の文書画像と第二の文書画像との、同じ特定項目の対応関係を比較的高精度に学習できると期待される。 In addition, the feature amount extracting unit 192 and the recording unit 193 extract the same specific item in the first document image and the second document image using the image after the character string of the second document image is determined. machine-learning the correspondence between
If the character string in the second document image is confirmed, the confirmed character string is considered correct. By performing machine learning using this accurate character string, the correspondence learning unit can learn the correspondence of the same specific item between the first document image and the second document image with relatively high accuracy. is expected.

次に、図１２を参照して、本発明の実施形態の構成について説明する。
図１２は、実施形態に係る画像処理装置の構成の例を示す図である。図１２に示す画像処理装置６００は、文字列検出部６０１と、出力部６０２と、を備える。
かかる構成にて、文字列検出部６０１は、複数の文書画像を用いた学習の結果に基づいて予め記録され、文書画像の種別毎かつ特定項目毎にその項目の文字列の特徴を示す特徴量のうち、表示されている第一の文書画像についての特徴量に基づいて、第一の文書画像における特定項目の文字列を検出する。
出力部６０２は、第一の文書画像と当該第一の文書画像に対応して表示されている第二の文書画像とで、同じ特定項目を示す対応関係情報を出力する。
画像処理装置６００によれば、文書画像における文字列の位置を学習により取得することができる。従って、画像処理装置６００によれば、文字認識対象の文書のフォーマットが予めわかっていない場合でも、作業者が、どの項目がどこに示されているかを把握できるよう補助を行うことができる。 Next, the configuration of the embodiment of the present invention will be described with reference to FIG.
FIG. 12 is a diagram illustrating an example of the configuration of an image processing apparatus according to the embodiment; An image processing device 600 shown in FIG. 12 includes a character string detection unit 601 and an output unit 602 .
With such a configuration, the character string detection unit 601 obtains a feature amount that is recorded in advance based on the results of learning using a plurality of document images, and indicates the character string characteristics of the item for each type of document image and for each specific item. Among them, the character string of the specific item in the first document image is detected based on the feature amount of the displayed first document image.
The output unit 602 outputs correspondence information indicating the same specific item between the first document image and the second document image displayed corresponding to the first document image.
According to the image processing device 600, the position of the character string in the document image can be acquired by learning. Therefore, according to the image processing apparatus 600, even if the format of the document to be subjected to character recognition is not known in advance, it is possible to assist the operator in understanding which item is shown where.

上述の各装置は内部に、コンピュータシステムを有している。そして、各装置に上述した各処理を行わせるためのプログラムは、それら装置のコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムを各装置のコンピュータが読み出して実行することによって、上記処理が行われる。ここでコンピュータ読み取り可能な記録媒体とは、磁気ディスク、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリ等をいう。また、このコンピュータプログラムを通信回線によってコンピュータに配信し、この配信を受けたコンピュータが当該プログラムを実行するようにしても良い。 Each of the devices described above has an internal computer system. A program for causing each device to perform each process described above is stored in a computer-readable recording medium of each device, and the computer of each device reads and executes the program to perform the above process done. Here, the computer-readable recording medium refers to magnetic disks, magneto-optical disks, CD-ROMs, DVD-ROMs, semiconductor memories, and the like. Alternatively, the computer program may be distributed to a computer via a communication line, and the computer receiving the distribution may execute the program.

また、上記プログラムは、前述した各処理部の機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the program may be for implementing part of the functions of the processing units described above. Further, it may be a so-called difference file (difference program) that can realize the above-described functions in combination with a program already recorded in the computer system.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and design and the like are included within the scope of the gist of the present invention.

１画像処理装置
２画像読取装置
３記録装置
４データベース
１１０通信部
１２０表示部
１３０操作入力部
１８０記憶部
１９０制御部
１９１取得部
１９２特徴量抽出部
１９３記録部 1 image processing device 2 image reading device 3 recording device 4 database 110 communication unit 120 display unit 130 operation input unit 180 storage unit 190 control unit 191 acquisition unit 192 feature amount extraction unit 193 recording unit

Claims

a character string recognition unit that recognizes the character string of a specific item in an image to be recognized based on the results of learning using a plurality of images containing character strings;
Output for outputting the recognition target image and the character string recognition result in a manner in which a correspondence relationship indicating the same specific item in the recognition target image and the character string recognition result by the character string recognition unit can be grasped. Department and
with
The character string recognition unit identifies candidates for the character string of the specific item in the image to be recognized,
The output unit outputs the range of the identified character string candidates in a comprehensible manner.
Image processing device.

an operation reception unit that receives an operation to correct a character string included in the character string recognition result output by the output unit;
The image processing apparatus according to claim 1, wherein the character string recognition unit records the character string corrected by the correction operation accepted by the operation accepting unit as the recognized character string.

The character string recognizing unit selects the character string for each type of document indicated by the plurality of images including the character string and for each of the specific items from among the feature amounts that are recorded based on the learning result and indicate the characteristics of the character string for each of the specific items. 3. The image processing apparatus according to claim 1, wherein the character string of the specific item is recognized based on the feature amount of the character string in the target image.

The image processing apparatus according to claim 3, wherein the feature amount is generated based on information indicating attributes of characters included in the character string and a position of the character string in the image to be recognized.

5. The image processing device according to claim 4, wherein the character attribute indicates at least one of numbers, alphabets, hiragana, kanji, number of characters, character height, and font.

The character string recognition unit detects a plurality of candidates for the character string of the specific item in the recognition target image,
The image processing apparatus according to any one of claims 1 to 5, wherein the output unit outputs a range of each of the detected character string candidates in a manner that allows understanding.

a step of recognizing the character string of a specific item in an image to be recognized based on the results of learning using a plurality of images containing the character string;
a step of outputting the recognition target image and the character string recognition result in a manner in which a correspondence relationship indicating the same specific item in the recognition target image and the character string recognition result can be grasped;
including
In the step of recognizing the character string, identifying candidates for the character string of the specific item in the image to be recognized;
In the outputting step, the range of the identified character string candidates is output in a comprehensible manner.
Image processing method.

to the computer,
a step of recognizing the character string of a specific item in an image to be recognized based on the results of learning using a plurality of images containing the character string;
a step of outputting the recognition target image and the character string recognition result in a manner in which a correspondence relationship indicating the same specific item in the recognition target image and the character string recognition result can be grasped;
and
In the step of recognizing the character string, identifying candidates for the character string of the specific item in the image to be recognized;
In the outputting step, the range of the identified character string candidates is output in a comprehensible manner.
program for.