JP2023031036A

JP2023031036A - Information processing apparatus, information processing system, control method of information processing apparatus, and program

Info

Publication number: JP2023031036A
Application number: JP2021136503A
Authority: JP
Inventors: 満夫木村; Mitsuo Kimura
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2023-03-08

Abstract

To provide an information processing apparatus configured to accurately extract information from an image including a handwritten character, an information processing system, a control method of the information processing apparatus, and a program.SOLUTION: In a software configuration of an image processing system, a key value extraction server 102 includes: a printed character OCR unit 313 which specifies regions of a printed character and a handwritten character from image data of a document including the printed character and a handwritten character to perform character recognition; a ruled line extraction unit 312 which specifies positions of ruled lines included in the image data; a handwritten character region normalization unit 315 which normalizes the handwritten region on the basis of at least one of the printed character region and the positions of the ruled lines; and a key value extraction unit 316 which extracts a character string corresponding to a value of a predetermined item, using an extraction rule 330, on the basis of character strings recognized from handwritten characters corresponding to the printed character region and the normalized region. The extraction rule is trained with a result which is obtained by correcting a result extracted by the key value extraction unit by a user.SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、情報処理システム、情報処理装置の制御方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing system, a control method for an information processing device, and a program.

企業の会計業務に関する処理を提供するクラウドベースの会計サービスがある。会計サービスを利用して、請求書の管理、支払いを行うためには、紙の請求書から必要な情報を抽出して会計サービスに入力する必要がある。これらの入力作業を支援するための入力支援システムが存在する。入力支援システムでは、請求書をスキャナで読み取り、読み取った請求書の画像に対して文字認識を行う。そして、認識結果をユーザが確認・修正した後、修正された結果のデータが会計サービスに登録される。
会計サービスに登録する情報を認識結果から抽出するために、機械学習によってトレーニングされた学習モデルを使用する入力支援システムが存在する。文字認識結果の特徴と、会計サービスで必要とされる所定の項目（キー）に対応する文字認識結果（バリュー）に付与したラベルを機械学習することによって、請求書の書式を予め定義することなく、情報を抽出することができる。この学習に用いる文字認識結果の特徴には、文字領域の矩形情報が含まれる。
また、請求書に記載された手書き文字の認識を行い、会計サービスへの登録を支援する入力支援システムも存在する。手書き文字の場合、活字に比べて、文字の大きさや位置にばらつきが発生する。そのため、文字認識結果の特徴である文字領域の矩形情報にもばらつきが発生する。手書き文字の文字認識結果の特徴を用いてトレーニングを行った学習モデルは、文字領域の矩形情報のばらつきが原因で、情報抽出の精度に課題がある。
特許文献１には、文字画像を正規化する技術が開示されている。 There are cloud-based accounting services that provide processing related to corporate accounting operations. In order to use an accounting service to manage and pay bills, it is necessary to extract necessary information from paper invoices and enter it into the accounting service. There are input support systems for assisting these input operations. The input support system scans the bill with a scanner and performs character recognition on the scanned bill image. After the user confirms and corrects the recognition result, the data of the corrected result is registered in the accounting service.
Input assistance systems exist that use learning models trained by machine learning to extract information from recognition results for registration with accounting services. By machine learning the characteristics of character recognition results and the labels assigned to the character recognition results (values) corresponding to the predetermined items (keys) required for accounting services, billing formats are not defined in advance. , information can be extracted. The features of the character recognition result used for this learning include rectangle information of the character area.
There is also an input support system that recognizes handwritten characters written on bills and supports registration in accounting services. In the case of handwritten characters, variations occur in the size and position of characters compared to printed characters. Therefore, the rectangular information of the character area, which is a feature of the character recognition result, also varies. A learning model trained using the features of character recognition results of handwritten characters has a problem in the accuracy of information extraction due to variations in the rectangle information of character regions.
Patent Literature 1 discloses a technique for normalizing character images.

特開平11-161740号公報JP-A-11-161740

特許文献１の文字画像を正規化する技術を用いることによって、手書き文字の文字領域の矩形情報のばらつきを軽減することができる。しかしながら、特許文献１の技術では、認識の対象と特定された単語領域を基準に正規化を行うため、同一書式の請求書であっても、請求書ごとに正規化の結果が異なる。請求書ごとのばらつきによって、手書き文字の文字認識結果の特徴を用いてトレーニングを行った学習モデルには、依然、情報抽出の精度に課題が残る。
本発明は、手書き文字を含む画像であっても、予め文書の書式を定義しておくことなく、精度よく情報抽出が可能な画像処理システムを提供することを目的とする。 By using the technique of normalizing the character image disclosed in Japanese Patent Application Laid-Open No. 2002-200315, it is possible to reduce variations in the rectangle information of the character areas of the handwritten characters. However, in the technique of Patent Document 1, since normalization is performed based on the word region specified as the target of recognition, the result of normalization differs for each bill even if the bills have the same format. Due to invoice-to-invoice variability, learning models trained using features from handwritten character recognition results still struggle with information extraction accuracy.
SUMMARY OF THE INVENTION It is an object of the present invention to provide an image processing system capable of accurately extracting information from an image including handwritten characters without defining the format of the document in advance.

上記の目的を達成するため本発明の情報処理装置は、活字と手書き文字を含む原稿の画像データから前記活字と手書き文字の文字領域を特定し文字認識を行う認識手段と、前記画像データに含まれる罫線の位置を特定する特定手段と、前記活字の領域と罫線の位置の少なくともいずれかに基づいて、前記手書き文字の文字領域を正規化する正規化手段と、前記活字の領域及び前記正規化された領域に対応する前記手書き文字から認識された文字列に基づき、所定の項目の値に対応する文字列を、ルールを用いて抽出する抽出手段と、を有し、前記ルールは、前記抽出手段による抽出結果をユーザが修正した修正結果を用いて学習されたルールである、ことを特徴とする。 In order to achieve the above object, the information processing apparatus of the present invention comprises recognition means for specifying character regions of the printed characters and handwritten characters from image data of a document containing the printed characters and handwritten characters, and recognizing the characters included in the image data. a normalizing means for normalizing the character area of the handwritten character based on at least one of the area of the printed character and the position of the ruled line; and the area of the printed character and the normalization. extracting means for extracting a character string corresponding to a value of a predetermined item using a rule, based on the character string recognized from the handwritten character corresponding to the identified area, wherein the rule is the extraction It is characterized in that it is a rule learned using a correction result obtained by correcting the extraction result by the means by the user.

本発明の情報処理装置によれば、手書き文字を含む画像であっても、予め文書の書式を定義しておくことなく、精度よく情報を抽出することができる。 According to the information processing apparatus of the present invention, it is possible to accurately extract information even from an image including handwritten characters without defining the format of the document in advance.

画像処理システムの全体構成を示す図。1 is a diagram showing the overall configuration of an image processing system; FIG. 画像処理システムのハードウェア構成の一例を示す図。1 is a diagram showing an example of the hardware configuration of an image processing system; FIG. 画像処理システムのソフトウェア構成の一例を示す図。FIG. 2 is a diagram showing an example of the software configuration of an image processing system; 処理制御フローのステップで生成される画像データの一例。An example of image data generated in a step of the processing control flow. 処理制御フローのステップで生成される罫線情報の一例。An example of ruled line information generated in a step of the process control flow. 処理制御フローのステップで生成される文字認識結果の一例。An example of a character recognition result generated in a step of the process control flow. 処理制御フローのステップで生成される文字認識結果の一例。An example of a character recognition result generated in a step of the process control flow. 処理制御フローのステップで生成される手書き文字領域の正規化結果の一例。An example of the normalization result of the handwritten character area generated in the step of the process control flow. 処理制御フローのステップで生成される画像データの一例。An example of image data generated in a step of the processing control flow. 処理制御フローのステップで生成される情報抽出結果の一例。An example of an information extraction result generated in a step of the process control flow. 処理制御フローのステップで生成される情報抽出結果の一例。An example of an information extraction result generated in a step of the process control flow. 処理制御フローのステップで表示される画面を示す図。FIG. 10 is a diagram showing a screen displayed in a step of the processing control flow; 処理制御フローを示す図。The figure which shows a processing control flow. 処理制御フローを示す図。The figure which shows a processing control flow. 処理制御フローのステップで表示される画面の遷移を説明する図。FIG. 4 is a diagram for explaining transition of screens displayed in steps of the processing control flow;

以下、本発明を実施するための形態について図面などを参照して説明する。なお、実施形態は、本発明を限定するものではなく、また、実施形態で説明されている全ての構成が本発明の課題を解決するための手段に必須であるとは限らない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. The embodiments do not limit the present invention, and not all configurations described in the embodiments are essential for solving the problems of the present invention.

（実施形態１）
[システム構成]
以下、本発明を実施するための形態について図面を用いて説明する。
まず、本発明の前提となる情報処理システムについて説明する。
図１は、本発明の実施の形態に係る情報処理システムの全体構成を示す図である。図１において、ＭＦＰ（Multifunction Peripheral）101は、ネットワークに接続され、紙文書のスキャン機能を備える画像入力装置である。
情報処理装置であるキーバリュー抽出サーバ102は、紙文書のスキャン画像から、会計サービスで必要とされる所定の項目（キー）に対応する情報（バリュー）を抽出するサービスを提供する。キーバリュー抽出サーバ102は、例えば、請求書の画像から「日付」といったキーに対応するバリューに相当する文字列「2019/06/12」を抽出する。
情報処理装置である会計サーバ103は、企業の会計業務を処理するサービスを提供する。ＭＦＰ101、キーバリュー抽出サーバ102、会計サーバ103は、インターネット100を介して互いに接続されている。 (Embodiment 1)
[System configuration]
EMBODIMENT OF THE INVENTION Hereinafter, the form for implementing this invention is demonstrated using drawing.
First, an information processing system that is a premise of the present invention will be described.
FIG. 1 is a diagram showing the overall configuration of an information processing system according to an embodiment of the present invention. In FIG. 1, an MFP (Multifunction Peripheral) 101 is an image input device connected to a network and equipped with a paper document scanning function.
A key-value extraction server 102, which is an information processing device, provides a service for extracting information (value) corresponding to a predetermined item (key) required for accounting services from a scanned image of a paper document. The key-value extraction server 102, for example, extracts a character string "2019/06/12" corresponding to a value corresponding to a key such as "date" from the invoice image.
Accounting server 103, which is an information processing device, provides services for processing company accounting operations. The MFP 101 , key-value extraction server 102 and accounting server 103 are connected to each other via the Internet 100 .

[ハードウェア構成]
図２(a)は、情報処理装置であるキーバリュー抽出サーバ102、会計サーバ103の基本的なハードウェア構成を示すブロック図である。
図２(a)において、ＣＰＵ（Central Processing Unit）201は、各種のプログラムを実行し、様々な機能を実現するユニットである。ＲＡＭ（Random Access Memory）202は、各種の情報を記憶するユニットである。また、ＲＡＭ202は、ＣＰＵ201の一時的な作業記憶領域としても利用されるユニットである。ＲＯＭ（Read Only Memory）203は、各種のプログラム等を記憶するユニットである。例えば、ＣＰＵ201は、ＲＯＭ203に記憶されているプログラムをＲＡＭ202にロードしてプログラムを実行する。
加えて、ＣＰＵ201がフラッシュメモリ、ＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Disk）といった外部記憶装置に記憶されているプログラムに基づき処理を実行する。これにより、後述の図３に示されるようなキーバリュー抽出サーバ102を構成するソフトウェア構成及び後述するシーケンスの各ステップの処理が実現される。
なお、キーバリュー抽出サーバ102の機能及び後述するシーケンスに係る処理の全部又は一部についてはＡＳＩＣ等の専用のハードウェアを用いて実現してもよい。
Input/Outputインターフェース204は、ディスプレイ装置のような出力装置、キーボードやマウスといった入力装置とのインターフェースを提供するユニットである。
ＮＩＣ（Network Interface Card）205は、キーバリュー抽出サーバ102、会計サーバ103をネットワーク（不図示）に接続するためのユニットである。
上述したユニットは、バス206を介して互いにデータの送受信を行うことが可能に構成されている。
なお、キーバリュー抽出サーバ102や会計サーバ103は、１台のコンピュータにより実現されるものであっても、複数のコンピュータにより実現されるものであってもよい。例えば、キーバリュー抽出サーバ102や会計サーバ103は、クラウドコンピューティングの技術を利用して実現される構成であってもよい。すなわち、キーバリュー抽出サーバ102や会計サーバ103は、クラウドサービスとして実現される構成でもよい。 [Hardware configuration]
FIG. 2(a) is a block diagram showing the basic hardware configuration of the key-value extraction server 102 and accounting server 103, which are information processing apparatuses.
In FIG. 2A, a CPU (Central Processing Unit) 201 is a unit that executes various programs and implements various functions. A RAM (Random Access Memory) 202 is a unit that stores various information. Also, the RAM 202 is a unit that is also used as a temporary work storage area for the CPU 201 . A ROM (Read Only Memory) 203 is a unit that stores various programs and the like. For example, the CPU 201 loads a program stored in the ROM 203 into the RAM 202 and executes the program.
In addition, the CPU 201 executes processing based on programs stored in an external storage device such as flash memory, HDD (Hard Disk Drive), or SSD (Solid State Disk). As a result, the software configuration constituting the key-value extraction server 102 as shown in FIG. 3, which will be described later, and the processing of each step of the sequence, which will be described later, are realized.
All or part of the functions of the key-value extraction server 102 and the processing related to the sequence described later may be realized using dedicated hardware such as ASIC.
An input/output interface 204 is a unit that provides an interface with an output device such as a display device and an input device such as a keyboard and mouse.
A NIC (Network Interface Card) 205 is a unit for connecting the key-value extraction server 102 and accounting server 103 to a network (not shown).
The units described above are configured to be able to transmit and receive data to and from each other via the bus 206 .
Note that the key-value extraction server 102 and accounting server 103 may be realized by one computer or may be realized by a plurality of computers. For example, the key-value extraction server 102 and accounting server 103 may be configured using cloud computing technology. That is, the key-value extraction server 102 and the accounting server 103 may be implemented as cloud services.

図２(b)は、画像入力装置であるＭＦＰ101の基本的なハードウェア構成を示すブロック図である。図２(b)において、ＣＰＵ211、ＲＡＭ212、ＲＯＭ213、ＮＩＣ214は、それぞれ、図２(a)のＣＰＵ201、ＲＡＭ202、ＲＯＭ203、ＮＩＣ205と同様である。
スキャナ215は、紙文書を画像データに変換する入力ユニットである。プリンタエンジン216は、画像データの印刷を行う出力ユニットである。操作パネル217は、ユーザからのタッチ操作を受け付ける入力ユニットであり、ＭＦＰ101の情報をディスプレイに表示する出力ユニットである。
上述したユニットは、バス218を介してデータの送受信を行うことが可能に構成されている。 FIG. 2(b) is a block diagram showing the basic hardware configuration of the MFP 101, which is an image input device. In FIG. 2(b), the CPU 211, RAM 212, ROM 213, and NIC 214 are the same as the CPU 201, RAM 202, ROM 203, and NIC 205 in FIG. 2(a), respectively.
Scanner 215 is an input unit that converts paper documents into image data. A printer engine 216 is an output unit that prints image data. The operation panel 217 is an input unit that receives touch operations from the user, and is an output unit that displays information of the MFP 101 on the display.
The units described above are configured to be able to transmit and receive data via bus 218 .

[ソフトウェア構成]
次に、ＭＦＰ101、および、キーバリュー抽出サーバ102におけるソフトウェアの構成について説明する。なお、会計サーバ103についての説明は省略する。
図３(a)は、ＭＦＰ101のソフトウェア構成図を示した図である。なお、以下のソフトウェア各部は、ＣＰＵ101がＲＯＭ203に格納されたプログラムをＲＡＭ202に展開し、実行することにより実現される。
ＵＩ部301は、操作パネル217を介して、ユーザからの入力を受け付ける。スキャン実行部302は、スキャナ215から入力されたデータを画像データに変換する。画像データ送信部303は、ＮＩＣ214を介して、画像データをキーバリュー抽出サーバ102に送信する。キーバリュー抽出結果受信部304は、ＮＩＣ214を介して、キーバリュー抽出結果をキーバリュー抽出サーバ102から受け取る。修正結果送信部305は、ＵＩ部301を介してユーザが行った修正結果を、ＮＩＣ214を介してキーバリュー抽出サーバ102に送信する。 [Software configuration]
Next, the configuration of software in the MFP 101 and the key/value extraction server 102 will be described. A description of the accounting server 103 is omitted.
FIG. 3(a) is a diagram showing a software configuration diagram of the MFP 101. FIG. The following software units are implemented by the CPU 101 developing programs stored in the ROM 203 in the RAM 202 and executing them.
A UI unit 301 receives input from the user via the operation panel 217 . A scan execution unit 302 converts data input from the scanner 215 into image data. The image data transmission unit 303 transmits image data to the key/value extraction server 102 via the NIC 214 . The key/value extraction result receiving unit 304 receives the key/value extraction result from the key/value extraction server 102 via the NIC 214 . A correction result transmission unit 305 transmits the correction result made by the user through the UI unit 301 to the key/value extraction server 102 through the NIC 214 .

図３(b)は、情報処理装置であるキーバリュー抽出サーバ102のソフトウェア構成図を示した図である。なお、以下のソフトウェア各部は、ＣＰＵ211がＲＯＭ203に格納されたプログラムをＲＡＭ212に展開し、実行することにより実現される。
画像データ受信部311は、キーバリュー抽出サーバ102は、ＮＩＣ205を介して、ＭＦＰ101から画像データを受け取る。罫線抽出部312は、画像データから罫線を抽出し、罫線で囲まれた領域（セル領域）を特定する。活字ＯＣＲ部313は、画像データから活字の文字領域を特定し、文字認識を行う。手書き文字ＯＣＲ部314は、画像データから手書き文字の文字領域を特定し、文字認識を行う。手書き文字領域正規化部315は、手書き文字領域を正規化する。キーバリュー抽出部316は、文字認識結果と正規化された文字領域から、あらかじめ定義されたキーに対応するバリューを抽出する。キーバリュー抽出結果送信部317は、ＮＩＣ205を介して、キーバリュー抽出結果をＭＦＰ101に送信する。修正結果受信部318は、ＮＩＣ205を介して、ＭＦＰ101からユーザが行った修正結果を取得する。学習部319は、受け取った修正結果に基づき学習する。キーバリューデータ送信部320は、ＮＩＣ205を介して、キーバリューデータを会計サーバ103に送信する。また、抽出ルール330は、文字認識結果と正規化された文字領域から、あらかじめ定義されたキーに対応するバリューを抽出するために使用する。抽出ルール330は、修正結果を学習することによって更新される。
本実施例における抽出ルール330は、文字認識結果の特徴と、キーに対応する文字認識結果に付与したラベルを機械学習することによってトレーニングされた学習モデル（学習済みモデル）である。文字認識結果の特徴は以下を含む。
・文字認識結果の文字列から得る特徴
・文字領域の矩形情報
・周囲にある文字認識結果の文字列から得る特徴 FIG. 3(b) is a diagram showing a software configuration diagram of the key-value extraction server 102, which is an information processing device. The following software units are realized by the CPU 211 developing programs stored in the ROM 203 in the RAM 212 and executing them.
The image data receiving unit 311 receives the image data from the MFP 101 via the NIC 205 of the key/value extraction server 102 . A ruled line extraction unit 312 extracts ruled lines from the image data and specifies an area (cell area) surrounded by the ruled lines. A printed character OCR unit 313 identifies a printed character area from the image data and performs character recognition. A handwritten character OCR unit 314 identifies a character area of handwritten characters from the image data and performs character recognition. Handwritten character area normalization section 315 normalizes the handwritten character area. A key-value extraction unit 316 extracts a value corresponding to a predefined key from the character recognition result and the normalized character area. A key-value extraction result transmission unit 317 transmits the key-value extraction result to the MFP 101 via the NIC 205 . A correction result receiving unit 318 acquires the correction result made by the user from the MFP 101 via the NIC 205 . The learning unit 319 learns based on the received correction results. A key-value data transmission unit 320 transmits the key-value data to the accounting server 103 via the NIC 205 . Also, the extraction rule 330 is used to extract the value corresponding to the predefined key from the character recognition result and the normalized character area. Extraction rules 330 are updated by learning the modified results.
The extraction rule 330 in this embodiment is a learning model (learned model) trained by machine learning the features of the character recognition results and the labels assigned to the character recognition results corresponding to the keys. Features of character recognition results include:
・Features obtained from the character string resulting from character recognition ・Rectangle information of the character area ・Features obtained from the surrounding character strings resulting from character recognition

[処理制御フロー]
本発明の情報処理システムにおける処理制御フローについて図１１を用いて説明する。図４～図９Ａ，Ｂ（以下まとめて「図９」という。）は、図１１のフローのステップで生成される画像データ、罫線情報、文字認識結果、キーバリュー抽出結果の例を示している。図１０は、図１１のフローのステップで、ＭＦＰ101の操作パネル217に表示されるキーバリュー抽出結果を確認・修正する画面を示している。図１２は、図１１の手書き文字の文字領域を正規化するステップの詳細なフローを示している。図１３は、図１１の処理制御フローのステップで、ＭＦＰ101の操作パネル217に表示される画面の遷移を示している。 [Processing control flow]
A processing control flow in the information processing system of the present invention will be described with reference to FIG. 4 to 9A and 9B (hereinafter collectively referred to as "FIG. 9") show examples of image data, ruled line information, character recognition results, and key/value extraction results generated in the steps of the flow of FIG. . FIG. 10 shows a screen for confirming/correcting the key-value extraction result displayed on the operation panel 217 of the MFP 101 in the steps of the flow of FIG. FIG. 12 shows a detailed flow of steps for normalizing the character regions of the handwritten characters of FIG. FIG. 13 shows transition of screens displayed on the operation panel 217 of the MFP 101 in steps of the processing control flow of FIG.

まず、Ｓ1101で、ユーザによりＭＦＰ101のスキャナ215に原稿（紙文書）がセットされ、スキャン開始が指示されたことをトリガーに本処理制御フローは開始する。
Ｓ1102で、ＭＦＰ101のスキャン実行部302が紙文書のスキャンを実行し、画像データを生成する。画像データ送信部303が、キーバリュー抽出サーバ102に送信する。図４は、ＭＦＰ101のスキャン実行部302により生成されキーバリュー抽出サーバ102に送信される画像データの一例を示している。
Ｓ1103で、キーバリュー抽出サーバ102の画像データ受信部311が、ＭＦＰ101から送信された画像データを受信し、罫線抽出部312が罫線を抽出し、罫線に囲まれた領域（セル領域）を特定する。図５は、図４の画像データに対して罫線を抽出し、セル領域を特定した結果を表形式で示している。表501は、抽出した罫線の（外接）矩形情報で、行503に対応する罫線の領域が図４の431である。具体的には、矩形の左上頂点のｘ座標：229、ｙ座標：428、矩形の幅：162、高さ：2となっている。罫線なので、高さが小さくなっているが、読取りの際罫線が傾いると、外接する矩形の高さは大きくなることがある。表502は、抽出したセル領域の矩形情報で、行504に対応する領域が図４の441である。矩形形状の左上頂点のｘ座標：107、ｙ座標：488、矩形の幅：194、高さ：20となっている。
Ｓ1104で、活字ＯＣＲ部313が活字の文字認識を実行する。
Ｓ1105で、手書き文字ＯＣＲ部314が手書き文字の文字認識を実行する。図６Ａ，Ｂ（以下まとめて「図６」という。）は、図４の画像に対して活字文字認識と手書き文字認識を行った結果を表形式で示している。表601は活字の文字認識結果を示している。表611は、手書き文字の文字認識結果を示している。列602、612は、文字認識結果の文字領域の矩形情報である。列603、613は、文字認識結果の文字列を示している。図４の401～408の点線がそれぞれ、行621～628に対応する活字の文字領域を示している。図４の411～420の点線がそれぞれ、行631～640に対応する手書き文字の文字領域を示している。
Ｓ1106で、手書き文字領域正規化部315が手書き文字領域の正規化を実行する。図７の表701は、図６の表602の手書き文字認識結果に対して、手書き文字領域の正規化を行った結果を表形式で示している。列702は、文字認識結果の文字領域を正規化した領域の矩形情報である。図８の801～810の点線がそれぞれ、行711～720に対応する手書き文字の正規化された文字領域を示している。手書き文字の文字領域を正規化する処理制御の詳細については図１２を用いて後述する。 First, in S1101, the user sets a document (paper document) on the scanner 215 of the MFP 101, and the processing control flow is started when the user issues an instruction to start scanning.
In S1102, the scan execution unit 302 of the MFP 101 scans the paper document and generates image data. The image data transmission unit 303 transmits to the key/value extraction server 102 . FIG. 4 shows an example of image data generated by the scan execution unit 302 of the MFP 101 and transmitted to the key/value extraction server 102. As shown in FIG.
In S1103, the image data receiving unit 311 of the key-value extraction server 102 receives the image data transmitted from the MFP 101, the ruled line extracting unit 312 extracts the ruled lines, and specifies the area (cell area) surrounded by the ruled lines. . FIG. 5 shows the results of extracting ruled lines from the image data of FIG. 4 and identifying cell regions in a table format. A table 501 shows (circumscribed) rectangular information of the extracted ruled lines, and the area of the ruled lines corresponding to the row 503 is 431 in FIG. Specifically, the x-coordinate of the upper left corner of the rectangle is 229, the y-coordinate is 428, the width of the rectangle is 162, and the height is 2. Since it is a ruled line, the height is small, but if the ruled line is slanted during reading, the height of the circumscribing rectangle may increase. A table 502 is rectangular information of the extracted cell area, and the area corresponding to the row 504 is 441 in FIG. The x-coordinate of the upper left corner of the rectangle is 107, the y-coordinate is 488, the width of the rectangle is 194, and the height is 20.
At S1104, the printed character OCR unit 313 performs character recognition of printed characters.
In S1105, the handwritten character OCR unit 314 performs character recognition of handwritten characters. FIGS. 6A and 6B (hereinafter collectively referred to as “FIG. 6”) show the results of printed character recognition and handwritten character recognition performed on the image of FIG. 4 in tabular form. Table 601 shows the character recognition results for printed characters. Table 611 shows the character recognition results for handwritten characters. Columns 602 and 612 are the rectangle information of the character regions of the character recognition results. Columns 603 and 613 show character strings resulting from character recognition. Dotted lines 401 to 408 in FIG. 4 indicate character areas of printed characters corresponding to lines 621 to 628, respectively. Dotted lines 411 to 420 in FIG. 4 indicate character areas of handwritten characters corresponding to lines 631 to 640, respectively.
In S1106, the handwritten character region normalization unit 315 normalizes the handwritten character region. A table 701 in FIG. 7 shows, in tabular form, the result of normalizing the handwritten character area for the handwritten character recognition result in the table 602 in FIG. A column 702 is rectangle information of the area obtained by normalizing the character area of the character recognition result. Dotted lines 801-810 in FIG. 8 indicate normalized character regions of handwritten characters corresponding to lines 711-720, respectively. Details of the process control for normalizing the character area of handwritten characters will be described later with reference to FIG.

Ｓ1107で、キーバリュー抽出部316は、抽出ルール330を用いて、あらかじめ定義されたキーに対応する文字認識結果をバリューとして抽出する。キーは、会計サーバ103で必要とされる所定の項目であり、本実施例では、タイトル、電話番号、番号、日付、金額の五項目である。キーバリュー抽出結果送信部317が、ＭＦＰ101に文字認識結果とキーバリュー抽出結果を送信する。図９は、図６の表601の活字の文字領域と文字認識結果、および、図７の手書き文字領域の正規化領域と文字認識結果からキーバリュー抽出（Ｓ1107）を行った結果を表形式で示している。表901の列902は、活字の文字認識結果に対して付与されたラベルを示している。表911の列912は、手書き文字の文字認識結果に対して付与されたラベルを示している。キーに対応するバリューとして抽出された文字認識結果に対して、対応するキーをラベルとして付与する。
Ｓ1108で、ＭＦＰ101のＵＩ部301が、キーバリュー抽出結果を操作パネル217に表示する。図１０の1001は、Ｓ1105で表示される画面を示す図である。1002～1006は、キーに対応するバリューを表示するバリュー表示欄である。バリュー表示欄には、それぞれ、図９のラベルが付与された文字認識結果が表示されている。また、バリュー表示欄は、タッチすることによって、編集モードとなり、編集が可能になる。1007は登録ボタンである。
Ｓ1109で、ユーザは、ＭＦＰ101の操作パネル217に表示される画面1001（図１３では1301）で、キーに対応するバリューの抽出結果を確認し、誤りがあれば修正する（図１３の1306）。修正を検知するとＭＦＰ101のＵＩ部301は、修正に対応するラベルが付いた矩形領域が間違っている可能性があるため、正しい矩形領域を設定するためS1102で生成した画像データのプレビュー画面を操作パネル217に表示する（図１３の1303）。ユーザはＭＦＰ101の操作パネル217に表示される画像データのプレビュー画面上で、上記修正した文字列の文字領域をタッチして選択する（図１３の1307）。ユーザによるプレビュー画面上のタッチを検知したＭＦＰ101のＵＩ部301が、操作パネル217に画面1001を表示する。ユーザは一連の修正操作が終わると、登録ボタン1007を押下する（図１３の1308）。
Ｓ1110で、修正結果送信部305が、キーバリュー抽出結果の文字列を、修正後の文字列で更新する。また、文字領域が変更された場合はラベルを付与しなおす。このように、修正結果送信部305は、抽出結果を修正する。さらに修正結果送信部305が、上記のように修正した抽出結果（修正結果）を、キーバリュー抽出サーバ102に送信する。
Ｓ1111で、キーバリュー抽出サーバ102の修正結果受信部318が、ＭＦＰ101から送信された修正結果を受信し取得する。そして、S1106で作られた（正規化された）文字認識結果と、S1107で付与されたラベルと、取得した修正結果（すなわち、正規化領域、文字認識結果（文字列）、ラベル）に基づき学習部319が学習することによって、抽出ルール330を更新する。なお、キーバリュー抽出サーバ102は、上記の学習を外部装置（クラウドサービスでもよい）に依頼し、学習結果を受け取ることによって、抽出ルール330を更新する構成でもよい。すなわち、学習部319はキーバリュー抽出サーバ102の外部にあってもよい。
Ｓ1112で、キーバリューデータ送信部320が、キーとバリュー（キーに対応する文字列）のデータを会計サーバ103に送信する。 In S1107, the key-value extraction unit 316 uses the extraction rule 330 to extract the character recognition result corresponding to the predefined key as a value. The keys are predetermined items required by the accounting server 103, and in this embodiment are five items: title, telephone number, number, date, and amount. A key-value extraction result transmission unit 317 transmits the character recognition result and the key-value extraction result to the MFP 101 . FIG. 9 shows, in tabular form, the result of key-value extraction (S1107) from the printed character area and the character recognition result in the table 601 of FIG. 6 and the normalized area of the handwritten character area and the character recognition result in FIG. showing. A column 902 of the table 901 shows the labels assigned to the character recognition results of printed characters. Column 912 of table 911 shows the labels given to the character recognition results of handwritten characters. The corresponding key is given as a label to the character recognition result extracted as the value corresponding to the key.
In S1108, the UI unit 301 of the MFP 101 displays the key-value extraction result on the operation panel 217. FIG. 1001 in FIG. 10 shows the screen displayed in S1105. 1002 to 1006 are value display fields for displaying values corresponding to keys. In the value display columns, the character recognition results with the labels shown in FIG. 9 are displayed. Also, by touching the value display column, it becomes an edit mode, and editing becomes possible. 1007 is a registration button.
In S1109, the user checks the extraction result of the value corresponding to the key on the screen 1001 (1301 in FIG. 13) displayed on the operation panel 217 of the MFP 101, and corrects any errors (1306 in FIG. 13). When the correction is detected, the UI unit 301 of the MFP 101 displays a preview screen of the image data generated in step S1102 on the operation panel in order to set the correct rectangular area because there is a possibility that the rectangular area with the label corresponding to the correction is incorrect. 217 (1303 in FIG. 13). The user touches and selects the character area of the corrected character string on the image data preview screen displayed on the operation panel 217 of the MFP 101 (1307 in FIG. 13). The UI unit 301 of the MFP 101 that has detected the user's touch on the preview screen displays a screen 1001 on the operation panel 217 . After completing a series of correction operations, the user presses the registration button 1007 (1308 in FIG. 13).
In S1110, the correction result transmission unit 305 updates the character string of the key-value extraction result with the corrected character string. Also, if the character area is changed, the label is added again. In this manner, the correction result transmitting unit 305 corrects the extraction result. Furthermore, the correction result transmission unit 305 transmits the extraction result (correction result) corrected as described above to the key-value extraction server 102 .
In S1111, the correction result receiving unit 318 of the key/value extraction server 102 receives and acquires the correction result transmitted from the MFP101. Then, learn based on the (normalized) character recognition result created in S1106, the label assigned in S1107, and the acquired correction result (i.e., normalized region, character recognition result (character string), label) The extraction rule 330 is updated as the unit 319 learns. Note that the key-value extraction server 102 may be configured to update the extraction rule 330 by requesting the above learning to an external device (which may be a cloud service) and receiving the learning result. In other words, the learning unit 319 may be outside the key/value extraction server 102 .
In S 1112 , the key-value data transmission unit 320 transmits the key and value (character string corresponding to the key) data to the accounting server 103 .

手書き文字領域正規化部315が実行する手書き文字の文字領域を正規化する処理（Ｓ1106）の詳細な制御フローについて図１２を用いて説明する。
Ｓ1201で、手書き文字領域正規化部315は、手書き文字の文字領域が罫線で囲まれた領域（セル領域）に重なっているかどうかを判断する。重なっていると判断された場合（Ｓ1201でYes）、Ｓ1202に進む。
Ｓ1202で、手書き文字領域正規化部315は、手書き文字の文字領域に重なるセル領域に、活字の文字領域があるかどうかを判断する。活字の文字領域が無いと判断された場合（Ｓ1202でYes）、Ｓ1203に進む。
Ｓ1203で、手書き文字領域正規化部315は、手書き文字の文字領域をセル領域で正規化する。具体的には、セル領域を手書き文字の正規化文字領域に設定する。図７の行717（図４の417）は、手書き文字の文字領域に重なるセル領域（図４の441）で正規化された例（図８の807）である。行717の文字領域に重なるセル領域（図５の行504）が、正規化文字領域に設定されている。
Ｓ1202で、手書き文字領域正規化部315により活字の文字領域があると判断された場合（Ｓ1202でYes）、Ｓ1204に進む。
Ｓ1204で、手書き文字領域正規化部315は、手書き文字の文字領域を、近傍の活字、更にいうと距離が最も近い活字の文字領域に基づいてサイズと位置を正規化する。具体的には、文字領域のサイズの高さを近傍の活字の文字領域の高さに設定し、文字領域のサイズの幅を高さ×文字数に設定する。この文字数は、Ｓ1105でＯＣＲ処理を行って認識された文字数である。文字領域の位置を、近傍の活字の文字領域に接する位置に設定する。具体的には、手書き文字の左横に活字がある場合は、その活字の右側に続く領域に高さ×文字数の領域に正規化し、手書き文字の上に活字がある場合は、活字と左端をそろえて、下に続く領域に正規化する。図７の行718（図４の418）は、活字の文字領域（図４の406）に基づいて正規化された例（図８の808）である。行718の文字領域に最も近い活字の文字領域（図６の行626）の高さが、正規化文字領域の高さに設定され、幅が高さ×文字数に設定されている。正規化文字領域の位置が、活字の文字領域に接する位置に設定されている。
Ｓ1201で、手書き文字の文字領域がセル領域に重なっていないと判断された場合、Ｓ1205で、手書き文字領域正規化部315は、手書き文字の文字領域の上下左右のいずれかの既定の距離の範囲内に、活字の文字領域があるかどうかを判断する。手書き文字領域正規化部315により活字の文字領域があると判断された場合、Ｓ1204へ進む。上述の図７の行718（図４の418）の場合と同様に、図７の行711～715（411～415）の手書き文字領域が、それぞれ、図６の行621～625（401～405）の活字の文字領域に基づいて正規化されている（図８の801～805）。また、図７の行719（419）、720（420）の手書き文字領域が、それぞれ、図６の行627（407）、628（408）の活字の文字領域に基づいて正規化されている（図８の809、810）。
また、Ｓ1205で、手書き文字領域正規化部315により活字の文字領域が無いと判断された場合、Ｓ1206で、手書き文字の文字領域の近傍に罫線がある（文字領域に重なる、又は、上下左右のいずれかの既定の距離の範囲内にある）かどうかを判断する。罫線があると判断された場合、Ｓ1207で、手書き文字の文字領域を、活字の文字領域の高さの最頻値と罫線に基づいて正規化する。具体的には、文字領域の高さを、画像の中のすべての活字の文字領域の高さの最頻値に設定し、文字領域の幅を高さ×文字数に設定する。文字領域の位置の先頭を罫線に合わせ、罫線に接する位置に設定する。図７の行716（図４の416）は、活字の文字領域の最頻の高さと罫線（図４の431）に基づいて正規化された例（図８の806）である。すべての活字の文字領域の高さの最頻値（図６の表601の列602の高さの最頻値）が、正規化文字領域の高さに設定され、幅が高さ×文字数に設定されている。正規化文字領域の位置が、先頭が罫線の先頭に一致し、かつ、罫線に接する位置に設定されている。なお、活字の文字領域の高さの代表値として、最頻値を例として示したが、平均値、中央値などを用いてもよい。 A detailed control flow of the handwritten character area normalization process (S1106) executed by the handwritten character area normalization unit 315 will be described with reference to FIG.
In S1201, the handwritten character area normalization unit 315 determines whether the character area of the handwritten character overlaps an area (cell area) surrounded by ruled lines. If it is determined that they overlap (Yes in S1201), the process proceeds to S1202.
In S1202, the handwritten character area normalization unit 315 determines whether or not there is a printed character area in the cell area overlapping the character area of the handwritten character. If it is determined that there is no printed character area (Yes in S1202), the process proceeds to S1203.
In S1203, the handwritten character area normalization unit 315 normalizes the character area of the handwritten character in the cell area. Specifically, the cell area is set as a normalized character area for handwritten characters. Row 717 in FIG. 7 (417 in FIG. 4) is an example (807 in FIG. 8) normalized by the cell area (441 in FIG. 4) overlapping the character area of handwritten characters. A cell area (row 504 in FIG. 5) overlapping the character area of line 717 is set to the normalized character area.
In S1202, if the handwritten character area normalization unit 315 determines that there is a printed character area (Yes in S1202), the process proceeds to S1204.
In S1204, the handwritten character area normalization unit 315 normalizes the size and position of the character area of the handwritten character based on the character area of the nearby printed character, or the closest printed character. Specifically, the height of the size of the character area is set to the height of the character area of the nearby printed character, and the width of the size of the character area is set to the height times the number of characters. This number of characters is the number of characters recognized by OCR processing in S1105. Sets the position of the character area to a position that touches the character area of the nearby type. Specifically, if there is a printed character on the left side of the handwritten character, the area to the right of the printed character is normalized to the height x number of characters area, and if there is a printed character above the handwritten character, the printed character and the left edge are normalized. Align and normalize to the region that follows below. Row 718 in FIG. 7 (418 in FIG. 4) is a normalized example (808 in FIG. 8) based on the character area of the type (406 in FIG. 4). The height of the character area of the type closest to the character area of line 718 (line 626 in FIG. 6) is set to the height of the normalized character area, and the width is set to height times the number of characters. The position of the normalized character area is set to a position in contact with the character area of the type.
If it is determined in S1201 that the character area of the handwritten character does not overlap the cell area, in S1205 the handwritten character area normalization unit 315 determines whether the character area of the handwritten character is within a predetermined distance range of the top, bottom, left, or right. Determines whether there is a character area for type in the If the handwritten character area normalization unit 315 determines that there is a printed character area, the process advances to S1204. 7, lines 711-715 (411-415) of FIG. ) are normalized based on the character area of the type (801-805 in FIG. 8). 7 are normalized based on the printed character areas of lines 627 (407) and 628 (408) of FIG. 809, 810 in FIG. 8).
If the handwritten character area normalization unit 315 determines in S1205 that there is no printed character area, then in S1206 there are ruled lines in the vicinity of the handwritten character area (overlapping the character area or bordering the upper, lower, left, and right sides of the character area). within one of the predefined distances). If it is determined that there is a ruled line, in S1207 the character area of the handwritten character is normalized based on the mode of the height of the character area of the printed character and the ruled line. Specifically, the height of the character area is set to the mode of the height of the character areas of all characters in the image, and the width of the character area is set to height×the number of characters. Align the beginning of the position of the character area with the ruled line and set it to a position that touches the ruled line. Row 716 in FIG. 7 (416 in FIG. 4) is a normalized example (806 in FIG. 8) based on the most frequent height and ruled line (431 in FIG. 4) of the character area of the type. The mode of the height of the character area of all the characters (the mode of height in column 602 of the table 601 in FIG. 6) is set to the height of the normalized character area, and the width is set to the height times the number of characters. is set. The position of the normalized character area is set such that the top matches the top of the ruled line and is in contact with the ruled line. Although the mode value is shown as an example of the representative value of the height of the character area of the type, an average value, a median value, or the like may be used.

Ｓ1206で、手書き文字領域正規化部315により手書き文字の文字領域の近傍に罫線が無いと判断された場合、手書き文字領域正規化部315は、文字領域を正規化せずに終了する。
なお、本フローは、横書きの文字領域を正規化することを前提に、活字領域の高さ、高さ×文字数を、それぞれ正規化された領域の高さと幅に設定している。縦書きの文字領域を正規化する場合は、活字領域の幅と幅×文字数を、それぞれ正規化された領域の幅と高さに設定する。 In S1206, when the handwritten character area normalization unit 315 determines that there is no ruled line near the character area of the handwritten character, the handwritten character area normalization unit 315 terminates without normalizing the character area.
In this flow, on the premise that the horizontally written character area is normalized, the height of the type area and the height×the number of characters are set to the height and width of the normalized area, respectively. When normalizing a vertically written character area, set the width and width of the type area×the number of characters to the width and height of the normalized area, respectively.

以上述べたように、情報処理システムは、活字の文字領域と罫線に基づいて正規化することにより、同一書式の帳票（請求書等）では正規化の結果のばらつきを少なくなる。このように正規化した手書きの文字領域、それに付されたラベルを学習データとして学習するため、手書き文字を含む画像であっても、予め書式を定義しておくことなく、精度よく情報を抽出することができる。 As described above, the information processing system reduces variations in normalization results for forms (invoices, etc.) of the same format by normalizing based on the character areas and ruled lines of printed characters. Since the normalized handwritten character regions and the labels attached to them are learned as learning data, even images containing handwritten characters can be accurately extracted without defining a format in advance. be able to.

（その他の実施例）
以上の画像処理システムでは、ＭＦＰとキーバリュー抽出サーバが別体として説明したが、全ての機能を備えた１つの画像処理装置で行ってもよい。すなわち、ＭＦＰ101が、罫線抽出部312、活字ＯＣＲ部313、手書き文字ＯＣＲ部314、手書き文字領域正規化部315、バリュー抽出部316、学習部319、キーバリューデータ送信部320等を備える構成でもよい。
本発明は、上述の実施形態の1以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、1以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
以上、本発明の好ましい実施形態について説明したが、本発明は、これらの実施形態に限定されず、その要旨の範囲内で種々の変形および変更が可能である。 (Other examples)
In the image processing system described above, the MFP and the key/value extraction server are separate units, but a single image processing apparatus having all the functions may be used. That is, the MFP 101 may include a ruled line extraction unit 312, a printed character OCR unit 313, a handwritten character OCR unit 314, a handwritten character area normalization unit 315, a value extraction unit 316, a learning unit 319, a key/value data transmission unit 320, and the like. .
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.
Although preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and various modifications and changes are possible within the scope of the gist thereof.

101 ＭＦＰ
102 キーバリュー抽出サーバ
313 活字ＯＣＲ部
314 手書き文字ＯＣＲ部
319 学習部
330 抽出ルール 101 MFPs
102 Key-value extraction server
313 Type OCR Department
314 Handwritten Character OCR Section
319 Learning Department
330 extraction rules

Claims

recognition means for identifying character regions of the printed characters and handwritten characters from image data of a document containing the printed characters and handwritten characters and performing character recognition;
an identifying means for identifying positions of ruled lines included in the image data;
normalization means for normalizing the character area of the handwritten character based on at least one of the area of the printed character and the position of the ruled line;
extracting means for extracting a character string corresponding to a value of a predetermined item using a rule, based on the character string recognized from the handwritten characters corresponding to the printed character area and the normalized area. death,
The information processing apparatus, wherein the rule is a rule learned using a correction result obtained by correcting a result of extraction by the extraction means by a user.

Acquisition means for acquiring a correction result obtained by correcting the extraction result by the extraction means by a user;
learning means for updating the rule by learning using the correction result;
2. The information processing apparatus according to claim 1, wherein said extraction means performs said extraction using a rule updated by said learning means.

When the character area of the handwritten character overlaps the area surrounded by the ruled lines, the normalization means normalizes the position and size of the character area of the handwritten character based on the area surrounded by the ruled lines. 3. The information processing device according to claim 1 or 2, wherein

3. The information processing according to claim 1, wherein said normalization means normalizes the position and size of the character area of the handwritten character based on the character area of the printed character in the vicinity of the character area of the handwritten character. Device.

The normalization means normalizes the position of the character area of the handwritten character based on the position of a ruled line in the vicinity of the character area of the handwritten character, and normalizes the size of the character area of the handwritten character based on the size of the printed character. 3. The information processing device according to claim 1 or 2, wherein

6. The information processing apparatus according to claim 5, wherein the mode of the size of the type in the image is used as the size of the type.

7. An information processing system in which the information processing apparatus according to any one of claims 1 to 6 and an image input apparatus having scanning means and operation means are connected via a network.

a recognition step of identifying character regions of the printed characters and handwritten characters from image data of a document containing printed characters and handwritten characters and performing character recognition;
an identifying step of identifying positions of ruled lines included in the image data;
a normalization step of normalizing the character area of the handwritten character based on at least one of the area of the printed character and the position of the ruled line;
an extraction step of extracting a character string corresponding to a value of a predetermined item using a rule, based on the character string recognized from the handwritten characters corresponding to the printed character area and the normalized area. death,
A control method for an information processing apparatus, wherein the rule is a rule learned using a correction result obtained by correcting an extraction result in the extraction step by a user.

A program for causing a computer to execute the processing method of the information processing apparatus according to claim 8 .