JP6590355B1

JP6590355B1 - Learning model generation device, character recognition device, learning model generation method, character recognition method, and program

Info

Publication number: JP6590355B1
Application number: JP2019086630A
Authority: JP
Inventors: 昂平安田
Original assignee: Arithmer Inc
Current assignee: Arithmer Inc
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2019-10-16
Anticipated expiration: 2039-04-26
Also published as: JP2020184109A; WO2020218512A1

Abstract

【課題】帳票に記入された手書き文字列の認識処理の精度を改善することができる。【解決手段】学習モデル生成装置は、帳票の手書き文字領域に記入されうる１又は複数の単語が登録されたコーパス３と、１文字単位の手書き文字画像のデータセットＤＳ１と、に基づいて文字列画像及び正解ラベルを含む学習用データを生成する学習用データ生成部１２と、学習用データを用いた第１学習により、学習モデルを生成する学習モデル生成部１３と、を備える。【選択図】図３An object of the present invention is to improve the accuracy of recognition processing of a handwritten character string entered in a form. A learning model generation device includes a character string based on a corpus in which one or more words that can be entered in a handwritten character area of a form are registered, and a data set DS1 of a handwritten character image for each character. A learning data generation unit 12 that generates learning data including an image and a correct answer label, and a learning model generation unit 13 that generates a learning model by first learning using the learning data. [Selection] Figure 3

Description

本発明は、学習モデル生成装置、文字認識装置、学習モデル生成方法、文字認識方法、及びプログラムに関する。 The present invention relates to a learning model generation device, a character recognition device, a learning model generation method, a character recognition method, and a program.

従来、手書き文字が記入された帳票をイメージスキャナ等で読み取った画像データについて、光学的文字認識処理、つまりОＣＲ（Optical Character Recognition）処理することにより、所定の文字コードに変換したデジタルデータを生成する手法が知られている。 Conventionally, digital data converted into a predetermined character code is generated by performing optical character recognition processing, that is, ОCR (Optical Character Recognition) processing, on image data obtained by reading a form in which handwritten characters are written by an image scanner or the like. Techniques are known.

例えば、特許文献１には、手書きや活字等の文書や、映像や写真などの画像において文字を認識するシステム及び関連サービスが開示されている。より具体的には、特許文献１には、見本文字画像の入力を受け付ける文字画像入力受付部と、見本文字画像に基づいて文字部品を抽出する文字部品抽出と、文字部品に基づいて擬似文字モデルを生成する擬似文字モデル生成部と、擬似文字モデルに基づいて文字識別パターンを生成して識別辞書を生成する識別辞書生成と、を含むことを特徴とする文字識別システムが記載されている。 For example, Patent Document 1 discloses a system and related services for recognizing characters in documents such as handwritten characters and printed characters, and images such as videos and photographs. More specifically, Patent Document 1 discloses a character image input receiving unit that receives an input of a sample character image, a character component extraction that extracts a character component based on the sample character image, and a pseudo character model based on the character component. There is described a character identification system including a pseudo character model generation unit that generates a character identification pattern and an identification dictionary generation that generates a character identification pattern based on the pseudo character model and generates an identification dictionary.

特開２０１５−０６９２５６号公報Japanese Patent Application Laid-Open No. 2015-069256

特許文献１に記載の従来システムにおいては、少数の見本画像をもとに、外字または新しい文字画像として登録された文字を学習して、当該文字をより高精度に認識することが記載されている。しかしながら、特許文献１に記載の文字認識技術は、１つ１つの文字を個別に認識するためのものであって、複数の文字からなる文字列を高精度に読み取るためのものではない。 In the conventional system described in Patent Document 1, it is described that a character registered as an external character or a new character image is learned based on a small number of sample images and the character is recognized with higher accuracy. . However, the character recognition technique described in Patent Document 1 is for individually recognizing each character, and is not for reading a character string composed of a plurality of characters with high accuracy.

そこで、本発明のいくつかの態様はかかる事情に鑑みてなされたものであり、帳票に記入された手書き文字列の認識処理の精度を改善する学習モデル生成装置、文字認識装置、学習モデル生成方法、文字認識方法、及びプログラムを提供することを目的とする。 Accordingly, some aspects of the present invention have been made in view of such circumstances, and a learning model generation device, a character recognition device, and a learning model generation method that improve the accuracy of recognition processing of a handwritten character string entered in a form. An object of the present invention is to provide a character recognition method and a program.

本発明の一態様に係る学習モデル生成装置は、帳票の手書き文字領域に記入されうる１又は複数の単語が登録されたデータベースと、１文字単位の手書き文字画像のデータセットと、に基づいて文字列画像及び正解ラベルを含む学習用データを生成する学習用データ生成部と、学習用データを用いた第１学習により、学習モデルを生成する学習モデル生成部と、を備える。 A learning model generation apparatus according to an aspect of the present invention is based on a database in which one or more words that can be entered in a handwritten character area of a form are registered, and a data set of handwritten character images in units of characters. A learning data generation unit that generates learning data including a sequence image and a correct label, and a learning model generation unit that generates a learning model by first learning using the learning data.

本発明の一態様に係る文字認識装置は、帳票に記入された手書き文字を認識する文字認識装置であって、帳票の画像データを取得する画像データ取得部と、取得した画像データに基づいて、手書き文字で記入された文字列を含む１又は複数の手書き文字領域を特定する領域特定部と、第１ニューラルネットワーク及び第２ニューラルネットワークが結合されたネットワーク構造を有する学習モデルを用いて、手書き文字領域に記入された文字列の内容を認識する文字認識部と、を備える。 A character recognition device according to an aspect of the present invention is a character recognition device that recognizes handwritten characters entered in a form, based on an image data acquisition unit that acquires image data of the form, and the acquired image data, Handwritten characters using a learning model having a network structure in which one or more handwritten character regions including a character string written in handwritten characters are specified, and a first neural network and a second neural network are combined. And a character recognition unit for recognizing the contents of the character string entered in the area.

本発明の一態様に係る学習モデル生成方法は、学習モデルを生成するコンピュータが実行する学習モデル生成方法であって、帳票の手書き文字領域に記入されうる１又は複数の単語が登録されたデータベースと、１文字単位の手書き文字画像のデータセットと、に基づいて文字列画像及び正解ラベルを含む学習用データを生成するステップと、学習用データを用いた第１学習により、学習モデルを生成するステップと、を含む。 A learning model generation method according to an aspect of the present invention is a learning model generation method executed by a computer that generates a learning model, and includes a database in which one or a plurality of words that can be written in a handwritten character area of a form are registered A step of generating learning data including a character string image and a correct answer label based on a data set of handwritten character images in units of one character, and a step of generating a learning model by first learning using the learning data And including.

本発明の一態様に係る文字認識方法は、帳票に記入された手書き文字を認識するコンピュータが実行する文字認識方法であって、帳票の画像データを取得するステップと、取得した画像データに基づいて、手書き文字で記入された文字列を含む１又は複数の手書き文字領域を特定するステップと、第１ニューラルネットワーク及び第２ニューラルネットワークが結合されたネットワーク構造を有する学習済モデルを用いて、手書き文字領域に記入された文字列の内容を認識するステップと、を含む。 A character recognition method according to an aspect of the present invention is a character recognition method executed by a computer for recognizing handwritten characters entered in a form, based on the step of obtaining image data of the form, and based on the obtained image data Using a learned model having a network structure in which one or a plurality of handwritten character regions including a character string written with handwritten characters is identified and the first neural network and the second neural network are combined. Recognizing the contents of the character string entered in the area.

本発明の一態様に係るプログラムは、コンピュータを、帳票の手書き文字領域に記入されうる１又は複数の単語が登録されたデータベースと、１文字単位の手書き文字画像のデータセットと、に基づいて文字列画像及び正解ラベルを含む学習用データを生成する学習用データ生成部と、学習用データを用いた第１学習により、学習モデルを生成する、学習モデル生成部と、して機能させる。 A program according to an aspect of the present invention is a program that uses a computer based on a database in which one or more words that can be entered in a handwritten character area of a form are registered, and a data set of handwritten character images for each character. It functions as a learning data generation unit that generates learning data including a sequence image and a correct answer label, and a learning model generation unit that generates a learning model by first learning using the learning data.

本発明の一態様に係るプログラムは、帳票に記入された手書き文字を認識するコンピュータを、帳票の画像データを取得する画像データ取得部と、取得した画像データに基づいて、手書き文字で記入された文字列を含む１又は複数の手書き文字領域を特定する領域特定部と、第１ニューラルネットワーク及び第２ニューラルネットワークが結合されたネットワーク構造を有する学習済モデルを用いて、手書き文字領域に記入された文字列の内容を認識する文字認識部と、して機能させる。 A program according to an aspect of the present invention is a computer that recognizes handwritten characters entered in a form, an image data obtaining unit that obtains image data of the form, and handwritten characters based on the obtained image data. Using a learned model having a network structure in which one or a plurality of handwritten character regions including a character string are specified and a first neural network and a second neural network are combined, the handwritten character region is filled It functions as a character recognition unit that recognizes the contents of a character string.

なお、本発明において、「部」とは、単に物理的手段を意味するものではなく、その「部」が有する機能をソフトウェアによって実現する場合も含む。また、１つの「部」や装置が有する機能が２つ以上の物理的手段や装置により実現されても、２つ以上の「部」や装置の機能が１つの物理的手段や装置により実現されても良い。 In the present invention, the “part” does not simply mean a physical means, but includes a case where the function of the “part” is realized by software. Also, even if the functions of one “unit” or device are realized by two or more physical means or devices, the functions of two or more “units” or devices are realized by one physical means or device. May be.

本発明によれば、帳票に記入された手書き文字列の認識処理の精度を改善することができる。 ADVANTAGE OF THE INVENTION According to this invention, the precision of the recognition process of the handwritten character string entered in the form can be improved.

第１実施形態に係る文字認識装置の概略構成図（システム構成図）である。It is a schematic block diagram (system block diagram) of the character recognition apparatus which concerns on 1st Embodiment. 第１実施形態に係る証券の一例を示す図である。It is a figure showing an example of securities concerning a 1st embodiment. 第１実施形態に係る学習用データ生成処理、及び、学習モデル生成処理の一例を示す概念図である。It is a conceptual diagram which shows an example of the data generation process for learning which concerns on 1st Embodiment, and a learning model generation process. 第１実施形態に係る住所コーパスの一例を示す図である。It is a figure which shows an example of the address corpus concerning 1st Embodiment. 第１実施形態に係る学習モデル強化（更新）処理の一例を示す概念図である。It is a conceptual diagram which shows an example of the learning model reinforcement | strengthening (update) process which concerns on 1st Embodiment. 第１実施形態に係るレイアウト情報の一例を示す概念図である。It is a conceptual diagram which shows an example of the layout information which concerns on 1st Embodiment. 第１実施形態に係る文字認識処理の一例を示すフローチャートである。It is a flowchart which shows an example of the character recognition process which concerns on 1st Embodiment. 第１実施形態に係る学習モデル生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the learning model production | generation process which concerns on 1st Embodiment. 第２実施形態に係る透かしが印刷された帳票の一例を示す図である。It is a figure which shows an example of the form on which the watermark based on 2nd Embodiment was printed. 第２実施形態に係る、文字列画像に、帳票に印刷される透かしの少なくとも一部を重畳した学習用データの一例を示す図である。It is a figure which shows an example of the data for learning which superimposes at least one part of the watermark printed on a form on the character string image based on 2nd Embodiment. 第３実施形態に係る文字認識装置の概略構成図（システム構成図）である。It is a schematic block diagram (system block diagram) of the character recognition apparatus which concerns on 3rd Embodiment. 第４実施形態に係る文字認識装置及び学習モデル生成装置の概略構成図（システム構成図）である。It is a schematic block diagram (system block diagram) of the character recognition apparatus and learning model production | generation apparatus which concern on 4th Embodiment. 第５実施形態に係る文字認識装置及び外部装置の概略構成図（システム構成図）である。It is a schematic block diagram (system block diagram) of the character recognition apparatus which concerns on 5th Embodiment, and an external device. 本発明の実施形態に係るコンピュータのハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the computer which concerns on embodiment of this invention. 第１実施形態に係る文字認識装置の変形例を示す概略構成図（システム構成図）である。It is a schematic block diagram (system block diagram) which shows the modification of the character recognition apparatus which concerns on 1st Embodiment.

以下、添付図面を参照しながら本発明の実施の形態について説明する。以下の実施の形態は、本発明を説明するための例示であり、本発明をその実施の形態のみに限定する趣旨ではない。また、本発明は、その要旨を逸脱しない限り、様々な変形が可能である。さらに、各図面において同一の構成要素に対しては可能な限り同一の符号を付し、重複する説明は省略する。 Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. The following embodiments are exemplifications for explaining the present invention, and are not intended to limit the present invention only to the embodiments. The present invention can be variously modified without departing from the gist thereof. Furthermore, in each drawing, the same components are denoted by the same reference numerals as much as possible, and redundant description is omitted.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る文字認識装置の概略構成図（システム構成図）である。図１に示すように、文字認識装置１００Ａは、帳票に記入された手書き文字を認識する装置であり、例えばサーバ等の情報処理装置である。文字認識装置１００Ａは、例えば、ラップトップ又はノートブック型コンピュータ等の他の情報処理装置であってもよい。文字認識装置１００Ａは、例示的に、帳票に記入された手書き文字を認識するための情報処理を実行する情報処理部１、辞書データベース（ＤＢ）としてのコーパス３、１文字単位の手書き文字データセットＤＢ５、文字列画像単位の手書き文字データセットＤＢ７、及びレイアウト情報ＤＢ９を備えて構成されている。なお、文字認識装置１００Ａは、帳票に記入された手書き文字以外の文字を認識してもよい。また、コーパス３、手書き文字データセットＤＢ５、手書き文字データセットＤＢ７、又は、レイアウト情報ＤＢ９の少なくとも一つは、文字認識装置１００Ａとは別個の装置、又は、データベースとして構成されてもよい。 <First Embodiment>
FIG. 1 is a schematic configuration diagram (system configuration diagram) of a character recognition device according to a first embodiment of the present invention. As shown in FIG. 1, the character recognition device 100A is a device that recognizes handwritten characters entered in a form, and is an information processing device such as a server. The character recognition device 100A may be another information processing device such as a laptop or a notebook computer. The character recognition device 100A exemplarily includes an information processing unit 1 that performs information processing for recognizing handwritten characters entered in a form, a corpus 3 as a dictionary database (DB), and a handwritten character data set for each character. It comprises DB5, handwritten character data set DB7 for each character string image, and layout information DB9. Note that the character recognition device 100A may recognize characters other than the handwritten characters entered in the form. Further, at least one of the corpus 3, the handwritten character data set DB5, the handwritten character data set DB7, or the layout information DB9 may be configured as a device separate from the character recognition device 100A or as a database.

「帳票」とは、文字列が記入された書類をいい、帳簿や伝票の総称である。帳票とは、例えば、証券会社等が扱う証券、申請書、又は、契約書等の書類を含む。 “Form” means a document in which a character string is written, and is a general term for books and slips. The form includes documents such as securities handled by securities companies, application forms, or contracts, for example.

図２は、第１実施形態に係る証券（帳票）の一例を示す図である。図２に示すように、証券Ｃ１は、特定の保険会社「○○損害保険株式会社」の自動車保険証券である。 FIG. 2 is a diagram illustrating an example of a security (form) according to the first embodiment. As shown in FIG. 2, the security C1 is a car insurance policy of a specific insurance company “XX Insurance Co., Ltd.”.

証券Ｃ１は、手書き文字が記入されたフィールド（手書き文字領域）として、例えば、住所フィールド２０（住所に関する手書き文字領域）、及び、氏名フィールド２２を含む。これらのフィールドは、文字認識装置１００Ａで文字を認識するフィールドを例示したものであって、証券Ｃ１は他にも文字が記載されたフィールドを有してもよい。例えば、証券Ｃ１は、「ご契約内容」に含まれる、保険の開始日及び保険の満期日を手書きで記入するための保険期間フィールド（不図示）を有してもよい。さらに、証券Ｃ１は、「ご契約のお車」に含まれる、車台番号を手書きで記入するための車台番号フィールド（不図示）、及び、登録番号を手書きで記入するための登録番号フィールド（不図示）等を更に有してもよい。また、文字認識装置１００Ａは、他のフィールドの手書き文字を認識してもよい。なお、例示した上記フィールドのすべてについて手書き文字を認識しなければならないわけではない。 The securities C1 includes, for example, an address field 20 (handwritten character region relating to an address) and a name field 22 as fields (handwritten character regions) in which handwritten characters are entered. These fields are examples of fields for recognizing characters by the character recognition device 100A, and the securities C1 may have other fields in which characters are described. For example, the securities C1 may have an insurance period field (not shown) for hand-filling the insurance start date and the insurance maturity date included in the “contract details”. Further, the securities C1 includes a chassis number field (not shown) for handwritten entry of the chassis number and a registration number field (not shown) for entering the registration number, which are included in the "contracted car". And the like. Further, the character recognition device 100A may recognize handwritten characters in other fields. It should be noted that handwritten characters are not necessarily recognized for all the above-described fields.

図１に戻り、情報処理部１は、例えば、機能的に、画像データ取得部１１、学習用データ生成部１２、学習モデル生成部１３、学習モデル更新部１４、領域特定部１５、及び、文字認識部１６を含んで構成されている。 Returning to FIG. 1, for example, the information processing unit 1 functionally includes an image data acquisition unit 11, a learning data generation unit 12, a learning model generation unit 13, a learning model update unit 14, a region specification unit 15, and characters. The recognition unit 16 is included.

なお、情報処理部１の上記各部は、例えば、メモリやハードディスク等の記憶領域を用いたり、記憶領域に格納されているプログラムをプロセッサが実行したりすることにより実現することができる。また、文字認識装置１００Ａのコーパス３、並びに、各ＤＢ５、７及び９は、プロセッサが実行することにより実現することができる。 The above-described units of the information processing unit 1 can be realized by using a storage area such as a memory or a hard disk, or by executing a program stored in the storage area by a processor. Further, the corpus 3 and the DBs 5, 7, and 9 of the character recognition device 100A can be realized by being executed by a processor.

画像データ取得部１１は、証券Ｃ１の画像データを取得する。また、画像データ取得部１１は、例えば、画像データ取得部１１は、図１４を参照して後述する入出力インターフェース４４の一例であるカメラ等の撮像装置で撮像することによって生成される画像データを取得してもよい。 The image data acquisition unit 11 acquires image data of the securities C1. Further, the image data acquisition unit 11, for example, the image data acquisition unit 11 captures image data generated by imaging with an imaging device such as a camera which is an example of the input / output interface 44 described later with reference to FIG. 14. You may get it.

図１５に示すように、画像データ取得部１１は、証券Ｃ１をカメラ等の撮像装置を含む外部装置５０で撮像することによって生成される画像データを、所定の通信ネットワークＮを介して取得してもよい。通信ネットワークＮは、例えばインターネット等を含む情報処理に係る通信回線又は通信網であり、その具体的な構成は、文字認識装置１００Ａと外部装置５０との間でデータの送受信が可能なように構成されていれば特に制限されない。 As shown in FIG. 15, the image data acquisition unit 11 acquires image data generated by imaging the securities C 1 with an external device 50 including an imaging device such as a camera via a predetermined communication network N. Also good. The communication network N is a communication line or a communication network related to information processing including the Internet, for example, and its specific configuration is configured so that data can be transmitted and received between the character recognition device 100A and the external device 50. If it is done, it is not particularly limited.

図３は、第１実施形態に係る学習用データ生成処理、及び、学習モデル生成処理の一例を示す概念図である。図１及び図３に示すように、学習用データ生成部１２は、証券Ｃ１における１又は複数の手書き文字領域に記入されうる１又は複数の単語が登録されたコーパス３と、手書き文字データセットＤＢ５に記録されている１文字単位の手書き文字画像のデータセットＤＳ１と、に基づいて文字列画像及び正解ラベルを含む学習用データを生成する。 FIG. 3 is a conceptual diagram illustrating an example of learning data generation processing and learning model generation processing according to the first embodiment. As shown in FIGS. 1 and 3, the learning data generation unit 12 includes a corpus 3 in which one or more words that can be entered in one or more handwritten character areas in the securities C1 are registered, and a handwritten character data set DB5. The learning data including the character string image and the correct answer label is generated based on the data set DS1 of the handwritten character image in units of one character recorded in (1).

図４は、第１実施形態に係る住所コーパスの一例を示す図である。図４に示すように、図１及び図３に示すコーパス３には、例えば、都道府県名、市区町村名、地域名、及び建物名の可能な組み合わせが階層化されて登録されている住所コーパスが含まれる。つまり、各都道府県名の下位階層には、当該都道府県に属する市区町村名が含まれる。同様に、各市区町村名の下位階層には、当該市区町村に属する地域名が含まれる。住所コーパスにはさらに、行政上の変更があった都道府県名、市区町村名および地域名について、旧名称と現名称とが対応付けられて登録されてもよい。 FIG. 4 is a diagram illustrating an example of an address corpus according to the first embodiment. As shown in FIG. 4, in the corpus 3 shown in FIGS. 1 and 3, for example, addresses in which possible combinations of prefecture names, city names, area names, and building names are layered and registered. Corpus is included. In other words, the name of a city belonging to the prefecture is included in the lower hierarchy of each prefecture name. Similarly, the lower level of each city name includes the name of the area belonging to the city. Further, the old name and the current name may be associated with each other and registered in the address corpus with respect to the name of the prefecture, city, town and village where the administrative change has occurred.

図１及び図３に示すコーパス３には、住所に特化した住所コーパスの他、氏名、車名、職業、保険期間、車台番号、及び登録番号等に特化した各種コーパスが含まれてもよい。 The corpus 3 shown in FIGS. 1 and 3 may include various corpora specialized in name, car name, occupation, insurance period, chassis number, registration number, etc. in addition to the address corpus specialized in address. Good.

図３及び図４に示すように、学習用データ生成部１２は、証券Ｃ１における１又は複数のフィールドに記入されうる１又は複数の単語のテキスト情報ＣＬ１をコーパス３から抽出する。図３に示す例では、例えば住所テキスト情報「トウキョウト」を住所コーパスから抽出する。次に、コーパス３から抽出したテキスト情報ＣＬ１に含まれる各文字の手書き文字画像を手書き文字データセットＤＢ５からそれぞれ読み出して、手書き文字列画像ＣＳＩ１「トウキョウト」を生成する。ここで、図３に示すように、手書き文字データセットＤＢ５には、ひらがな、カタカナ、漢字等の文字（１文字）ごとに、対応する手書き文字画像が複数対応付けられて格納されている。具体的には、テキスト情報「ア」に対して、複数の手書き文字画像「ア」（すなわち、手書きで「ア」と記載された画像）が格納されている。手書き文字データセットＤＢ５には、テキスト情報「ア」以外の各文字についても同様に、複数の手書き文字画像（１文字単位の手書き文字画像）が対応付けられたデータセットＤＳ１が格納されている。
すなわち、学習用データ生成部１２は、コーパス３から抽出されたテキスト情報ＣＬ１に含まれる文字ごとに、手書き文字データセットＤＢ５から、対応する手書き文字画像を読みだして、手書き文字列画像ＣＳＩ１を生成する。例えば、テキスト情報ＣＬ１が「トウキョウト」のとき、テキスト情報「ト」に対応する手書き文字画像「ト」を手書き文字データセットＤＢ５から任意に１つ抽出する。続いて、テキスト情報「ウ」に対応する手書き文字画像「ウ」を手書き文字データセットＤＢ５から任意に１つ抽出する。残りの、テキスト情報「キ」「ョ」「ウ」「ト」についても同様にして、対応する手書き文字画像「キ」「ョ」「ウ」「ト」を手書き文字データセットＤＢ５からそれぞれ任意に１つ抽出する。そして、抽出された手書き文字画像「ト」「ウ」「キ」「ョ」「ウ」「ト」を１つにまとめて、文字列画像ＣＳＩ１「トウキョウト」を生成する。このようにして生成された手書き文字列画像ＣＳＩ１「トウキョウト」に対して、コーパス３から抽出されたテキスト情報ＣＬ１「トウキョウト」を正解ラベルとする学習用データを生成する。なお、手書き文字列画像ＣＳＩ１を生成する際に、任意に抽出された手書き文字画像「ト」「ウ」「キ」「ョ」「ウ」「ト」のそれぞれを、回転、拡大、縮小、移動、又は、歪みを付加させてもよい。 As shown in FIGS. 3 and 4, the learning data generation unit 12 extracts text information CL1 of one or more words that can be entered in one or more fields in the securities C1 from the corpus 3. In the example shown in FIG. 3, for example, address text information “Tokyo” is extracted from the address corpus. Next, the handwritten character image of each character included in the text information CL1 extracted from the corpus 3 is read from the handwritten character data set DB 5 to generate a handwritten character string image CSI1 “Tokyo”. Here, as shown in FIG. 3, the handwritten character data set DB 5 stores a plurality of corresponding handwritten character images in association with each character (one character) such as hiragana, katakana, and kanji. Specifically, a plurality of handwritten character images “A” (that is, images handwritten as “A”) are stored for the text information “A”. Similarly, for each character other than the text information “A”, the handwritten character data set DB5 stores a data set DS1 in which a plurality of handwritten character images (one character-by-character handwritten character image) are associated.
That is, for each character included in the text information CL1 extracted from the corpus 3, the learning data generation unit 12 reads a corresponding handwritten character image from the handwritten character data set DB5 and generates a handwritten character string image CSI1. To do. For example, when the text information CL1 is “Tokyo”, one handwritten character image “G” corresponding to the text information “G” is arbitrarily extracted from the handwritten character data set DB5. Subsequently, one handwritten character image “U” corresponding to the text information “U” is arbitrarily extracted from the handwritten character data set DB 5. Similarly for the remaining text information “ki”, “yo”, “u” and “to”, the corresponding handwritten character images “ki”, “yo”, “u” and “to” are respectively arbitrarily selected from the handwritten character data set DB 5. Extract one. Then, the extracted handwritten character images “T”, “U”, “K”, “K”, “U”, and “G” are combined into one to generate a character string image CSI1 “Tokyo”. For the handwritten character string image CSI1 “Tokyo” generated in this way, learning data is generated with the text information CL1 “Tokyo” extracted from the corpus 3 as the correct answer label. When the handwritten character string image CSI1 is generated, each of the arbitrarily extracted handwritten character images “t”, “c”, “ki”, “c”, “c”, “g” is rotated, enlarged, reduced, moved. Alternatively, distortion may be added.

手書き文字データセットＤＢ５には、１文字単位の手書き文字画像のデータセットとして、ひらがな又は漢字のテキスト情報と、ひらがな又は漢字のテキスト情報のそれぞれに対応する、複数の手書き文字画像と、がセットで含まれてもよい。また、手書き文字データセットＤＢ５には、外国語の１文字単位の手書き文字画像のデータセットが含まれてもよい。例えば、１文字単位の手書き文字画像のデータセットとして、アルファベットのテキスト情報と、アルファベットのテキスト情報のそれぞれに対応する、複数の手書き文字画像と、がセットで含まれてもよい。 In the handwritten character data set DB 5, as a data set of handwritten character images for each character, a set of hiragana or kanji text information and a plurality of handwritten character images corresponding to each of the hiragana or kanji text information. May be included. The handwritten character data set DB5 may include a data set of handwritten character images in units of foreign characters. For example, as a data set of handwritten character images in character units, alphabetic text information and a plurality of handwritten character images corresponding to the alphabetic text information may be included in a set.

学習モデル生成部１３は、学習用データ生成部１２が生成した学習用データを用いた第１学習により、学習モデルを生成する。図３に示すように、学習モデル生成部１３は、例えば、ＣＲＮＮ（Convolutional Recurrent Neural Network）を含むネットワーク構造から学習モデルＬＭ１を生成する。ＣＲＮＮは、例えば、畳み込みニューラルネットワーク（第１ニューラルネットワーク）、つまりＣＮＮ（Convolutional Neural Network）と、リカレントニューラルネットワーク（第２ニューラルネットワーク）、つまりＲＮＮ(Recurrent Neural Network)とが結合されたネットワーク構造である。ＣＮＮでは、証券Ｃ１における１又は複数のフィールドに含まれる手書き文字列に関する特徴量マップを算出する。ＲＮＮでは、動画像・音声などの時系列データを扱うことができるニューラルネットワークであり、再帰構造をもつため過去の情報を含めた予測が可能となる。ＲＮＮを用いて、特徴量マップから得られた複数の連続的な特徴データの前後関係を踏まえて文字列インデックスを算出する。なお、ネットワーク構造は上記以外の構成を採用してもよい。また、ニューラルネットワークについても、ＣＮＮ及びＲＮＮ以外のニューラルネットワークを採用してもよい。 The learning model generation unit 13 generates a learning model by first learning using the learning data generated by the learning data generation unit 12. As illustrated in FIG. 3, the learning model generation unit 13 generates a learning model LM1 from a network structure including, for example, CRNN (Convolutional Recurrent Neural Network). The CRNN is, for example, a network structure in which a convolutional neural network (first neural network), that is, a CNN (Convolutional Neural Network) and a recurrent neural network (second neural network), that is, an RNN (Recurrent Neural Network) are combined. . In CNN, the feature-value map regarding the handwritten character string contained in the 1 or several field in the securities | curds C1 is calculated. The RNN is a neural network that can handle time-series data such as moving images and sounds, and has a recursive structure, so that prediction including past information is possible. Using RNN, a character string index is calculated based on the context of a plurality of continuous feature data obtained from the feature map. The network structure may adopt a configuration other than the above. As the neural network, neural networks other than CNN and RNN may be adopted.

この構成によれば、学習モデル生成部１３は、ＣＲＮＮを含むネットワーク構造から学習モデルを生成するので、高精度に手書き文字列を認識することができる。 According to this configuration, the learning model generation unit 13 generates a learning model from a network structure including CRNN, and thus can recognize a handwritten character string with high accuracy.

上記したとおり、学習モデル生成部１３は、住所ＣＲＮＮ学習モデルを生成する。学習モデル生成部１３は、他の種別のＣＲＮＮ学習モデルを生成してもよい。例えば、学習用データ生成部１２が、氏名、保険期間、車台番号、及び登録番号等に関する学習用データを生成する場合、学習モデル生成部１３は、氏名、車名、職業、保険期間、車台番号、及び登録番号等に関する学習用データのそれぞれを用いた第１学習により、氏名、保険期間、車台番号、及び登録番号等に関するＣＲＮＮ学習モデルを生成してもよい。 As described above, the learning model generation unit 13 generates an address CRNN learning model. The learning model generation unit 13 may generate another type of CRNN learning model. For example, when the learning data generation unit 12 generates learning data related to name, insurance period, chassis number, registration number, etc., the learning model generation unit 13 includes name, vehicle name, occupation, insurance period, chassis number. And a first learning using each of the learning data relating to the registration number, etc., a CRNN learning model relating to the name, the insurance period, the chassis number, the registration number, etc. may be generated.

学習モデル更新部１４は、図３に示す第１学習の後、証券Ｃ１の画像データから切り出された文字列画像を学習用データとして用いた第２学習により、生成された学習モデルを強化（更新）する。 After the first learning shown in FIG. 3, the learning model update unit 14 reinforces (updates) the generated learning model by second learning using a character string image cut out from the image data of the securities C1 as learning data. )

図５は、第１実施形態に係る学習モデル強化（更新）処理の一例を示す概念図である。図５に示すように、図１に示す学習モデル更新部１４は、複数の証券Ｃ１の画像データから切り出された、住所及び氏名等の複数の文字列画像を学習用データとして、手書き文字データセットＤＢ７に格納する。手書き文字データセットＤＢ７には、例えば、複数の手書き文字列画像「トウキョウト」を含む文字列画像単位の手書き文字画像のデータセットＤＳ３が格納されている。学習モデル更新部１４は、図３に示す第１学習で生成された既存学習モデルを使用して、例えば、手書き文字データセットＤＢ７に含まれる手書き文字列画像「トウキョウト」（「ウ」が不鮮明）に基づいて、テキスト情報「トウキョクト」を生成（推論）する。これは、手書き文字列画像「トウキョウト」の「ウ」が不鮮明であったため、画像「ウ」を「ク」と誤認識したものである。この場合は、例えば、図１に示す文字認識装置１００Ａを操作するユーザにより、誤認識されたテキスト情報「トウキョクト」をテキスト情報「トウキョウト」に修正するマニュアル修正を実行してもよい。 FIG. 5 is a conceptual diagram illustrating an example of learning model reinforcement (update) processing according to the first embodiment. As shown in FIG. 5, the learning model update unit 14 shown in FIG. 1 uses a plurality of character string images such as addresses and names extracted from the image data of a plurality of securities C1 as learning data, and sets a handwritten character data set. Store in DB7. The handwritten character data set DB 7 stores, for example, a data set DS3 of handwritten character images in units of character string images including a plurality of handwritten character string images “Tokyo”. The learning model update unit 14 uses, for example, the existing learning model generated in the first learning shown in FIG. 3, for example, the handwritten character string image “Tokyo” (“U” is unclear) included in the handwritten character data set DB 7. Based on the above, the text information “Tokyo” is generated (inferred). This is because the image “U” is misrecognized as “K” because “U” of the handwritten character string image “Tokyo” is unclear. In this case, for example, the user who operates the character recognition device 100A shown in FIG. 1 may execute manual correction for correcting the misrecognized text information “Tokyo” to the text information “Tokyo”.

このように、学習モデル更新部１４は、第１学習で生成された既存学習モデルを使用して、手書き文字列画像に基づいてテキスト情報を生成（推論）し、誤認識された場合は、ユーザによってマニュアル修正されたテキスト情報を、手書き文字列画像の正解ラベルとしてとして付与する。他方、誤認識されなかった場合は、マニュアル修正を実行せず、生成（推論）されたテキスト情報を、手書き文字列画像の正解ラベルとして付与する。これにより、第２学習では、半自動的にアノテーションが生成される。すなわち、手書き文字列画像とそれに対応する正解ラベル、すなわち、この例では、手書き文字列画像「トウキョウト」に対応するテキスト情報の「トウキョウト」、を含む学習用データが生成される。そして、学習モデル更新部１４は、新たに生成された学習用データを既存学習モデルに追加することにより、学習モデルを強化することができる。 As described above, the learning model update unit 14 generates (infers) text information based on the handwritten character string image using the existing learning model generated in the first learning, and when it is erroneously recognized, The text information manually corrected by is given as a correct label of the handwritten character string image. On the other hand, if no erroneous recognition is made, manual correction is not executed, and the generated (inferred) text information is assigned as a correct label of the handwritten character string image. Thus, in the second learning, an annotation is generated semi-automatically. That is, learning data including a handwritten character string image and a correct answer label corresponding thereto, that is, in this example, “Tokyo” of text information corresponding to the handwritten character string image “Tokyo” is generated. The learning model update unit 14 can reinforce the learning model by adding newly generated learning data to the existing learning model.

この構成によれば、学習モデル更新部１４は、証券Ｃ１の画像データから各項目（氏住所、氏名等）の手書き文字列を抽出し、これらに正解ラベルを付与する。よって、これらの正解ラベルが付与された手書き文字列を第２学習することにより、第１学習で生成された学習モデルを強化することができる。 According to this structure, the learning model update part 14 extracts the handwritten character string of each item (name, address, name, etc.) from the image data of the securities C1, and gives a correct answer label to them. Therefore, the learning model generated in the first learning can be strengthened by performing the second learning on the handwritten character string to which these correct labels are assigned.

図１に戻り、領域特定部１５は、画像データ取得部１１が取得した証券Ｃ１の画像データに基づいて、証券Ｃ１に手書き文字で記入された文字列を含む１又は複数のフィールドを特定する。証券Ｃ１におけるフィールドを特定する手法は様々な手法を採り得るが、一例として、以下では、証券Ｃ１内のフィールドを特定するためのレイアウト情報を使用する手法を説明する。 Returning to FIG. 1, the region specifying unit 15 specifies one or a plurality of fields including a character string written in handwritten characters on the certificate C 1 based on the image data of the certificate C 1 acquired by the image data acquiring unit 11. Various methods can be used for specifying the field in the security C1, but as an example, a method using layout information for specifying the field in the security C1 will be described below.

図１に示すように、文字認識装置１００Ａは、証券Ｃ１における所定位置に対応付けて、フィールドを特定するためのレイアウト情報を記録するレイアウト情報ＤＢ９（記録部）を更に備える。 As shown in FIG. 1, the character recognition device 100A further includes a layout information DB 9 (recording unit) that records layout information for specifying a field in association with a predetermined position in the securities C1.

図６は、第１実施形態に係るレイアウト情報の一例を示す概念図である。図６に示すように、レイアウト情報は、証券テンプレートＩＤごとに、複数のフィールドの各フィールド名と当該フィールドの始点位置と終点位置とが対応付けて格納されている。ここでフィールド名は複数のフィールドのいずれであるかを特定する情報の例となっている。 FIG. 6 is a conceptual diagram showing an example of layout information according to the first embodiment. As shown in FIG. 6, the layout information stores, for each securities template ID, the field names of a plurality of fields and the start position and end position of the fields in association with each other. Here, the field name is an example of information specifying which of a plurality of fields.

図６の例では証券テンプレートＩＤ「００１」について、フィールド名「住所」の位置が始点の座標（Ｘ２１，Ｙ２１）および終点の座標（Ｘ２２，Ｙ２２）で表されている。これにより、フィールド名「住所」のフィールドは、これら始点と終点とで指定される矩形の領域である。これらの座標は、証券Ｃ１全体を予め定められた大きさに正規化したときの位置であることが好ましい。ただし、フィールドの位置の指定方法は図６に示す例に限られず、他の方法が用いられてもよい。 In the example of FIG. 6, for the securities template ID “001”, the position of the field name “address” is represented by the coordinates of the start point (X21, Y21) and the coordinates of the end point (X22, Y22). Thereby, the field of the field name “address” is a rectangular area designated by the start point and the end point. These coordinates are preferably positions when the entire security C1 is normalized to a predetermined size. However, the field position designation method is not limited to the example shown in FIG. 6, and other methods may be used.

レイアウト情報は文字認識装置１００Ａの文字認識処理に先立って、レイアウト情報ＤＢ９に格納される。新たなフォーマットの証券が発行された場合には、文字認識装置１００Ａのユーザ等により、当該証券についてのレイアウト情報がレイアウト情報ＤＢ９に追加されることが好ましい。 The layout information is stored in the layout information DB 9 prior to the character recognition process of the character recognition device 100A. When a securities in a new format is issued, it is preferable that layout information about the securities is added to the layout information DB 9 by the user of the character recognition device 100A or the like.

以上の通り、レイアウト情報は証券テンプレートごとに複数のフィールドのそれぞれを特定する位置の情報が格納されている。領域特定部１５は、レイアウト情報に基づいて、フィールドを特定する。この構成によれば、例えば、互いに異なるレイアウトの複数の証券においても、それぞれにおける住所フィールド２０、及び、氏名フィールド２２等の各フィールドの位置が特定できる。 As described above, the layout information stores information on positions for specifying each of a plurality of fields for each securities template. The area specifying unit 15 specifies a field based on the layout information. According to this configuration, for example, even in a plurality of securities having different layouts, the position of each field such as the address field 20 and the name field 22 can be specified.

なお、証券Ｃ１内のフィールドを特定する手法は上記に限られない。例えば、上記した証券テンプレートを使用しない特定手法の一例については、第３実施形態として説明する The method for specifying the field in the securities C1 is not limited to the above. For example, an example of a specific method that does not use the above-described securities template will be described as a third embodiment.

文字認識部１６は、生成された学習モデル、又は、強化（更新）された学習モデルを用いて、手書き文字領域に記入された文字列の内容を認識する。文字認識部１６は、例えば、ＣＮＮ及びＲＮＮが結合されたＣＲＮＮを有する学習モデルを用いて、証券Ｃ１のフィールドに記入された文字列の内容を認識する。この構成によれば、文字認識部１６は、ＣＮＮ及びＲＮＮが結合されたネットワーク構造を用いて、手書き文字列を認識するので、高精度に手書き文字列を認識することができる。 The character recognition unit 16 recognizes the contents of the character string entered in the handwritten character area using the generated learning model or the enhanced (updated) learning model. The character recognizing unit 16 recognizes the content of the character string entered in the field of the security C1 using, for example, a learning model having a CRNN in which CNN and RNN are combined. According to this configuration, since the character recognition unit 16 recognizes a handwritten character string using a network structure in which CNN and RNN are combined, it can recognize the handwritten character string with high accuracy.

（文字認識処理）
図７及び図８を用いて、本発明の第１実施形態に係る文字認識処理の一例を説明する。図７は、第１実施形態に係る文字認識処理の一例を示すフローチャートである。 (Character recognition processing)
An example of character recognition processing according to the first embodiment of the present invention will be described with reference to FIGS. FIG. 7 is a flowchart illustrating an example of character recognition processing according to the first embodiment.

図７に示すように、図１に示す画像データ取得部１１は、図２に示す証券Ｃ１の画像データを取得する（ステップＳ１）。学習用データ生成部１２は、証券Ｃ１の手書き文字領域に記入されうる１又は複数の単語が登録されたコーパス３と、手書き文字データセットＤＢ５に記録されている１文字単位の手書き文字画像のデータセットＤＳ１と、に基づいて文字列画像及び正解ラベルを含む学習用データを生成する（ステップＳ３）。学習モデル生成部１３は、学習モデルを生成する（ステップＳ５）。なお、学習モデルの生成処理及び更新処理については、図８を参照して後述する。領域特定部１５は、画像データ取得部１１が取得した画像データに基づいて、証券Ｃ１に手書き文字で記入された文字列を含む１又は複数のフィールドを特定する（ステップＳ７）。文字認識部１６は、生成された学習モデル、又は、強化（更新）された学習モデルを用いて、証券Ｃ１に手書き文字で記入された文字列を含む１又は複数のフィールドに記入された文字列の内容を認識する（ステップＳ９）。 As shown in FIG. 7, the image data acquisition unit 11 shown in FIG. 1 acquires the image data of the securities C1 shown in FIG. 2 (step S1). The learning data generation unit 12 is a corpus 3 in which one or a plurality of words that can be entered in the handwritten character area of the certificate C1 is registered, and data of handwritten character images in character units recorded in the handwritten character data set DB5. Based on the set DS1, learning data including a character string image and a correct label is generated (step S3). The learning model generation unit 13 generates a learning model (step S5). The learning model generation process and update process will be described later with reference to FIG. Based on the image data acquired by the image data acquisition unit 11, the region specifying unit 15 specifies one or a plurality of fields including a character string written with handwritten characters on the securities C1 (step S7). The character recognition unit 16 uses the generated learning model or the strengthened (updated) learning model, and the character string written in one or more fields including the character string written in handwritten characters on the securities C1. Is recognized (step S9).

図８は、第１実施形態に係る学習モデル生成処理（図７におけるステップＳ５）の一例を示すフローチャートである。図８に示すように、学習モデル生成部１３は、学習用データ生成部１２が生成した学習用データを用いた第１学習により、学習モデルを生成する（ステップＳ５１）。次に、学習モデル更新部１４は、第１学習の後、証券Ｃ１の画像データから切り出された文字列画像を学習用データとして用いた第２学習により、学習モデルを更新する（ステップＳ５３）。 FIG. 8 is a flowchart showing an example of the learning model generation process (step S5 in FIG. 7) according to the first embodiment. As illustrated in FIG. 8, the learning model generation unit 13 generates a learning model through first learning using the learning data generated by the learning data generation unit 12 (step S51). Next, after the first learning, the learning model update unit 14 updates the learning model by the second learning using the character string image cut out from the image data of the securities C1 as learning data (step S53).

以上、本発明の第１実施形態によれば、コーパス３と、手書き文字データセットＤＢ５に記録されている１文字単位の手書き文字画像のデータセットＤＳ１と、に基づいて文字列画像及び正解ラベルを含む学習用データを生成する。生成された学習用データを用いた第１学習により、学習モデルを生成する。よって、第１学習により生成された学習モデルを用いて、証券Ｃ１における１又は複数のフィールドに記入された文字列の内容を認識することができる。したがって、証券Ｃ１に記入された手書き文字列の認識処理の精度を改善することができる。 As mentioned above, according to 1st Embodiment of this invention, a character string image and a correct answer label are based on corpus 3 and data set DS1 of the handwritten character image of 1 character unit currently recorded on handwritten character data set DB5. Generate learning data including. A learning model is generated by first learning using the generated learning data. Therefore, it is possible to recognize the contents of the character strings entered in one or more fields in the securities C1 using the learning model generated by the first learning. Therefore, it is possible to improve the accuracy of the recognition process of the handwritten character string entered in the securities C1.

＜第２実施形態＞
図９及び図１０を参照して第２実施形態の学習モデル生成処理及び学習モデル更新処理を説明する。第２実施形態は、図１、図３及び図５に示す手書き文字データセットＤＢ５，ＤＢ７に含まれる文字列画像に、帳票に印刷される透かしの少なくとも一部を重畳したものを学習用データとして生成する点で、図１、図３及び図５に示す手書き文字データセットＤＢ５，ＤＢ７に含まれる文字列画像に透かしが重畳されていない第１実施形態とは異なる。また、第２実施形態は、図１、図３及び図５に示す手書き文字データセットＤＢ５，ＤＢ７に含まれる文字列画像に、帳票におけるノイズを重畳したものを学習用データとして生成する点で、図１、図３及び図５に示す手書き文字データセットＤＢ５，ＤＢ７に含まれる文字列画像にノイズが重畳されていない第１実施形態とは異なる。以下では、第１実施形態と異なる点について特に説明する。 Second Embodiment
A learning model generation process and a learning model update process of the second embodiment will be described with reference to FIGS. In the second embodiment, learning data is obtained by superimposing at least a part of a watermark printed on a form on a character string image included in the handwritten character data sets DB5 and DB7 shown in FIGS. In the point which produces | generates, it differs from 1st Embodiment in which the watermark is not superimposed on the character string image contained in handwritten character data set DB5, DB7 shown in FIG.1, FIG3 and FIG.5. In the second embodiment, the character string images included in the handwritten character data sets DB5 and DB7 shown in FIG. 1, FIG. 3 and FIG. This is different from the first embodiment in which noise is not superimposed on the character string images included in the handwritten character data sets DB5 and DB7 shown in FIGS. Below, a different point from 1st Embodiment is demonstrated especially.

図９は、第２実施形態に係る透かしが印刷された証券の一例を示す図である。図９に示すように、証券Ｃ３は、例えば自動車保険証券であり、証券Ｃ３には、「複写」という透かしＷが印刷されている。図１０は、第２実施形態に係る、文字列画像に、証券Ｃ３に印刷される透かしの少なくとも一部を重畳した学習用データの一例を示す図である。
図１０（ａ）に示すように、図１及び図３に示す１文字単位の手書き文字データセットＤＢ５には、例えば、透かしの少なくとも一部を含む１文字単位の手書き文字画像「キ」が複数パターン格納されている。手書き文字データセットＤＢ５には、これに限られず、透かしの少なくとも一部を含む１文字単位の手書き文字画像「ア」…「ン」のそれぞれについて複数パターン格納されてもよい。図１に示す学習用データ生成部１２は、手書き文字データセットＤＢ５から、ランダムに、透かしの少なくとも一部を含む、複数パターンの手書き文字画像「ア」…「ン」を読みだして学習用データを生成する。 FIG. 9 is a diagram illustrating an example of a security printed with a watermark according to the second embodiment. As shown in FIG. 9, the security C3 is, for example, an automobile insurance policy, and a watermark W “copy” is printed on the security C3. FIG. 10 is a diagram illustrating an example of learning data in which at least a part of the watermark printed on the securities C3 is superimposed on the character string image according to the second embodiment.
As shown in FIG. 10A, in the one-character handwritten character data set DB5 shown in FIGS. 1 and 3, for example, a plurality of one-character handwritten character images “ki” including at least a part of a watermark are included. The pattern is stored. The handwritten character data set DB5 is not limited to this, and a plurality of patterns may be stored for each of the handwritten character images “a”. The learning data generation unit 12 shown in FIG. 1 reads out a plurality of patterns of handwritten character images “a”... “N” including at least a part of the watermark from the handwritten character data set DB 5 at random. Is generated.

手書き文字データセットＤＢ５には、透かしの少なくとも一部を含む１文字単位の手書き文字画像のデータセットとして、ひらがな又は漢字のテキスト情報と、ひらがな又は漢字のテキスト情報のそれぞれに対応する、複数の手書き文字画像と、がセットで含まれてもよい。また、手書き文字データセットＤＢ５には、透かしの少なくとも一部を含む外国語の１文字単位の手書き文字画像のデータセットが含まれてもよい。例えば、１文字単位の手書き文字画像のデータセットとして、アルファベットのテキスト情報と、アルファベットのテキスト情報のそれぞれに対応する、複数の手書き文字画像と、がセットで含まれてもよい。 The handwritten character data set DB 5 includes a plurality of handwritten characters corresponding to each of hiragana or kanji text information and hiragana or kanji text information as a data set of handwritten character images including at least a part of the watermark. A character image may be included as a set. In addition, the handwritten character data set DB5 may include a data set of handwritten character images for each character in a foreign language including at least part of the watermark. For example, as a data set of handwritten character images in character units, alphabetic text information and a plurality of handwritten character images corresponding to the alphabetic text information may be included in a set.

図１０（ｂ）に示すように、図１及び図５に示す文字列画像単位の手書き文字データセットＤＢ７には、例えば、透かしの少なくとも一部を含む文字列単位の手書き文字列画像「トウキョウト＊＊＊…」が格納されている。さらに、図１０（ｃ）に示すように、図１及び図５に示す文字列画像単位の手書き文字データセットＤＢ７には、例えば、透かしの少なくとも一部を含む文字列単位の手書き文字列画像「トウキョウト」（図１０（ｂ）に示す手書き文字列画像の一部）が格納されてもよい。
本実施形態に係る学習モデル更新部１４では、図３に示す第１学習の後、証券Ｃ３の画像データ）から切り出された文字列画像であって、透かしの少なくとも一部が重畳された文字列画像を手書き文字データセットＤＢ７から読みだす。そして、学習モデル更新部１４は、例えば、読みだした、透かしの少なくとも一部が重畳された文字列画像を学習用データとして用いた第２学習により、生成された学習モデルを強化（更新）する。 As shown in FIG. 10B, in the handwritten character data set DB 7 for each character string image shown in FIGS. 1 and 5, for example, a handwritten character string image “Tokyo *” including at least a part of the watermark is included. ** ... "is stored. Further, as shown in FIG. 10C, the handwritten character data set DB7 in character string image units shown in FIGS. 1 and 5 includes, for example, a handwritten character string image “in character string units including at least a part of a watermark”. “Tokyo” (part of the handwritten character string image shown in FIG. 10B) may be stored.
The learning model updating unit 14 according to the present embodiment is a character string image cut out from the image data of the security C3 after the first learning shown in FIG. 3, and is a character string on which at least a part of the watermark is superimposed. The image is read from the handwritten character data set DB7. The learning model update unit 14 reinforces (updates) the generated learning model by second learning using, for example, the read character string image on which at least a part of the watermark is superimposed as learning data. .

なお、学習モデルの生成処理、又は、学習モデルの強化処理の少なくとも一方において用いられる学習用データは、文字列画像に、証券に印刷される透かしの少なくとも一部を重畳したものの他、文字列画像に、証券におけるノイズを重畳したものを含んでもよい。 Note that the learning data used in at least one of the learning model generation process and the learning model reinforcement process is a character string image in which at least a part of the watermark printed on the securities is superimposed on the character string image. In addition, it may include noise superimposed on securities.

以上、本発明の第２実施形態によれば、学習モデルの生成処理、又は、学習モデルの強化処理の少なくとも一方において、文字列画像に、帳票に印刷される透かしの少なくとも一部を重畳したものを学習用データとして生成する。よって、透かしが印刷される帳票における文字列画像の認識処理においてロバスト性が向上する。 As described above, according to the second embodiment of the present invention, at least one part of the watermark printed on the form is superimposed on the character string image in at least one of the learning model generation process and the learning model reinforcement process. Is generated as learning data. Therefore, robustness is improved in the character string image recognition process in the form on which the watermark is printed.

また、学習モデルの生成処理、又は、学習モデルの強化処理の少なくとも一方において、文字列画像に、前記帳票におけるノイズを重畳したものを学習用データとして生成する。よって、ノイズが重畳された帳票における文字列画像の認識処理においてロバスト性が向上する。 Further, in at least one of the learning model generation process and the learning model reinforcement process, a character string image in which noise in the form is superimposed is generated as learning data. Therefore, the robustness is improved in the character string image recognition process in the form on which noise is superimposed.

＜第３実施形態＞
図１１を参照して、第３実施形態に係る文字認識装置を説明する。第３実施形態に係る文字認識装置１００Ｂは、帳票のフィールドを特定する処理において、帳票に記載された項目名を含む項目領域を抽出し、項目領域に属性を割り当てる等の処理を行う。第１実施形態では、第３実施形態のこれらの処理は行わず、図１に示すレイアウト情報を参照して帳票のフィールドを特定する点で第３実施形態とは異なる。以下では、第１実施形態と異なる点について特に説明する。 <Third Embodiment>
A character recognition device according to the third embodiment will be described with reference to FIG. The character recognition device 100B according to the third embodiment performs processing such as extracting an item area including an item name described in a form and assigning an attribute to the item area in the process of specifying a form field. The first embodiment is different from the third embodiment in that these processes of the third embodiment are not performed and the form fields are specified with reference to the layout information shown in FIG. Below, a different point from 1st Embodiment is demonstrated especially.

図１１は、第３実施形態に係る文字認識装置の概略構成図（システム構成図）である。図１１に示すように、文字認識装置１００Ｂは、図１に示す第１実施形態に係る文字認識装置１００Ａと比較すると、図１に示すレイアウト情報ＤＢ９は備えておらず、領域特定部１５が、例示的に、項目抽出部１５１と属性割当部１５２とを更に備えている。 FIG. 11 is a schematic configuration diagram (system configuration diagram) of the character recognition device according to the third embodiment. As shown in FIG. 11, the character recognition device 100B does not include the layout information DB 9 shown in FIG. 1 as compared with the character recognition device 100A according to the first embodiment shown in FIG. For example, an item extracting unit 151 and an attribute assigning unit 152 are further provided.

領域特定部１５は、例えば証券に記載された「氏名」や「住所」等の項目名を含む項目領域を抽出対象として、所定のニューラルネットワークを用いて、項目領域を、属性を付与した上で抽出する項目抽出部１５１を備える。項目抽出部１５１は、例えば証券上に活字で印刷された項目名を含む項目領域を抽出対象として、証券の画像データに含まれる項目領域を属性の分類付きで個別に抽出する。例えば、証券の画像データに「氏名」や「住所」などの画像領域が存在する場合、それぞれの画像領域が項目領域として抽出されると共に、それぞれの項目領域に対して「name」や「address」といった属性が付加される。項目領域の抽出は、深層学習による物体検出アルゴリズムを用いて行われる。また、このアルゴリズムに基づき構築された所定の学習モデルを参照して、抽出した項目領域の属性の分類が行われる。また、分類された属性については、その分類確度も算出・出力されてもよい。 The area specifying unit 15 uses, for example, an item area including an item name such as “name” and “address” described in the securities as an extraction target and assigns an attribute to the item area using a predetermined neural network. An item extraction unit 151 for extraction is provided. The item extraction unit 151 extracts, for example, item areas included in the image data of the securities with attribute classifications, by using, as extraction targets, item areas including the item names printed in print on the securities. For example, when there are image areas such as “name” and “address” in the image data of securities, each image area is extracted as an item area, and “name” or “address” is assigned to each item area. Such attributes are added. The item area is extracted using an object detection algorithm based on deep learning. Further, the attribute of the extracted item region is classified with reference to a predetermined learning model constructed based on this algorithm. In addition, for the classified attributes, the classification accuracy may be calculated and output.

領域特定部１５は、証券の画像データにおける項目領域の位置及び属性に基づいて、項目領域と、当該項目領域の近傍に位置するフィールド（手書き文字領域）とを対応づけ、且つ、フィールドに対して項目領域の属性を割り当てる属性割当部１５２を備える。属性割当部１５２は、証券の画像データにおける項目領域の位置およびその属性に基づいて、証券画像のレイアウトを解析し、どの属性に関する情報がどこに記入されているのかを特定する。具体的には、証券におけるフィールドのそれぞれに対して、項目抽出部１５１によって分類された属性のいずれかが割り当てられる。基本的に、証券の画像データにおいて、ある項目領域と、ある手書き文字領域とが近接、すなわち、両者の距離が所定のしきい値以下である場合、両者の対応付けが行われる。そして、このフィールド（手書き文字領域）に対して、項目領域の属性が割り当てられる。例えば、「name」という属性を有する項目領域の近傍にフィールドが存在する場合、このフィールドに対して「name」という属性が割り当てられる。また、項目領域とフィールドとの具体的な対応規則については、所定の対応規則テーブル等において予め設定・定義されている。 The area specifying unit 15 associates an item area with a field (handwritten character area) located near the item area based on the position and attribute of the item area in the image data of the security, and An attribute assigning unit 152 that assigns the attribute of the item area is provided. The attribute assigning unit 152 analyzes the layout of the securities image based on the position of the item area in the image data of the securities and the attributes thereof, and identifies where the information regarding which attribute is entered. Specifically, one of the attributes classified by the item extraction unit 151 is assigned to each field in the securities. Basically, in a security image data, when a certain item area and a certain handwritten character area are close to each other, that is, when the distance between them is equal to or smaller than a predetermined threshold value, they are associated with each other. The attribute of the item area is assigned to this field (handwritten character area). For example, if a field exists in the vicinity of an item area having the attribute “name”, the attribute “name” is assigned to this field. Further, specific correspondence rules between item areas and fields are set and defined in advance in a predetermined correspondence rule table or the like.

以上、第３実施形態によれば、証券の画像データに含まれる項目領域と、その属性とが取得される。これらの情報から、帳票画像中のどの位置にどのような情報が記載されているのかを特定することができる。これにより、予め、文字認識装置に登録されていない未知の証券であっても、レイアウト解析を行うことが可能になる。 As described above, according to the third embodiment, the item areas included in the image data of securities and their attributes are acquired. From these pieces of information, it is possible to specify what information is written at which position in the form image. This makes it possible to perform layout analysis even for unknown securities that are not registered in advance in the character recognition device.

＜第４実施形態＞
図１２を参照して、第４実施形態に係る文字認識装置及び学習モデル生成装置を説明する。図１２に示す第４実施形態に係る文字認識装置１００Ｃ及び学習モデル生成装置２００は、図１に示す第１実施形態に係る文字認識装置１００Ａが備える各構成が分離されて構成されたものである。文字認識装置１００Ｃは、学習モデル生成装置２００で生成された、例えば、第１ニューラルネットワーク及び第２ニューラルネットワークが結合されたネットワーク構造を有する学習モデルを用いて、フィールドに記入された文字列の内容を認識する。また、文字認識装置１００Ｃと学習モデル生成装置２００とが通信ネットワークＮを介してデータの送受信が可能なように構成されるものでもよい。ただし、これに限らず、学習モデル生成装置２００で生成された学習モデルは、任意の手段で、文字認識装置１００Ｃの主記録装置に格納されるものである。なお、同様に、図１１に示す第３実施形態に係る文字認識装置１００Ｂについても、文字認識装置１００Ｂが備える各構成が分離されて文字認識装置及び学習モデル生成装置が構成されてもよい。また、文字認識装置１００Ｂが備える各構成が分離された、文字認識装置と学習モデル生成装置とが通信ネットワークＮを介してデータの送受信が可能なように構成されるものでもよい。 <Fourth embodiment>
A character recognition device and a learning model generation device according to the fourth embodiment will be described with reference to FIG. The character recognition device 100C and the learning model generation device 200 according to the fourth embodiment shown in FIG. 12 are configured by separating the components included in the character recognition device 100A according to the first embodiment shown in FIG. . The character recognition device 100C uses the learning model generated by the learning model generation device 200, for example, a learning model having a network structure in which the first neural network and the second neural network are combined, and the contents of the character string entered in the field. Recognize Further, the character recognition device 100C and the learning model generation device 200 may be configured to be able to transmit and receive data via the communication network N. However, the present invention is not limited to this, and the learning model generated by the learning model generation device 200 is stored in the main recording device of the character recognition device 100C by any means. Similarly, for the character recognition device 100B according to the third embodiment shown in FIG. 11, the character recognition device and the learning model generation device may be configured by separating the components included in the character recognition device 100B. In addition, the character recognition device and the learning model generation device may be configured to be able to transmit and receive data via the communication network N, in which the components included in the character recognition device 100B are separated.

以上、第４実施形態によれば、第１実施形態に係る文字認識装置１００Ａ又は第３実施形態に係る文字認識装置１００Ｃは、別個の装置である文字認識装置及び学習モデル生成装置を構成可能である。 As described above, according to the fourth embodiment, the character recognition device 100A according to the first embodiment or the character recognition device 100C according to the third embodiment can configure a character recognition device and a learning model generation device which are separate devices. is there.

＜第５実施形態＞
図１３を参照して、第５実施形態に係る文字認識装置及び外部装置を説明する。図１３は、第５実施形態に係る文字認識装置及び外部装置の概略構成図（システム構成図）である。図１３に示すように、第５実施形態に係る文字認識装置１００Ｃは、外部装置５０によって生成される画像データを所定の通信ネットワークＮを介して取得するものでもよい。 <Fifth Embodiment>
A character recognition device and an external device according to the fifth embodiment will be described with reference to FIG. FIG. 13 is a schematic configuration diagram (system configuration diagram) of a character recognition device and an external device according to the fifth embodiment. As shown in FIG. 13, the character recognition device 100 C according to the fifth embodiment may acquire image data generated by the external device 50 via a predetermined communication network N.

以上、第５実施形態によれば、文字認識装置１００Ｃは、外部装置５０によって生成される画像データを取得し、取得した画像データに基づいて文字認識処理を実行することができる。 As described above, according to the fifth embodiment, the character recognition device 100 C can acquire the image data generated by the external device 50 and execute the character recognition process based on the acquired image data.

図１４は、本発明の実施形態に係るコンピュータのハードウェア構成の一例を示す図である。図１４を参照して、図１及び１５に示す文字認識装置１００Ａ、図１１に示す文字認識装置１００Ｂ、図１２及び１３に示す文字認識装置１００Ｃ、図１２に示す学習モデル生成装置２００、並びに、図１３及び１５に示す外部装置を構成するのに用いることができるコンピュータのハードウェア構成の一例について説明する。 FIG. 14 is a diagram illustrating an example of a hardware configuration of a computer according to the embodiment of the present invention. 14, character recognition device 100A shown in FIGS. 1 and 15, character recognition device 100B shown in FIG. 11, character recognition device 100C shown in FIGS. 12 and 13, learning model generation device 200 shown in FIG. 12, and An example of the hardware configuration of a computer that can be used to configure the external device shown in FIGS. 13 and 15 will be described.

図１４に示すように、コンピュータ４０は、ハードウェア資源として、主に、プロセッサ４１と、主記録装置４２と、補助記録装置４３と、入出力インターフェース４４と、通信インターフェース４５とを備えており、これらはアドレスバス、データバス、コントロールバス等を含むバスライン４６を介して相互に接続されている。なお、バスライン４６と各ハードウェア資源との間には適宜インターフェース回路（図示せず）が介在している場合もある。 As shown in FIG. 14, the computer 40 mainly includes a processor 41, a main recording device 42, an auxiliary recording device 43, an input / output interface 44, and a communication interface 45 as hardware resources. These are connected to each other via a bus line 46 including an address bus, a data bus, a control bus, and the like. An interface circuit (not shown) may be interposed between the bus line 46 and each hardware resource as appropriate.

プロセッサ４１は、コンピュータ全体の制御を行う。プロセッサ４１は、例えば、図１及び図１１に示す情報処理部１に相当する。主記録装置４２は、プロセッサ４１に対して作業領域を提供し、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）やＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の揮発性メモリである。補助記録装置４３は、ソフトウェアであるプログラム等やデータ等を格納する、ＨＤＤやＳＳＤ、フラッシュメモリ等の不揮発性メモリである。当該プログラムやデータ等は、任意の時点で補助記録装置４３からバスライン４６を介して主記録装置４２へとロードされる。補助記録装置４３は、例えば、図１に示すコーパス３、手書き文字データセットＤＢ５、手書き文字データセットＤＢ７、及び、レイアウト情報ＤＢ９に相当する。また、補助記録装置４３は、例えば、図１１に示すコーパス３、手書き文字データセットＤＢ５、及び、手書き文字データセットＤＢ７に相当する。 The processor 41 controls the entire computer. The processor 41 corresponds to, for example, the information processing unit 1 illustrated in FIGS. 1 and 11. The main recording device 42 provides a work area to the processor 41 and is a volatile memory such as an SRAM (Static Random Access Memory) or a DRAM (Dynamic Random Access Memory). The auxiliary recording device 43 is a non-volatile memory such as an HDD, an SSD, or a flash memory that stores software programs and data. The program, data, and the like are loaded from the auxiliary recording device 43 to the main recording device 42 via the bus line 46 at an arbitrary time. The auxiliary recording device 43 corresponds to, for example, the corpus 3, the handwritten character data set DB5, the handwritten character data set DB7, and the layout information DB9 shown in FIG. The auxiliary recording device 43 corresponds to, for example, the corpus 3, the handwritten character data set DB5, and the handwritten character data set DB7 shown in FIG.

入出力インターフェース４４は、情報を提示すること及び情報の入力を受けることの一方又は双方を行うものであり、カメラ、キーボード、マウス、ディスプレイ、タッチパネル・ディスプレイ、マイク、スピーカ、温度センサ等である。通信インターフェース４５は、図１、１１及び１２に示す通信ネットワークＮと接続されるものであり、通信ネットワークＮを介してデータを送受する。通信インターフェース４５と通信ネットワークＮとは、有線又は無線で接続されうる。通信インターフェース４５は、ネットワークに係る情報、例えば、Ｗｉ−Ｆｉのアクセスポイントに係る情報、通信キャリアの基地局に関する情報等も取得することがある。 The input / output interface 44 performs one or both of presenting information and receiving input of information, and includes a camera, a keyboard, a mouse, a display, a touch panel display, a microphone, a speaker, a temperature sensor, and the like. The communication interface 45 is connected to the communication network N shown in FIGS. 1, 11, and 12, and transmits and receives data via the communication network N. The communication interface 45 and the communication network N can be connected by wire or wireless. The communication interface 45 may also acquire information relating to the network, for example, information relating to Wi-Fi access points, information relating to communication carrier base stations, and the like.

上に例示したハードウェア資源とソフトウェアとの協働により、コンピュータ４０は、所望の手段として機能し、所望のステップを実行し、所望の機能を実現させることできることは、当業者には明らかである。 It will be apparent to those skilled in the art that the computer 40 can function as a desired means, execute a desired step, and realize a desired function by cooperating with the hardware resources and software exemplified above. .

なお、上記各実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するものではない。本発明はその趣旨を逸脱することなく、変更／改良され得るとともに、本発明にはその等価物も含まれる。また、本発明は、上記各実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の開示を形成できるものである。例えば、実施形態に示される全構成要素から幾つかの構成要素は削除してもよいものである。さらに、異なる実施形態に構成要素を適宜組み合わせてもよいものである。 In addition, each said embodiment is for making an understanding of this invention easy, and does not limit this invention and interpret it. The present invention can be changed / improved without departing from the gist thereof, and the present invention includes equivalents thereof. Further, the present invention can form various disclosures by appropriately combining a plurality of constituent elements disclosed in the respective embodiments. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements may be appropriately combined in different embodiments.

１，１Ａ，１Ｂ…情報処理部、３…コーパス、５，７…手書き文字データセット、９…レイアウト情報ＤＢ、１１…画像データ取得部、１２…学習用データ生成部、１３…学習モデル生成部、１４…学習モデル更新部、１５…領域特定部、１６…文字認識部、４１…プロセッサ、４２…主記録装置、４３…補助記録装置、４４…入出力インターフェース、４５…通信インターフェース、４６…バス、５０…外部装置、１００Ａ，１００Ｂ，１００Ｃ…文字認識装置、１５１…項目抽出部、１５２…属性割当部、２００…学習モデル生成装置 DESCRIPTION OF SYMBOLS 1,1A, 1B ... Information processing part, 3 ... Corpus, 5, 7 ... Handwritten character data set, 9 ... Layout information DB, 11 ... Image data acquisition part, 12 ... Learning data generation part, 13 ... Learning model generation part , 14 ... Learning model update unit, 15 ... Area specifying unit, 16 ... Character recognition unit, 41 ... Processor, 42 ... Main recording device, 43 ... Auxiliary recording device, 44 ... Input / output interface, 45 ... Communication interface, 46 ... Bus 50 ... External device, 100A, 100B, 100C ... Character recognition device, 151 ... Item extraction unit, 152 ... Attribute assignment unit, 200 ... Learning model generation device

Claims

A database in which one or more words that can be entered in the handwritten character area of the form are registered;
A learning data generation unit that generates learning data including a character string image and a correct answer label based on a data set of handwritten character images in units of one character;
A learning model generation unit that generates a learning model by first learning using the learning data;
Learning model generation device.

A learning model updating unit that updates the learning model by second learning using a character string image cut out from the image data of the form as learning data after the first learning;
The learning model generation apparatus according to claim 1.

The learning data generation unit generates learning data by superimposing at least a part of a watermark printed on the form on the character string image.
The learning model generation apparatus according to claim 1 or 2.

The learning data generation unit generates, as learning data, a noise image superimposed on the character string image.
The learning model production | generation apparatus as described in any one of Claims 1-3.

The learning model generation unit
A learning model is generated from a network structure in which a first neural network and a second neural network are combined;
The first neural network comprises a convolutional neural network;
Calculating a feature amount map related to the character string included in the handwritten character region;
The second neural network is constituted by a recurrent neural network, and calculates a character string index from the feature map.
The learning model production | generation apparatus as described in any one of Claims 1-4.

At least one area of the handwritten character area is a handwritten character area related to an address;
Wherein the handwritten character area co associated with the Pasu relates Address, state, city name, area name, or a combination comprising at least one name of a building name is registered,
The learning model production | generation apparatus as described in any one of Claims 1-5.

An image data acquisition unit that acquires image data generated by imaging the form with an imaging device via a communication network;
The learning model production | generation apparatus as described in any one of Claims 1-6.

An area specifying unit for specifying one or a plurality of handwritten character areas including a character string written with handwritten characters based on the image data of the form;
Using a learning model generated by the learning model generating unit according to any one of claims 1 to 7, and a character recognizing unit for recognizing the content of a character string written in the handwritten character region,
Character recognition device.

A recording unit for recording layout information for specifying the handwritten character region in association with a predetermined position in the form;
The region specifying unit specifies the handwritten character region based on the layout information.
The character recognition device according to claim 8 .

The region specifying unit includes:
An item extraction unit that extracts an item area including an item name described in the form with an attribute;
An attribute for associating the item region with the handwritten character region located in the vicinity of the item region and assigning the attribute of the handwritten character region based on the position of the item region in the image data of the form and the attribute An allocation unit;
The character recognition device according to claim 8 or 9 .

A learning model generation method executed by a computer that generates a learning model,
A database in which one or more words that can be entered in the handwritten character area of the form are registered;
Generating learning data including a character string image and a correct label based on a data set of handwritten character images in character units;
Generating a learning model by first learning using the learning data;
including,
Learning model generation method.

Computer
A database in which one or more words that can be entered in the handwritten character area of the form are registered;
A learning data generation unit that generates learning data including a character string image and a correct answer label based on a data set of handwritten character images in units of one character;
A learning model generation unit that generates a learning model by first learning using the learning data;
Program to make it work.