JP2020027598A

JP2020027598A - Character recognition device, character recognition method, and character recognition program

Info

Publication number: JP2020027598A
Application number: JP2018244224A
Authority: JP
Inventors: 清水　亮; Akira Shimizu; 亮清水; 哲朗増田; Tetsuro MASUDA; 彰洋溝畑; Akihiro Mizohata; 克人新井; Katsuto Arai; 純子安田; Junko Yasuda
Original assignee: Sigmaxyz Inc; Uei Corp
Current assignee: Sigmaxyz Inc; Uei Corp
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2020-02-20

Abstract

To provide a character recognition device, a character recognition method and a character recognition program that are capable of constructing an image recognition model even for a document including multiple items so as to properly extract regions of the multiple items.SOLUTION: A character recognition device 10 to recognize characters written in a document comprises: a first setting unit 11a to set multiple regions for an image in the document on the basis of input; a second setting unit 11b to set multiple sub-regions included in any of the multiple regions on the basis of the input; a third setting unit 11c to set region types for the multiple regions and the multiple sub-regions on the basis of the input; and a generation unit 12 to generate learning data used for supervised learning of one or more image recognition models, including positions of the multiple regions in the image, positions of the multiple sub-regions in the multiple regions, and the region types.SELECTED DRAWING: Figure 1

Description

本発明は、文字認識装置、文字認識方法及び文字認識プログラムに関する。 The present invention relates to a character recognition device, a character recognition method, and a character recognition program.

従来、ＯＣＲ（Optical Character Recognition）と呼ばれる画像に含まれる文字を認識する技術が用いられている。画像中の文字の記載箇所や大きさが統一されている場合、ＯＣＲによって高い精度で文字を認識し、画像から文字情報を抽出することができる。 Conventionally, a technique called OCR (Optical Character Recognition) for recognizing characters included in an image has been used. When the description location and size of characters in an image are unified, characters can be recognized with high accuracy by OCR, and character information can be extracted from the image.

例えば、下記特許文献１には、領収書の画像データから文字列情報と位置情報とを取得する画像処理プログラムが記載されている。特許文献１に記載された画像処理プログラムは、位置情報に対応する位置にマーカーを表示し、項目情報の種類に応じて、複数の値を合算する第一処理又は複数の値のうち１つを選択する第二処理を実行する。 For example, Patent Document 1 below describes an image processing program that acquires character string information and position information from image data of a receipt. The image processing program described in Patent Literature 1 displays a marker at a position corresponding to position information, and performs one of a first process of adding a plurality of values or one of a plurality of values according to the type of item information. Execute the second process to be selected.

特開２０１６−１２６３５６号公報JP-A-2006-126356

画像中の文字の記載箇所や大きさが統一されている場合、従来のＯＣＲ技術によって高い精度で画像から文字情報を抽出することができる。しかしながら、複数の項目が含まれる書類の画像の場合、従来のＯＣＲ技術では複数の項目に記載された文字を適切に認識することが難しかった。 When the description location and size of characters in an image are unified, character information can be extracted from the image with high accuracy by the conventional OCR technology. However, in the case of a document image including a plurality of items, it has been difficult for the conventional OCR technology to appropriately recognize the characters described in the plurality of items.

そのため、書類画像に含まれる複数の項目の領域を抽出し、複数の項目の領域それぞれについてＯＣＲ技術を適用して文字を認識することが検討されている。ここで、書類画像から複数の項目の領域を抽出するために、ニューラルネットワーク等の画像認識モデルを用いることがある。 Therefore, it has been studied to extract regions of a plurality of items included in a document image and recognize characters by applying the OCR technique to each of the regions of the plurality of items. Here, an image recognition model such as a neural network may be used to extract regions of a plurality of items from a document image.

しかしながら、書類に含まれる項目の数は、画像認識モデルによって区別可能な領域の数よりも多い場合があり、書類画像に含まれる複数の項目の領域が適切に抽出できないことがある。 However, the number of items included in the document may be larger than the number of regions that can be distinguished by the image recognition model, and the region of a plurality of items included in the document image may not be appropriately extracted.

そこで、本発明は、複数の項目が含まれる書類であっても、複数の項目の領域を適切に抽出するように画像認識モデルを構築することができる文字認識装置、文字認識方法及び文字認識プログラムを提供する。 Therefore, the present invention provides a character recognition device, a character recognition method, and a character recognition program capable of constructing an image recognition model so as to appropriately extract regions of a plurality of items even in a document including a plurality of items. I will provide a.

本発明の一態様に係る文字認識装置は、書類に記載された文字を認識する文字認識装置であって、入力に基づいて、書類の画像について複数の領域を設定する第１設定部と、入力に基づいて、複数の領域いずれかに含まれる複数の副領域を設定する第２設定部と、入力に基づいて、複数の領域及び複数の副領域に対して、領域の種類を設定する第３設定部と、画像における複数の領域の位置、複数の領域における複数の副領域の位置及び領域の種類を含む、１又は複数の画像認識モデルの教師有り学習に用いる学習用データを生成する生成部と、を備える。 A character recognition device according to one aspect of the present invention is a character recognition device for recognizing characters described in a document, comprising: a first setting unit configured to set a plurality of regions in a document image based on an input; A second setting unit that sets a plurality of sub-regions included in any of the plurality of regions based on the input; A setting unit and a generation unit that generates learning data used for supervised learning of one or a plurality of image recognition models, including positions of a plurality of regions in an image, positions of a plurality of sub-regions in the plurality of regions, and types of regions. And.

この態様によれば、書類に含まれる項目の数が、画像認識モデルによって区別可能な領域の数よりも多い場合であっても、複数の領域の数を画像認識モデルによって区別可能な領域の数よりも少なく設定し、複数の領域いずれかに含まれる複数の副領域を設定して、複数の項目に対応する領域を設定することができる。これにより、複数の領域と複数の副領域を段階的に識別し、それらの領域の種類を識別する画像認識モデルを構築することができ、複数の項目が含まれる書類であっても、複数の項目の領域を適切に抽出するように画像認識モデルを構築することができる。 According to this aspect, even when the number of items included in the document is larger than the number of areas that can be distinguished by the image recognition model, the number of the plurality of areas is reduced by the number of areas that can be distinguished by the image recognition model. It is possible to set a smaller number, set a plurality of sub-areas included in any of the plurality of areas, and set an area corresponding to a plurality of items. This makes it possible to identify a plurality of areas and a plurality of sub-areas in a stepwise manner and to construct an image recognition model for identifying the type of those areas. Even if the document includes a plurality of items, a plurality of areas can be identified. An image recognition model can be constructed so as to appropriately extract a region of an item.

上記態様において、生成部は、１又は複数の画像認識モデルの教師有り学習に用いる複数種類の学習用データを生成してもよい。 In the above aspect, the generation unit may generate a plurality of types of learning data used for supervised learning of one or a plurality of image recognition models.

この態様によれば、複数の領域、複数の副領域及び領域の種類を一度設定することで、１又は複数の画像認識モデルの教師有り学習に用いる複数種類の学習用データをまとめて生成することができ、複数種類の学習用データを効率的に生成することができる。 According to this aspect, by setting a plurality of regions, a plurality of sub-regions, and a type of the region once, a plurality of types of learning data used for supervised learning of one or a plurality of image recognition models are collectively generated. Thus, a plurality of types of learning data can be efficiently generated.

上記態様において、生成部は、複数の領域及び複数の副領域の輪郭を、領域の種類毎に異なる態様で表した第１種の学習用データと、複数の領域及び複数の副領域を、領域の種類毎に異なる態様で塗り潰した第２種の学習用データと、を生成してもよい。 In the above aspect, the generation unit may include a first type of learning data in which the outlines of the plurality of regions and the plurality of sub-regions are expressed in a different manner for each type of region, and the plurality of regions and the plurality of sub-regions. And the second type of learning data painted in a different manner for each type.

この態様によれば、第１種の学習用データによって、複数の領域及び複数の副領域を区別可能な態様で囲むバウンディングボックスを画像に上書きする第１画像認識モデルを生成することができる。また、第２種の学習用データによって、複数の領域及び複数の副領域を区別可能な態様で塗り潰した画像を出力する第２画像認識モデルを生成することができる。 According to this aspect, it is possible to generate a first image recognition model that overwrites an image with a bounding box that surrounds a plurality of regions and a plurality of sub-regions in a distinguishable manner, using the first type of learning data. In addition, a second image recognition model that outputs an image in which a plurality of areas and a plurality of sub-areas are filled in a distinguishable manner can be generated based on the second type of learning data.

上記態様において、１又は複数の画像認識モデルに新たな画像を入力し、１又は複数の画像認識モデルの出力に基づいて、新たな画像に含まれる複数の領域、複数の領域いずれかに含まれる複数の副領域並びに複数の領域及び複数の副領域に対する領域の種類を出力する画像認識部と、複数の領域の画像及び複数の副領域の画像を文字認識モデルに入力し、文字認識モデルの出力に基づいて、複数の領域に含まれる文字及び複数の副領域に含まれる文字を出力する文字認識部と、領域の種類に応じた補正規則を選択し、文字を補正する補正部と、をさらに備えてもよい。 In the above aspect, a new image is input to one or more image recognition models, and is included in any of a plurality of regions included in the new image and a plurality of regions based on outputs of the one or more image recognition models. An image recognition unit that outputs a plurality of sub-regions and a plurality of regions and an area type for the plurality of sub-regions, and inputs the image of the plurality of regions and the image of the plurality of sub-regions to a character recognition model, and outputs the character recognition model A character recognition unit that outputs a character included in the plurality of regions and a character included in the plurality of sub-regions, and a correction unit that selects a correction rule according to the type of the region and corrects the character. May be provided.

この態様によれば、書類に含まれる項目の数が、画像認識モデルによって区別可能な領域の数よりも多い場合であっても、画像認識部によって段階的に全ての領域を識別することができる。また、文字認識部によって、複数の領域及び複数の副領域に含まれる文字を認識し、補正部によって領域の種類に応じた文字の補正を行うことで、書類に含まれる複数の項目に記載された文字を高精度で出力することができる。 According to this aspect, even when the number of items included in the document is larger than the number of regions that can be distinguished by the image recognition model, all the regions can be identified stepwise by the image recognition unit. . In addition, the character recognition unit recognizes characters included in the plurality of regions and the plurality of sub-regions, and corrects characters according to the type of the region using the correction unit. Characters can be output with high precision.

上記態様において、補正部は、複数の正規表現のいずれかを用いて文字の一部を抽出することで、文字を補正してもよい。 In the above aspect, the correction unit may correct the character by extracting a part of the character using one of the plurality of regular expressions.

この態様によれば、記載される文字が定型化されている場合に、文字の一部を抽出することができる。例えば、文字の中から必要な数値を抽出することができる。 According to this aspect, when the character to be described is standardized, a part of the character can be extracted. For example, necessary numerical values can be extracted from characters.

上記態様において、補正部は、文字と候補となる複数の文字列との編集距離を用いて、文字を候補となる複数の文字列のいずれかに置換することで、文字を補正してもよい。 In the above aspect, the correction unit may correct the character by using the edit distance between the character and the plurality of candidate character strings to replace the character with one of the plurality of candidate character strings. .

この態様によれば、項目に記載される文字が限定されている場合に、記載され得ない文字列を排除して、候補となる複数の文字列のいずれかに補正することができる。 According to this aspect, when the characters described in the item are limited, a character string that cannot be described can be excluded and corrected to one of a plurality of candidate character strings.

上記態様において、補正部は、文字コードの範囲を限定して、文字を補正してもよい。 In the above aspect, the correction unit may correct the character by limiting the range of the character code.

この態様によれば、項目に記載される文字コードの範囲が限定されている場合に、記載され得ない文字コードを排除して、文字コードの範囲を限定して文字を補正することができる。 According to this aspect, when the range of the character code described in the item is limited, the character code that cannot be described can be excluded, and the character can be corrected by limiting the range of the character code.

上記態様において、文字認識部は、複数の文字認識モデルのうち、出力の信用度が高い文字認識モデルを選択し、選択した文字認識モデルの出力に基づいて、複数の領域に含まれる文字及び複数の副領域に含まれる文字を出力してもよい。 In the above aspect, the character recognition unit selects a character recognition model having a high degree of credibility of output from among the plurality of character recognition models, and, based on an output of the selected character recognition model, a character included in the plurality of regions and a plurality of characters. Characters included in the sub area may be output.

この態様によれば、出力の信用度が高い文字認識モデルを選択することで、文字認識精度をより向上させることができる。 According to this aspect, the character recognition accuracy can be further improved by selecting a character recognition model having a high output reliability.

上記態様において、文字認識部は、複数の領域及び複数の副領域毎に文字認識モデルを選択してもよい。 In the above aspect, the character recognition unit may select a character recognition model for each of the plurality of regions and the plurality of sub-regions.

この態様によれば、複数の領域及び複数の副領域毎に、適した文字認識モデルを選択することができ、文字認識精度をより向上させることができる。 According to this aspect, a suitable character recognition model can be selected for each of the plurality of regions and the plurality of sub-regions, and the character recognition accuracy can be further improved.

上記態様において、画像認識部は、１又は複数の画像認識モデルのうち、出力の信用度が高い画像認識モデルを選択し、選択した画像認識モデルの出力に基づいて、複数の領域、複数の副領域及び領域の種類を出力してもよい。 In the above aspect, the image recognizing unit selects an image recognition model having a high degree of credibility of output from one or a plurality of image recognition models, and, based on an output of the selected image recognition model, a plurality of regions and a plurality of sub-regions. And the type of area.

この態様によれば、出力の信用度が高い画像認識モデルを選択することで、書類に含まれる複数の項目の認識精度をより向上させることができる。 According to this aspect, it is possible to further improve the recognition accuracy of a plurality of items included in the document by selecting an image recognition model having a high output credibility.

上記態様において、画像認識部は、書類の種類毎に画像認識モデルを選択してもよい。 In the above aspect, the image recognition unit may select an image recognition model for each type of document.

この態様によれば、書類の種類毎に、適した画像認識モデルを選択することができ、書類に含まれる複数の項目の認識精度をより向上させることができる。 According to this aspect, a suitable image recognition model can be selected for each type of document, and the recognition accuracy of a plurality of items included in the document can be further improved.

上記態様において、入力に基づいて、画像認識部により出力された複数の領域、複数の副領域及び領域の種類と、文字認識部により出力された文字と、補正部により補正された文字との少なくともいずれかを修正し、修正されたデータを学習用データに追加する修正部をさらに備えてもよい。 In the above aspect, based on the input, at least one of the plurality of regions output by the image recognition unit, the plurality of sub-regions and the type of the region, the character output by the character recognition unit, and the character corrected by the correction unit. A correction unit that corrects any of them and adds the corrected data to the learning data may be further provided.

この態様によれば、画像認識モデル及び文字認識モデルいずれかの出力が誤っていた場合に、その誤りを修正したデータを学習用データに追加することができ、画像認識モデル及び文字認識モデルの出力精度をより向上させる学習用データを生成することができる。 According to this aspect, when either of the output of the image recognition model and the output of the character recognition model is erroneous, the data in which the error is corrected can be added to the learning data. It is possible to generate learning data that further improves accuracy.

上記態様において、所定のパラメータが設定された学習プログラム及び学習用データを用いて、１又は複数の画像認識モデル及び複数の文字認識モデルの少なくともいずれかの再学習を行う学習部をさらに備えてもよい。 In the above aspect, a learning unit that re-learns at least one of one or a plurality of image recognition models and a plurality of character recognition models using a learning program in which predetermined parameters are set and learning data may be further provided. Good.

この態様によれば、画像認識モデル及び文字認識モデルいずれかの再学習を行うことで、画像認識モデル及び文字認識モデルの出力精度を継続的に向上させることができる。 According to this aspect, the output accuracy of the image recognition model and the character recognition model can be continuously improved by re-learning either the image recognition model or the character recognition model.

上記態様において、学習部は、１又は複数の画像認識モデル及び文字認識モデルの少なくともいずれかの再学習後の出力精度が再学習前の出力精度より低い場合に、所定のパラメータを変更して、１又は複数の画像認識モデル及び文字認識モデルの少なくともいずれかの再学習を実行し直してもよい。 In the above aspect, the learning unit changes a predetermined parameter when the output accuracy after re-learning of at least one of the one or more image recognition models and the character recognition models is lower than the output accuracy before re-learning, Re-learning of at least one of one or more image recognition models and character recognition models may be performed again.

この態様によれば、仮に画像認識モデル及び文字認識モデルいずれかの再学習によって出力精度が低下した場合に、学習プログラムのパラメータを変更して再学習を実行し直し、画像認識モデル及び文字認識モデルの出力精度が向上するようにすることができる。 According to this aspect, if the output accuracy is reduced due to re-learning of either the image recognition model or the character recognition model, the parameters of the learning program are changed and re-learning is performed again, and the image recognition model and the character recognition model are re-executed. Output accuracy can be improved.

本発明の他の態様に係る文字認識方法は、書類に記載された文字を認識する文字認識方法であって、入力に基づいて、書類の画像について複数の領域を設定することと、入力に基づいて、複数の領域いずれかに含まれる複数の副領域を設定することと、入力に基づいて、複数の領域及び複数の副領域に対して、領域の種類を設定することと、画像における複数の領域の位置、複数の領域における複数の副領域の位置及び領域の種類を含む、１又は複数の画像認識モデルの教師有り学習に用いる学習用データを生成することと、を含む。 A character recognition method according to another aspect of the present invention is a character recognition method for recognizing characters described in a document, wherein a plurality of areas are set for an image of the document based on an input, and Setting a plurality of sub-regions included in any of the plurality of regions; setting a region type for the plurality of regions and the plurality of sub-regions based on the input; Generating learning data used for supervised learning of one or a plurality of image recognition models including the position of the region, the positions of the plurality of sub-regions in the plurality of regions, and the type of the region.

本発明の他の態様に係る文字認識プログラムは、書類に記載された文字を認識する文字認識装置に備えられたプロセッサを、入力に基づいて、書類の画像について複数の領域を設定する第１設定部、入力に基づいて、複数の領域いずれかに含まれる複数の副領域を設定する第２設定部、入力に基づいて、複数の領域及び複数の副領域に対して、領域の種類を設定する第３設定部、及び画像における複数の領域の位置、複数の領域における複数の副領域の位置及び領域の種類を含む、１又は複数の画像認識モデルの教師有り学習に用いる学習用データを生成する生成部、として機能させる。 According to another aspect of the present invention, there is provided a character recognition program for a processor provided in a character recognition device for recognizing a character described in a document, the first setting for setting a plurality of regions in an image of the document based on an input. A second setting unit that sets a plurality of sub-regions included in any of the plurality of regions based on the input, and sets a region type for the plurality of regions and the plurality of sub-regions based on the input The third setting unit generates learning data used for supervised learning of one or a plurality of image recognition models, including positions of a plurality of regions in the image, positions of a plurality of subregions in the plurality of regions, and types of the regions. Function as a generating unit.

本発明によれば、複数の項目が含まれる書類であっても、複数の項目の領域を適切に抽出するように画像認識モデルを構築することができる文字認識装置、文字認識方法及び文字認識プログラムが提供される。 According to the present invention, a character recognition device, a character recognition method, and a character recognition program capable of constructing an image recognition model so as to appropriately extract regions of a plurality of items even in a document including a plurality of items Is provided.

本発明の実施形態に係る文字認識システムの概要を示す図である。It is a figure showing the outline of the character recognition system concerning the embodiment of the present invention. 本実施形態に係る文字認識装置の機能ブロックを示す図である。It is a figure showing the functional block of the character recognition device concerning this embodiment. 本実施形態に係る文字認識装置の物理的構成を示す図である。It is a figure showing the physical composition of the character recognition device concerning this embodiment. 本実施形態に係る文字認識装置の設定画面の一例を示す図である。It is a figure showing an example of the setting screen of the character recognition device concerning this embodiment. 本実施形態に係る文字認識装置により生成される第１種の学習用データの一例を示す図である。It is a figure showing an example of the 1st kind of data for learning generated by the character recognition device concerning this embodiment. 本実施形態に係る文字認識装置により生成される第２種の学習用データの一例を示す図である。It is a figure showing an example of the 2nd type of data for learning generated by the character recognition device concerning this embodiment. 本実施形態に係る文字認識装置により実行される学習用データ生成処理のフローチャートである。5 is a flowchart of a learning data generation process performed by the character recognition device according to the embodiment. 本実施形態に係る文字認識装置により実行される文字認識処理のフローチャートである。5 is a flowchart of a character recognition process performed by the character recognition device according to the embodiment. 本実施形態に係る文字認識装置により実行される文字補正処理のフローチャートである。5 is a flowchart of a character correction process performed by the character recognition device according to the embodiment. 本実施形態に係る文字認識装置により実行される再学習処理のフローチャートである。It is a flowchart of the relearning process performed by the character recognition device according to the present embodiment.

以下、本発明の一側面に係る実施の形態（以下、「本実施形態」と表記する。）を、図面に基づいて説明する。なお、各図において、同一の符号を付したものは、同一又は同様の構成を有する。 Hereinafter, an embodiment according to one aspect of the present invention (hereinafter, referred to as “the present embodiment”) will be described with reference to the drawings. In each of the drawings, the components denoted by the same reference numerals have the same or similar configurations.

図１は、本発明の実施形態に係る文字認識システム１００の概要を示す図である。本実施形態に係る文字認識システム１００は、文字認識装置１０、ユーザ端末２０、文字認識モデル３０、書類画像データベースＤＢ１及びマスタデータベースＤＢ２を含む。文字認識装置１０、ユーザ端末２０、文字認識モデル３０、書類画像データベースＤＢ１及びマスタデータベースＤＢ２は、インターネットやＬＡＮ（Local Area Network）等の通信ネットワークＮを介して通信可能であってよい。文字認識システム１００は、書類の画像に記載された文字を文字認識装置１０によって認識し、その結果に誤りが含まれる場合には、ユーザ端末２０から修正を受け付けて、認識精度を継続的に向上させていくシステムである。 FIG. 1 is a diagram showing an outline of a character recognition system 100 according to an embodiment of the present invention. The character recognition system 100 according to the present embodiment includes a character recognition device 10, a user terminal 20, a character recognition model 30, a document image database DB1, and a master database DB2. The character recognition device 10, the user terminal 20, the character recognition model 30, the document image database DB1, and the master database DB2 may be able to communicate via a communication network N such as the Internet or a LAN (Local Area Network). The character recognition system 100 recognizes the characters described in the image of the document by the character recognition device 10, and when the result includes an error, accepts a correction from the user terminal 20 and continuously improves the recognition accuracy. It is a system that lets you.

書類画像データベースＤＢ１は、書類の画像を蓄積したデータベースである。文字認識装置１０は、書類画像データベースＤＢ１に記憶された書類の画像を取得し、書類に含まれる複数の項目を指定する入力をユーザ端末２０から受け付けて、学習用データを生成する。 The document image database DB1 is a database that stores images of documents. The character recognition device 10 acquires an image of a document stored in the document image database DB1, receives an input specifying a plurality of items included in the document from the user terminal 20, and generates learning data.

マスタデータベースＤＢ２は、書類に記載される内容のマスタデータを蓄積したデータベースである。例えば書類に銀行の支店名が記載される場合、マスタデータベースＤＢ２は、現に存在する支店名のリストを記憶してよい。その場合、文字認識装置１０は、マスタデータベースＤＢ２からマスタデータを取得し、認識された文字とマスタデータとの突合を行ってよい。 The master database DB2 is a database that stores master data of contents described in documents. For example, when the name of a branch of a bank is described in the document, the master database DB2 may store a list of currently existing branch names. In that case, the character recognition device 10 may acquire the master data from the master database DB2, and may match the recognized character with the master data.

文字認識モデル３０は、画像に含まれる文字を認識する学習モデルであり、文字認識装置１０によって用いられる。文字認識モデル３０は、公知の学習モデルで構成されてよく、手書き文字や印刷された文字を認識するモデルであってよい。本例では、文字認識モデル３０は、第１文字認識モデル３１及び第２文字認識モデル３２を含むが、３以上のモデルを含んでもよい。また、第１文字認識モデル３１及び第２文字認識モデル３２は、通信ネットワークＮを介して利用可能なものであればその保存場所は任意である。なお、文字認識モデル３０は、文字認識装置１０に含まれてもよい。 The character recognition model 30 is a learning model for recognizing characters included in an image, and is used by the character recognition device 10. The character recognition model 30 may be configured by a known learning model, and may be a model that recognizes handwritten characters and printed characters. In this example, the character recognition model 30 includes a first character recognition model 31 and a second character recognition model 32, but may include three or more models. The storage locations of the first character recognition model 31 and the second character recognition model 32 are arbitrary as long as they can be used via the communication network N. Note that the character recognition model 30 may be included in the character recognition device 10.

図２は、本実施形態に係る文字認識装置１０の機能ブロックを示す図である。文字認識装置１０は、学習処理部１０Ｌ及び認識処理部１０Ｒを備える。学習処理部１０Ｌは、設定部１１、生成部１２、学習部１３、学習用データ１５ａ、修正データ１５ｂ及び修正部１６を含む。また、認識処理部１０Ｒは、画像認識部１４、文字認識部１７、補正部１８、辞書データ１９ａ及びマスタデータ１９ｂを含む。 FIG. 2 is a diagram illustrating functional blocks of the character recognition device 10 according to the present embodiment. The character recognition device 10 includes a learning processing unit 10L and a recognition processing unit 10R. The learning processing unit 10L includes a setting unit 11, a generation unit 12, a learning unit 13, learning data 15a, correction data 15b, and a correction unit 16. The recognition processing unit 10R includes an image recognition unit 14, a character recognition unit 17, a correction unit 18, dictionary data 19a, and master data 19b.

設定部１１は、第１設定部１１ａ、第２設定部１１ｂ及び第３設定部１１ｃを含む。設定部１１は、書類画像データベースＤＢ１から書類の画像を取得し、ユーザ端末２０から入力を受け付ける。第１設定部１１ａは、入力に基づいて、書類の画像について複数の領域を設定する。ここで、複数の領域は、書類に含まれる複数の項目を囲むように設定されてよい。第１設定部１１ａは、一つの項目が含まれる領域を設定してもよいし、複数の項目が含まれる領域を設定してもよい。 The setting unit 11 includes a first setting unit 11a, a second setting unit 11b, and a third setting unit 11c. The setting unit 11 acquires an image of a document from the document image database DB1 and receives an input from the user terminal 20. The first setting unit 11a sets a plurality of areas for an image of a document based on the input. Here, the plurality of areas may be set so as to surround a plurality of items included in the document. The first setting unit 11a may set an area including one item, or may set an area including a plurality of items.

第２設定部１１ｂは、入力に基づいて、複数の領域いずれかに含まれる複数の副領域を設定する。複数の副領域は、第１設定部１１ａにより設定された領域の内側に設定されてよい。第２設定部１１ｂは、一つの項目が含まれる副領域を設定してもよいし、複数の項目が含まれる副領域を設定してもよい。なお、第２設定部１１ｂは、入力に基づいて、複数の副領域いずれかに含まれる複数の副領域を設定してもよい。すなわち、第２設定部１１ｂによって、領域の内側に複数の副領域が設定され、その副領域の内側にさらに複数の副領域が設定されてもよい。 The second setting unit 11b sets a plurality of sub-regions included in any of the plurality of regions based on the input. The plurality of sub-regions may be set inside the region set by the first setting unit 11a. The second setting unit 11b may set a sub area including one item, or may set a sub area including a plurality of items. The second setting unit 11b may set a plurality of sub-regions included in any of the plurality of sub-regions based on the input. That is, a plurality of sub-regions may be set inside the region by the second setting unit 11b, and a plurality of sub-regions may be set further inside the sub-region.

第３設定部１１ｃは、入力に基づいて、複数の領域及び複数の副領域に対して、領域の種類を設定する。領域の種類は、複数の領域及び複数の副領域に対してそれぞれ一つ設定されてよいが、複数の領域及び複数の副領域に対して１又は複数の領域の種類が設定されてもよい。領域の種類は、領域に対応する項目の内容を表すものであってよく、例えば、銀行口座に関する書類であれば、領域の種類は、口座番号や支店名であってよい。 The third setting unit 11c sets an area type for a plurality of areas and a plurality of sub-areas based on the input. One type of region may be set for each of the plurality of regions and the plurality of sub-regions, but one or more types of regions may be set for the plurality of regions and the plurality of sub-regions. The type of area may represent the content of an item corresponding to the area. For example, in the case of a document relating to a bank account, the type of area may be an account number or a branch name.

生成部１２は、書類の画像における複数の領域の位置、複数の領域における複数の副領域の位置及び領域の種類を含む、１又は複数の画像認識モデルの教師有り学習に用いる学習用データを生成する。生成部１２は、書類の画像に対して、第１設定部１１ａにより設定された複数の領域の位置と、第２設定部１１ｂにより設定された複数の副領域の位置と、第３設定部１１ｃにより設定された複数の領域及び複数の副領域の種類とを関連付けて、学習用データを生成してよい。生成部１２により生成された学習用データは、学習用データ１５ａとして記憶される。 The generation unit 12 generates learning data used for supervised learning of one or a plurality of image recognition models, including positions of a plurality of regions in a document image, positions of a plurality of subregions in a plurality of regions, and types of regions. I do. The generation unit 12 determines the positions of the plurality of regions set by the first setting unit 11a, the positions of the plurality of sub-regions set by the second setting unit 11b, and the third setting unit 11c with respect to the image of the document. The learning data may be generated by associating the plurality of regions and the types of the plurality of sub-regions set by the above. The learning data generated by the generation unit 12 is stored as learning data 15a.

本実施形態に係る文字認識装置１０によれば、書類に含まれる項目の数が、画像認識モデルによって区別可能な領域の数よりも多い場合であっても、複数の領域の数を画像認識モデルによって区別可能な領域の数よりも少なく設定し、複数の領域いずれかに含まれる複数の副領域を設定して、複数の項目に対応する領域を設定することができる。これにより、複数の領域と複数の副領域を段階的に識別し、それらの領域の種類を識別する画像認識モデルを構築することができ、複数の項目が含まれる書類であっても、複数の項目の領域を適切に抽出するように画像認識モデルを構築することができる。 According to the character recognition device 10 according to the present embodiment, even when the number of items included in the document is larger than the number of regions that can be distinguished by the image recognition model, the number of the plurality of regions is determined by the image recognition model. , The number of areas that can be distinguished is set smaller than the number of areas that can be distinguished, a plurality of sub-areas included in any of the plurality of areas are set, and areas corresponding to a plurality of items can be set. This makes it possible to identify a plurality of areas and a plurality of sub-areas in a stepwise manner and to construct an image recognition model for identifying the type of those areas. Even if the document includes a plurality of items, a plurality of areas can be identified. An image recognition model can be constructed so as to appropriately extract a region of an item.

生成部１２は、１又は複数の画像認識モデルの教師有り学習に用いる複数種類の学習用データを生成してもよい。複数種類の学習用データを生成することで、複数の領域、複数の副領域及び領域の種類を一度設定することで、１又は複数の画像認識モデルの教師有り学習に用いる複数種類の学習用データをまとめて生成することができ、複数種類の学習用データを効率的に生成することができる。 The generation unit 12 may generate a plurality of types of learning data used for supervised learning of one or a plurality of image recognition models. By generating a plurality of types of learning data, a plurality of regions, a plurality of sub-regions, and a type of region are set once, so that a plurality of types of learning data used for supervised learning of one or a plurality of image recognition models. Can be generated together, and a plurality of types of learning data can be efficiently generated.

生成部１２は、複数の領域及び複数の副領域の輪郭を、領域の種類毎に異なる態様で表した第１種の学習用データと、複数の領域及び複数の副領域を、領域の種類毎に異なる態様で塗り潰した第２種の学習用データと、を生成してもよい。第１種の学習用データ及び第２種の学習用データの例は、図５及び６を用いて詳細に説明する。第１種の学習用データによって、複数の領域及び複数の副領域を区別可能な態様で囲むバウンディングボックスを画像に上書きする第１画像認識モデル１４ａを生成することができる。また、第２種の学習用データによって、複数の領域及び複数の副領域を区別可能な態様で塗り潰した画像を出力する第２画像認識モデル１４ｂを生成することができる。 The generation unit 12 generates a first type of learning data in which the outlines of the plurality of regions and the plurality of sub-regions are represented in different modes for each type of region, and the plurality of regions and the plurality of sub-regions for each type of region. And the second type of learning data filled in a different manner. Examples of the first type of learning data and the second type of learning data will be described in detail with reference to FIGS. With the first type of learning data, it is possible to generate a first image recognition model 14a that overwrites an image with a bounding box surrounding a plurality of regions and a plurality of sub-regions in a distinguishable manner. Further, the second type of learning data can be used to generate a second image recognition model 14b that outputs an image in which a plurality of areas and a plurality of sub-areas are filled in a distinguishable manner.

学習部１３は、所定のパラメータが設定された学習プログラム１３ａ及び学習用データ１５ａを用いて、１又は複数の画像認識モデルの教師有り学習を行う。本例の場合、学習部１３は、第１学習プログラム及び第１種の学習用データを用いて、複数の領域及び複数の副領域を区別可能な態様で囲むバウンディングボックスを画像に上書きする第１画像認識モデル１４ａを生成してよい。第１画像認識モデル１４ａは、ＣＮＮ（Convolutional Neural Network）により構成されてよく、より具体的には、ＳＳＤ（Single Shot Multibox Detector）、Faster R-CNN又はRetina Netにより構成されてよい。また、学習部１３は、第２学習プログラム及び第２種の学習用データを用いて、複数の領域及び複数の副領域を区別可能な態様で塗り潰した画像を出力する第２画像認識モデル１４ｂを生成してよい。第２画像認識モデル１４ｂは、ＧＡＮ（Generative Adversarial Network）により構成されてよく、より具体的にはＶＡＥ（Variational Autoencoder）で構成されてよい。 The learning unit 13 performs supervised learning of one or a plurality of image recognition models using the learning program 13a in which predetermined parameters are set and the learning data 15a. In the case of this example, the learning unit 13 uses the first learning program and the first type of learning data to overwrite an image with a bounding box that surrounds the plurality of regions and the plurality of sub-regions in a distinguishable manner. The image recognition model 14a may be generated. The first image recognition model 14a may be configured by a CNN (Convolutional Neural Network), and more specifically, may be configured by an SSD (Single Shot Multibox Detector), Faster R-CNN, or Retina Net. Further, the learning unit 13 uses the second learning program and the second type of learning data to generate a second image recognition model 14b that outputs an image in which a plurality of regions and a plurality of sub-regions are filled in a distinguishable manner. May be generated. The second image recognition model 14b may be configured by a GAN (Generative Adversarial Network), and more specifically, may be configured by a VAE (Variational Autoencoder).

画像認識部１４は、１又は複数の画像認識モデルに新たな画像ＩＭＧを入力し、１又は複数の画像認識モデルの出力に基づいて、新たな画像ＩＭＧに含まれる複数の領域、複数の領域いずれかに含まれる複数の副領域並びに複数の領域及び複数の副領域に対する領域の種類を出力する。ここで、新たな画像ＩＭＧは、新しい書類の画像である。本例では、画像認識部１４は、学習部１３により生成された第１画像認識モデル１４ａ及び第２画像認識モデル１４ｂのいずれかを用いて、書類の項目に対応する画像の領域及び領域の種類を出力する。画像認識部１４によって、書類に含まれる項目の数が、画像認識モデルによって区別可能な領域の数よりも多い場合であっても、段階的に全ての領域を識別することができる。 The image recognition unit 14 inputs a new image IMG to one or a plurality of image recognition models, and based on an output of the one or a plurality of image recognition models, selects one of a plurality of regions and a plurality of regions included in the new image IMG. It outputs the plurality of sub-regions included in the, and the plurality of regions and the type of region for the plurality of sub-regions. Here, the new image IMG is an image of a new document. In this example, the image recognizing unit 14 uses one of the first image recognition model 14a and the second image recognition model 14b generated by the learning unit 13 to specify the area of the image corresponding to the item of the document and the type of the area. Is output. Even if the number of items included in the document is larger than the number of regions that can be distinguished by the image recognition model, the image recognition unit 14 can identify all the regions in a stepwise manner.

文字認識部１７は、複数の領域の画像及び複数の副領域の画像を文字認識モデル３０に入力し、文字認識モデル３０の出力に基づいて、複数の領域に含まれる文字及び複数の副領域に含まれる文字を出力する。本例では、文字認識部１７は、第１文字認識モデル３１及び第２文字認識モデル３２のいずれかを用いて、画像に含まれる文字を出力する。文字認識部１７によって、複数の領域及び複数の副領域に含まれる文字を認識することができる。 The character recognition unit 17 inputs the images of the plurality of regions and the images of the plurality of sub-regions to the character recognition model 30, and based on the output of the character recognition model 30, outputs the characters included in the plurality of regions and the plurality of sub-regions. Outputs the characters included. In this example, the character recognizing unit 17 outputs a character included in the image using one of the first character recognition model 31 and the second character recognition model 32. The character recognition unit 17 can recognize characters included in a plurality of areas and a plurality of sub-areas.

補正部１８は、領域の種類に応じた補正規則を選択し、文字を補正する。補正部１８は、あらかじめ定められた補正規則に従って、ルールベースで文字を補正してよい。補正部１８によって、領域の種類に応じた文字の補正を行うことで、書類に含まれる複数の項目に記載された文字を高精度で出力することができる。 The correction unit 18 selects a correction rule according to the type of the area, and corrects a character. The correction unit 18 may correct the character on a rule basis according to a predetermined correction rule. The correction unit 18 corrects characters according to the type of area, so that characters described in a plurality of items included in the document can be output with high accuracy.

補正部１８は、複数の正規表現のいずれかを用いて文字の一部を抽出することで、文字を補正してよい。これにより、記載される文字が定型化されている場合に、文字の一部を抽出することができる。例えば、文字の中から必要な数値を抽出することができる。ここで、補正部１８は、辞書データ１９ａを参照して、補正に用いる正規表現を選択してよい。 The correction unit 18 may correct a character by extracting a part of the character using one of a plurality of regular expressions. Thereby, when the character to be described is standardized, a part of the character can be extracted. For example, necessary numerical values can be extracted from characters. Here, the correction unit 18 may select a regular expression to be used for correction with reference to the dictionary data 19a.

補正部１８は、文字認識部１７により認識された文字と、候補となる複数の文字列との編集距離を用いて、文字認識部１７により認識された文字を候補となる複数の文字列のいずれかに置換することで、文字を補正してもよい。これにより、項目に記載される文字が限定されている場合に、記載され得ない文字列を排除して、候補となる複数の文字列のいずれかに補正することができる。例えば、銀行の支店名が記載される項目の場合、候補となる複数の文字列を現に存在する支店名として、記載された文字を現に存在する支店名のいずれかに補正することができる。ここで、補正部１８は、マスタデータ１９ｂを参照して、候補となる複数の文字列を選択してよい。マスタデータ１９ｂは、マスタデータベースＤＢ２が更新された場合に、マスタデータベースＤＢ２から最新のデータを取得してよい。 The correction unit 18 uses the edit distance between the character recognized by the character recognition unit 17 and the plurality of candidate character strings to convert the character recognized by the character recognition unit 17 into a plurality of candidate character strings. The character may be corrected by replacing the character. Thus, when the characters described in the item are limited, the character string that cannot be described can be excluded and corrected to one of the plurality of candidate character strings. For example, in the case of an item in which a bank branch name is described, a plurality of candidate character strings can be used as the currently existing branch name, and the described characters can be corrected to any of the currently existing branch names. Here, the correction unit 18 may select a plurality of candidate character strings with reference to the master data 19b. The master data 19b may acquire the latest data from the master database DB2 when the master database DB2 is updated.

補正部１８は、文字コードの範囲を限定して、文字を補正してもよい。これにより、項目に記載される文字コードの範囲が限定されている場合に、記載され得ない文字コードを排除して、文字コードの範囲を限定して文字を補正することができる。例えば、項目に記載される文字コードの範囲が数値を表す文字コードの範囲に限定されている場合、アルファベットの「Ｏ（オー）」を数字の「０（ゼロ）」に置換するといった補正を行うことができる。 The correction unit 18 may correct the character by limiting the range of the character code. Thus, when the range of character codes described in the item is limited, character codes that cannot be described can be excluded, and characters can be corrected by limiting the range of character codes. For example, when the range of the character code described in the item is limited to the range of the character code representing a numerical value, a correction such as replacing the alphabet “O” with the number “0 (zero)” is performed. be able to.

文字認識部１７は、複数の文字認識モデルのうち、出力の信用度が高い文字認識モデルを選択し、選択した文字認識モデルの出力に基づいて、複数の領域に含まれる文字及び複数の副領域に含まれる文字を出力してもよい。本例の場合、文字認識部１７は、第１文字認識モデル３１及び第２文字認識モデル３２のうち、出力の信用度が高い文字認識モデルを選択し、選択した文字認識モデルの出力に基づいて、複数の領域に含まれる文字及び複数の副領域に含まれる文字を出力してよい。このように、出力の信用度が高い文字認識モデルを選択することで、文字認識精度をより向上させることができる。 The character recognition unit 17 selects a character recognition model having a high degree of credibility of output from the plurality of character recognition models, and based on an output of the selected character recognition model, determines a character included in the plurality of regions and a plurality of sub-regions. The characters included may be output. In the case of the present example, the character recognition unit 17 selects a character recognition model having a high degree of credibility of output from the first character recognition model 31 and the second character recognition model 32, and based on the output of the selected character recognition model. Characters included in a plurality of areas and characters included in a plurality of sub-areas may be output. As described above, by selecting a character recognition model having high output reliability, the character recognition accuracy can be further improved.

文字認識部１７は、複数の領域及び複数の副領域毎に文字認識モデルを選択してもよい。例えば、口座番号が記載される領域について第１文字認識モデル３１を選択し、支店名が記載される領域について第２文字認識モデル３２を選択することとしてよい。このようにして、複数の領域及び複数の副領域毎に、適した文字認識モデルを選択することができ、文字認識精度をより向上させることができる。 The character recognition unit 17 may select a character recognition model for each of the plurality of regions and the plurality of sub-regions. For example, the first character recognition model 31 may be selected for an area where an account number is described, and the second character recognition model 32 may be selected for an area where a branch name is described. In this way, a suitable character recognition model can be selected for each of the plurality of regions and the plurality of sub-regions, and the character recognition accuracy can be further improved.

画像認識部１４は、１又は複数の画像認識モデルのうち、出力の信用度が高い画像認識モデルを選択し、選択した画像認識モデルの出力に基づいて、複数の領域、複数の副領域及び領域の種類を出力してもよい。本例の場合、画像認識部１４は、第１画像認識モデル１４ａ及び第２画像認識モデル１４ｂのうち、出力の信用度が高い画像認識モデルを選択し、選択した画像認識モデルの出力に基づいて、複数の領域、複数の副領域及び領域の種類を出力してよい。このように、出力の信用度が高い画像認識モデルを選択することで、書類に含まれる複数の項目の認識精度をより向上させることができる。 The image recognition unit 14 selects an image recognition model having a high degree of credibility of output from one or a plurality of image recognition models, and generates a plurality of regions, a plurality of sub-regions, and a region based on the output of the selected image recognition model. The type may be output. In the case of this example, the image recognition unit 14 selects an image recognition model having a high degree of credibility of output from the first image recognition model 14a and the second image recognition model 14b, and based on the output of the selected image recognition model. A plurality of areas, a plurality of sub-areas, and a type of area may be output. As described above, by selecting an image recognition model having a high output credibility, the recognition accuracy of a plurality of items included in a document can be further improved.

画像認識部１４は、書類の種類毎に画像認識モデルを選択してもよい。例えば、銀行口座に関する書類について第１画像認識モデル１４ａを選択し、車検（自動車検査登録制度）に関する書類について第２文字認識モデル３２を選択することとしてよい。このようにして、書類の種類毎に、適した画像認識モデルを選択することができ、書類に含まれる複数の項目の認識精度をより向上させることができる。 The image recognition unit 14 may select an image recognition model for each type of document. For example, the first image recognition model 14a may be selected for documents relating to bank accounts, and the second character recognition model 32 may be selected for documents relating to vehicle inspection (automobile inspection registration system). In this way, an appropriate image recognition model can be selected for each type of document, and the recognition accuracy of a plurality of items included in the document can be further improved.

修正部１６は、入力に基づいて、画像認識部１４により出力された複数の領域、複数の副領域及び領域の種類と、文字認識部１７により出力された文字と、補正部１８により補正された文字との少なくともいずれかを修正し、修正されたデータを学習用データ１５ａに追加する。修正部１６は、画像認識部１４により出力された複数の領域、複数の副領域及び領域の種類と、文字認識部１７により出力された文字と、補正部１８により補正された文字との少なくともいずれかを修正した修正データ１５ｂを蓄積し、修正データ１５ｂを定期的に学習用データ１５ａに追加してもよい。また、修正部１６は、修正データ１５ｂを辞書データ１９ａに追加してもよい。 The correcting unit 16 corrects the plurality of regions, the plurality of sub-regions and the types of the regions output by the image recognizing unit 14, the characters output by the character recognizing unit 17, and the correction unit 18 based on the input. At least one of the characters is corrected, and the corrected data is added to the learning data 15a. The correction unit 16 is configured to output at least one of the plurality of regions, the plurality of sub-regions, and the types of the regions output by the image recognition unit 14, the character output by the character recognition unit 17, and the character corrected by the correction unit 18. The correction data 15b obtained by correcting the above may be stored, and the correction data 15b may be periodically added to the learning data 15a. Further, the correction unit 16 may add the correction data 15b to the dictionary data 19a.

修正部１６によって、画像認識モデル及び文字認識モデルいずれかの出力が誤っていた場合に、その誤りを修正したデータを学習用データに追加することができ、画像認識モデル及び文字認識モデルの出力精度をより向上させる学習用データを生成することができる。 When the output of either the image recognition model or the character recognition model is erroneous, the correction unit 16 can add the data in which the error has been corrected to the learning data, and can improve the output accuracy of the image recognition model and the character recognition model. Can be generated for learning.

学習部１３は、学習プログラム１３ａ及び学習用データ１５ａを用いて、１又は複数の画像認識モデル及び複数の文字認識モデルの少なくともいずれかの再学習を行ってよい。例えば、学習部１３は、修正部１６によって学習用データ１５ａが追加された場合に、１又は複数の画像認識モデルの再学習を行ってよい。画像認識モデル及び文字認識モデルいずれかの再学習を行うことで、画像認識モデル及び文字認識モデルの出力精度を継続的に向上させることができる。 The learning unit 13 may re-learn at least one of one or a plurality of image recognition models and a plurality of character recognition models using the learning program 13a and the learning data 15a. For example, the learning unit 13 may re-learn one or a plurality of image recognition models when the correction unit 16 adds the learning data 15a. By performing re-learning of either the image recognition model or the character recognition model, the output accuracy of the image recognition model and the character recognition model can be continuously improved.

学習部１３は、１又は複数の画像認識モデル及び文字認識モデルの少なくともいずれかの再学習後の出力精度が再学習前の出力精度より低い場合に、学習プログラム１３ａの所定のパラメータを変更して、１又は複数の画像認識モデル及び文字認識モデルの少なくともいずれかの再学習を実行し直してよい。ここで、所定のパラメータとは、学習率等の確率的勾配降下法のハイパーパラメータであってよい。これにより、仮に画像認識モデル及び文字認識モデルいずれかの再学習によって出力精度が低下した場合に、学習プログラムのパラメータを変更して再学習を実行し直し、画像認識モデル及び文字認識モデルの出力精度が向上するようにすることができる。 The learning unit 13 changes predetermined parameters of the learning program 13a when the output accuracy after re-learning of at least one of the one or more image recognition models and the character recognition models is lower than the output accuracy before re-learning. The re-learning of at least one of the image recognition model and the character recognition model may be performed again. Here, the predetermined parameter may be a hyperparameter of a stochastic gradient descent method such as a learning rate. Thereby, if the output accuracy is reduced due to re-learning of either the image recognition model or the character recognition model, the parameters of the learning program are changed and re-learning is performed again, and the output accuracy of the image recognition model and the character recognition model is reduced. Can be improved.

図３は、本実施形態に係る文字認識装置１０の物理的構成を示す図である。文字認識装置１０は、演算部に相当するＣＰＵ（Central Processing Unit）１０ａと、記憶部に相当するＲＡＭ（Random Access Memory）１０ｂと、記憶部に相当するＲＯＭ（Read only Memory）１０ｃと、通信部１０ｄと、入力部１０ｅと、表示部１０ｆと、を有する。これらの各構成は、バスを介して相互にデータ送受信可能に接続される。なお、本例では文字認識装置１０が一台のコンピュータで構成される場合について説明するが、文字認識装置１０は、複数のコンピュータが組み合わされて実現されてもよい。また、図３で示す構成は一例であり、文字認識装置１０はこれら以外の構成を有してもよいし、これらの構成のうち一部を有さなくてもよい。 FIG. 3 is a diagram illustrating a physical configuration of the character recognition device 10 according to the present embodiment. The character recognition device 10 includes a CPU (Central Processing Unit) 10a corresponding to an operation unit, a RAM (Random Access Memory) 10b corresponding to a storage unit, a ROM (Read only Memory) 10c corresponding to a storage unit, and a communication unit. 10d, an input unit 10e, and a display unit 10f. These components are connected to each other via a bus so that data can be transmitted and received. In this example, a case will be described in which the character recognition device 10 is configured by one computer, but the character recognition device 10 may be realized by combining a plurality of computers. In addition, the configuration illustrated in FIG. 3 is an example, and the character recognition device 10 may have other configurations, or may not have some of these configurations.

ＣＰＵ１０ａは、ＲＡＭ１０ｂ又はＲＯＭ１０ｃに記憶されたプログラムの実行に関する制御やデータの演算、加工を行う制御部である。ＣＰＵ１０ａは、書類の画像に記載された文字を認識するプログラム（文字認識プログラム）を実行する演算部である。ＣＰＵ１０ａは、入力部１０ｅや通信部１０ｄから種々のデータを受け取り、データの演算結果を表示部１０ｆに表示したり、ＲＡＭ１０ｂやＲＯＭ１０ｃに格納したりする。 The CPU 10a is a control unit that performs control related to execution of a program stored in the RAM 10b or the ROM 10c and calculates and processes data. The CPU 10a is a calculation unit that executes a program (character recognition program) for recognizing characters described in an image of a document. The CPU 10a receives various data from the input unit 10e and the communication unit 10d, and displays a calculation result of the data on the display unit 10f and stores the calculation result in the RAM 10b and the ROM 10c.

ＲＡＭ１０ｂは、記憶部のうちデータの書き換えが可能なものであり、例えば半導体記憶素子で構成されてよい。ＲＡＭ１０ｂは、ＣＰＵ１０ａが実行する文字認識プログラム、書類の画像といったデータを記憶してよい。なお、これらは例示であって、ＲＡＭ１０ｂには、これら以外のデータが記憶されていてもよいし、これらの一部が記憶されていなくてもよい。 The RAM 10b is a storage unit in which data can be rewritten, and may be composed of, for example, a semiconductor storage element. The RAM 10b may store data such as a character recognition program executed by the CPU 10a and images of documents. These are merely examples, and the RAM 10b may store data other than these or some of them may not be stored.

ＲＯＭ１０ｃは、記憶部のうちデータの読み出しが可能なものであり、例えば半導体記憶素子で構成されてよい。ＲＯＭ１０ｃは、例えば文字認識プログラムや、書き換えが行われないデータを記憶してよい。 The ROM 10c is a storage unit from which data can be read, and may be configured by, for example, a semiconductor storage element. The ROM 10c may store, for example, a character recognition program or data that is not rewritten.

通信部１０ｄは、文字認識装置１０を他の機器に接続するインターフェースである。通信部１０ｄは、インターネット等の通信ネットワークＮに接続されてよい。 The communication unit 10d is an interface that connects the character recognition device 10 to another device. The communication unit 10d may be connected to a communication network N such as the Internet.

入力部１０ｅは、ユーザからデータの入力を受け付けるものであり、例えば、キーボード及びタッチパネルを含んでよい。 The input unit 10e accepts data input from a user, and may include, for example, a keyboard and a touch panel.

表示部１０ｆは、ＣＰＵ１０ａによる演算結果を視覚的に表示するものであり、例えば、ＬＣＤ（Liquid Crystal Display）により構成されてよい。表示部１０ｆは、処理対象となる書類の画像や文字認識結果を表示してよい。 The display unit 10f is for visually displaying the calculation result by the CPU 10a, and may be configured by, for example, an LCD (Liquid Crystal Display). The display unit 10f may display an image of a document to be processed or a character recognition result.

文字認識プログラムは、ＲＡＭ１０ｂやＲＯＭ１０ｃ等のコンピュータによって読み取り可能な記憶媒体に記憶されて提供されてもよいし、通信部１０ｄにより接続される通信ネットワークＮを介して提供されてもよい。文字認識装置１０では、ＣＰＵ１０ａが文字認識プログラムを実行することにより、図２を用いて説明した様々な動作が実現される。なお、これらの物理的な構成は例示であって、必ずしも独立した構成でなくてもよい。例えば、文字認識装置１０は、ＣＰＵ１０ａとＲＡＭ１０ｂやＲＯＭ１０ｃが一体化したＬＳＩ（Large-Scale Integration）を備えていてもよい。 The character recognition program may be provided by being stored in a computer-readable storage medium such as the RAM 10b or the ROM 10c, or may be provided via the communication network N connected by the communication unit 10d. In the character recognition device 10, the various operations described with reference to FIG. 2 are realized by the CPU 10a executing the character recognition program. Note that these physical configurations are merely examples, and are not necessarily independent configurations. For example, the character recognition device 10 may include an LSI (Large-Scale Integration) in which the CPU 10a and the RAM 10b or the ROM 10c are integrated.

図４は、本実施形態に係る文字認識装置１０の設定画面ＤＰの一例を示す図である。設定画面ＤＰは、ユーザ端末２０に表示される画面であってよく、書類の画像ＩＭＧ１、ポインタＰＴ、設定モード選択Ｓ１及び領域設定Ｓ２を含む。ユーザは、ポインタＰＴにより設定モード選択Ｓ１のラジオボタンのいずれかと、領域設定Ｓ２のラジオボタンのいずれかを選択し、画像ＩＭＧ１について複数の領域及び複数の副領域を設定する。 FIG. 4 is a diagram illustrating an example of the setting screen DP of the character recognition device 10 according to the present embodiment. The setting screen DP may be a screen displayed on the user terminal 20, and includes a document image IMG1, a pointer PT, a setting mode selection S1, and an area setting S2. The user selects one of the radio buttons of the setting mode selection S1 and one of the radio buttons of the area setting S2 with the pointer PT, and sets a plurality of areas and a plurality of sub-areas for the image IMG1.

領域設定Ｓ２には、「１．ｒｅｄ」という領域の種類を設定するラジオボタンと、「２．ｂｌｕｅ」という領域の種類を設定するラジオボタンと、「３．ｙｅｌｌｏｗ」という領域の種類を設定するラジオボタンと、「４．ｐｉｎｋ」という領域の種類を設定するラジオボタンと、が含まれる。すなわち、本例では、４種類の領域を設定することができる。また、設定モード選択Ｓ１は、複数の領域を設定するモードである「ｂａｓｅ＿ｉｍａｇｅ」のラジオボタンと、「１．ｒｅｄ」の種類の領域について副領域を設定するモードである「ｃｒｏｐ＿ｒｅｄ」のラジオボタンと、「２．ｂｌｕｅ」の種類の領域について副領域を設定するモードである「ｃｒｏｐ＿ｂｌｕｅ」のラジオボタンと、「３．ｙｅｌｌｏｗ」の種類の領域について副領域を設定するモードである「ｃｒｏｐ＿ｙｅｌｌｏｗ」のラジオボタンと、「４．ｐｉｎｋ」の種類の領域について副領域を設定するモードである「ｃｒｏｐ＿ｐｉｎｋ」のラジオボタンと、が含まれる。 In the area setting S2, a radio button for setting an area type “1. red”, a radio button for setting an area type “2. blue”, and an area type “3. yellow” are set. A radio button and a radio button for setting an area type of “4. pink” are included. That is, in this example, four types of regions can be set. The setting mode selection S1 includes a “base_image” radio button for setting a plurality of areas, and a “crop_red” radio button for setting a sub-area for a “1.red” type area. , A radio button of “crop_blue” for setting a sub-area for an area of “2. blue” type, and a radio of “crop_yellow” for setting a sub-area for an area of type “3. yellow” A button and a radio button of “crop_pink”, which is a mode for setting a sub-area for an area of “4. pink” type, are included.

本例では、画像ＩＭＧ１に対して、複数の領域として、実線で示された第１領域Ｒ１、破線で示された第２領域Ｒ２、一点鎖線で示された第３領域Ｒ３及び二点鎖線で示された第４領域Ｒ４が設定されている。ここで、線種は、領域設定Ｓ２により選択された領域の種類に対応する。本例では、実線は「１．ｒｅｄ」に対応し、破線は「２．ｂｌｕｅ」に対応し、一点鎖線は「３．ｙｅｌｌｏｗ」に対応し、二点鎖線は「４．ｐｉｎｋ」に対応する。 In this example, the image IMG1 is represented by a plurality of regions including a first region R1 indicated by a solid line, a second region R2 indicated by a broken line, a third region R3 indicated by a dashed line, and a two-dot chain line. The indicated fourth region R4 is set. Here, the line type corresponds to the type of the area selected by the area setting S2. In this example, the solid line corresponds to “1.red”, the dashed line corresponds to “2.blue”, the one-dot chain line corresponds to “3.yellow”, and the two-dot chain line corresponds to “4.pink”. .

第１領域Ｒ１は、実線で示された第１副領域Ｒ１ａと、破線で示された第２副領域Ｒ１ｂとを含む。このように、領域の内側に複数の副領域を設定することで、仮に画像認識モデルによって４種類の領域しか認識できない場合であっても、５以上の領域を区別した学習用データを生成し、画像認識モデルによって段階的に５以上の領域を識別できるようにすることができる。 The first region R1 includes a first sub-region R1a indicated by a solid line and a second sub-region R1b indicated by a broken line. In this way, by setting a plurality of sub-regions inside the region, even if only four types of regions can be recognized by the image recognition model, learning data that distinguishes five or more regions is generated, It is possible to identify five or more regions stepwise by the image recognition model.

本例では、第１副領域Ｒ１ａに「ＡＢＣ」と記載され、第２副領域Ｒ１ｂに「ａｂｃ」と記載され、第２領域Ｒ２に「ＤＥＦ」と記載され、第３領域Ｒ３に「１２３４」と記載され、第４領域Ｒ４に「Ｇ−５６」と記載されている。文字認識装置１０は、画像認識部１４によって、第１副領域Ｒ１ａ、第２副領域Ｒ１ｂ、第２領域Ｒ２、第３領域Ｒ３及び第４領域Ｒ４を認識し、文字認識部１７によって、それぞれの領域に記載された文字を認識する。また、文字認識装置１０は、補正部１８によって、文字認識部１７により認識された文字を補正する。例えば、第４領域Ｒ４に「Ｇ−」という文字列に続いて２桁の数値が記載されることが予め決まっている場合、補正部１８は、正規表現を用いて「Ｇ−５６」という文字のうち「５６」を抽出することで、文字を補正してもよい。また、例えば、第３領域Ｒ３に数値のみ記載されることが予め決まっている場合、補正部１８は、文字コードの範囲を限定して、例えば「ｌ（エル）」を「１（ｏｎｅ）」に置換して、「１２３４」という文字を補正してよい。 In this example, “ABC” is described in the first sub-region R1a, “abc” is described in the second sub-region R1b, “DEF” is described in the second region R2, and “1234” is described in the third region R3. And "G-56" is described in the fourth region R4. The character recognition device 10 recognizes the first sub-region R1a, the second sub-region R1b, the second region R2, the third region R3, and the fourth region R4 by the image recognition unit 14, and the character recognition unit 17 Recognize the characters described in the area. Further, the character recognition device 10 corrects the character recognized by the character recognition unit 17 by the correction unit 18. For example, when it is determined in advance that a two-digit numerical value is described in the fourth region R4 after the character string “G-”, the correction unit 18 uses a regular expression to write the character “G-56”. The character may be corrected by extracting “56” from among them. Further, for example, when it is predetermined that only a numerical value is described in the third region R3, the correction unit 18 limits the range of the character code, and changes, for example, “l” to “1 (one)”. And the character “1234” may be corrected.

図５は、本実施形態に係る文字認識装置１０により生成される第１種の学習用データＤ１の一例を示す図である。第１種の学習用データＤ１は、複数の領域及び複数の副領域を区別可能な態様で囲むバウンディングボックスを画像に上書きする第１画像認識モデル１４ａの教師有り学習に用いるデータである。 FIG. 5 is a diagram illustrating an example of the first type of learning data D1 generated by the character recognition device 10 according to the present embodiment. The first type of learning data D1 is data used for supervised learning of the first image recognition model 14a that overwrites an image with a bounding box surrounding a plurality of regions and a plurality of sub-regions in a distinguishable manner.

第１種の学習用データＤ１は、実線で示された第１領域Ｒ１、破線で示された第２領域Ｒ２、一点鎖線で示された第３領域Ｒ３及び二点鎖線で示された第４領域Ｒ４を含み、第１領域Ｒ１は、実線で示された第１副領域Ｒ１ａ及び破線で示された第２副領域Ｒ１ｂを含む。生成部１２は、ユーザ端末２０によって入力された複数の領域及び複数の副領域の輪郭をそのまま用いることで、第１種の学習用データＤ１を生成してもよいし、ユーザ端末２０によって入力された複数の領域及び複数の副領域の輪郭を、複数の方向に僅かに平行移動させることでデータオーグメンテーションを行って第１種の学習用データＤ１を生成してもよい。 The first type of learning data D1 includes a first area R1 indicated by a solid line, a second area R2 indicated by a broken line, a third area R3 indicated by a dashed line, and a fourth area indicated by a two-dot chain line. The first region R1 includes a first sub-region R1a indicated by a solid line and a second sub-region R1b indicated by a broken line. The generation unit 12 may generate the first type of learning data D1 by directly using the outlines of the plurality of regions and the plurality of sub-regions input by the user terminal 20, or may input the learning data D1 by the user terminal 20. Data augmentation may be performed by slightly translating the contours of the plurality of regions and the plurality of sub-regions in a plurality of directions to generate the first type of learning data D1.

図６は、本実施形態に係る文字認識装置１０により生成される第２種の学習用データＤ２の一例を示す図である。第２種の学習用データＤ２は、複数の領域及び複数の副領域を区別可能な態様で塗り潰した画像を出力する第２画像認識モデル１４ｂの教師有り学習に用いるデータである。 FIG. 6 is a diagram illustrating an example of the second type of learning data D2 generated by the character recognition device 10 according to the present embodiment. The second type of learning data D2 is data used for supervised learning of the second image recognition model 14b that outputs an image in which a plurality of areas and a plurality of sub-areas are filled in a distinguishable manner.

第２種の学習用データＤ２は、実線で示された第１塗り潰し領域Ｆ１、破線で示された第２塗り潰し領域Ｆ２、一点鎖線で示された第３塗り潰し領域Ｆ３及び二点鎖線で示された第４塗り潰し領域Ｆ４を含み、第１塗り潰し領域Ｆ１は、実線で示された第１副塗り潰し領域Ｆ１ａ及び破線で示された第２副塗り潰し領域Ｆ１ｂを含む。複数の塗り潰し領域及び複数の副塗り潰し領域は、領域の種類に対応した色で塗り潰されていてよい。本例では、色の違いをハッチングの違いによって表現している。生成部１２は、ユーザ端末２０によって入力された複数の領域及び複数の副領域の輪郭の内側を塗り潰すことで、第２種の学習用データＤ２を生成してもよいし、ユーザ端末２０によって入力された複数の領域及び複数の副領域の輪郭を、複数の方向に僅かに平行移動させることでデータオーグメンテーションを行い、その輪郭の内側を塗り潰すことで第２種の学習用データＤ２を生成してもよい。 The second type of learning data D2 is indicated by a first solid area F1 indicated by a solid line, a second solid area F2 indicated by a broken line, a third solid area F3 indicated by a dashed line, and a two-dot chain line. The first filled area F1 includes a first sub-filled area F1a shown by a solid line and a second sub-filled area F1b shown by a broken line. The plurality of filled regions and the plurality of sub-filled regions may be filled with a color corresponding to the type of the region. In this example, the difference in color is represented by the difference in hatching. The generation unit 12 may generate the second type of learning data D2 by filling the insides of the outlines of the plurality of regions and the plurality of sub-regions input by the user terminal 20. Data augmentation is performed by slightly translating the contours of the plurality of input regions and the plurality of sub-regions in a plurality of directions, and the second type of learning data D2 is obtained by filling the insides of the contours. May be generated.

図７は、本実施形態に係る文字認識装置１０により実行される学習用データ生成処理のフローチャートである。はじめに、文字認識装置１０は、入力に基づいて、書類の画像について複数の領域を設定する（Ｓ１０）。また、文字認識装置１０は、入力に基づいて、複数の領域のいずれかに含まれる複数の副領域を設定する（Ｓ１１）。さらに、文字認識装置１０は、入力に基づいて、複数の領域及び複数の副領域に対して、領域の種類を設定する（Ｓ１２）。なお、これらの入力は、ユーザ端末２０からの入力であってよい。 FIG. 7 is a flowchart of a learning data generation process performed by the character recognition device 10 according to the present embodiment. First, the character recognition device 10 sets a plurality of regions for a document image based on the input (S10). Further, the character recognition device 10 sets a plurality of sub-regions included in any of the plurality of regions based on the input (S11). Further, the character recognition device 10 sets the type of the area for the plurality of areas and the plurality of sub areas based on the input (S12). Note that these inputs may be input from the user terminal 20.

その後、文字認識装置１０は、第１種の学習用データを生成し（Ｓ１３）、第２種の学習用データを生成する（Ｓ１４）。なお、本例では２種類の学習用データを生成する場合について説明したが、文字認識装置１０は、画像認識部１４により用いられる複数の画像認識モデルの数に対応した複数種類の学習用データを生成してよい。以上により、学習用データ生成処理が終了する。 After that, the character recognition device 10 generates the first type of learning data (S13), and generates the second type of learning data (S14). Although the case where two types of learning data are generated has been described in the present example, the character recognition device 10 generates a plurality of types of learning data corresponding to the number of the plurality of image recognition models used by the image recognition unit 14. May be generated. Thus, the learning data generation processing ends.

図８は、本実施形態に係る文字認識装置１０により実行される文字認識処理のフローチャートである。はじめに、文字認識装置１０は、書類の種類に基づいて、画像認識モデルを選択する（Ｓ２０）。そして、文字認識装置１０は、新たな画像を、選択した画像認識モデルに入力し、複数の領域、複数の副領域及び領域の種類を出力する（Ｓ２１）。 FIG. 8 is a flowchart of a character recognition process performed by the character recognition device 10 according to the present embodiment. First, the character recognition device 10 selects an image recognition model based on the type of the document (S20). Then, the character recognition device 10 inputs the new image to the selected image recognition model, and outputs a plurality of regions, a plurality of sub-regions, and a type of region (S21).

次に、文字認識装置１０は、複数の領域及び複数の副領域毎に文字認識モデルを選択する（Ｓ２２）。そして、文字認識装置１０は、複数の領域の画像及び複数の副領域の画像を、選択した文字認識モデルに入力し、複数の領域に含まれる文字及び複数の副領域に含まれる文字を出力する（Ｓ２３）。 Next, the character recognition device 10 selects a character recognition model for each of the plurality of regions and the plurality of sub-regions (S22). Then, the character recognition device 10 inputs the images of the plurality of regions and the images of the plurality of sub-regions to the selected character recognition model, and outputs characters included in the plurality of regions and characters included in the plurality of sub-regions. (S23).

その後、文字認識装置１０は、領域の種類に応じて、文字を補正する（Ｓ２４）。文字補正処理については、次図を用いて詳細に説明する。その後、文字認識装置１０は、読み取られた文字を出力する（Ｓ２５）。以上により、文字認識処理が終了する。 Thereafter, the character recognition device 10 corrects the character according to the type of the area (S24). The character correction process will be described in detail with reference to the following diagram. Thereafter, the character recognition device 10 outputs the read character (S25). Thus, the character recognition processing ends.

図９は、本実施形態に係る文字認識装置１０により実行される文字補正処理のフローチャートである。同図では、図８の文字を補正する処理（Ｓ２４）の詳細の一例を示している。 FIG. 9 is a flowchart of a character correction process performed by the character recognition device 10 according to the present embodiment. FIG. 14 shows an example of the details of the character correcting process (S24) in FIG.

文字認識装置１０は、領域の種類が、数値の抽出に対応するものである場合（Ｓ２４１：ＹＥＳ）、複数の正規表現のいずれかを用いて文字の一部を抽出することで、文字を補正する（Ｓ２４２）。なお、本例では数値を抽出する場合について示しているが、正規表現によって文字の一部を抽出する場合、英字や漢字等の数値以外の文字を抽出してもよいし、数値、英字及び漢字等の組み合わせを抽出してもよい。 When the type of the region corresponds to the extraction of a numerical value (S241: YES), the character recognition device 10 corrects the character by extracting a part of the character using one of a plurality of regular expressions. (S242). In this example, a case where a numerical value is extracted is shown. However, when extracting a part of a character by using a regular expression, a character other than a numerical value such as an alphabetic character or a kanji may be extracted, or a numerical value, an alphabetic character, and a kanji character may be extracted. May be extracted.

領域の種類が、数値の抽出に対応するものでなく（Ｓ２４１：ＮＯ）、マスタ突合に対応するものである場合（Ｓ２４３：ＹＥＳ）、文字認識装置１０は、文字認識部１７により認識された文字と、候補となる複数の文字列との編集距離を用いて、文字認識部１７により認識された文字を候補となる複数の文字列のいずれかに置換することで、文字を補正する（Ｓ２４４）。 If the type of the area does not correspond to the extraction of the numerical value (S241: NO) but corresponds to the master match (S243: YES), the character recognition device 10 causes the character recognition unit 17 to recognize the character. The character is corrected by replacing the character recognized by the character recognizing unit 17 with one of the plurality of candidate character strings using the editing distance between the character string and the candidate character strings (S244). .

領域の種類が、数値の抽出に対応するものでなく（Ｓ２４１：ＮＯ）、マスタ突合に対応するものでなく（Ｓ２４３：ＮＯ）、文字コードの範囲を数値のみに限定するものである場合（Ｓ２４５：ＹＥＳ）、文字コードの範囲を、数値を表す範囲に限定して、文字を補正する（Ｓ２４６）。なお、本例で示した場合分けは例示であり、それぞれの場合分けの順序は任意であるし、異なる場合分けが含まれてもよい。以上により、文字補正処理が終了する。 When the type of the area does not correspond to the extraction of the numerical value (S241: NO), does not correspond to the master match (S243: NO), and the character code range is limited to only the numerical value (S245). : YES), the character code is corrected by limiting the range of the character code to the range representing the numerical value (S246). Note that the case divisions shown in this example are exemplifications, and the order of each case division is arbitrary, and different case divisions may be included. Thus, the character correction processing ends.

図１０は、本実施形態に係る文字認識装置１０により実行される再学習処理のフローチャートである。再学習処理は、文字認識処理が行われた後、認識結果に誤りがあった場合や、マスタデータベースＤＢ２に更新があった場合に行われてよい。 FIG. 10 is a flowchart of the relearning process executed by the character recognition device 10 according to the present embodiment. The re-learning process may be performed when there is an error in the recognition result after the character recognition process is performed or when the master database DB2 is updated.

はじめに、文字認識装置１０は、ユーザ端末２０から修正データの入力を受け付ける（Ｓ３０）。そして、文字認識装置１０は、ユーザ端末２０からの入力に基づいて、出力された複数の領域、複数の副領域及び領域の種類を修正する（Ｓ３１）。また、文字認識装置１０は、ユーザ端末２０からの入力に基づいて、出力された文字及び補正された文字を修正する（Ｓ３２）。 First, the character recognition device 10 receives an input of correction data from the user terminal 20 (S30). Then, the character recognition device 10 corrects the plurality of output regions, the plurality of sub-regions, and the type of the region based on the input from the user terminal 20 (S31). Further, the character recognition device 10 corrects the output character and the corrected character based on the input from the user terminal 20 (S32).

その後、文字認識装置１０は、複数の領域、複数の副領域及び領域の種類について修正されたデータを学習用データに追加し（Ｓ３３）、文字及び補正された文字について修正されたデータを辞書データに追加する（Ｓ３４）。 Thereafter, the character recognition device 10 adds the data corrected for the plurality of regions, the plurality of sub-regions, and the type of the region to the learning data (S33), and stores the corrected data for the characters and the corrected characters in the dictionary data. (S34).

また、文字認識装置１０は、マスタデータベースが更新された場合（Ｓ３５：ＹＥＳ）、最新のデータを取得して、マスタデータ１９ｂを更新する（Ｓ３６）。 When the master database is updated (S35: YES), the character recognition device 10 acquires the latest data and updates the master data 19b (S36).

その後、文字認識装置１０は、画像認識モデル及び文字認識モデルの再学習処理を実行する（Ｓ３７）。再学習処理の結果、画像認識モデル及び文字認識モデルの出力精度が低下した場合（Ｓ３８：ＹＥＳ）、文字認識装置１０は、学習プログラム１３ａのパラメータを変更し（Ｓ３９）、画像認識モデル及び文字認識モデルの再学習処理を実行する（Ｓ３７）。一方、再学習処理の結果、画像認識モデル及び文字認識モデルの出力精度が向上した場合（Ｓ３８：ＮＯ）、画像認識モデル及び文字認識モデルを更新する（Ｓ４０）。以上により、再学習処理が終了する。なお、パラメータを変更しても画像認識モデル及び文字認識モデルの出力精度が向上しない場合、文字認識装置１０は、画像認識モデル及び文字認識モデルを更新せず、それまでのバージョンのまま再学習処理を終了してもよい。 Thereafter, the character recognition device 10 executes a re-learning process of the image recognition model and the character recognition model (S37). When the output accuracy of the image recognition model and the character recognition model is reduced as a result of the re-learning process (S38: YES), the character recognition device 10 changes the parameters of the learning program 13a (S39), and executes the image recognition model and the character recognition. A model re-learning process is executed (S37). On the other hand, when the output accuracy of the image recognition model and the character recognition model is improved as a result of the relearning process (S38: NO), the image recognition model and the character recognition model are updated (S40). Thus, the re-learning process ends. If the output accuracy of the image recognition model and the character recognition model does not improve even if the parameters are changed, the character recognition device 10 does not update the image recognition model and the character recognition model, but performs the re-learning process with the previous version. May be terminated.

以上説明した実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。実施形態が備える各要素並びにその配置、材料、条件、形状及びサイズ等は、例示したものに限定されるわけではなく適宜変更することができる。また、異なる実施形態で示した構成同士を部分的に置換し又は組み合わせることが可能である。 The embodiments described above are intended to facilitate understanding of the present invention, and are not intended to limit and interpret the present invention. The components included in the embodiment and their arrangement, material, condition, shape, size, and the like are not limited to those illustrated, but can be appropriately changed. It is also possible to partially replace or combine the configurations shown in the different embodiments.

１０…文字認識装置、１０ａ…ＣＰＵ、１０ｂ…ＲＡＭ、１０ｃ…ＲＯＭ、１０ｄ…通信部、１０ｅ…入力部、１０ｆ…表示部、１０Ｌ…学習処理部、１０Ｒ…認識処理部、１１…設定部、１１ａ…第１設定部、１１ｂ…第２設定部、１１ｃ…第３設定部、１２…生成部、１３…学習部、１３ａ…学習プログラム、１４…画像認識部、１４ａ…第１画像認識モデル、１４ｂ…第２画像認識モデル、１５ａ…学習用データ、１５ｂ…修正データ、１６…修正部、１７…文字認識部、１８…補正部、１９ａ…辞書データ、１９ｂ…マスタデータ、２０…ユーザ端末、３０…文字認識モデル、３１…第１文字認識モデル、３２…第２文字認識モデル、ＤＢ１…書類画像データベース、ＤＢ２…マスタデータベース、１００…文字認識システム Reference numeral 10: character recognition device, 10a: CPU, 10b: RAM, 10c: ROM, 10d: communication unit, 10e: input unit, 10f: display unit, 10L: learning processing unit, 10R: recognition processing unit, 11: setting unit, 11a: first setting unit, 11b: second setting unit, 11c: third setting unit, 12: generation unit, 13: learning unit, 13a: learning program, 14: image recognition unit, 14a: first image recognition model, 14b: second image recognition model, 15a: learning data, 15b: correction data, 16: correction unit, 17: character recognition unit, 18: correction unit, 19a: dictionary data, 19b: master data, 20: user terminal, Reference numeral 30: character recognition model, 31: first character recognition model, 32: second character recognition model, DB1: document image database, DB2: master database, 100: character recognition system

Claims

A character recognition device for recognizing characters described in a document,
A first setting unit that sets a plurality of regions for the image of the document based on the input;
A second setting unit configured to set a plurality of sub-regions included in any of the plurality of regions based on an input;
A third setting unit configured to set an area type for the plurality of areas and the plurality of sub-areas based on the input;
Generating learning data for use in supervised learning of one or more image recognition models, including positions of the plurality of regions in the image, positions of the plurality of sub-regions in the plurality of regions, and types of the regions; Department and
A character recognition device comprising:

The generation unit generates a plurality of types of learning data used for supervised learning of the one or more image recognition models.
The character recognition device according to claim 1.

The generation unit may include a first type of learning data in which the outlines of the plurality of regions and the plurality of sub-regions are expressed in different modes for each type of the region, and the plurality of regions and the plurality of sub-regions. A second type of learning data that is filled in a different manner for each type of region.
The character recognition device according to claim 2.

A new image is input to the one or more image recognition models, and based on an output of the one or more image recognition models, a plurality of regions included in the new image is included in any of the plurality of regions. An image recognition unit that outputs a plurality of sub-regions and a type of a region for the plurality of regions and the plurality of sub-regions,
The image of the plurality of regions and the image of the plurality of sub-regions are input to a character recognition model, and based on the output of the character recognition model, the characters included in the plurality of regions and the characters included in the plurality of sub-regions A character recognition unit that outputs
A correction unit that selects a correction rule according to the type of the area, and corrects the character;
The character recognition device according to any one of claims 1 to 3, further comprising:

The correction unit corrects the character by extracting a part of the character using one of a plurality of regular expressions,
The character recognition device according to claim 4.

The correction unit corrects the character by using the edit distance between the character and a plurality of candidate character strings to replace the character with one of the plurality of candidate character strings.
The character recognition device according to claim 4.

The correction unit limits the range of a character code, and corrects the character.
The character recognition device according to claim 4.

The character recognition unit selects a character recognition model having a high degree of credibility of output among a plurality of character recognition models, and, based on an output of the selected character recognition model, a character included in the plurality of regions and the plurality of characters. Output the characters contained in the sub-area,
The character recognition device according to claim 4.

The character recognition unit selects a character recognition model for each of the plurality of regions and the plurality of sub-regions,
The character recognition device according to claim 8.

The image recognizing unit selects an image recognition model having a high degree of credibility of output from the one or more image recognition models, and, based on the output of the selected image recognition model, the plurality of regions and the plurality of sub-regions. Outputting the area and the type of the area,
The character recognition device according to claim 4.

The image recognition unit selects the image recognition model for each type of the document,
The character recognition device according to claim 10.

Based on the input, the plurality of regions output by the image recognition unit, the plurality of sub-regions and the type of the region, the character output by the character recognition unit, and the character corrected by the correction unit Correcting at least one of the, further comprising a correction unit to add the corrected data to the learning data,
The character recognition device according to claim 4.

A learning unit configured to re-learn at least one of the one or more image recognition models and the plurality of character recognition models using a learning program in which predetermined parameters are set and the learning data;
The character recognition device according to claim 4.

The learning unit, when the output accuracy after re-learning of at least one of the one or more image recognition models and the character recognition model is lower than the output accuracy before re-learning, changes the predetermined parameter, Re-executing at least one of the one or more image recognition models and the character recognition model;
The character recognition device according to claim 13.

A character recognition method for recognizing characters described in a document,
Based on the input, setting a plurality of areas for the image of the document;
Based on the input, setting a plurality of sub-regions included in any of the plurality of regions,
Based on the input, for the plurality of regions and the plurality of sub-regions, to set the type of region,
Generating learning data for use in supervised learning of one or more image recognition models, including positions of the plurality of regions in the image, positions of the plurality of sub-regions in the plurality of regions, and types of the regions. When,
Character recognition method including.

A processor provided in a character recognition device that recognizes characters described in documents,
A first setting unit that sets a plurality of regions for the image of the document based on the input;
A second setting unit configured to set a plurality of sub-regions included in any of the plurality of regions based on the input;
A third setting unit that sets an area type for the plurality of areas and the plurality of sub-areas based on the input; and a position of the plurality of areas in the image, the plurality of sub-areas in the plurality of areas. A generating unit that generates learning data used for supervised learning of one or more image recognition models, including a position of a region and a type of the region,
Character recognition program to function as.