JP7088661B2

JP7088661B2 - Paper form data conversion system, OCR engine learning image generator and image analyzer

Info

Publication number: JP7088661B2
Application number: JP2017209322A
Authority: JP
Inventors: 隆司小池
Original assignee: 株式会社インフォディオ
Priority date: 2017-10-30
Filing date: 2017-10-30
Publication date: 2022-06-21
Anticipated expiration: 2037-10-30
Also published as: JP2019082814A

Description

本発明は、帳票をスキャンした対象画像から文字情報を抽出して出力する紙帳票データ化システム、画像から文字情報を抽出するＯＣＲエンジンに画像からの文字情報の抽出を学習させるＯＣＲエンジン学習用画像生成装置および対象画像から文字情報を抽出する画像分析装置に関する。 The present invention is a paper form data conversion system that extracts and outputs character information from a target image obtained by scanning a form, and an OCR engine learning image that causes an OCR engine that extracts character information from an image to learn to extract character information from an image. The present invention relates to a generator and an image analyzer that extracts character information from a target image.

企業において取り扱う帳票は、多岐にわたる。このような帳票には、たとえば、他社が発行した請求書や、従業員の健康診断の結果、などがある。これらの帳票の内容は、文字情報の電子データが入手できない場合も多い。 The forms handled by companies are diverse. Such forms include, for example, invoices issued by other companies and the results of employee health examinations. As for the contents of these forms, electronic data of character information is often not available.

社内で作成した、あるいは、外部から受領したこのような紙の帳票から、必要な情報を電子データとしてコンピュータに入力し、各種企業活動に使用する場合がある。コンピュータへの入力の際には、帳票の画像をスキャナーで読み取ったうえで、ＯＣＲによって文字情報に変換することもある。 From such paper forms created in-house or received from outside, necessary information may be input to a computer as electronic data and used for various corporate activities. When inputting to a computer, the image of the form may be read by a scanner and then converted into text information by OCR.

特開２０１５－４６０２７号公報JP-A-2015-46027

請求書などの帳票の画像からＯＣＲによって文字情報を抽出する精度は、近年、向上している。しかし、抽出した文字情報に誤りがある場合には、人手で一つ一つ修正をしなければならない。このため、文書画像からの文字情報の抽出の更なる精度向上が望まれる。 In recent years, the accuracy of extracting character information by OCR from images of forms such as invoices has improved. However, if there is an error in the extracted character information, it must be manually corrected one by one. Therefore, it is desired to further improve the accuracy of extracting character information from the document image.

そこで、本発明は、請求書などの帳票から文字情報を抽出する精度を向上させることを目的とする。 Therefore, an object of the present invention is to improve the accuracy of extracting character information from a form such as an invoice.

上述の目的を達成するため、本発明は、帳票をスキャンした対象画像から文字情報を抽出して外部のデータベースに出力する紙帳票データ化システムにおいて、ＯＣＲエンジンを用いて前記対象画像から文字情報を抽出するＯＣＲ部と、前記ＯＣＲ部が抽出した文字情報をその文字情報の前記対象画像上の位置に基づいて構造化した構造化データを生成する構造解析部と、前記構造化データと前記データベースの構造との対応関係を示すマッピングテーブルを用いて前記構造化データを前記データベースに入力するマッピング部と、を有することを特徴とする。 In order to achieve the above object, the present invention uses an OCR engine to obtain character information from the target image in a paper form data conversion system that extracts character information from the target image obtained by scanning the form and outputs it to an external database. The OCR unit to be extracted, the structural analysis unit that generates structured data in which the character information extracted by the OCR unit is structured based on the position of the character information on the target image, the structured data, and the database. It is characterized by having a mapping unit for inputting the structured data into the database using a mapping table showing a correspondence relationship with the structure.

また、本発明は、画像から文字情報を抽出するＯＣＲエンジンに画像からの文字情報の抽出を学習させるＯＣＲエンジン学習用画像生成装置において、特定のフォントの文字に前記学習用画像変換を施して学習用画像を生成する学習用画像生成器と、認識済み文字を含む第１画像とその認識済み文字を前記特定のフォントで表した第２画像との組を用いて第２画像を第１画像へ変換する学習用画像変換を学習用画像生成器に学習させる学習用画像生成学習部と、を有することを特徴とする。 Further, according to the present invention, in an image generation device for learning an OCR engine that causes an OCR engine that extracts character information from an image to learn the extraction of character information from an image, the present invention applies the learning image conversion to characters of a specific font for learning. A second image is converted to a first image by using a pair of a learning image generator that generates an image for use, a first image including recognized characters, and a second image in which the recognized characters are represented by the specific font. It is characterized by having a learning image generation learning unit that causes a learning image generator to learn the learning image conversion to be converted.

また、本発明は、対象画像から文字情報を抽出する画像分析装置において、特定のフォントの文字に前記学習用画像変換を施して学習用画像を生成する学習用画像生成器と、認識済み文字を含む第１画像とその認識済み文字を前記特定のフォントで表した第２画像との組を用いて第２画像を第１画像へ変換する学習用画像変換を学習用画像生成器に学習させる学習用画像生成学習部と、前記学習用画像生成器が生成した前記学習用画像とその学習用画像に対応する文字との組を用いて画像から文字の抽出を学習させる文字認識学習部と、を備えたＯＣＲエンジン学習装置によって学習したＯＣＲエンジンと、前記ＯＣＲエンジンを用いて前記対象画像から文字情報を抽出するＯＣＲ部と、を有することを特徴とする。 Further, according to the present invention, in an image analyzer that extracts character information from a target image, a learning image generator that generates a learning image by performing the learning image conversion on characters of a specific font and recognized characters are used. Learning to make a learning image generator learn image conversion for learning to convert a second image to a first image using a pair of a first image including the first image and a second image expressing the recognized characters in the specific font. The image generation learning unit for learning and the character recognition learning unit for learning the extraction of characters from the image using the pair of the learning image generated by the learning image generator and the characters corresponding to the learning image. It is characterized by having an OCR engine learned by an OCR engine learning device provided, and an OCR unit that extracts character information from the target image using the OCR engine.

また、本発明は、対象画像から文字情報を抽出する画像分析装置において、文字および文字以外の画像が混在した第１処理前画像とその第１処理前画像の文字以外の画像を除去した第１処理後画像との組を用いて文字および文字以外の画像が混在した画像から文字以外の画像を除去する第１変換を第１機械学習器に学習させる第１学習装置と、前記第１変換を含む前処理を前記対象画像に施す前処理部と、前記対象画像から文字情報を抽出するＯＣＲ部と、を有することを特徴とする。 Further, according to the present invention, in an image analyzer that extracts character information from a target image, a first pre-process image in which characters and non-character images are mixed and an image other than the characters in the first pre-process image are removed. The first learning device that causes the first machine learner to learn the first conversion for removing the non-character image from the image in which the character and the non-character image are mixed by using the pair with the processed image, and the first conversion. It is characterized by having a preprocessing unit that applies preprocessing including the target image to the target image, and an OCR unit that extracts character information from the target image.

また、本発明は、対象画像から文字情報を抽出する画像分析装置において、罫線で表組された文字群を含む第２処理後画像とその第２処理後画像の罫線を除去した第２処理前画像との組を用いて罫線を使わないで表組された文字群の画像を罫線で表組された文字群の画像へ変換する第２変換を第２機械学習器に学習させる第２学習装置と、前記第２変換を含む前処理を前記対象画像に施す前処理部と、前記対象画像から文字情報を抽出するＯＣＲ部と、を有することを特徴とする。 Further, according to the present invention, in an image analyzer that extracts character information from a target image, a second processed image including a character group represented by a ruled line and a pre-processed image in which the ruled line of the second processed image is removed. A second learning device that causes a second machine learner to learn a second conversion that converts an image of a character group that is represented by a set with an image without using a ruled line into an image of a character group that is represented by a ruled line. It is characterized by having a preprocessing unit that performs preprocessing including the second conversion on the target image, and an OCR unit that extracts character information from the target image.

また、本発明は、対象画像から文字情報を抽出する画像分析装置において、文字群を含む第３処理前画像とその第３処理前画像に含まれるそれぞれの文字に互いに重ならない枠を形成した第３処理後画像との組を用いて文字群を含む画像に含まれるそれぞれの文字に互いに重ならない枠を形成する第３変換を第３機械学習器に学習させる第３学習装置と、前記第３変換を含む前処理を前記対象画像に施す前処理部と、前記対象画像から文字情報を抽出するＯＣＲ部と、を有することを特徴とする。 Further, in the image analyzer for extracting character information from a target image, the present invention forms a frame that does not overlap with each other in a third pre-process image including a character group and each character included in the third pre-process image. A third learning device that causes a third machine learner to learn a third conversion that forms a frame that does not overlap each other in each character included in an image including a character group by using a set with the image after processing, and the third. It is characterized by having a preprocessing unit that performs preprocessing including conversion on the target image, and an OCR unit that extracts character information from the target image.

本発明によれば、請求書などの帳票から文字情報を抽出する精度を向上させることができる。 According to the present invention, it is possible to improve the accuracy of extracting character information from a form such as an invoice.

本発明に係る帳票データ電子化システムの一実施の形態におけるブロック図である。It is a block diagram in one Embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態の分析対象の帳票の平面図である。It is a top view of the form to be analyzed of one Embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態の分析の流れを示すフローチャートである。It is a flowchart which shows the flow of analysis of one Embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態における前処理済画像の平面図である。It is a top view of the preprocessed image in one Embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態における罫線を使わないで表組された文字群の平面図の例である。It is an example of the plan view of the character group expressed without using the ruled line in one embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態における罫線で表組された文字群の平面図の例である。It is an example of the plan view of the character group represented by the ruled line in one Embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態における文字群の平面図の例である。It is an example of the plan view of the character group in one Embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態における１文字ずつ枠で囲まれた平面図の例である。It is an example of a plan view surrounded by a frame character by character in one embodiment of the form data digitization system according to the present invention. 本発明に係る帳票データ電子化システムの一実施の形態における特定の文字として正しく認識された画像とその文字を特定のフォントを用いて表現した画像である。It is an image correctly recognized as a specific character in one embodiment of the form data digitization system according to the present invention, and an image expressing the character using a specific font. 本発明に係る帳票データ電子化システムの一実施の形態における学習用画像である。It is an image for learning in one embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態における帳票の一部分の例である。It is an example of a part of a form in one Embodiment of the form data digitization system which concerns on this invention. 本発明に係る帳票データ電子化システムの一実施の形態におけるマッピング画面である。It is a mapping screen in one Embodiment of the form data digitization system which concerns on this invention.

本発明に係る画像分析装置の一実施の形態を、図面を参照して説明する。なお、この実施の形態は単なる例示であり、本発明はこれに限定されない。同一または類似の構成には同一の符号を付し、重複する説明は省略する。 An embodiment of the image analyzer according to the present invention will be described with reference to the drawings. It should be noted that this embodiment is merely an example, and the present invention is not limited thereto. The same or similar configurations are designated by the same reference numerals, and duplicate description will be omitted.

図１は、本発明に係る紙帳票データ化システムの一実施の形態におけるブロック図である。 FIG. 1 is a block diagram of an embodiment of a paper form data conversion system according to the present invention.

本実施の形態の紙帳票データ化システム１０は、たとえば紙に印刷された帳票などの非定型文書から文字情報を抽出して、文書上の構造に応じて構造化した文字情報を出力する画像分析装置である。また、紙帳票データ化システム１０は、構造化した文字情報をデータベースに入力する。 The paper form data conversion system 10 of the present embodiment extracts character information from an atypical document such as a form printed on paper, and outputs image analysis structured according to the structure of the document. It is a device. Further, the paper form data conversion system 10 inputs structured character information into the database.

本実施の形態の紙帳票データ化システム１０は、画像記憶部２０と画像解析前処理部２１と前処理済画像記憶部２２とＯＣＲ部２３と画像解析後処理部２４と構造解析部２５とマッピング部２６と抽出文字列記憶部２９と辞書３０とＯＣＲエンジン学習装置７０とスキャナー１２とディスプレイ１３とキーボード１４とマウス１５とを有している。紙帳票データ化システム１０は、たとえば１台のコンピュータ上に構成される。紙帳票データ化システム１０の一部、たとえば画像記憶部２０、画像解析前処理部２１、前処理済画像記憶部２２、ＯＣＲ部２３、画像解析後処理部２４、構造解析部２５、マッピング部２６、抽出文字列記憶部２９、および、辞書３０の一部または全部は、ネットワークで互いに接続された複数のコンピュータ上に分散して配置されていてもよい。画像記憶部２０と画像解析前処理部２１と前処理済画像記憶部２２とＯＣＲ部２３と画像解析後処理部２４と構造解析部２５とマッピング部２６と抽出文字列記憶部２９と辞書３０とＯＣＲエンジン学習装置７０とは、コンピュータにそれぞれの機能を持たせるプログラムによって実現される。 The paper form data conversion system 10 of the present embodiment maps the image storage unit 20, the image analysis pre-processing unit 21, the pre-processed image storage unit 22, the OCR unit 23, the image analysis post-processing unit 24, and the structural analysis unit 25. It has a unit 26, an extracted character string storage unit 29, a dictionary 30, an OCR engine learning device 70, a scanner 12, a display 13, a keyboard 14, and a mouse 15. The paper form data conversion system 10 is configured on, for example, one computer. A part of the paper form data conversion system 10, for example, an image storage unit 20, an image analysis preprocessing unit 21, a preprocessed image storage unit 22, an OCR unit 23, an image analysis post-processing unit 24, a structural analysis unit 25, and a mapping unit 26. , The extracted character string storage unit 29, and a part or all of the dictionary 30 may be distributed and arranged on a plurality of computers connected to each other by a network. Image storage unit 20, image analysis pre-processing unit 21, pre-processed image storage unit 22, OCR unit 23, image analysis post-processing unit 24, structural analysis unit 25, mapping unit 26, extracted character string storage unit 29, and dictionary 30. The OCR engine learning device 70 is realized by a program that gives a computer each function.

画像解析前処理部２１は、第１学習装置５１と第２学習装置５２と第３学習装置５３とを有している。第１学習装置５１は、第１機械学習器６１を含む。第２学習装置５２は、第２機械学習器６２を含む。第３学習装置５３は、第３機械学習器６３を含む。第１機械学習器６１、第２機械学習器６２、および、第３機械学習器６３は、いずれもＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋｓ：敵対的生成ネットワーク）を含む機械学習器である。 The image analysis preprocessing unit 21 has a first learning device 51, a second learning device 52, and a third learning device 53. The first learning device 51 includes a first machine learning device 61. The second learning device 52 includes a second machine learning device 62. The third learning device 53 includes a third machine learning device 63. The first machine learning device 61, the second machine learning device 62, and the third machine learning device 63 are all machine learning devices including GAN (Generative Adversarial Networks).

ＯＣＲエンジン学習装置７０は、学習用画像生成学習部７１と学習用画像生成器７２と文字認識学習部７３とを有している。 The OCR engine learning device 70 has a learning image generation learning unit 71, a learning image generator 72, and a character recognition learning unit 73.

図２は、本実施の形態の帳票データ電子化システムの分析対象の帳票の平面図である。 FIG. 2 is a plan view of a form to be analyzed by the form data digitization system of the present embodiment.

本実施の形態の紙帳票データ化システム１０は、たとえば紙に印刷された一月の請求額をまとめた帳票の画像（文書画像９０）を分析する。また、紙に印刷されたものだけではなく、スマートフォンやコンピュータのディスプレイに表示された内容を分析することもできる。 The paper form data conversion system 10 of the present embodiment analyzes, for example, an image (document image 90) of a form that summarizes the monthly billing amount printed on paper. It is also possible to analyze not only what is printed on paper but also what is displayed on the display of a smartphone or computer.

次に、本実施の形態の紙帳票データ化システム１０を用いた文書画像９０の分析処理の流れを説明する。 Next, the flow of the analysis process of the document image 90 using the paper form data conversion system 10 of the present embodiment will be described.

図３は、本実施の形態の画像分析装置の分析の流れを示すフローチャートである。 FIG. 3 is a flowchart showing the flow of analysis of the image analyzer of the present embodiment.

文書画像９０は、たとえばスキャナー１２でスキャンされて紙帳票データ化システム１０の画像記憶部２０に記憶される（Ｓ１）。スキャナー１２の代わりに、スマートフォンなどのカメラで撮像してもよい。カメラで撮像する場合には、画像に台形補正や水平補正を施すなどしてもよい。文書画像９０は、たとえばグレースケールに変換される。 The document image 90 is, for example, scanned by the scanner 12 and stored in the image storage unit 20 of the paper form data conversion system 10 (S1). Instead of the scanner 12, the image may be taken with a camera such as a smartphone. When taking an image with a camera, keystone correction or horizontal correction may be applied to the image. The document image 90 is converted to, for example, grayscale.

次に、画像解析前処理部２１は、画像記憶部２０に記憶された文書画像９０に対して前処理を施す（Ｓ２）。 Next, the image analysis preprocessing unit 21 performs preprocessing on the document image 90 stored in the image storage unit 20 (S2).

文字以外にハッチングや網掛けなどが含まれる画像から文字を抽出すると、抽出精度が低くなる傾向にある。そこで、文字の抽出の前に、画像解析前処理部２１は、対象画像から文字以外の画像を除去する。つまり、画像解析前処理部２１は、ある画像中の文字以外の部分を除去、すなわち、たとえば白などの紙面の色に変更する。 Extracting characters from an image that includes hatching or shading in addition to the characters tends to reduce the extraction accuracy. Therefore, before extracting the characters, the image analysis preprocessing unit 21 removes an image other than the characters from the target image. That is, the image analysis preprocessing unit 21 removes a portion of an image other than the characters, that is, changes the color to a paper surface such as white.

図４は、本実施の形態における前処理済画像の平面図である。 FIG. 4 is a plan view of the preprocessed image in the present embodiment.

画像解析前処理部２１は、事前に、図２に示すようなハッチングや網掛けなどが含まれる第１処理前画像（文書画像９０）から、図４に示すようなハッチングや網掛けなどの文字以外の部分を除去した第１処理後画像５４を生成する第１変換を学習している。文字以外の部分を除去した第１処理後画像５４は、画像レタッチソフトウェアなどを用いて、たとえば手動で、文字以外が含まれる文書画像９０から文字以外の部分を除去することにより生成することができる。このようなハッチングや網掛けなどが含まれる文書画像９０とハッチングや網掛けなどの文字以外の部分を除去した第１処理後画像５４との組を複数作成し、第１学習装置５１は、これらの組を用いて文字および文字以外の画像が混在した画像から文字以外の画像を除去する第１変換を第１機械学習器６１に学習させる。画像解析前処理部２１は、第１機械学習器６１を用いて文書画像９０に第１変換を施すことにより、対象画像から文字以外の画像を除去することができるようになる。 The image analysis pre-processing unit 21 previously performs characters such as hatching and shading as shown in FIG. 4 from the first pre-processing image (document image 90) including hatching and shading as shown in FIG. The first conversion for generating the image 54 after the first processing from which the parts other than the above are removed is being learned. The first processed image 54 from which the parts other than the characters have been removed can be generated by, for example, manually removing the parts other than the characters from the document image 90 containing the non-characters by using image retouching software or the like. .. A plurality of pairs of a document image 90 including such hatching and shading and a first processed image 54 from which parts other than characters such as hatching and shading are removed are created, and the first learning device 51 uses these. The first machine learner 61 is made to learn the first conversion of removing the non-character image from the image in which the character and the non-character image are mixed by using the set of. The image analysis preprocessing unit 21 can remove an image other than characters from the target image by performing the first conversion on the document image 90 using the first machine learning device 61.

このような第１機械学習器６１を用いることにより、文書画像９０から網掛けなどのノイズを除去することができる。網掛けやハッチング以外でも、文字認識の精度を低下させるノイズがわかっている場合には、そのようなノイズを除去する画像変換を第１機械学習器６１に学習させることにより、文書画像９０からノイズを除去することができる。つまり、第１機械学習器６１を用いた画像解析前処理部２１は、人工知能を用いたノイズ除去エンジン（ＡＩ－ノイズ除去エンジン）として機能している。 By using such a first machine learning device 61, noise such as shading can be removed from the document image 90. If noise that reduces the accuracy of character recognition is known other than shading and hatching, noise is generated from the document image 90 by having the first machine learning device 61 learn image conversion to remove such noise. Can be removed. That is, the image analysis preprocessing unit 21 using the first machine learning device 61 functions as a noise removal engine (AI-noise removal engine) using artificial intelligence.

図５は、本実施の形態における罫線を使わないで表組された文字群の平面図の例である。図６は、本実施の形態における罫線で表組された文字群の平面図の例である。 FIG. 5 is an example of a plan view of a character group arranged in a table without using a ruled line in the present embodiment. FIG. 6 is an example of a plan view of a character group represented by a ruled line in the present embodiment.

また、画像解析前処理部２１は、事前に、図５に示すような罫線を使わないで表組された文字群の画像を、図６に示すような罫線で表組された文字組された文字群の画像へ変換する第２変換を学習している。図６に示すような罫線で表組された文字群を含む第２処理後画像５６は、画像レタッチソフトウェアなどを用いて、たとえば手動で、図５に示すような第２処理前画像５５に罫線を付加することによって生成することができる。あるいは、画像レタッチソフトウェアなどを用いて、たとえば手動で、第２処理後画像５６から罫線を除去することによって第２処理前画像５５を生成することもできる。第２学習装置５２は、第２処理前画像５５から、第２処理後画像５６を生成する第２変換を第２機械学習器６２に学習させる。罫線を使わないで表組された文字群の第２処理前画像５５と罫線で表組された第２処理後画像５６との組を複数作成し、第２学習装置５２は、これらの組を用いて、罫線を使わないで表組された文字群の画像を罫線を用いて表組した画像へ変換する第２変換を第２機械学習器６２に学習させる。画像解析前処理部２１は、第２機械学習器６２を用いて文書画像９０に第２変換を施すことにより、対象画像の罫線を用いない表組を罫線を用いた表組の画像へ変換することができるようになる。ここで、表組に用いる罫線は、文書画像９０中の文字の色とは異なる色（たとえば青）であることが好ましい。 Further, the image analysis pre-processing unit 21 previously sets an image of a character group that has been arranged without using the ruled lines as shown in FIG. 5 into a character group that has been arranged with the ruled lines as shown in FIG. We are learning the second conversion to convert a character group to an image. The second post-processed image 56 including the character group represented by the ruled lines as shown in FIG. 6 has a ruled line on the second pre-processed image 55 as shown in FIG. 5, for example, manually by using image retouching software or the like. Can be generated by adding. Alternatively, the image 55 before the second processing can be generated by manually removing the ruled lines from the image 56 after the second processing, for example, by using image retouching software or the like. The second learning device 52 causes the second machine learning device 62 to learn the second conversion for generating the second post-processing image 56 from the second pre-processing image 55. A plurality of pairs of the second pre-processing image 55 of the character group tabulated without using the ruled lines and the second post-processing image 56 tabulated by the ruled lines are created, and the second learning device 52 creates these pairs. The second machine learner 62 is made to learn the second conversion of converting the image of the character group tabulated without using the ruled lines into the image tabulated using the ruled lines. The image analysis preprocessing unit 21 uses the second machine learning device 62 to perform the second conversion on the document image 90, thereby converting the table structure of the target image without the ruled lines into the image of the table structure using the ruled lines. You will be able to do it. Here, the ruled line used for the table is preferably a color different from the color of the characters in the document image 90 (for example, blue).

このような第２機械学習器６２を用いることにより、非定型の文書に含まれる表組部分を罫線がない場合であっても、表組としてとらえることができる。その結果、文書の構造化が容易になる。 By using such a second machine learning device 62, the table structure portion included in the atypical document can be regarded as a table structure even when there is no ruled line. As a result, the structuring of the document becomes easier.

図７は、本実施の形態における文字群の平面図の例である。図８は、本実施の形態における１文字ずつ枠で囲まれた平面図の例である。 FIG. 7 is an example of a plan view of a character group in the present embodiment. FIG. 8 is an example of a plan view surrounded by a frame character by character in the present embodiment.

また、画像解析前処理部２１は、事前に、図７に示すような文字群の画像（第３処理前画像５７）を、図８に示すような１文字ずつが互いに重ならない枠で囲まれた画像（第３処理後画像５８）へ変換する第３変換を学習している。図８に示すような、画像中の文字群のそれぞの文字が互いに重ならない枠で囲まれた第３処理後画像５８は、画像レタッチソフトウェアなどを用いて、たとえば手動で、図７に示すような第３処理前画像５７に枠を付加することによって生成することができる。第３学習装置５３は、第３処理前画像５７から、第３処理後画像５８を生成する第３変換を第３機械学習器６３に学習させる。文字群を含む第３処理前画像５７とその文字群のそれぞれの文字を枠で囲んだ第３処理後画像５８との組を複数作成し、第３学習装置５３は、これらの組を用いて、文字群を含む画像をそれぞれの文字を枠で囲んだ画像へ変換する第３変換を第３機械学習器６３に学習させる。画像解析前処理部２１は、第３機械学習器６３を用いて文書画像９０に第３変換を施すことにより、対象画像に含まれるそれぞれの文字を枠で囲んだ画像へ変換することができるようになる。ここで、文字を囲む枠は、文書画像９０中の文字の色とは異なる色（たとえば赤）であることが好ましい。 Further, the image analysis pre-processing unit 21 previously surrounds an image of a character group as shown in FIG. 7 (third pre-processing image 57) with a frame in which each character as shown in FIG. 8 does not overlap with each other. We are learning the third conversion to convert to the image (image 58 after the third processing). The third post-processed image 58, as shown in FIG. 8, in which the characters of the character group in the image are surrounded by a frame that does not overlap with each other, is shown in FIG. 7 manually, for example, by using image retouching software or the like. It can be generated by adding a frame to the third pre-processed image 57 as described above. The third learning device 53 causes the third machine learning device 63 to learn the third conversion for generating the third post-processing image 58 from the third pre-processing image 57. A plurality of sets of the third pre-processed image 57 including the character group and the third post-processed image 58 in which each character of the character group is surrounded by a frame are created, and the third learning device 53 uses these sets. , The third machine learning device 63 is made to learn the third conversion of converting an image including a character group into an image in which each character is surrounded by a frame. The image analysis preprocessing unit 21 can convert each character included in the target image into an image surrounded by a frame by performing a third conversion on the document image 90 using the third machine learning device 63. become. Here, the frame surrounding the characters is preferably a color different from the color of the characters in the document image 90 (for example, red).

このようにして画像解析前処理部２１は、画像記憶部２０に記憶された文書画像９０に前処理を施す。文書画像９０に第１変換、第２変換および第３変換を施した前処理済画像は、前処理済画像記憶部２２に記憶される。第１変換、第２変換および第３変換は、たとえばこの順番に行われる。また、白抜き文字など、他の部分と文字と地の色が逆になっている反転文字が形成されている領域は、色を反転する前処理を施してもよい。第２変換および第３変換で付加される罫線や枠は文書画像９０とは別のレイヤーに配置してもよい。画像ファイルのフォーマットが多層構造を持てる場合には、実際の文書画像９０以外の層（レイヤー）を作成し罫線や枠を配置する。画像ファイルのフォーマットが多層構造を持てない場合には、実際の文書画像９０以外の層（レイヤー）とは別の画像ファイルを作成し罫線や枠を配置する。 In this way, the image analysis preprocessing unit 21 performs preprocessing on the document image 90 stored in the image storage unit 20. The preprocessed image obtained by subjecting the document image 90 to the first conversion, the second conversion, and the third conversion is stored in the preprocessed image storage unit 22. The first conversion, the second conversion and the third conversion are performed in this order, for example. Further, in the area where the inverted character whose background color is opposite to that of the character is formed, such as a white character, a pretreatment for inverting the color may be performed. The ruled lines and frames added in the second conversion and the third conversion may be arranged on a layer different from the document image 90. When the format of the image file has a multi-layer structure, a layer other than the actual document image 90 is created and a ruled line or a frame is arranged. When the format of the image file does not have a multi-layer structure, an image file different from the layer (layer) other than the actual document image 90 is created and a ruled line or a frame is arranged.

次に、ＯＣＲ部２３は、前処理済画像９３を画像処理して、文字情報を抽出する（Ｓ３）。ＯＣＲ部２３は、ＯＣＲ（光学式文字認識：ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ）エンジンを用いる。ＯＣＲエンジンは、学習によって精度が向上する機械学習器を用いる。ＯＣＲエンジンとしては、たとえば畳み込みニューラルネットワーク（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ：ＣＮＮ）を少なくとも一部に含むものを用いる。ＯＣＲエンジンは、画像を文字に変換する対応関係を用いて対象画像から文字情報を抽出する。画像解析前処理部２１が文字以外の部分を除去しておくことにより文字認識の精度が向上する。ＯＣＲエンジンには、以下の方法で文字認識を学習させる。ＯＣＲエンジンの学習は、ＯＣＲエンジン学習装置７０が行う。 Next, the OCR unit 23 performs image processing on the preprocessed image 93 to extract character information (S3). The OCR unit 23 uses an OCR (Optical Character Recognition) engine. The OCR engine uses a machine learning device whose accuracy is improved by learning. As the OCR engine, for example, an engine including at least a part of a convolutional neural network (CNN) is used. The OCR engine extracts character information from the target image using a correspondence relationship that converts the image into characters. The accuracy of character recognition is improved by removing the portion other than the characters by the image analysis preprocessing unit 21. The OCR engine is trained in character recognition by the following method. The learning of the OCR engine is performed by the OCR engine learning device 70.

まず、特定の文字として正しく認識された画像を複数用意する。次に、その特定の文字を特定のフォントを用いて表現した画像を形成する。 First, prepare a plurality of images that are correctly recognized as specific characters. Next, an image in which the specific character is expressed using a specific font is formed.

図９は、本実施の形態における特定の文字として正しく認識された画像とその文字を特定のフォントを用いて表現した画像である。 FIG. 9 is an image correctly recognized as a specific character in the present embodiment and an image expressing the character using a specific font.

図９において、特定の文字として正しく認識された画像（第１画像）とは右側の画像であり、その文字を特定のフォントを用いて表現した画像（第２画像）とは左側の画像である。 In FIG. 9, the image correctly recognized as a specific character (first image) is the image on the right side, and the image expressing the character using a specific font (second image) is the image on the left side. ..

学習用画像生成学習部７１は、特定の文字を特定のフォントを用いて表現した画像（第２画像）から、その特定の文字を認識した元の画像（第１画像）への変換（学習用画像変換）を学習用画像生成器７２に学習させる。学習用画像生成器７２は、特定のフォントに含まれる一部またはすべての文字の画像に学習用画像変換を施して学習用画像を生成する。これにより、特定の文字として正しく認識された画像と同じような特徴、たとえば滲み方など、を有する学習用画像を生成することができる。 Image generation for learning The learning unit 71 converts an image (second image) in which a specific character is expressed using a specific font into an original image (first image) that recognizes the specific character (for learning). Image conversion) is trained by the learning image generator 72. The learning image generator 72 generates a learning image by performing a learning image conversion on an image of some or all characters included in a specific font. As a result, it is possible to generate a learning image having characteristics similar to those of an image correctly recognized as a specific character, such as a blurring method.

図１０は、本実施の形態における学習用画像である。 FIG. 10 is a learning image in the present embodiment.

図１０において、左側は特定のフォントで表現した文字の画像であり、右側は学習用画像変換を用いて左側の画像を変換した画像である。 In FIG. 10, the left side is an image of characters expressed in a specific font, and the right side is an image obtained by converting the image on the left side using a learning image conversion.

このように学習用画像変換と、特定のフォントに含まれる文字画像とを用いることにより、画像データが存在しない文字についても、学習用のデータ（画像）を生成することができる。文字認識学習部７３は、この学習用のデータを用いて、画像を文字に変換する対応関係をさらにＯＣＲエンジンに学習させることにより、文字認識の精度が向上する。 By using the learning image conversion and the character image included in the specific font in this way, it is possible to generate learning data (image) even for characters for which no image data exists. The character recognition learning unit 73 uses the learning data to further train the OCR engine on the correspondence relationship for converting an image into characters, thereby improving the accuracy of character recognition.

ＯＣＲ部２３による文字抽出の際、第２機械学習器６２によって付加された罫線、および、第３機械学習器６３によって付加された枠は、画像中の文字と異なる色としておくことにより、文字認識の精度を低下させるおそれは小さい。さらに、第３機械学習器６３がそれぞれの文字に枠を付加していることにより、隣り合う文字の一部または全部が一つの文字として認識される可能性が極めて低下する。このため文字認識の精度が向上する。 When the characters are extracted by the OCR unit 23, the ruled lines added by the second machine learning device 62 and the frame added by the third machine learning device 63 are colored differently from the characters in the image to recognize the characters. There is little risk of degrading the accuracy of. Further, since the third machine learning device 63 adds a frame to each character, the possibility that a part or all of the adjacent characters is recognized as one character is extremely reduced. Therefore, the accuracy of character recognition is improved.

なお、罫線および枠が文字認識の精度を低下させる場合などには、特定の色を削除することなどによって罫線および枠を削除して文字認識してもよい。このとき、罫線および枠で区切られた領域を仮想的に一つのブロックとして文字認識することにより、文字認識の精度を向上させることができる。 If the ruled line and the frame reduce the accuracy of character recognition, the ruled line and the frame may be deleted and the character may be recognized by deleting a specific color or the like. At this time, the accuracy of character recognition can be improved by virtually recognizing the area delimited by the ruled line and the frame as one block.

第２機械学習器６２を用いることにより、非定型の文書に含まれる表組部分を罫線がない場合であっても、表組としてとらえることができる。その結果、文書の構造化が容易になる。つまり、第２機械学習器６２を用いた画像解析前処理部２１およびＯＣＲ部２３は、人工知能を用いた自然言語解析・構造化エンジンとして機能している。 By using the second machine learning device 62, the table structure portion included in the atypical document can be regarded as a table structure even when there is no ruled line. As a result, the structuring of the document becomes easier. That is, the image analysis preprocessing unit 21 and the OCR unit 23 using the second machine learning device 62 function as a natural language analysis / structuring engine using artificial intelligence.

第３機械学習器６３を用いることにより、文書画像９０中に含まれる文字を１文字ずつ把握することができるため、文字認識の精度が向上する。つまり、第３機械学習器６３を用いた画像解析前処理部２１およびＯＣＲエンジンは、人工知能を用いたＯＣＲ文字認識エンジン（ＡＩ－ＯＣＲ文字認識エンジン）として機能している。 By using the third machine learning device 63, the characters included in the document image 90 can be grasped character by character, so that the accuracy of character recognition is improved. That is, the image analysis preprocessing unit 21 and the OCR engine using the third machine learning device 63 function as an OCR character recognition engine (AI-OCR character recognition engine) using artificial intelligence.

ＯＣＲ部２３が抽出した文字情報は、その文字情報の文書画像９０上の位置とともに、抽出文字列記憶部２９に記憶される。 The character information extracted by the OCR unit 23 is stored in the extracted character string storage unit 29 together with the position of the character information on the document image 90.

次に、画像解析後処理部２４は、抽出した文字情報に対して後処理を施す（Ｓ４）。後処理では、たとえば抽出した文字情報の認識の正誤を辞書３０を用いてチェックする。 Next, the image analysis post-processing unit 24 performs post-processing on the extracted character information (S4). In the post-processing, for example, the correctness of recognition of the extracted character information is checked by using the dictionary 30.

図１１は、本実施の形態における文書画像の一部分の例である。 FIG. 11 is an example of a part of a document image in the present embodiment.

たとえば図１１に示される画像をＯＣＲした結果、「入院拾付金日額」と認識した場合について考える。「入院拾付金日額」と同じ７文字の単語を辞書３０から抽出し、それぞれと排他的論理和（ＸＯＲ）をとる。より具体的には、「入院拾付金日額」のそれぞれの文字を２進数で表したものの列と、辞書３０に含まれる同じ長さの単語のそれぞれの文字を２進数で表したものの列とのＸＯＲを計算する。このＸＯＲの計算結果が７つすべてが０の列となれば、抽出した文字列が辞書３０に含まれていることとなる。 For example, consider a case where the image shown in FIG. 11 is recognized as "daily hospitalization fee" as a result of OCR. The same 7-letter words as the "daily amount of hospitalization fee" are extracted from the dictionary 30 and an exclusive OR (XOR) is taken with each word. More specifically, a column in which each character of "Hospitalization Pickup Daily Amount" is expressed in binary and a column in which each character of a word of the same length contained in the dictionary 30 is expressed in binary. And XOR is calculated. If all seven of the XOR calculation results are 0 columns, the extracted character string is included in the dictionary 30.

一方、たとえば辞書３０に含まれる７文字の単語として「成人病入院特約」が存在した場合、「入院拾付金日額」とＸＯＲをとると、計算結果は７つすべて１の列となる。このように、ＸＯＲの計算結果に１が含まれている場合には、抽出した文字列は辞書３０に含まれていないことになる。 On the other hand, for example, when "adult illness hospitalization special contract" exists as a 7-character word included in the dictionary 30, if "hospitalization pick-up daily amount" and XOR are taken, all seven calculation results are in one column. As described above, when 1 is included in the calculation result of XOR, the extracted character string is not included in the dictionary 30.

ＸＯＲの計算結果がすべて０の列の単語が辞書３０に現れるまで、同じ長さの文字列についてＸＯＲの計算を繰り返す。ＸＯＲの計算結果がすべて０の列の単語が辞書３０に存在する場合には、抽出した文字列は正しいと判定する。ＸＯＲの計算結果がすべて０の列の単語が辞書３０に存在しない場合には、抽出した文字列は誤っている可能性があると判定する。 The XOR calculation is repeated for the character strings of the same length until the words in the column whose XOR calculation result is all 0 appear in the dictionary 30. When the word in the column whose XOR calculation result is all 0 exists in the dictionary 30, it is determined that the extracted character string is correct. When the word in the column whose XOR calculation result is all 0 does not exist in the dictionary 30, it is determined that the extracted character string may be incorrect.

抽出した文字列が誤っている可能性があると判定し、かつ、ＸＯＲの計算結果に１つだけ１が存在する単語がある場合には、その単語と１文字だけ誤った画像認識をした可能性が高い。そこで、その単語を正しいものとして抽出した文字列を修正する。 If it is determined that the extracted character string may be incorrect and there is a word in which only one 1 exists in the XOR calculation result, it is possible that the word and only one character were incorrectly recognized as an image. Highly sex. Therefore, the character string extracted with the word as correct is corrected.

このようにして、文字情報の認識精度が向上する。ここで、単語の長さが短い場合には、１文字だけ異なる単語が複数存在する可能性が高いので、正誤の判定が困難である。そこで、たとえば単語の長さとして５文字以上のものだけについてチェックをするようにしてもよい。 In this way, the recognition accuracy of the character information is improved. Here, when the length of the word is short, there is a high possibility that there are a plurality of words that differ by only one character, so that it is difficult to determine the correctness. Therefore, for example, a check may be made only for words having a length of 5 characters or more.

また、後処理において、抽出した文字情報の形態素解析を行ってもよい。形態素解析によって、抽出した文字情報に含まれる文字列の品詞を分析することができる。分析した結果は、文字列とともに記憶しておく。品詞だけではなく、固有名詞に該当するか否か、日付に該当するか否か、数量に該当するか否かなどを分析してもよい。 Further, in the post-processing, the morphological analysis of the extracted character information may be performed. By morphological analysis, it is possible to analyze the part of speech of the character string included in the extracted character information. The analysis result is stored together with the character string. Not only the part of speech, but also whether or not it corresponds to a proper noun, whether or not it corresponds to a date, whether or not it corresponds to a quantity, and the like may be analyzed.

次に、構造解析部２５は、ＯＣＲ部２３が抽出し、必要に応じて画像解析後処理部２４が修正した文字情報を構造化する（Ｓ５）。ここで構造化とは、文字情報を一群の情報ごとにまとめ、さらにそれらの情報の階層関係を特定し、表現することである。たとえば、構造化された文字情報には、最上位の階層に帳票のタイトルがあり、その下層に文書作成者、文書作成日、および、メインの内容がある。それぞれの階層の項目は、複数の階層を含んでいてもよい。たとえば帳票のタイトルが請求書である場合、メインの内容は、請求項目と、合計が含まれていて、それぞれの請求項目には、品番、単価、数量、請求額が含まれている。 Next, the structural analysis unit 25 structures the character information extracted by the OCR unit 23 and corrected by the image analysis post-processing unit 24 as needed (S5). Here, structuring is to group textual information into a group of information, and to specify and express the hierarchical relationship of the information. For example, in structured text information, the title of the form is in the uppermost layer, and the document creator, the document creation date, and the main contents are in the lower layer. Items in each hierarchy may include a plurality of hierarchies. For example, if the title of the form is an invoice, the main content includes the invoice item and the total, and each invoice item includes the product number, unit price, quantity, and invoice amount.

階層化に際しては、キーとバリューとを特定してもよい。項目名となりうる文字列を記憶したデータベースに対象とする文字列が含まれるか否かを検索することにより、キーを特定することができる。キーとバリューの紐づけには、それぞれの文字列の相対的な位置関係を用いる。それぞれの文字列に対して、その文字列の右側（ｒｉｇｈｔ）、左側（ｌｅｆｔ）、上（ａｂｏｖｅ）、および、下（ｂｅｌｏｗ）に位置する文字列をネイバー（ｎｅｉｇｈｂｏｒ）として記憶する。ネイバーの文字列のいずれかをバリューとする。 When layering, the key and value may be specified. The key can be specified by searching whether or not the target character string is included in the database that stores the character string that can be the item name. The relative positional relationship of each character string is used for associating the key and the value. For each character string, the character strings located on the right side (right), left side (left), upper (above), and lower (below) of the character string are stored as neighbors (neighbor). The value is one of the character strings of the neighbor.

また、項目名が記載されていないバリューの候補リストを、「商品名」などの記載されない項目名とともに、予め記憶しておいてもよい。抽出した文字情報が、この候補リストの中の一つに該当する場合には、その抽出した文字情報をバリューとし、対応する項目名をキーとして記憶してもよい。 Further, the value candidate list in which the item name is not described may be stored in advance together with the item name in which the item name is not described, such as "product name". When the extracted character information corresponds to one of the candidate lists, the extracted character information may be used as a value and the corresponding item name may be stored as a key.

後処理（Ｓ４）において形態素解析などを行っていた場合には、品詞などの文字列の特徴を紐づけに用いてもよい。たとえば、特定のキーに対するバリューとしては数量しか対応しないなどのルールを用いて紐づけすることもできる。 When morphological analysis or the like is performed in the post-processing (S4), the characteristics of the character string such as the part of speech may be used for the association. For example, it is possible to link using a rule that only the quantity corresponds to the value for a specific key.

キーとバリューとは、単純な一対一の関係でなくてもよい。たとえば、一つのキーに対して複数のバリューを組み合わせてもよい。キーは、メインキーにサブキーを従属させたような階層構造（ツリー構造）になっていてもよい。この場合、階層は３以上であってもよい。 Keys and values do not have to be a simple one-to-one relationship. For example, a plurality of values may be combined for one key. The key may have a hierarchical structure (tree structure) in which a subkey is subordinated to the main key. In this case, the number of layers may be 3 or more.

第２機械学習器６２が付加した罫線は、文書の構造化に用いることができる。したがって、罫線が付加されていない表組の文書であっても、抽出した文字情報を容易に構造化することができる。 The ruled line added by the second machine learning device 62 can be used for structuring the document. Therefore, the extracted character information can be easily structured even in a table structure document to which no ruled line is added.

次に、マッピング部２６は、元の文書画像９０上に抽出した文字列をデータベースにマッピングする（Ｓ６）。 Next, the mapping unit 26 maps the character string extracted on the original document image 90 to the database (S6).

図１２は、本実施の形態の画像分析装置におけるマッピング画面である。 FIG. 12 is a mapping screen in the image analyzer of the present embodiment.

マッピング画面４０は、ディスプレイ１３に表示される。マッピング画面４０は、画像表示部４１と対応表示部４２とを備えている。画像表示部４１には、文書画像９０が表示される。対応表示部４２には、構造化された文字情報が表示される。 The mapping screen 40 is displayed on the display 13. The mapping screen 40 includes an image display unit 41 and a corresponding display unit 42. The document image 90 is displayed on the image display unit 41. Structured character information is displayed on the corresponding display unit 42.

抽出した文字列は、構造化された文字情報（構造化データ）として表示される。マッピング部２６は、さらに、構造解析部２５が組み合わせたキーおよびバリューの対応関係の確認をユーザーに促し、誤りがある場合には修正できるようにする。 The extracted character string is displayed as structured character information (structured data). The mapping unit 26 further prompts the user to confirm the correspondence between the key and the value combined by the structural analysis unit 25, and makes it possible to correct any error.

具体的には、まず、たとえば文書画像９０および抽出した文字情報をディスプレイ１３の画像表示部４１に表示する。抽出した文字情報は、たとえば文書画像９０の左側の対応表示部４２に、構造化された文字情報として表示される。 Specifically, first, for example, the document image 90 and the extracted character information are displayed on the image display unit 41 of the display 13. The extracted character information is displayed as structured character information on the corresponding display unit 42 on the left side of the document image 90, for example.

また、文字情報を抽出した部分に色付けをする。色付けされた部分にマウスの操作などでポインターを移動させてクリックした場合には、キーおよびバリューの組が表示された部分においてその部分に該当するキーまたはバリューの文字情報が他と区別できるように、たとえば色を変化させることにより表示される。この際、文書画像９０上では、選択された部分に対応するキーまたはバリューと紐づけられている文字情報が読み取られた部分が他と区別できるように、たとえば色を変化させることにより表示される。ユーザーは、この対応関係に間違いがない場合には、放置し、あるいは、間違いがないことを入力する。この対応関係に間違いがある場合には、キーおよびバリューの組が表示された部分において、キーまたはバリューの値を修正する。 In addition, the part where the character information is extracted is colored. When the pointer is moved to the colored part by operating the mouse and clicked, the character information of the key or value corresponding to that part can be distinguished from others in the part where the key and value pair is displayed. , For example, displayed by changing the color. At this time, on the document image 90, the part where the character information associated with the key or value corresponding to the selected part is read is displayed by changing the color, for example, so that the part can be distinguished from the others. .. If there is no mistake in this correspondence, the user leaves it alone or inputs that there is no mistake. If there is a mistake in this correspondence, the key or value value is corrected in the part where the key and value pair is displayed.

修正に際しては、文字認識の誤りであれば、キーボードなどから入力する。対応付け（紐づけ）に誤りがある場合には、対応するキーまたはバリューが記されている部分を文書画像９０上でクリックするなどして、対応関係を修正してもよい。 When correcting, if there is an error in character recognition, input it from the keyboard or the like. If there is an error in the correspondence (association), the correspondence may be corrected by clicking on the part where the corresponding key or value is written on the document image 90.

文字認識の結果は、たとえば、複数の候補と、それぞれの候補の確からしさとして得られる。つまり、ある文字画像を認識した結果、候補１の文字の確からしさは９０％、候補２の文字の確からしさは１０％などと表現される。文書画像９０および抽出した文字情報をディスプレイ１３の画像表示部４１に表示する際、文字認識の確からしさで文字を色分けしてもよい。たとえば、確からしさが９０％などの所定の閾値未満であれば、赤い文字で表すなどとしてもよい。このように表示することにより、文字認識の確認者（ヴェリファイヤ）が文字認識をチェックしやすくなる。色分けは、文字単位で行ってもよいし、抽出した単語ごとでもよい。あるいは、文書全体として、所定の閾値未満の確からしさの文字の割合に応じて、色分けしてもよい。この場合、文字認識の確からしさが低い文字の割合が多い文書では、このシステムによる文字認識の結果を用いずに、作業者が手入力した方が早い場合もある。 The result of character recognition is obtained, for example, as a plurality of candidates and the certainty of each candidate. That is, as a result of recognizing a certain character image, the certainty of the character of the candidate 1 is expressed as 90%, the certainty of the character of the candidate 2 is expressed as 10%, and the like. When displaying the document image 90 and the extracted character information on the image display unit 41 of the display 13, the characters may be color-coded according to the certainty of character recognition. For example, if the certainty is less than a predetermined threshold such as 90%, it may be represented by red characters. By displaying in this way, it becomes easier for the person who confirms the character recognition (verifier) to check the character recognition. Color coding may be performed on a character-by-character basis or on an extracted word basis. Alternatively, the entire document may be color-coded according to the proportion of characters with certainty that is less than a predetermined threshold. In this case, in a document having a large proportion of characters with low accuracy of character recognition, it may be faster for the operator to manually input the characters without using the result of character recognition by this system.

さらに、マッピング部２６は、必要に応じて修正された構造化データをマッピングテーブルを用いて、データベースに入力する。ここで、マッピングテーブルとは、文書の論理構造とデータベースの構造との対応を表すテーブル（表）である。文書の論理構造は、構造化データの構造として表現されているため、マッピングテーブルを用いることにより、容易に文字情報をデータベースに流し込むことができる。 Further, the mapping unit 26 inputs the structured data modified as necessary into the database using the mapping table. Here, the mapping table is a table (table) showing the correspondence between the logical structure of the document and the structure of the database. Since the logical structure of a document is expressed as a structure of structured data, character information can be easily flowed into a database by using a mapping table.

このように、本実施の形態では、文字認識の結果をユーザーが修正することにより、誤りを低減することができる。 As described above, in the present embodiment, the error can be reduced by the user correcting the result of character recognition.

本実施の形態の帳票データ電子化システムは、ＯＣＲエンジンおよび前処理部にディープラーニング可能な機械学習器を備えている。このため、学習を進めることにより、文字認識の精度を向上させることができる。 The form data digitization system of the present embodiment includes an OCR engine and a machine learning device capable of deep learning in the preprocessing unit. Therefore, the accuracy of character recognition can be improved by advancing the learning.

さらに、ユーザーによる修正を学習することにより、文字認識の精度およびマッピング（紐づけ）の精度を向上させることができる。たとえば、特定の文字を他の文字と誤認識していて、それをユーザーが修正する回数が増えていくことにより、その特定の文字を正しく認識させるようにすることができる。また、マッピングが誤っていた場合は、ユーザーが紐づけを修正することにより、正しくキーとバリューを組み合わせられるようになっていく。 Further, by learning the correction by the user, the accuracy of character recognition and the accuracy of mapping (association) can be improved. For example, if a specific character is mistakenly recognized as another character and the user corrects it more often, the specific character can be correctly recognized. In addition, if the mapping is incorrect, the user can correct the key and value by correcting the association.

このように本実施の形態の紙帳票データ化システム１０は、文字情報の文書画像９０中の位置に基づいてその文字情報と対をなす文字情報とを紐づけする構造解析部２５を有している。このため、分析対象画像である文書画像９０で表される文書の構造、すなわち、項目名とそれに対応する値との対応関係を把握することができる。その結果、項目名とそれに対応する値との対応関係に基づいて、文字情報の認識精度を向上させることができる。 As described above, the paper form data conversion system 10 of the present embodiment has a structural analysis unit 25 for associating the character information with the paired character information based on the position of the character information in the document image 90. There is. Therefore, it is possible to grasp the structure of the document represented by the document image 90, which is the image to be analyzed, that is, the correspondence between the item name and the corresponding value. As a result, the recognition accuracy of the character information can be improved based on the correspondence between the item name and the corresponding value.

さらに、本実施の形態において、マッピング部２６は、構造解析部２５が紐づけした文字情報の組を文書画像９０と対応付けて表示し、文字情報の組の組み合わせに誤りがある場合にユーザーによる修正の入力を受け付ける。このため、項目名とそれに対応する値との対応関係をより正確に把握することができる。さらに、ユーザーによる修正の履歴を学習していくことにより、項目名とそれに対応する値との対応関係の把握の精度を向上させることができる。 Further, in the present embodiment, the mapping unit 26 displays the set of character information associated with the structural analysis unit 25 in association with the document image 90, and when there is an error in the combination of the character information set, the user makes a mistake. Accepts correction input. Therefore, it is possible to more accurately grasp the correspondence relationship between the item name and the corresponding value. Furthermore, by learning the history of corrections by the user, it is possible to improve the accuracy of grasping the correspondence relationship between the item name and the corresponding value.

１０…紙帳票データ化システム、１２…スキャナー、１３…ディスプレイ、１４…キーボード、１５…マウス、２０…画像記憶部、２１…画像解析前処理部、２２…前処理済画像記憶部、２３…ＯＣＲ部、２４…画像解析後処理部、２５…構造解析部、２６…マッピング部、２９…抽出文字列記憶部、３０…辞書、４０…マッピング画面、４１…画像表示部、４２…対応表示部、５１…第１学習装置、５２…第２学習装置、５３…第３学習装置、５４…第１処理後画像、５５…第２処理前画像、５６…第２処理後画像、５７…第３処理前画像、５８…第３処理後画像、６１…第１機械学習器、６２…第２機械学習器、６３…第３機械学習器、７０…ＯＣＲエンジン学習装置、７１…学習用画像生成学習部、７２…学習用画像生成器、７３…文字認識学習部、９０…文書画像、９３…前処理済画像

10 ... Paper form data conversion system, 12 ... Scanner, 13 ... Display, 14 ... Keyboard, 15 ... Mouse, 20 ... Image storage unit, 21 ... Image analysis preprocessing unit, 22 ... Preprocessed image storage unit, 23 ... OCR Unit, 24 ... Image analysis post-processing unit, 25 ... Structural analysis unit, 26 ... Mapping unit, 29 ... Extracted character string storage unit, 30 ... Dictionary, 40 ... Mapping screen, 41 ... Image display unit, 42 ... Corresponding display unit, 51 ... 1st learning device, 52 ... 2nd learning device, 53 ... 3rd learning device, 54 ... 1st processed image, 55 ... 2nd pre-processed image, 56 ... 2nd processed image, 57 ... 3rd processing Pre-image, 58 ... 3rd processed image, 61 ... 1st machine learning device, 62 ... 2nd machine learning device, 63 ... 3rd machine learning device, 70 ... OCR engine learning device, 71 ... Image generation learning unit for learning , 72 ... Image generator for learning, 73 ... Character recognition learning unit, 90 ... Document image, 93 ... Preprocessed image

Claims

In a paper form data conversion system that extracts character information from a scanned target image and outputs it to an external database.
An OCR unit that extracts character information from the target image using the OCR engine,
A structural analysis unit that generates structured data in which the character information extracted by the OCR unit is structured based on the position of the character information on the target image.
A mapping unit that inputs the structured data to the database using a mapping table showing the correspondence between the structured data and the structure of the database, and
A paper form data conversion system characterized by having.

An image in which characters and non-character images are mixed using a pair of a first pre-process image in which characters and non-character images are mixed and a first post-process image in which images other than characters in the first pre-process image are removed. A first learning device that causes a first machine learning device to learn a first conversion that removes images other than characters from
Have,
The pretreatment includes the first conversion.
The paper form data conversion system according to claim 1, characterized in that.

A character group that is tabulated without using a ruled line using a set of a second post-processed image that includes a character group that is tabulated by a ruled line and a second pre-processed image that has the ruled line of the second post-processed image removed. A second learning device that causes a second machine learner to learn a second conversion that converts an image into an image of a character group represented by ruled lines.
Have,
The pretreatment comprises the second transformation.
The form data digitization system according to claim 1 or 2, characterized in that.

Each of the images including the character group is included in the image including the character group by using the pair of the third pre-process image including the character group and the third post-process image in which the frames not overlapping each other are formed in the respective characters included in the third pre-process image. A third learning device that causes a third machine learning device to learn a third transformation that forms a frame that does not overlap with each other.
Have,
The pretreatment includes the third transformation.
The paper form data conversion system according to any one of claims 1 to 3, wherein the paper form data conversion system is characterized.

A word database that stores words and
The exclusive logical sum of the first data in which each character of the word is expressed in binary and the second data in which each character of the character information extracted by the OCR unit is expressed in binary is obtained, and the exclusive sum is obtained. Image analysis that determines that the character information extracted by the OCR unit is correct when all of the logical sums are 0, and corrects the character information to the word when one of the exclusive logical sums is 1 and the other is 0. Post-processing unit and
The paper form data conversion system according to any one of claims 1 to 4, wherein the system is characterized by having.