JP2024020728A

JP2024020728A - Learning model learning apparatus, character string image position prediction apparatus, character position estimation apparatus, method thereof, and program

Info

Publication number: JP2024020728A
Application number: JP2022123127A
Authority: JP
Inventors: 達也石井; Tatsuya Ishii; 敏生岡; Toshio Oka; 光晟河津; Kosei Kawazu; 秀行秋山; Hideyuki Akiyama; 留次郎大澤; Tomejiro Osawa
Original assignee: Toppan Holdings Inc
Current assignee: Toppan Holdings Inc
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2024-02-15

Abstract

To reduce the time required for identifying a position of a character in a character string image in a recognition target, the character being obtained as a result of character recognition, in a text-sequence scheme.SOLUTION: A learning model learning system learns positions of a character string described in a character string image and a partial character string including one or more characters in the character string image, in character string image recognition which recognizes characters from a character string image representing a character string. The learning model learning system includes a learning unit which generates a learning model by performing learning with a character string image including one character string including two or more partial character strings, the character string written on the character string image, and start and end positions of partial character strings in the character string image.SELECTED DRAWING: Figure 1

Description

本発明は、手書き文字列画像認識のための学習モデル学習装置、文字列画像位置予測装置、文字位置推定装置とその方法及びプログラムに関する。 The present invention relates to a learning model learning device, a character string image position prediction device, a character position estimation device, a method, and a program for handwritten character string image recognition.

従来、文字認識技術においては、１行の文字列が画像として含まれる文字列画像を認識するにあたって、１文字ごとの文字領域（文字枠）を認識し、その文字領域内の文字が何であるかを１文字ずつ認識していた。 Conventionally, in character recognition technology, when recognizing a character string image containing one line of character string as an image, the character area (character frame) of each character is recognized and the characters in that character area are identified. was recognized character by character.

特許文献１は、手書き文字列画像及び正解ラベルを含む学習データを生成し、上記手法を用いて機械学習認識器を構築する技術について開示されている。しかし、このような手法で機械学習認識器を構築する場合、学習データを作成する際に１文字ごとに文字枠を作成し、それぞれについて文字ラベル（文字コード）を付与するという作業が必要となり、学習データの作成負荷が高いという問題があった。 Patent Document 1 discloses a technique for generating learning data including a handwritten character string image and a correct label, and constructing a machine learning recognizer using the above method. However, when building a machine learning recognizer using this method, it is necessary to create a character frame for each character and assign a character label (character code) to each character when creating training data. There was a problem that the load of creating learning data was high.

一方、近年では１文字ずつに文字ラベルを付与して学習データを作成するのではなく、非特許文献１のように、行全体に対してテキストラベルを付与した学習データ（以下テキストシーケンスデータ）を用いて学習する技術（以下ではテキストシーケンス方式と呼ぶ）が広く採用されている。この手法では１文字ずつ枠を作成したり、文字ラベルを振ったりする作業が不要になるという利点がある。 On the other hand, in recent years, instead of creating learning data by adding a character label to each character, as in Non-Patent Document 1, learning data (hereinafter referred to as text sequence data) in which text labels are added to entire lines is created. A technique for learning using text sequences (hereinafter referred to as the text sequence method) has been widely adopted. This method has the advantage of eliminating the need to create frames for each character and assign character labels.

国際公開第２０２０／２１８５１２号International Publication No. 2020/218512

“An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”, Baoguang Shi, Xiang Bai and Cong Yao School of Electronic Information and Communications Huazhong University of Science and Technology, Wuhan, China, 21 Jul 2015, arXiv“An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”, Baoguang Shi, Xiang Bai and Cong Yao School of Electronic Information and Communications Huazhong University of Science and Technology, Wuhan, China, 21 Jul 2015, arXiv

しかしながら、前述のテキストシーケンス方式では、認識結果として得られた文字が、認識対象における文字列画像のどの位置にあるかを認識するまでに時間がかかるという問題がある。 However, the aforementioned text sequence method has a problem in that it takes time to recognize the position of a character obtained as a recognition result in a character string image to be recognized.

本発明は、このような状況に鑑みてなされたもので、テキストシーケンス方式であっても、文字認識結果として得られた文字が、認識対象における文字列画像のどの位置にあるかを把握するまでの時間を低減することができる学習モデル学習装置、文字列画像位置予測装置、文字位置推定装置とその方法及びプログラムを提供することにある。 The present invention was made in view of this situation, and even with the text sequence method, it is difficult to know where the character obtained as a result of character recognition is located in the character string image to be recognized. An object of the present invention is to provide a learning model learning device, a character string image position prediction device, a character position estimation device, a method thereof, and a program, which can reduce the time required for the learning model.

上述した課題を解決するために、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を学習する学習モデル学習装置であって、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで学習モデルを生成する学習部を備えることを特徴とした学習モデル学習装置である。 In order to solve the above-mentioned problems, one aspect of the present invention provides a character string written in a character string image and a character string in the character string image in character string image recognition that recognizes characters from a character string image representing a character string. A learning model learning device that learns the position of a partial string containing one or more characters, comprising a character string image in which a single line of character strings containing two or more of the partial strings is written, and the character string. A learning model learning device comprising a learning unit that generates a learning model by learning using a character string written on an image and information on the start and end positions of each sub-character string in the character string image. It is.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を予測する文字列画像位置予測装置であって、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像を入力することで、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを得る予測部と、前記予測部の予測結果を出力する出力部、を備えることを特徴とした文字列画像位置予測装置である。 In addition, one aspect of the present invention provides character string image recognition that recognizes characters from a character string image representing a character string, including a character string written in the character string image and one or more characters in the character string image. A character string image position prediction device that predicts the position of a sub-character string that is By inputting a character string image to be recognized into a learning model that is generated by learning using the character string and the starting and ending position information of each substring in the character string image, the character string A character string comprising: a prediction unit that obtains a character string included in an image and start and end position information in the character string image to be recognized; and an output unit that outputs a prediction result of the prediction unit. This is an image position prediction device.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識を行うことで得られる１文字以上の部分文字列の前記文字列画像における位置を推定する文字位置推定装置であって、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像が入力されることで得られる、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを取得し、前記文字列画像上に書かれている文字列から前記部分文字列を生成した条件と同じ条件に基づいて、前記取得された文字列から部分文字列に分割する分割部と、前記分割部で分割された部分文字列が含まれる前記文字認識対象の文字列画像における、前記分割された部分文字列の範囲を推定する推定部と、前記推定部で推定された範囲と、当該範囲内に含まれる前記分割部で得られた部分文字列とをそれぞれ出力する出力部と、を備えることを特徴とした文字位置推定装置である。 Further, one aspect of the present invention provides a character position for estimating the position in the character string image of a partial character string of one or more characters obtained by performing character string image recognition to recognize characters from a character string image representing a character string. The estimation device includes a character string image in which one line of character strings including two or more of the partial character strings is written, a character string written on the character string image, and each partial character in the character string image. The character string included in the character string image obtained by inputting the character string image to be recognized into a learning model generated by learning using the start and end position information of the string and the character The information on the starting and ending positions of the character string image to be recognized is acquired, and the acquired character is generated based on the same conditions as those for generating the partial character string from the character string written on the character string image. a dividing unit that divides a string into partial character strings; and an estimation unit that estimates a range of the divided partial character strings in the character string image to be recognized, which includes the partial character strings divided by the dividing unit. , and an output unit that outputs the range estimated by the estimation unit and the partial character strings obtained by the division unit included within the range, respectively.

また、本発明の一態様は、上述の学習モデル学習装置において、前記学習部によって学習される学習モデルは、テキストシーケンスデータを用いて深層学習によって学習されることを特徴とする。 Moreover, one aspect of the present invention is characterized in that, in the learning model learning device described above, the learning model learned by the learning unit is learned by deep learning using text sequence data.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を学習する学習モデル学習方法であって、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで学習モデルを生成することを特徴とした学習モデル学習方法である。 In addition, one aspect of the present invention provides character string image recognition that recognizes characters from a character string image representing a character string, including a character string written in the character string image and one or more characters in the character string image. A learning model learning method for learning the positions of partial character strings that are written on the character string image, the character string image having one line of character strings containing two or more of the partial character strings written on the character string image. This learning model learning method is characterized in that a learning model is generated by learning using information on the start and end positions of each partial character string in a character string and the character string image.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を予測する文字列画像位置予測方法であって、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像を入力することで、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを予測結果として取得することを特徴とした文字列画像予測方法である。 In addition, one aspect of the present invention provides character string image recognition that recognizes characters from a character string image representing a character string, including a character string written in the character string image and one or more characters in the character string image. A character string image position prediction method for predicting the position of a partial character string that is By inputting a character string image to be recognized into a learning model that is generated by learning using the character string and the starting and ending position information of each substring in the character string image, the character string This is a character string image prediction method characterized in that a character string included in an image and start and end position information in the character string image to be character recognized are acquired as prediction results.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識を行うことで得られる１文字以上の部分文字列の前記文字列画像における位置を推定する文字位置推定方法であって、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像が入力されることで得られる、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを取得し、前記文字列画像上に書かれている文字列から前記部分文字列を生成した条件と同じ条件に基づいて、前記取得された文字列から部分文字列に分割し、前記分割された部分文字列が含まれる前記文字認識対象の文字列画像における、前記分割された部分文字列の範囲を推定し、前記推定された範囲と、当該範囲内に含まれる前記分割された部分文字列とをそれぞれ出力する、ことを特徴とした文字位置推定方法である。 Further, one aspect of the present invention provides a character position for estimating the position in the character string image of a partial character string of one or more characters obtained by performing character string image recognition to recognize characters from a character string image representing a character string. The estimation method includes a character string image in which a single line of character strings containing two or more of the partial character strings is written, a character string written on the character string image, and each partial character in the character string image. The character string included in the character string image obtained by inputting the character string image to be recognized into a learning model generated by learning using the start and end position information of the string and the character The information on the starting and ending positions of the character string image to be recognized is acquired, and the acquired character is generated based on the same conditions as those for generating the partial character string from the character string written on the character string image. A string is divided into partial character strings, the range of the divided partial character strings in the character string image to be recognized that includes the divided partial character strings is estimated, and the range of the divided partial character strings and the corresponding This character position estimation method is characterized in that each of the divided partial character strings included within the range is output.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を学習するコンピュータに、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで学習モデルを生成する学習部として機能させるためのプログラムである。 In addition, one aspect of the present invention provides character string image recognition that recognizes characters from a character string image representing a character string, including a character string written in the character string image and one or more characters in the character string image. A computer that learns the positions of partial character strings that are written is provided with a character string image in which one line of character strings containing two or more of the partial character strings is written, a character string written on the character string image, and the character mentioned above. This is a program that functions as a learning unit that generates a learning model by learning using information on the start and end positions of each partial character string in a string image.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を予測するコンピュータに、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像を入力することで、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを得る予測部、前記予測部の予測結果を出力する出力部、として機能させるためのプログラムである。 In addition, one aspect of the present invention provides character string image recognition that recognizes characters from a character string image representing a character string, including a character string written in the character string image and one or more characters in the character string image. A computer that predicts the position of a partial character string that is to be written is provided with a character string image in which a single line of character strings containing two or more of the partial character strings is written, a character string written on the character string image, and the character string described above. By inputting a character string image to be recognized into a learning model that is generated by learning using the starting and ending position information of each partial character string in a column image, the character string contained in the character string image and , a prediction unit that obtains start and end position information in a character string image to be recognized, and an output unit that outputs a prediction result of the prediction unit.

また、本発明の一態様は、文字列を表す文字列画像から文字を認識する文字列画像認識を行うことで得られる１文字以上の部分文字列の前記文字列画像における位置を推定するコンピュータに、前記部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像が入力されることで得られる、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを取得し、前記文字列画像上に書かれている文字列から前記部分文字列を生成した条件と同じ条件に基づいて、前記取得された文字列から部分文字列に分割する分割部、前記分割部で分割された部分文字列が含まれる前記文字認識対象の文字列画像における、前記分割された部分文字列の範囲を推定する推定部、前記推定部で推定された範囲と、当該範囲内に含まれる前記分割部で得られた部分文字列とをそれぞれ出力する出力部、として機能させるためのプログラムである。 Further, one aspect of the present invention provides a computer that estimates the position of a partial character string of one or more characters obtained by performing character string image recognition that recognizes characters from a character string image representing a character string. , a character string image in which a single line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the start and end of each sub-character string in the above-mentioned character string image. A character string included in the character string image obtained by inputting a character string image to be recognized into a learning model generated by learning using position information, and the character string to be recognized. The starting and ending position information in the image is acquired, and a partial string is generated from the acquired character string based on the same conditions as those used to generate the partial string from the character string written on the character string image. a dividing section that divides the partial string into two parts, an estimation section that estimates the range of the divided partial string in the character string image to be character recognized that includes the partial string divided by the dividing section; This is a program for functioning as an output unit that outputs a range obtained by dividing the range and a partial character string included in the range obtained by the dividing unit.

以上説明したように、この発明によれば、テキストシーケンス方式であっても、文字認識結果として得られた文字が、認識対象における文字列画像のどの位置にあるかを把握するまでの時間を低減することができる。 As explained above, according to the present invention, even in the text sequence method, the time required to determine the position of a character obtained as a character recognition result in a character string image to be recognized is reduced. can do.

この発明の一実施形態による学習モデル学習システムＳの構成を示す概略ブロック図である。1 is a schematic block diagram showing the configuration of a learning model learning system S according to an embodiment of the present invention. 文字列画像データベース１０１に記憶される学習データセットの一例を示す図である。3 is a diagram illustrating an example of a learning data set stored in a character string image database 101. FIG. 文字列画像上の部分文字列の始端・終端位置情報について説明する図である。FIG. 3 is a diagram illustrating starting and ending position information of a partial character string on a character string image. 本実施形態の深層学習モデル学習システムの動作例を示すフローチャートである。It is a flow chart showing an example of the operation of the deep learning model learning system of this embodiment. ディスプレイ４に表示される表示画面の一例を示す図である。3 is a diagram showing an example of a display screen displayed on a display 4. FIG.

以下、本発明の一実施形態による学習モデル学習システムについて図面を参照して説明する。図１は、この発明の一実施形態による学習モデル学習システムＳの構成を示す概略ブロック図である。
学習モデル学習システムＳは、深層学習モデル学習装置１、文字列画像予測装置２、文字位置推定装置３、ディスプレイ４を含む。
深層学習モデル学習装置１は、文字列画像データベース１０１、学習部１０２、学習モデル１０３、の各々を備えている。文字列画像予測装置２は、予測画像データベース２０１、予測部２０２、予測モデル２０３、出力部２０４の各々を備えている。文字位置推定装置３は、分割部３０１、推定部３０２、出力部３０３の各々を備えている。 Hereinafter, a learning model learning system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a schematic block diagram showing the configuration of a learning model learning system S according to an embodiment of the present invention.
The learning model learning system S includes a deep learning model learning device 1, a character string image prediction device 2, a character position estimation device 3, and a display 4.
The deep learning model learning device 1 includes a character string image database 101, a learning section 102, and a learning model 103. The character string image prediction device 2 includes a predicted image database 201, a prediction section 202, a prediction model 203, and an output section 204. The character position estimating device 3 includes a dividing section 301, an estimating section 302, and an output section 303.

文字列画像データベース１０１は、１行の文字列画像とそれに対応するテキストシーケンスデータ、シーケンスデータが分割された部分文字列の始端・終端位置情報を含む学習データセットを複数記憶する。
図２は、文字列画像データベース１０１に記憶される学習データセットの一例を示す図である。
この例では、学習データセットには、文字列画像１１０と、テキストシーケンスデータ１１１と、位置情報１１２とが含まれる。
文字列画像１１０は少なくとも２文字以上の文字列が画像として表される画像である。文字列画像１１０は、例えば、手書き文字によって記述された文字列が含まれる画像である。文字列画像１１０には、手書き文字以外に、活字やワードプロセッサによって記述され印刷された文字が含まれていてもよい。文字列画像１１０は、文書をスキャナによって光学的に読み取り、文字認識処理（ＯＣＲ（Optical Character Recognition）処理）が行われることで、読み取り対象の文書から抽出された１行分の記述内容を表す画像であってもよい。行は、縦書きであっても横書きであっても良い。この文書は、公的文書であってもよいし、契約書や、各種サービスの申し込み用紙であってもよい。
この手書き文字は、書き手によって独特の文字書体によって記載されていたり、旧字であったり、旧仮名遣いによって記載されている場合もある。また、手書き文字は、筆、鉛筆、ペンなどのような様々な筆記用具によって記載されている場合もある。そのため、文書を確認する担当者が、文字認識処理された結果として得られた文字列と、読み取り対象の文書に記載された文字列との対応関係が正しいかを確認する場合がある。この場合、旧字、旧仮名遣いによって記載されている文書である場合、旧字、旧仮名遣い等に関する知識が十分ではない担当者が対応関係を確認しようとすると、文字認識処理によって得られた文字が、文字列画像におけるどの部分に該当するかを把握するために時間がかかる場合がある。 The character string image database 101 stores a plurality of learning data sets including one line of character string images, corresponding text sequence data, and starting and ending position information of partial character strings into which the sequence data is divided.
FIG. 2 is a diagram showing an example of a learning data set stored in the character string image database 101.
In this example, the learning data set includes character string images 110, text sequence data 111, and position information 112.
The character string image 110 is an image in which a character string of at least two characters is represented as an image. The character string image 110 is, for example, an image that includes a character string written in handwritten characters. In addition to handwritten characters, the character string image 110 may include printed characters or characters written and printed using a word processor. The character string image 110 is an image representing the written content of one line extracted from the document to be read by optically reading the document with a scanner and performing character recognition processing (OCR (Optical Character Recognition) processing). It may be. The lines may be written vertically or horizontally. This document may be an official document, a contract, or an application form for various services.
These handwritten characters may be written in a unique font depending on the writer, or may be written in old characters, or may be written in old kana. Furthermore, handwritten characters may be written using various writing instruments such as brushes, pencils, pens, and the like. Therefore, the person in charge of checking the document may check whether the correspondence between the character string obtained as a result of character recognition processing and the character string written in the document to be read is correct. In this case, if the document is written in old characters and old kana, when a person in charge who does not have sufficient knowledge of old characters and old kana tries to check the correspondence, the characters obtained by character recognition processing , it may take time to figure out which part of the character string image it corresponds to.

テキストシーケンスデータ１１１は、文字列画像１１０について文字認識処理を行うことで得られた文字列であり、例えばテキストデータである。このテキストシーケンスデータ１１１は、文字列画像１１０に対する正解ラベルとして、文字列画像１１０に付与される。
ここで、テキストシーケンスデータ１１１は、一定の分割条件に従い、部分文字列として分割されている。例えばテキストシーケンスデータによって表される文字列が住所と氏名を含む場合、都道府県から町字まで、番地以下、氏名などに分割され、文字列が文章の場合には形態素解析を行うことで、形態素に分割される。この分割条件は、予め決められている。
例えば、文字列画像１１０から得られたテキストシーケンスデータ１１１が「東京都文京区水道○丁目○番地○号 ○○印刷株式会社 ○○花子」であり、このテキストシーケンスデータ１１１が、「東京都文京区水道」である部分文字列１１１ａ、「○丁目○番地○号」である部分文字列１１１ｂ、「○○印刷株式会社」である部分文字列１１１ｃ、「○○花子」である部分文字列１１１ｄに分割されている。 The text sequence data 111 is a character string obtained by performing character recognition processing on the character string image 110, and is, for example, text data. This text sequence data 111 is given to the character string image 110 as a correct label for the character string image 110.
Here, the text sequence data 111 is divided into partial character strings according to certain division conditions. For example, if a character string represented by text sequence data includes an address and a name, it is divided into prefectures, town characters, street numbers, names, etc. If the character string is a sentence, morphological analysis is performed to divided into This division condition is determined in advance.
For example, the text sequence data 111 obtained from the character string image 110 is "○○ Hanako, Suido ○-chome, Bunkyo-ku, Tokyo, ○○ Printing Co., Ltd."; A partial character string 111a that is "Waterworks District", a partial character string 111b that is "○-chome ○ address ○", a partial character string 111c that is "○○ Printing Co., Ltd.", and a partial character string 111d that is "○○ Hanako". It is divided into

次に以下に始端・終端情報の例について説明する。
図３は、始端・終端位置情報の一例を表す図である。
始端・終端位置情報は、文字列画像が表す１行の先頭から末尾までの範囲のうち、部分文字列の表す文字列が画像として存在する領域の開始位置と終了位置を表す情報である。
文字列画像４１０は、文字列画像データベース１０１に登録されている文字列画像の一例である。文字列画像４２０は、この文字列画像４１０に対して始端・終端位置情報を表した場合の一例を示す。文字列画像４２０では、第１領域４２１、第２領域４２２、第３領域４２３、第４領域４２４のように、４つの領域が設定されている。各領域は隣接しており、領域の間には隙間が無いように設定されている。この場合における始端・終端位置情報は、文字列画像４１０の左端（文字列の先頭側）を基準として、各領域の境界の座標が、部分文字列の始端・終端位置情報を示す。
例えば、文字列画像４２０の始端の座標はＰ１、終端の座標は位置Ｐ５である。
第１領域４２１の始端の座標は位置Ｐ１、終端の座標は位置Ｐ２である。
第２領域４２２の始端の座標は位置Ｐ２、終端の座標は位置Ｐ３である。
第３領域４２３の始端の座標は位置Ｐ３、終端の座標は位置Ｐ４である。
第４領域４２４の始端の座標は位置Ｐ４、終端の座標は位置Ｐ５である。
ここでは、文字列画像４１０が、横書きの画像であるため、文字列画像４１０の左端を基準としたが、縦書きの画像の場合には上端を基準にしてもよい。 Next, an example of start end/end end information will be explained below.
FIG. 3 is a diagram showing an example of start end/end end position information.
The start/end position information is information representing the start position and end position of the area where the character string represented by the partial character string exists as an image, within the range from the beginning to the end of one line represented by the character string image.
The character string image 410 is an example of a character string image registered in the character string image database 101. The character string image 420 shows an example of the case where starting and ending position information is expressed for the character string image 410. In the character string image 420, four areas are set, such as a first area 421, a second area 422, a third area 423, and a fourth area 424. Each region is adjacent to each other, and the settings are such that there are no gaps between the regions. In this case, the coordinates of the boundary of each region indicate the start and end position information of the partial character string, with the left end of the character string image 410 (the beginning of the character string) as a reference.
For example, the coordinates of the starting end of the character string image 420 are P1, and the coordinates of the ending end are position P5.
The coordinates of the starting end of the first region 421 are position P1, and the coordinates of the ending end are position P2.
The coordinates of the starting end of the second region 422 are position P2, and the coordinates of the ending end are position P3.
The coordinates of the starting end of the third region 423 are position P3, and the coordinates of the ending end are position P4.
The coordinates of the starting end of the fourth region 424 are position P4, and the coordinates of the ending end are position P5.
Here, since the character string image 410 is a horizontally written image, the left end of the character string image 410 is used as a reference, but in the case of a vertically written image, the upper end may be used as a reference.

また、前記の例以外の始端・終端位置情報の例として文字列画像４３０を示す。文字列画像４３０において記述された内容は、文字列画像４１０と同じであるが、始端・終端位置情報の設定の仕方が異なる。
文字列画像４３０では、第５領域４３１、第６領域４３２、第７領域４３３、第８領域４３４のように、４つの領域が設定されているが、文字列画像４３０の画像全体に隙間なく設定されているのではなく、部分文字列間の空白となっている領域については部分文字列領域に含めず、各部分文字列に該当する部分に応じて領域が設定されている。
例えば、文字列画像４３０の始端の座標はＰ１０、終端の座標は位置Ｐ５０である。
第５領域４３１の始端の座標は位置Ｐ１１、終端の座標は位置Ｐ１２である。文字列画像４３０の始端と第５領域４３１の始端との間には隙間がある。
第６領域４３２の始端の座標は位置Ｐ２１、終端の座標は位置Ｐ２２である。第５領域４３１の終端と第６領域４３２の始端との間には隙間がある。
第７領域４３３の始端の座標は位置Ｐ３１、終端の座標は位置Ｐ３２である。第６領域４３２の終端と第７領域４３３の始端との間には隙間がある。
第８領域４３４の始端の座標は位置Ｐ４１、終端の座標は位置Ｐ４２である。第７領域４３３の終端と第８領域４３４の始端との間には隙間がある。
また、第８領域４３４の終端と文字列画像４３０の終端との間には隙間がある。 Further, a character string image 430 is shown as an example of start/end position information other than the above example. The contents described in the character string image 430 are the same as those in the character string image 410, but the way of setting the start and end position information is different.
In the character string image 430, four areas are set, such as a fifth area 431, a sixth area 432, a seventh area 433, and an eighth area 434, but they are set in the entire image of the character string image 430 without any gaps. Instead, blank areas between partial character strings are not included in the partial character string area, and areas are set according to the portions corresponding to each partial character string.
For example, the coordinates of the starting end of the character string image 430 are P10, and the coordinates of the ending end are position P50.
The coordinates of the starting end of the fifth region 431 are position P11, and the coordinates of the ending end are position P12. There is a gap between the starting end of the character string image 430 and the starting end of the fifth area 431.
The coordinates of the starting end of the sixth region 432 are position P21, and the coordinates of the ending end are position P22. There is a gap between the end of the fifth region 431 and the start of the sixth region 432.
The coordinates of the starting end of the seventh region 433 are position P31, and the coordinates of the ending end are position P32. There is a gap between the end of the sixth region 432 and the beginning of the seventh region 433.
The coordinates of the starting end of the eighth region 434 are position P41, and the coordinates of the ending end are position P42. There is a gap between the end of the seventh region 433 and the start of the eighth region 434.
Further, there is a gap between the end of the eighth area 434 and the end of the character string image 430.

ここで図２では、文字列画像１１０において、位置Ｐａ、位置Ｐｂ、位置ＰＣ、位置Ｐｄが定められている。このような位置は、学習データセットが作成される際に、作成担当者がマウス、キーボード、タッチパネル等の入力装置を介して位置を指定することで設定されてもよい。
テキストシーケンスデータ１１１のうち、部分文字列１１１ａについて、始端の座標が位置Ｐａであり、終端の座標が位置Ｐｂである。部分文字列１１１ｂについて、始端の座標が位置Ｐｂであり、終端の座標が位置Ｐｃである。部分文字列１１１ｃについて、始端の座標が位置Ｐｃであり、終端の座標が位置Ｐｄである。部分文字列１１１ｄについて、始端の座標が位置Ｐｄであり、終端の座標が位置Ｐｅである。
各部分文字列と始端・終端位置情報との組み合わせは、上述の作成担当者によって入力装置を介して指定されることで設定されてもよい。
また、文字列画像１１０において、位置Ｐａと位置Ｐｂとの間の領域１１０ａ、位置Ｐｂと位置Ｐｃとの間の領域１１０ｂ、位置Ｐｃと位置Ｐｄとの間の領域１１０ｃ、位置Ｐｄと位置Ｐｅとの間の領域１１０ｄの４つの領域が、文字列画像１１０の先頭から順に並び、部分文字列１１１ａ、部分文字列１１１ｂ、部分文字列１１１ｃ、部分文字列１１１ｄについても、テキストシーケンスデータの先頭から順に並ぶため、これらの並び順に従って、部分文字列と領域との対応関係が定まるようになっていてもよい。 Here, in FIG. 2, a position Pa, a position Pb, a position PC, and a position Pd are determined in the character string image 110. Such a position may be set by a person in charge of creating the learning data set by specifying the position via an input device such as a mouse, keyboard, or touch panel.
In the text sequence data 111, the coordinates of the starting end of the partial character string 111a are the position Pa, and the coordinates of the ending end are the position Pb. Regarding the partial character string 111b, the coordinates of the starting end are position Pb, and the coordinates of the ending end are position Pc. Regarding the partial character string 111c, the coordinates of the starting end are position Pc, and the coordinates of the ending end are position Pd. Regarding the partial character string 111d, the coordinates of the starting end are position Pd, and the coordinates of the ending end are position Pe.
The combination of each partial character string and start/end position information may be set by being specified by the above-mentioned person in charge of creation via an input device.
In addition, in the character string image 110, an area 110a between positions Pa and Pb, an area 110b between positions Pb and Pc, an area 110c between positions Pc and Pd, and an area between positions Pd and Pe. The four areas 110d in between are arranged in order from the beginning of the character string image 110, and the partial character strings 111a, 111b, 111c, and 111d are also arranged in order from the beginning of the text sequence data. Therefore, the correspondence between partial character strings and areas may be determined according to the order in which they are arranged.

図１に戻り、学習部１０２は、学習モデル１０３を含む。学習部１０２は、文字列画像データベース１０１に登録された文字列画像データとテキストシーケンスデータ、各部分文字列の始端・終端位置情報を用いて、文字列画像と、部分文字列と、部分文字列が存在する文字列画像における領域との関係を学習することで、学習モデル１０３を生成する。学習部１０２が学習をする学習方式としては、例えば深層学習である。学習部１０２は、生成された学習モデル１０３を一時的に記憶し、文字列画像予測装置２に出力する。 Returning to FIG. 1, the learning unit 102 includes a learning model 103. The learning unit 102 uses the character string image data and text sequence data registered in the character string image database 101, and the start and end position information of each partial character string to generate character string images, partial character strings, and partial character strings. The learning model 103 is generated by learning the relationship with the region in the character string image where the character string image exists. The learning method used by the learning unit 102 is, for example, deep learning. The learning unit 102 temporarily stores the generated learning model 103 and outputs it to the character string image prediction device 2.

学習モデル１０３は、文字列画像と、文字列画像に書かれている文字列が何であるかと、部分文字列の始端・終端位置が文字列画像におけるどこであるかとの関係を学習したモデルである。 The learning model 103 is a model that has learned the relationship between a character string image, the character string written in the character string image, and where the starting and ending positions of a partial character string are in the character string image.

予測画像データベース２０１は、文字列画像データが登録されている。
予測画像データベース２０１に記憶される文字列画像データは、文字認識をする対象の文書を光学的に読み取ることで生成された画像から抽出された文字列画像である。 The predicted image database 201 has registered character string image data.
The character string image data stored in the predicted image database 201 is a character string image extracted from an image generated by optically reading a document to be subjected to character recognition.

予測部２０２は、予測モデル２０３を含む。
予測部２０２は、学習部１０２において生成された学習モデル１０３を深層学習モデル学習装置１から取得して記憶する。
予測部２０２は予測画像データベース２０１に登録されている文字列画像を予測モデル２０３に入力し、予測モデル２０３から出力される予測結果を出力部２０４に出力する。この予測結果は、文字列画像に応じた文字列（テキストシーケンスデータ）と、文字列画像に対して定められた始端・終端位置情報とを含む。ここで得られるテキストシーケンスデータに、少なくとも２つ以上の部分文字列が含まれる場合には、始端・終端位置情報が複数組得られる。 The prediction unit 202 includes a prediction model 203.
The prediction unit 202 acquires the learning model 103 generated by the learning unit 102 from the deep learning model learning device 1 and stores it.
The prediction unit 202 inputs character string images registered in the predicted image database 201 to the prediction model 203, and outputs the prediction result output from the prediction model 203 to the output unit 204. This prediction result includes a character string (text sequence data) corresponding to the character string image and start and end position information determined for the character string image. If the text sequence data obtained here includes at least two or more partial character strings, multiple sets of start/end position information are obtained.

予測モデル２０３は、深層学習モデル学習装置１で学習された学習モデル１０３であり、文字列画像を入力として受け取り、文字列と始端・終端位置情報を出力する。 The prediction model 203 is a learning model 103 trained by the deep learning model learning device 1, receives a character string image as input, and outputs a character string and start/end position information.

出力部２０４は、予測画像データベース２０１から画像が入力された予測モデル２０３の出力（文字列と始端・終端位置情報）を受け取り、文字位置推定装置３に出力するとともに、ディスプレイ４に出力する。ディスプレイ４は、文字列と始端・終端位置情報を表示する。ここで出力部２０４は、予測モデル２０３の出力だけでなく、予測モデル２０３に入力された文字列画像についてもディスプレイ４に出力してもよい。この場合、ディスプレイ４は、文字列画像と、文字列と、始端・終端位置情報とを表示することができる。 The output unit 204 receives the output (character string and start/end position information) of the prediction model 203 into which an image is input from the prediction image database 201 and outputs it to the character position estimation device 3 and the display 4 . The display 4 displays the character string and starting and ending position information. Here, the output unit 204 may output not only the output of the prediction model 203 but also the character string image input to the prediction model 203 to the display 4. In this case, the display 4 can display a character string image, a character string, and start/end position information.

分割部３０１は、予測モデル２０３から得られる予測結果（文字列と始端・終端位置情報）を出力部２０４から受け取る。分割部３０１は、文字列画像データベース１０１に登録されている文字列に対応するテキストシーケンスデータから部分文字列を得る際に用いられた分割条件と同じ分割条件に基づいて、出力部２０４から得られた文字列（テキストシーケンスデータ）を部分文字列に分割する。 The dividing unit 301 receives the prediction result (character string and start/end position information) obtained from the prediction model 203 from the output unit 204 . The dividing unit 301 generates partial character strings obtained from the output unit 204 based on the same dividing conditions as those used when obtaining partial character strings from text sequence data corresponding to character strings registered in the character string image database 101. Divide a string (text sequence data) into substrings.

推定部３０２は、出力部２０４から予測結果として得られた始端・終端位置情報と、分割部３０１によって分割された各部分文字列とを入力し、文字列画像における、各部分文字列に対応する始端・終端位置を推定する。 The estimation unit 302 inputs the start/end position information obtained as a prediction result from the output unit 204 and each partial character string divided by the dividing unit 301, and calculates a value corresponding to each partial character string in the character string image. Estimate the start and end positions.

出力部３０３は、推定部３０２で推定された各部分文字列と、当該各部分文字列に対応する始端・終端位置を受け取り、ディスプレイ４に出力して表示させる。 The output unit 303 receives each partial character string estimated by the estimating unit 302 and the start and end positions corresponding to each partial character string, and outputs and displays them on the display 4.

上述の深層学習モデル学習装置１、文字列画像予測装置２、文字位置推定装置３は、それぞれコンピュータである。学習部１０２、予測部２０２、出力部２０４、分割部３０１、推定部３０２、出力部３０３は、例えばＣＰＵ（中央処理装置）等の処理装置若しくは専用の電子回路で構成されてよい。 The above-described deep learning model learning device 1, character string image prediction device 2, and character position estimation device 3 are each computers. The learning unit 102, the prediction unit 202, the output unit 204, the division unit 301, the estimation unit 302, and the output unit 303 may be configured with a processing device such as a CPU (central processing unit) or a dedicated electronic circuit, for example.

文字列画像データベース１０１、予測画像データベース２０１は、記憶媒体、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、フラッシュメモリ、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓｒｅａｄ／ｗｒｉｔｅＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、またはこれらの記憶媒体の任意の組み合わせによって構成される。
文字列画像データベース１０１、予測画像データベース２０１は、例えば、不揮発性メモリを用いることができる。 The character string image database 101 and the predicted image database 201 are stored in a storage medium such as an HDD (Hard Disk Drive), a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), or a RAM (Random Access). read/write Memory), ROM (Read Only Memory), or any combination of these storage media.
For example, nonvolatile memory can be used for the character string image database 101 and the predicted image database 201.

次に、図４は、本実施形態の深層学習モデル学習システムの動作例を示すフローチャートである。
〈ステップＳ１〉
学習部１０２は、文字列画像データベース１０１に記憶された文字列画像、テキストシーケンスデータ、始端・終端位置情報を学習データセットとして入力して学習することで、学習モデル１０３を生成する。 Next, FIG. 4 is a flowchart showing an example of the operation of the deep learning model learning system of this embodiment.
<Step S1>
The learning unit 102 generates a learning model 103 by learning by inputting character string images, text sequence data, and start/end position information stored in the character string image database 101 as a learning data set.

〈ステップＳ２〉
予測部２０２は、ステップＳ１において生成された学習モデル１０３を深層学習モデル学習装置１から受け取り、予測モデル２０３とする。予測部２０２は、この予測モデル２０３を用い、予測画像データベース２０１から選択した文字列画像を予測モデル２０３に入力することで、予測結果（テキストシーケンスデータと、文字列画像に対して定められた始端・終端位置情報）を出力部２０４に渡す。
出力部２０４は、予測結果を文字位置推定装置３に出力する。 <Step S2>
The prediction unit 202 receives the learning model 103 generated in step S1 from the deep learning model learning device 1, and uses it as a prediction model 203. Using this prediction model 203, the prediction unit 202 inputs the character string image selected from the predicted image database 201 into the prediction model 203, thereby generating a prediction result (text sequence data and a starting point determined for the character string image).・Terminal position information) is passed to the output unit 204.
The output unit 204 outputs the prediction result to the character position estimation device 3.

〈ステップＳ３〉
分割部３０１は、ステップＳ２において出力部２０４から受け取ったテキストシーケンスデータを、分割条件に基づいて分割することによって、部分文字列に分割する。 <Step S3>
The dividing unit 301 divides the text sequence data received from the output unit 204 in step S2 into partial character strings by dividing it based on division conditions.

〈ステップＳ４〉
推定部３０２は、ステップＳ２において出力部２０４から受け取った始端・終端位置情報と、分割部３０１によって分割された部分文字列を用い、各部分文字列がいずれの始端・終端位置情報に対応するかを推定する。 <Step S4>
The estimation unit 302 uses the start/end position information received from the output unit 204 in step S2 and the partial strings divided by the dividing unit 301 to determine which start/end position information each partial string corresponds to. Estimate.

〈ステップＳ５〉
出力部３０３は、ステップＳ２において予測された文字列全体（テキストシーケンスデータ）とステップＳ４において推定された各部分文字列と当該各部分文字列に対応する位置情報とをディスプレイ４に出力する。 <Step S5>
The output unit 303 outputs the entire character string (text sequence data) predicted in step S2, each partial character string estimated in step S4, and position information corresponding to each partial character string to the display 4.

図５は、ディスプレイ４に表示される表示画面の一例を示す図である。この表示画面５００は、文字位置推定装置３の出力部３０３から出力される表示画面の一例である。
画面の上側には、文字認識対象の文字列画像５１０が表示される。画面の下側には、文字列画像５１０を文字認識処理することで得られたテキストシーケンスデータ５２０が表示される。
テキストシーケンスデータを確認する担当者は、入力装置を用いて操作入力することで、テキストシーケンスデータ５２０のうち、確認したい対象の文字に対して操作子５１１の位置を合わせる。文字位置推定装置３は、操作子５１１の位置にある文字が属する領域を、当該文字が属する部分文字列の始端・終端位置情報に基づいて特定する。そして、文字位置推定装置３は、文字列画像５１０のうち、特定された領域を他の領域とは異なる表示態様で表示する。ここでは例えば、文字列画像５１０に対し、当該文字列画像５１０を視認可能に透過させるようにして、特定された領域に対して任意の色（例えば赤、黄色、緑等のうちいずれか）を重ねて表示する。これにより担当者は、テキストシーケンスデータにおいて確認したい文字を操作子５１１によって指し示すことで、その文字が属する文字列画像の範囲を把握することができる。これにより、担当者は、文字列画像ではなく、その範囲に絞って確認をすればすむため、確認時間が長引かないようにすることができる。 FIG. 5 is a diagram showing an example of a display screen displayed on the display 4. As shown in FIG. This display screen 500 is an example of a display screen output from the output unit 303 of the character position estimation device 3.
A character string image 510 to be character recognized is displayed on the upper side of the screen. At the bottom of the screen, text sequence data 520 obtained by subjecting the character string image 510 to character recognition processing is displayed.
The person in charge of checking the text sequence data aligns the position of the operator 511 with the character to be checked in the text sequence data 520 by inputting an operation using an input device. The character position estimating device 3 identifies the area to which the character located at the position of the operator 511 belongs based on the start and end position information of the partial character string to which the character belongs. Then, the character position estimation device 3 displays the specified region in the character string image 510 in a display mode different from that of other regions. Here, for example, the character string image 510 is made transparent so that the character string image 510 can be visually recognized, and an arbitrary color (for example, any one of red, yellow, green, etc.) is applied to the specified area. Display overlapping. As a result, the person in charge can grasp the range of the character string image to which the character belongs by pointing to the character to be confirmed in the text sequence data using the operator 511. This allows the person in charge to check only that range rather than the character string image, so the checking time can be prevented from taking too long.

上述した実施形態によれば、文字列画像に対して１文字ごとに枠や文字ラベルを作成するのではなく、文字列画像を、１文字以上の部分文字列２つ以上に分割したのち、その部分文字列に対しての始端・終端位置情報、及びテキストシーケンスデータを用いて機械学習認識器を学習し、その機械学習認識器を用いて文字列画像上の文字列及び部分文字列の位置（始端・終端位置）を推定することで、１文字ごとに枠やラベルを作成するより手間を省き、かつ認識時には部分文字列における各文字について、文字列画像における大まかな位置（領域）を推定することができる。そのため、例えば、テキストシーケンスデータを確認する担当者は、テキストシーケンスデータに属する部分文字列あるいは１つの文字が、文字列画像におけるどのブロックに存在するかを速やかに把握することができ、確認作業時間が長引くことを低減することができる。 According to the embodiment described above, instead of creating a frame or a character label for each character in a character string image, the character string image is divided into two or more substrings of one or more characters, and then A machine learning recognizer is trained using the start and end position information for substrings and text sequence data, and the machine learning recognizer is used to determine the position of character strings and substrings on character string images ( By estimating the starting and ending positions), it saves time compared to creating a frame or label for each character, and during recognition, the rough position (area) of each character in a partial string can be estimated in the character string image. be able to. Therefore, for example, a person in charge of checking text sequence data can quickly understand in which block in a character string image a partial character string or a single character belonging to the text sequence data exists, which saves time on checking. This can reduce the prolongation of the problem.

なお、上述した深層学習モデル学習装置１は、学習モデル学習装置の一例である。
学習モデル学習装置は、文字列を表す画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を学習する。
学習モデル学習装置は、部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで学習モデルを生成する学習部を有する。この学習部の一例は、学習部１０２である。 Note that the deep learning model learning device 1 described above is an example of a learning model learning device.
The learning model learning device performs character string image recognition that recognizes characters from images representing character strings, and recognizes character strings written in character string images and partial character strings that include one or more characters in the character string image. Learn location.
The learning model learning device uses a character string image in which a single line of character strings containing two or more partial character strings is written, a character string written on the character string image, and each partial character string in the character string image. It has a learning section that generates a learning model by learning using the starting and ending position information. An example of this learning section is the learning section 102.

また、上述した文字列画像予測装置２は、文字列画像位置予測装置の一例である。
文字列画像位置予測装置は、文字列を表す文字列画像から文字を認識する文字列画像認識において当該文字列画像に記載された文字列、および前記文字列画像における１文字以上の文字が含まれる部分文字列の位置を予測する。
文字列画像位置予測装置は、部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像を入力することで、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを得る予測部と、前記予測部の予測結果を出力する出力部、を備える。この予測部の一例は、予測部２０２であり、出力部の一例は、出力部２０４である。 Furthermore, the above-described character string image prediction device 2 is an example of a character string image position prediction device.
The character string image position prediction device performs character string image recognition that recognizes characters from a character string image representing a character string, and includes a character string written in the character string image and one or more characters in the character string image. Predict the position of a substring.
The character string image position prediction device detects a character string image in which a single line of character strings containing two or more partial character strings is written, a character string written on the character string image, and each part in the character string image. By inputting a character string image to be recognized into a learning model generated by learning using the starting and ending position information of a character string, the character string included in the character string image and the character recognition target can be input. A prediction unit that obtains start and end position information in a character string image, and an output unit that outputs a prediction result of the prediction unit. An example of this prediction unit is the prediction unit 202, and an example of the output unit is the output unit 204.

また、上述した文字位置推定装置３は、文字位置推定装置の一例である。
文字位置推定装置は、文字列を表す文字列画像から文字を認識する文字列画像認識を行うことで得られる１文字以上の部分文字列の前記文字列画像における位置を推定する。
文字位置推定装置は、部分文字列が２つ以上含まれる１行の文字列が書かれた文字列画像、前記文字列画像上に書かれている文字列及び前記文字列画像における各部分文字列の始端・終端位置情報を用いて学習することで生成された学習モデルに、文字認識対象の文字列画像が入力されることで得られる、当該文字列画像に含まれる文字列と、前記文字認識対象の文字列画像における始端・終端位置情報とを取得し、前記文字列画像上に書かれている文字列から前記部分文字列を生成した条件と同じ条件に基づいて、前記取得された文字列から部分文字列に分割する分割部と、前記分割部で分割された部分文字列が含まれる前記文字認識対象の文字列画像における、前記分割された部分文字列の範囲を推定する推定部と、前記推定部で推定された範囲と、当該範囲内に含まれる前記分割部で得られた部分文字列とをそれぞれ出力する出力部と、を備える。この分割部の一例は、分割部３０１であり、推定部の一例は、推定部３０２であり、出力部の一例は、出力部３０３である。 Further, the character position estimation device 3 described above is an example of a character position estimation device.
The character position estimation device estimates the position in the character string image of a partial character string of one or more characters obtained by performing character string image recognition to recognize characters from a character string image representing a character string.
The character position estimation device detects a character string image in which one line of character strings containing two or more partial character strings is written, a character string written on the character string image, and each partial character string in the character string image. The character string included in the character string image obtained by inputting a character string image to be recognized into a learning model generated by learning using the start and end position information of The obtained character string is obtained by acquiring start and end position information in the target character string image and generating the partial character string from the character string written on the character string image based on the same conditions as those used to generate the partial character string. a dividing unit that divides into partial character strings; an estimating unit that estimates the range of the divided partial character strings in the character string image to be recognized, which includes the partial character strings divided by the dividing unit; The apparatus further includes an output section that outputs the range estimated by the estimation section and the partial character strings obtained by the division section included within the range. An example of this dividing section is the dividing section 301, an example of the estimating section is the estimating section 302, and an example of the output section is the output section 303.

上述した実施形態における深層学習モデル学習装置１、文字列画像予測装置２、文字位置推定装置３を、コンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The deep learning model learning device 1, character string image prediction device 2, and character position estimation device 3 in the embodiments described above may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed. Note that the "computer system" herein includes hardware such as an OS and peripheral devices. Furthermore, the term "computer-readable recording medium" refers to portable media such as flexible disks, magneto-optical disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems. Furthermore, a "computer-readable recording medium" refers to a storage medium that dynamically stores a program for a short period of time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include a device that retains a program for a certain period of time, such as a volatile memory inside a computer system that is a server or client in that case. Further, the above-mentioned program may be one for realizing a part of the above-mentioned functions, or may be one that can realize the above-mentioned functions in combination with a program already recorded in the computer system. It may be realized using a programmable logic device such as an FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments of the present invention have been described above in detail with reference to the drawings, the specific configuration is not limited to these embodiments, and includes designs within the scope of the gist of the present invention.

１…深層学習モデル学習装置，２…文字列画像予測装置，３…文字位置推定装置，４…ディスプレイ，１０１…文字列画像データベース，１０２…学習部，１０３…学習モデル，１１０…文字列画像，１１０ａ…領域，１１０ｂ…領域，１１０ｃ…領域，１１０ｄ…領域，１１１…テキストシーケンスデータ，１１１ａ…部分文字列，１１１ｂ…部分文字列
１１１ｃ…部分文字列，１１１ｄ…部分文字列，１１２…位置情報，２０１…予測画像データベース，２０２…予測部，２０３…予測モデル，２０４…出力部，３０１…分割部，３０２…推定部，３０３…出力部，４１０…文字列画像，４２０…文字列画像，４２１…第１領域，４２２…第２領域，４２３…第３領域，４２４…第４領域，４３０…文字列画像，４３１…第５領域，４３２…第６領域，４３３…第７領域，４３４…第８領域，５００…表示画面，５１０…文字列画像，５１１…操作子，５２０…テキストシーケンスデータ，Ｓ…学習モデル学習システム DESCRIPTION OF SYMBOLS 1... Deep learning model learning device, 2... Character string image prediction device, 3... Character position estimation device, 4... Display, 101... Character string image database, 102... Learning unit, 103... Learning model, 110... Character string image, 110a...area, 110b...area, 110c...area, 110d...area, 111...text sequence data, 111a...partial character string, 111b...partial character string 111c...partial character string, 111d...partial character string, 112...position information, 201... Predicted image database, 202... Prediction unit, 203... Prediction model, 204... Output unit, 301... Division unit, 302... Estimation unit, 303... Output unit, 410... Character string image, 420... Character string image, 421... 1st area, 422...2nd area, 423...3rd area, 424...4th area, 430...character string image, 431...5th area, 432...6th area, 433...7th area, 434...8th area Area, 500...Display screen, 510...Character string image, 511...Manipulator, 520...Text sequence data, S...Learning model learning system

Claims

In character string image recognition, which recognizes characters from a character string image representing a character string, the character string written in the character string image and the position of a partial character string containing one or more characters in the character string image are learned. A learning model learning device,
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. A learning model learning device characterized by comprising a learning section that generates a learning model by learning using information.

A character string image position prediction device that predicts the position of a character string written in a character string image and a partial character string containing one or more characters in the character string image in character string image recognition,
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. By inputting a character string image for character recognition into a learning model generated by learning using information, the character string included in the character string image and the starting point/point in the character string image for character recognition can be input. a prediction unit that obtains terminal position information;
an output unit that outputs the prediction result of the prediction unit;
A character string image position prediction device comprising:

A character position estimation device that estimates the position in the character string image of a partial character string of one or more characters obtained by performing character string image recognition to recognize characters from a character string image representing a character string,
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. The character string included in the character string image and the character string image to be recognized, which are obtained by inputting the character string image to be recognized into a learning model generated by learning using information. , and convert the obtained character string into a partial string based on the same conditions as those used to generate the partial string from the character string written on the character string image. A dividing part to be divided,
an estimating unit that estimates a range of the divided partial character strings in the character string image to be recognized, which includes the partial character strings divided by the dividing unit;
an output unit that outputs a range estimated by the estimation unit and a partial character string obtained by the division unit included within the range;
A character position estimation device comprising:

The learning model learned by the learning unit is learned by deep learning using text sequence data,
The learning model learning device according to claim 1.

In character string image recognition, which recognizes characters from a character string image representing a character string, the character string written in the character string image and the position of a partial character string containing one or more characters in the character string image are learned. A learning model learning method, comprising:
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. A learning model learning method characterized by generating a learning model by learning using information.

Predicting the position of a character string written in a character string image and a partial character string containing one or more characters in the character string image in character string image recognition that recognizes characters from a character string image representing a character string. A method for predicting character string image position, the method comprising:
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. By inputting a character string image for character recognition into a learning model generated by learning using information, the character string included in the character string image and the starting point/point in the character string image for character recognition can be input. A character string image prediction method characterized by acquiring end position information as a prediction result.

A character position estimation method for estimating the position in the character string image of a partial character string of one or more characters obtained by performing character string image recognition to recognize characters from a character string image representing a character string, the method comprising:
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. The character string included in the character string image and the character string image to be recognized, which are obtained by inputting the character string image to be recognized into a learning model generated by learning using information. , and convert the obtained character string into a partial string based on the same conditions as those used to generate the partial string from the character string written on the character string image. divide,
estimating the range of the divided partial character string in the character string image to be recognized, which includes the divided partial character string;
outputting the estimated range and the divided partial character strings included in the range, respectively;
A character position estimation method characterized by the following.

In character string image recognition, which recognizes characters from a character string image representing a character string, the character string written in the character string image and the position of a partial character string containing one or more characters in the character string image are learned. to the computer,
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. A program that functions as a learning section that generates a learning model by learning using information.

Predicting the position of a character string written in a character string image and a partial character string containing one or more characters in the character string image in character string image recognition that recognizes characters from a character string image representing a character string. to the computer,
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. By inputting a character string image to be recognized into a learning model that is generated by learning using information, the character string included in the character string image and the starting point/point in the character string image to be recognized are determined. a prediction unit that obtains terminal position information;
an output unit that outputs the prediction result of the prediction unit;
A program to function as

A computer that estimates the position in the character string image of a partial character string of one or more characters obtained by performing character string image recognition that recognizes characters from a character string image representing a character string,
A character string image in which one line of character strings containing two or more of the above-mentioned sub-character strings is written, a character string written on the above-mentioned character string image, and the starting and ending positions of each sub-character string in the above-mentioned character string image. The character string included in the character string image and the character string image to be recognized, which are obtained by inputting the character string image to be recognized into a learning model generated by learning using information. , and convert the obtained character string into a partial string based on the same conditions as those used to generate the partial string from the character string written on the character string image. The dividing part to be divided,
an estimating unit that estimates a range of the divided partial character strings in the character string image to be recognized, which includes the partial character strings divided by the dividing unit;
an output unit that outputs the range estimated by the estimation unit and the partial character strings obtained by the division unit included within the range;
A program to function as