JP2021005164A

JP2021005164A - Character recognition device, imaging device, character recognition method, and character recognition program

Info

Publication number: JP2021005164A
Application number: JP2019117702A
Authority: JP
Inventors: 大資玉城; Daisuke Tamaki; 健太郎須藤; Kentaro Sudo
Original assignee: Exa Wizards Inc
Current assignee: Exa Wizards Inc
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2021-01-14
Anticipated expiration: 2039-06-25
Also published as: JP6779491B1

Abstract

To accurately read characters even when the characters are not unified in an image.SOLUTION: A character recognition device (1) comprises: a first model being a convolutional neural network which outputs a feature quantity of a character image; a second model being a recurrent neural network which receives the feature quantity outputted from the first model once or more and outputs character information; and a character processing unit which uses the models to output character information indicative of a character included in the character image.SELECTED DRAWING: Figure 1

Description

本発明は、文字認識装置、撮影装置、文字認識方法、および、文字認識プログラムに関する。 The present invention relates to a character recognition device, a photographing device, a character recognition method, and a character recognition program.

従来、ＡＩ（Artificial Intelligence）を利用して画像から文字を読み取る技術が知られている。例えば、非特許文献１には、１行に並んだ文字を含む画像を機械学習モデルに入力して得られた特徴量行列（特徴マップ）を横方向に１列ずつ切り出しながら、文字を認識する技術が開示されている。 Conventionally, a technique of reading characters from an image using AI (Artificial Intelligence) is known. For example, in Non-Patent Document 1, characters are recognized while cutting out a feature matrix (feature map) obtained by inputting an image containing characters arranged in one line into a machine learning model one column at a time in the horizontal direction. The technology is disclosed.

Palaiahnakote Shivakumara、外5名、"CNN-RNN based method for license plate recognition"、[online]、CAAI Trans. Intell. Technol., 2018, Vol. 3, Iss. 3, pp. 169-175、[令和１年６月５日検索]、インターネット<URL:https://core.ac.uk/download/pdf/161769815.pdf>Palaiahnakote Shivakumara, 5 outsiders, "CNN-RNN based method for license plate recognition", [online], CAAI Trans. Intell. Technol., 2018, Vol. 3, Iss. 3, pp. 169-175, [Reiwa Search on June 5, 1], Internet <URL: https://core.ac.uk/download/pdf/161769815.pdf>

しかしながら、上述のような従来技術には、文字の配置、並びに、文字の大きさ、及びフォント等の文字の属性が統一されていない場合、読み取り精度が低下するという問題がある。例えば、複数行の文字を含む画像を処理しようとすると、特徴量行列から切り出した１列の特徴量に、縦方向に並んだ複数文字の特徴量が含まれることになるので、１列の特徴量を用いても１つの文字として認識することができないという問題がある。 However, the above-mentioned prior art has a problem that the reading accuracy is lowered when the character arrangement, the character size, and the character attributes such as the font are not unified. For example, when trying to process an image containing a plurality of lines of characters, the feature amount of one column cut out from the feature amount matrix includes the feature amount of a plurality of characters arranged in the vertical direction, so that the feature amount of one column is included. There is a problem that even if a quantity is used, it cannot be recognized as one character.

本発明の一態様は、画像において文字の配置および属性が統一されていなくても、文字を精度よく読み取ることのできる技術を提供することを目的とする。 One aspect of the present invention is to provide a technique capable of accurately reading characters even if the arrangement and attributes of the characters are not unified in the image.

上記の課題を解決するために、本発明の一態様に係る文字認識装置は、１以上の文字を含む文字画像が入力され、当該文字画像の特徴量を出力する畳み込みニューラルネットワークである第１のモデルと、前記第１のモデルが出力する前記特徴量が１回以上入力され、前記文字画像に含まれる文字を示す文字情報を出力する再帰型ニューラルネットワークである第２のモデルと、前記第１のモデルおよび前記第２のモデルを用いて、前記文字画像に含まれる文字を示す文字情報を出力する文字処理部と、を備える。 In order to solve the above problems, the character recognition device according to one aspect of the present invention is a first convolutional neural network in which a character image containing one or more characters is input and a feature amount of the character image is output. The model, the second model which is a recursive neural network in which the feature amount output by the first model is input one or more times and the character information indicating the character included in the character image is output, and the first model. The model and the second model are used to provide a character processing unit that outputs character information indicating characters included in the character image.

本発明の一態様に係る文字認識方法は、第１のモデルと、第２のモデルとを用いた文字認識方法であって、前記第１のモデルが、１以上の文字を含む文字画像が入力され、当該文字画像の特徴量を出力する畳み込みニューラルネットワークであり、前記第２のモデルが、前記第１のモデルが出力する前記特徴量が１回以上入力され、前記文字画像に含まれる文字を示す文字情報を出力する再帰型ニューラルネットワークであり、前記第１のモデルおよび前記第２のモデルを用いて、前記文字画像に含まれる文字を示す文字情報を出力する文字処理ステップを含む。 The character recognition method according to one aspect of the present invention is a character recognition method using a first model and a second model, in which the first model inputs a character image containing one or more characters. It is a convolutional neural network that outputs the feature amount of the character image, and the second model is input with the feature amount output by the first model one or more times, and the character included in the character image is input. It is a recursive neural network that outputs the indicated character information, and includes a character processing step that outputs character information indicating characters included in the character image using the first model and the second model.

本発明の一態様によれば、画像において文字の配置および属性が統一されていなくても、文字を精度よく読み取ることのできる技術を提供することができる。 According to one aspect of the present invention, it is possible to provide a technique capable of accurately reading characters even if the arrangement and attributes of the characters are not unified in the image.

本発明の一実施形態に係る文字認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the character recognition apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文字認識装置が処理する画像の例を示す図である。It is a figure which shows the example of the image processed by the character recognition apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文字認識装置の処理を示すフローチャートである。It is a flowchart which shows the processing of the character recognition apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文字部分画像から文字を出力させる処理を示すフローチャートである。It is a flowchart which shows the process which outputs the character from the character partial image which concerns on one Embodiment of this invention. 本発明の一実施形態に係るナンバープレートの例を示す図である。It is a figure which shows the example of the license plate which concerns on one Embodiment of this invention. 本発明の一実施形態に係るモデルの入出力を示す図である。It is a figure which shows the input / output of the model which concerns on one Embodiment of this invention. 本発明の一実施形態に係る文字認識装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware structure of the character recognition apparatus which concerns on one Embodiment of this invention.

〔実施形態〕
以下、本発明の一実施形態について、詳細に説明する。 [Embodiment]
Hereinafter, one embodiment of the present invention will be described in detail.

（文字認識装置１の構成）
図１は、本実施形態に係る文字認識装置１の構成を示すブロック図である。文字認識装置１は、入出力部１１、文字認識部１２、および、学習部１３を備えている。文字認識部１２は、文字処理部１４、第３のモデルＭＡ、第１のモデルＭＢ、および、第２のモデルＭＣを備えている。 (Configuration of character recognition device 1)
FIG. 1 is a block diagram showing a configuration of a character recognition device 1 according to the present embodiment. The character recognition device 1 includes an input / output unit 11, a character recognition unit 12, and a learning unit 13. The character recognition unit 12 includes a character processing unit 14, a third model MA, a first model MB, and a second model MC.

文字認識装置１には、カメラ（撮影装置）２と、表示装置３とが接続されている。カメラ２は、所定のシーンを撮影し、当該撮影した画像である撮影画像を文字認識装置１に出力する。カメラ２の種類としては、例えば、公共のため（例えば、交通安全監視システム等）に使われるカメラ、個人で使われるカメラの両方が含まれる。表示装置３は、文字認識装置１から出力された、撮影画像の文字認識結果を表示する。 A camera (photographing device) 2 and a display device 3 are connected to the character recognition device 1. The camera 2 captures a predetermined scene and outputs a captured image, which is the captured image, to the character recognition device 1. The type of camera 2 includes, for example, both a camera used for public purposes (for example, a traffic safety monitoring system, etc.) and a camera used for individuals. The display device 3 displays the character recognition result of the captured image output from the character recognition device 1.

文字認識部１２において、第３のモデルＭＡは、画像が入力され、当該画像において文字が存在する領域を示す領域情報を出力する畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）である。なお、本明細書において「文字」との文言には、記号、数字、および、各種言語の文字が含まれる。また、領域情報は、画像において文字が存在する領域の四隅の座標を含む。なお、第３のモデルＭＡは、文字が存在する領域を含む画像と、当該画像において文字が存在する領域の四隅の座標と含む教師データを用いて学習させたものである。 In the character recognition unit 12, the third model MA is a convolutional neural network (CNN) in which an image is input and region information indicating a region where characters exist in the image is output. In the present specification, the wording "character" includes symbols, numbers, and characters in various languages. In addition, the area information includes the coordinates of the four corners of the area in which the character exists in the image. The third model MA is trained using an image including a region in which characters exist and teacher data including the coordinates of the four corners of the region in which the characters exist in the image.

第１のモデルＭＢは、１以上の文字を含む文字画像が入力され、当該文字画像の特徴量を出力する畳み込みニューラルネットワークである。第２のモデルＭＣは、第１のモデルが出力する特徴量が１回以上入力され、文字画像に含まれる文字を示す文字情報を出力する再帰型ニューラルネットワーク（ＲＮＮ：Recurrent Neural Network）である。再帰型ニューラルネットワークは、長期短期記憶ネットワーク（ＬＳＴＭ：Long Short-Term Memory）であってもよい。なお、第１のモデルＭＢおよび第２のモデルＭＣは、１以上の文字を含む文字画像と、当該文字画像に含まれる文字を示す文字情報とを含む教師データを用いて学習させたものである。 The first model MB is a convolutional neural network in which a character image containing one or more characters is input and the feature amount of the character image is output. The second model MC is a recurrent neural network (RNN) in which the feature amount output by the first model is input one or more times and character information indicating the characters included in the character image is output. The recurrent neural network may be a long short-term memory network (LSTM). The first model MB and the second model MC are trained using teacher data including a character image including one or more characters and character information indicating characters included in the character image. ..

入出力部１１は、カメラ２の撮影画像から複数の文字を含む画像を抽出し、当該画像を文字処理部１４に出力する。また、入出力部１１は、文字処理部１４から出力された文字情報を表示装置３に表示させる。なお、入出力部１１が出力する文字情報は、表示装置３に加え、又は表示装置３に代えて、他の情報処理装置に提供される構成としてもよい。そのような他の情報処理装置において、上記の文字情報は、データベースに格納された他の文字情報と対比するために用いられたり、他のカメラで撮像された画像に基づく文字情報等と比較するために用いられたりする。 The input / output unit 11 extracts an image including a plurality of characters from the image captured by the camera 2 and outputs the image to the character processing unit 14. Further, the input / output unit 11 causes the display device 3 to display the character information output from the character processing unit 14. The character information output by the input / output unit 11 may be provided to another information processing device in addition to or in place of the display device 3. In such other information processing devices, the above character information is used for comparison with other character information stored in a database, or is compared with character information based on an image captured by another camera. It is also used for.

文字処理部１４は、第３のモデルＭＡ、第１のモデルＭＢおよび第２のモデルＭＣを用いて、文字画像に含まれる文字を示す文字情報を出力する。文字処理部１４は、入出力部１１から取得した画像を第３のモデルＭＡに入力し、当該画像から、第３のモデルＭＡが出力する領域情報が示す領域を切り出し、切り出した領域を文字画像として第１のモデルＭＢに入力する。ただし、文字処理部１４は、第３のモデルＭＡを用いない処理を行ってもよく、その場合、文字処理部１４は、入出力部１１から取得した画像を文字画像として第１のモデルＭＢに入力する。 The character processing unit 14 uses the third model MA, the first model MB, and the second model MC to output character information indicating characters included in the character image. The character processing unit 14 inputs the image acquired from the input / output unit 11 into the third model MA, cuts out the area indicated by the area information output by the third model MA from the image, and sets the cut out area as a character image. Is input to the first model MB. However, the character processing unit 14 may perform processing that does not use the third model MA, and in that case, the character processing unit 14 uses the image acquired from the input / output unit 11 as a character image in the first model MB. input.

そして、文字処理部１４は、第１のモデルＭＢが出力した特徴量を第２のモデルＭＣに１回以上入力し、第２のモデルＭＣが出力する１文字分の文字情報を連結することにより、文字画像に含まれる文字を取得する。一態様において、文字処理部１４は、第２のモデルＭＣに対し、文字画像の特徴量とともに、第２のモデルＭＣが前の回に出力した文字を示す文字情報を入力する。 Then, the character processing unit 14 inputs the feature amount output by the first model MB into the second model MC at least once, and concatenates the character information for one character output by the second model MC. , Get the characters contained in the character image. In one aspect, the character processing unit 14 inputs the feature amount of the character image and the character information indicating the character output by the second model MC in the previous time to the second model MC.

学習部１３は、第１のモデルＭＢおよび第２のモデルＭＣを、教師データを用いて学習させる。詳細には、学習部１３は、第２のモデルＭＣの出力から計算した損失関数を用いて、第１のモデルＭＢおよび第２のモデルＭＣをそれぞれ学習させる。また、学習部１３は、第３のモデルＭＡを、教師データを用いて学習させる。 The learning unit 13 trains the first model MB and the second model MC using the teacher data. Specifically, the learning unit 13 trains the first model MB and the second model MC, respectively, using the loss function calculated from the output of the second model MC. In addition, the learning unit 13 trains the third model MA using the teacher data.

（画像の例）
図２は、本実施形態に係る文字認識装置１が処理する画像の例を示す図である。２０１は、車体を含む道路の撮影画像の例である。２０２は、車体の画像（以下、車体画像という）の例である。２０３は、ナンバープレートの画像（以下、ナンバープレート画像という）の例である。 (Example of image)
FIG. 2 is a diagram showing an example of an image processed by the character recognition device 1 according to the present embodiment. Reference numeral 201 denotes an example of a photographed image of a road including a vehicle body. 202 is an example of a vehicle body image (hereinafter referred to as a vehicle body image). 203 is an example of a license plate image (hereinafter referred to as a license plate image).

（文字認識装置１の処理）
図３は、本実施形態に係る文字認識装置１の処理を示すフローチャートである。以下、文字認識装置１が車体を含む画像からナンバープレートの文字を読み取る（認識する）処理（すなわち、文字画像がナンバープレートを含む撮影画像または当該撮影画像の部分画像である構成）について説明する。本処理は、リアルタイムに行われる。なお、文字認識装置１は、カメラ２から画像を取得するのではなく、自装置が備える記録媒体から画像を取得してもよいし、ローカル又はグローバルなネットワークを介して画像を取得してもよい。 (Processing of character recognition device 1)
FIG. 3 is a flowchart showing the processing of the character recognition device 1 according to the present embodiment. Hereinafter, a process in which the character recognition device 1 reads (recognizes) the characters on the license plate from the image including the vehicle body (that is, the character image is a photographed image including the license plate or a partial image of the photographed image) will be described. This process is performed in real time. The character recognition device 1 may acquire an image from a recording medium provided in its own device, or may acquire an image via a local or global network, instead of acquiring an image from the camera 2. ..

（ステップＳ３０１）
文字認識装置１において、入出力部１１は、カメラ２から車体を含む道路の撮影画像（図２の２０１に対応）を取得する。 (Step S301)
In the character recognition device 1, the input / output unit 11 acquires a photographed image of the road including the vehicle body (corresponding to 201 in FIG. 2) from the camera 2.

（ステップＳ３０２）
次に、入出力部１１は、ステップＳ３０１で取得した撮影画像から車体を検出し、車体画像（図２の２０２に対応）を切り出す。撮影画像から車体を検出する処理には、任意の物体検出アルゴリズムが利用可能であり、例えば、他の手法に比べて処理が相対的に軽い「SqueezeDet」等が用いられる。これにより、高速な処理が可能となるので、リアルタイムな処理が実現できる。 (Step S302)
Next, the input / output unit 11 detects the vehicle body from the captured image acquired in step S301, and cuts out the vehicle body image (corresponding to 202 in FIG. 2). An arbitrary object detection algorithm can be used for the process of detecting the vehicle body from the captured image. For example, "SqueezeDet" or the like, which is relatively lighter than other methods, is used. As a result, high-speed processing becomes possible, and real-time processing can be realized.

（ステップＳ３０３）
そして、入出力部１１は、ステップＳ３０２で切り出した車体画像からナンバープレートを検出し、ナンバープレート画像（図２の２０３に対応）を切り出す。車体画像からナンバープレートを検出する処理には、任意の物体検出アルゴリズムが利用可能であり、例えば、ステップＳ３０２と同様に、SqueezeDet等が用いられる。これにより、高速な処理が可能となるので、リアルタイムな処理が実現できる。また、車体画像からナンバープレート画像を切り出すことにより、ナンバープレート画像と、車体画像とを対応させて管理することができる。 (Step S303)
Then, the input / output unit 11 detects the license plate from the vehicle body image cut out in step S302, and cuts out the license plate image (corresponding to 203 in FIG. 2). An arbitrary object detection algorithm can be used for the process of detecting the license plate from the vehicle body image, and for example, SqueezeDet or the like is used as in step S302. As a result, high-speed processing becomes possible, and real-time processing can be realized. Further, by cutting out the license plate image from the vehicle body image, the license plate image and the vehicle body image can be managed in association with each other.

（ステップＳ３０４）
続いて、入出力部１１は、ステップＳ３０３で切り出したナンバープレート画像を第３のモデルＭＡに入力して、ナンバープレート画像から文字部分画像を切り出す。後述する通り、文字部分画像を利用して文字認識を実行することにより、ナンバープレート画像を利用して文字認識を実行する場合よりも、識別精度を向上させることができる。 (Step S304)
Subsequently, the input / output unit 11 inputs the license plate image cut out in step S303 into the third model MA, and cuts out a character partial image from the license plate image. As will be described later, by performing character recognition using the character partial image, it is possible to improve the identification accuracy as compared with the case of performing character recognition using the license plate image.

学習部１３は、事前に、第３のモデルＭＡに対して、ナンバープレート画像に含まれる文字の部分（例えば、矩形領域）を学習させる。その場合、教師データとして、ナンバープレート画像に、文字を含む矩形領域の各頂点（４隅）の座標を付与したものが用いられる。 The learning unit 13 causes the third model MA to learn the character portion (for example, a rectangular area) included in the license plate image in advance. In that case, as the teacher data, a license plate image to which the coordinates of each vertex (four corners) of the rectangular region including characters are added is used.

なお、ステップＳ３０４の処理は、必須ではなく、省略してもよい。 The process of step S304 is not essential and may be omitted.

（ステップＳ３０５）
さらに、文字処理部１４は、第３のモデルＭＡから文字部分画像を取得し、当該文字部分画像を第１のモデルＭＢに入力し、第２のモデルＭＣに文字を出力させる。 (Step S305)
Further, the character processing unit 14 acquires a character partial image from the third model MA, inputs the character partial image to the first model MB, and causes the second model MC to output characters.

図５は、本実施形態に係るナンバープレートの例を示す図である。例えば、ナンバープレートが図５の５０３である場合、第２のモデルＭＣは「ＴＮ７７Ｊ８２８５」を出力する。なお、「−」を出力するように、第１のモデルＭＢおよび第２のモデルＭＣに学習させることもできる。ステップＳ３０５の処理の詳細は、図４を参照しながら、説明する。 FIG. 5 is a diagram showing an example of a license plate according to the present embodiment. For example, when the license plate is 503 in FIG. 5, the second model MC outputs "TN77J8285". It is also possible to train the first model MB and the second model MC so as to output "-". The details of the process of step S305 will be described with reference to FIG.

（教師データの作成方法）
モデルＭＡ、ＭＢおよびＭＣの教師データの作成方法の一例を、以下に示す。
i）多数のナンバープレート画像を取得する。
ii）個々のナンバープレート画像について、人の手で、四隅の座標のアノテーションを行う。
iii）個々のナンバープレート画像について、人の手で、文字列の読み取りを行う。このとき、多段文字等の読み取り方向、「−」の読み取りの有無等について統一しておく。
iv）i)のナンバープレート画像と、ii)の四隅の座標のアノテーションデータとを、文字部分画像を切り抜くための第３のモデルＭＡの教師データとする。
v）i)のナンバープレート画像とii)の四隅の座標のアノテーションデータとから文字部分画像を生成する。
vi）v）の文字部分画像と、iii）の文字列とを、第１のモデルＭＢおよび第２のモデルＭＣの教師データとする。 (How to create teacher data)
An example of how to create teacher data for models MA, MB and MC is shown below.
i) Obtain a large number of license plate images.
ii) Annotate the coordinates of the four corners by hand for each license plate image.
iii) For each license plate image, read the character string by hand. At this time, the reading direction of multi-stage characters and the presence or absence of reading "-" are unified.
iv) The license plate image of i) and the annotation data of the coordinates of the four corners of ii) are used as the teacher data of the third model MA for cutting out the character partial image.
v) Generate a character partial image from the license plate image of i) and the annotation data of the coordinates of the four corners of ii).
The character partial image of vi) v) and the character string of iii) are used as teacher data of the first model MB and the second model MC.

なお、ある程度学習が進んでからは、文字認識装置１自身が新規データに対して車体検出、ナンバープレート検出、ナンバープレートの切り出し、および文字読み取りを行う。そして、最後に人間が文字読み取りの間違えた箇所を修正することにより、アノテーションデータの半自動生成を実施することができる。 After learning has progressed to some extent, the character recognition device 1 itself performs vehicle body detection, license plate detection, license plate cutting, and character reading for new data. Finally, the annotation data can be semi-automatically generated by correcting the part where the human has made a mistake in reading the character.

（文字部分画像から文字を出力させる処理）
図４は、本実施形態に係る文字部分画像から文字を出力させる処理を示すフローチャートである。本処理は、図３のステップＳ３０５の処理を詳細にしたものである。図６は、本実施形態に係る第１のモデルＭＢおよび第２のモデルＭＣの入出力を示す図である。 (Process to output characters from the character partial image)
FIG. 4 is a flowchart showing a process of outputting characters from the character partial image according to the present embodiment. This process is a detailed version of the process of step S305 in FIG. FIG. 6 is a diagram showing input / output of the first model MB and the second model MC according to the present embodiment.

ここで、第１のモデルＭＢは、例えば、１０層以下の層数からなる畳み込みニューラルネットワークであるので、リアルタイムな応答性を有する。一例として、第１のモデルＭＢは、６層からなる畳み込みニューラルネットワークである。一態様において、第１のモデルＭＢは、ナンバープレートの文字部分画像が入力され、当該文字部分画像の特徴量行列を出力する。 Here, since the first model MB is, for example, a convolutional neural network having 10 or less layers, it has real-time responsiveness. As an example, the first model MB is a convolutional neural network consisting of 6 layers. In one aspect, in the first model MB, the character partial image of the license plate is input, and the feature matrix of the character partial image is output.

また、第２のモデルＭＣは、再帰型ニューラルネットワークであり、文字部分画像の特徴量行列が１回以上入力され、各回においてナンバープレートの文字を示す文字情報を１文字分ずつ出力する。 Further, the second model MC is a recurrent neural network, in which the feature matrix of the character partial image is input one or more times, and the character information indicating the characters of the license plate is output one character at each time.

（ステップＳ４０１）
文字認識装置１において、文字処理部１４は、第３のモデルＭＡから取得した文字部分画像を第１のモデルＭＢに入力し、特徴量行列を出力させる。この処理は、図６の（最初のステップ）における、第１のモデルＭＢの入出力に対応する。 (Step S401)
In the character recognition device 1, the character processing unit 14 inputs the character partial image acquired from the third model MA into the first model MB, and outputs a feature matrix. This process corresponds to the input / output of the first model MB in (first step) of FIG.

（ステップＳ４０２：文字処理ステップ）
次に、文字処理部１４は、第１のモデルＭＢが出力した特徴量行列と、最初を示す特殊文字とを第２のモデルＭＣに入力し、１文字を出力させる。ここで、文字処理部１４は、第１のモデルＭＢが出力した特徴量行列をそのまま第２のモデルＭＣに入力する。この処理は、図６の（最初のステップ）における、第２のモデルＭＣの入出力に対応する。 (Step S402: Character processing step)
Next, the character processing unit 14 inputs the feature matrix output by the first model MB and the special character indicating the beginning into the second model MC, and outputs one character. Here, the character processing unit 14 inputs the feature matrix output by the first model MB as it is into the second model MC. This process corresponds to the input / output of the second model MC in (first step) of FIG.

（ステップＳ４０３）
ここで、文字処理部１４は、第２のモデルＭＣが出力した１文字が最後を示す特殊文字であるか否かを判定する。１文字が最後を示す特殊文字である場合（ステップＳ４０３のＹＥＳ）、文字処理部１４は、処理を終了する。これは、図６の（最後のステップ）における、第２のモデルＭＣの入出力の後処理である。ここで、例えば、文字認識装置１は、ステップＳ４０２で出力された文字を連結して出力してもよい。 (Step S403)
Here, the character processing unit 14 determines whether or not one character output by the second model MC is a special character indicating the end. When one character is a special character indicating the end (YES in step S403), the character processing unit 14 ends the processing. This is the post-processing of the input and output of the second model MC in (the last step) of FIG. Here, for example, the character recognition device 1 may concatenate and output the characters output in step S402.

一方、１文字が最後を示す特殊文字でない場合（ステップＳ４０３のＮＯ）、文字処理部１４は、ステップＳ４０４の処理を実行する。 On the other hand, when one character is not a special character indicating the end (NO in step S403), the character processing unit 14 executes the process of step S404.

（ステップＳ４０４：文字処理ステップ）
文字処理部１４は、第２のモデルＭＣが先に出力した１文字と、第１のモデルＭＢが出力した特徴量行列とを第２のモデルＭＣに入力し、次の１文字を出力させる。第２のモデルＭＣにおいて、当該第２のモデルＭＣが先に出力した１文字は、次に文字部分画像のどこを見るべきかを決定するのに用いられると解釈される。この処理は、図６の（ｉ回目のステップ）における、第２のモデルＭＣの入出力に対応する。 (Step S404: Character processing step)
The character processing unit 14 inputs the one character first output by the second model MC and the feature matrix output by the first model MB into the second model MC, and outputs the next one character. In the second model MC, the one character output first by the second model MC is interpreted to be used to determine where to look next in the character partial image. This process corresponds to the input / output of the second model MC in (the i-th step) of FIG.

なお、一態様において、ステップＳ４０４において、文字処理部１４は、第２のモデルＭＣが先に出力した１文字を第２のモデルＭＣに入力しない構成であってもよい。この場合、第２のモデルＭＣの内部状態が、次に文字部分画像のどこを見るべきかを決定するのに用いられると解釈される。ただし、ステップＳ４０４において、文字処理部１４が、第２のモデルＭＣが先に出力した１文字を第２のモデルＭＣに入力する構成が好ましい。 In one aspect, in step S404, the character processing unit 14 may be configured not to input one character previously output by the second model MC into the second model MC. In this case, the internal state of the second model MC is interpreted to be used to determine where to look next in the text partial image. However, in step S404, it is preferable that the character processing unit 14 inputs one character previously output by the second model MC to the second model MC.

（本実施形態の効果）
文字を含む画像の特徴量行列を１列ずつ分割し、分割した１列の特徴量をＲＮＮに順次入力する非特許文献１の構成では、文字が多段に分けて配置されている画像を処理する場合に、１列の特徴量に複数の文字の特徴量が含まれる場合があるので、識別精度が低下してしまう。 (Effect of this embodiment)
In the configuration of Non-Patent Document 1 in which the feature amount matrix of an image containing characters is divided one column at a time and the feature amounts of the divided one column are sequentially input to the RNN, an image in which characters are arranged in multiple stages is processed. In some cases, the feature amount of one column may include the feature amount of a plurality of characters, so that the identification accuracy is lowered.

本実施形態に係る第２のモデルＭＣは、文字画像の特徴量行列の全体が入力され、１文字分の文字情報を出力する構成である。そのため、文字が多段に分けて配置されている画像を学習させることにより、文字が多段に分けて配置されている画像に対しても識別精度が低下しないようにすることができる。このように本実施形態によれば、画像において文字の配置および属性が統一されていなくても、文字を精度よく読み取ることのできる技術を提供することができる。 The second model MC according to the present embodiment has a configuration in which the entire feature matrix of the character image is input and character information for one character is output. Therefore, by learning the image in which the characters are arranged in multiple stages, it is possible to prevent the identification accuracy from being lowered even in the image in which the characters are arranged in multiple stages. As described above, according to the present embodiment, it is possible to provide a technique capable of accurately reading characters even if the arrangement and attributes of the characters are not unified in the image.

〔付記事項〕
上記実施形態では、車体のナンバープレートに含まれる文字を読み取ることを説明したが、本発明は、他の用途にも適用可能である。以下では、他の用途について説明する。 [Additional notes]
In the above embodiment, the characters included in the license plate of the vehicle body have been described, but the present invention can be applied to other uses. In the following, other uses will be described.

（１）文字認識装置１は、文字の配置、大きさ、フォントが統一されていない、ナンバープレート以外の任意の画像（例えば、名刺、看板、チラシなど）からリアルタイムな文字認識を行ってもよい。 (1) The character recognition device 1 may perform real-time character recognition from an arbitrary image (for example, a business card, a signboard, a leaflet, etc.) other than a license plate whose character arrangement, size, and font are not unified. ..

（２）文字認識装置１は、多段の文字を読み取ってもよい。すなわち、文字画像は、複数の段に分かれて配置された複数の文字を含むこととしてもよい。 (2) The character recognition device 1 may read a multi-stage character. That is, the character image may include a plurality of characters arranged in a plurality of stages.

（３）文字認識装置１は、モデルに画像を入力する直前に、超解像処理、複数フレームを利用したブレ補正処理等を行ってもよい。また、文字認識装置１は、文字画像（ナンバープレート画像）の傾きを補正する処理を行ってもよい。 (3) The character recognition device 1 may perform super-resolution processing, blur correction processing using a plurality of frames, and the like immediately before inputting an image into the model. Further, the character recognition device 1 may perform a process of correcting the inclination of the character image (license plate image).

（４）カメラ２の撮像画像は、赤外線画像であってもよい。 (4) The captured image of the camera 2 may be an infrared image.

（５）第２のモデルＭＣは、１文字ずつではなく、２文字以上ずつ出力するようにしてもよい。第２のモデルＭＣが１度に出力する文字数は、学習時の教師データにより、調整可能である。また、文字自体ではなく、文字を数字に置き換えたもの（文字を示す文字情報）を出力してもよい。 (5) The second model MC may output two or more characters instead of one character at a time. The number of characters output by the second model MC at one time can be adjusted by the teacher data at the time of learning. Further, instead of the character itself, the character replaced with a number (character information indicating the character) may be output.

（６）日本のナンバープレートに適用した場合、地名の辞書を用意しておくのがよい。 (6) When applied to Japanese license plates, it is advisable to prepare a dictionary of place names.

（７）カメラ２が文字認識装置１の少なくとも一部を備えた構成であってもよい。この場合、カメラ２は、撮影部と、文字認識装置１とを備え、文字処理部１４は、撮影部が撮影した撮影画像または当該撮影画像の部分画像を文字画像として、文字情報を出力してもよい。また、カメラ２は、撮影部と、入出力部１１と、文字認識部１２とを備え、文字処理部１４は、撮影部が撮影した撮影画像または当該撮影画像の部分画像を文字画像として、文字情報を出力してもよい。この場合、学習部１３の機能は、ネットワークを介してカメラ２に接続されたサーバにより提供してもよい。 (7) The camera 2 may be configured to include at least a part of the character recognition device 1. In this case, the camera 2 includes a shooting unit and a character recognition device 1, and the character processing unit 14 outputs character information by using the captured image captured by the photographing unit or a partial image of the captured image as a character image. May be good. Further, the camera 2 includes a photographing unit, an input / output unit 11, and a character recognition unit 12, and the character processing unit 14 uses a photographed image photographed by the photographing unit or a partial image of the photographed image as a character image and characters. Information may be output. In this case, the function of the learning unit 13 may be provided by a server connected to the camera 2 via the network.

〔ソフトウェアによる実現例〕
文字認識装置１の制御ブロック（特に、入出力部１１、文字認識部１２、および、学習部１３）は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of realization by software]
Even if the control block of the character recognition device 1 (in particular, the input / output unit 11, the character recognition unit 12, and the learning unit 13) is realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like. It may be realized by software.

後者の場合、文字認識装置１は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば１つ以上のプロセッサを備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the character recognition device 1 includes a computer that executes instructions of a program that is software that realizes each function. The computer includes, for example, one or more processors and a computer-readable recording medium that stores the program. Then, in the computer, the processor reads the program from the recording medium and executes it, thereby achieving the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, in addition to a "non-temporary tangible medium" such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. Further, a RAM (Random Access Memory) for expanding the above program may be further provided. Further, the program may be supplied to the computer via an arbitrary transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. It should be noted that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.

図７は、本実施形態に係る文字認識装置１のハードウェア構成の具体例を示すブロック図である。文字認識装置１は、バス１８を介して互いに接続された、入出力部１１と、演算装置１５と、主記憶装置１６と、補助記憶装置１７とを備えている。入出力部１１には、カメラ２、および、表示装置３が接続される。演算装置１５、主記憶装置１６、および補助記憶装置１７は、それぞれ、例えばプロセッサ（例えばＣＰＵ：Central Processing Unit，ＧＰＵ：Graphics Processing Unit等）、ＲＡＭ（random access memory）、ハードディスクドライブであってもよい。演算装置１５は、一例として、図１に示した、学習部１３および文字処理部１４を含む。主記憶装置１６および補助記憶装置１７は、一例として、図１に示した、第３のモデルＭＡ、第１のモデルＭＢ、および、第２のモデルＭＣを記憶している。 FIG. 7 is a block diagram showing a specific example of the hardware configuration of the character recognition device 1 according to the present embodiment. The character recognition device 1 includes an input / output unit 11, an arithmetic unit 15, a main storage device 16, and an auxiliary storage device 17, which are connected to each other via a bus 18. A camera 2 and a display device 3 are connected to the input / output unit 11. The arithmetic unit 15, the main storage device 16, and the auxiliary storage device 17 may be, for example, a processor (for example, CPU: Central Processing Unit, GPU: Graphics Processing Unit, etc.), RAM (random access memory), or a hard disk drive, respectively. .. The arithmetic unit 15 includes the learning unit 13 and the character processing unit 14 shown in FIG. 1 as an example. As an example, the main storage device 16 and the auxiliary storage device 17 store the third model MA, the first model MB, and the second model MC shown in FIG.

（まとめ）
本発明の態様１に係る文字認識装置は、１以上の文字を含む文字画像が入力され、当該文字画像の特徴量を出力する畳み込みニューラルネットワークである第１のモデルと、前記第１のモデルが出力する前記特徴量が１回以上入力され、前記文字画像に含まれる文字を示す文字情報を出力する再帰型ニューラルネットワークである第２のモデルと、前記第１のモデルおよび前記第２のモデルを用いて、前記文字画像に含まれる文字を示す文字情報を出力する文字処理部と、を備える。 (Summary)
The character recognition device according to the first aspect of the present invention includes a first model which is a convolutional neural network in which a character image containing one or more characters is input and outputs a feature amount of the character image, and the first model. A second model, which is a recursive neural network in which the feature amount to be output is input one or more times and character information indicating a character included in the character image is output, and the first model and the second model are used. It is provided with a character processing unit that outputs character information indicating characters included in the character image.

本発明の態様２に係る文字認識装置は、前記態様１において、前記文字処理部が、前記第２のモデルに対し、前記特徴量とともに、前記第２のモデルが前の回に出力した文字を示す文字情報を入力することとしてもよい。 In the character recognition device according to the second aspect of the present invention, in the first aspect, the character processing unit outputs the characters output by the second model to the second model together with the feature amount in the previous time. You may enter the character information to indicate.

本発明の態様３に係る文字認識装置は、前記態様１または２において、前記第１のモデルおよび前記第２のモデルが、１以上の文字を含む文字画像と、当該文字画像に含まれる文字を示す文字情報とを含む教師データを用いて学習させたものであることとしてもよい。 In the character recognition device according to the third aspect of the present invention, in the first or second aspect, the first model and the second model display a character image containing one or more characters and characters included in the character image. It may be that the training is performed using the teacher data including the indicated character information.

本発明の態様４に係る文字認識装置は、前記態様３において、前記第１のモデルおよび前記第２のモデルを、前記教師データを用いて学習させる学習部をさらに備えることとしてもよい。 In the third aspect, the character recognition device according to the fourth aspect of the present invention may further include a learning unit for learning the first model and the second model using the teacher data.

本発明の態様５に係る文字認識装置は、前記態様１から４において、画像が入力され、当該画像において文字が存在する領域を示す領域情報を出力する畳み込みニューラルネットワークである第３のモデルをさらに備え、前記文字処理部が、前記画像から、前記第３のモデルが出力する前記領域情報が示す領域を切り出し、切り出した領域を前記文字画像として前記第１のモデルに入力することとしてもよい。 The character recognition device according to the fifth aspect of the present invention further comprises a third model, which is a convolutional neural network in which an image is input and area information indicating an area in which a character exists in the image is output in the first to fourth aspects. The character processing unit may cut out a region indicated by the area information output by the third model from the image, and input the cut out area as the character image into the first model.

本発明の態様６に係る文字認識装置は、前記態様５において、前記第３のモデルが、文字が存在する領域を含む画像と、当該画像において文字が存在する領域の四隅の座標と含む教師データを用いて学習させたものであることとしてもよい。 In the character recognition device according to the sixth aspect of the present invention, in the fifth aspect, the third model includes a teacher data including an image including a region where characters exist and the coordinates of four corners of the region where characters exist in the image. It may be that it was trained using.

本発明の態様７に係る文字認識装置は、前記態様１から６において、前記文字画像は、複数の段に分かれて配置された複数の文字を含むこととしてもよい。 In the character recognition device according to the seventh aspect of the present invention, in the first to sixth aspects, the character image may include a plurality of characters arranged in a plurality of stages.

本発明の態様８に係る文字認識装置は、前記態様１から７において、前記文字画像は、ナンバープレートを含む撮影画像または当該撮影画像の部分画像であることとしてもよい。 In the character recognition device according to the eighth aspect of the present invention, in the first to seventh aspects, the character image may be a photographed image including a license plate or a partial image of the photographed image.

本発明の態様９に係る撮影装置は、撮影部と、前記態様１から８の何れかの文字認識装置とを備え、前記文字認識装置の前記文字処理部は、前記撮影部が撮影した撮影画像または当該撮影画像の部分画像を前記文字画像として、前記文字情報を出力する。 The photographing device according to the ninth aspect of the present invention includes a photographing unit and a character recognition device according to any one of the first to eighth aspects, and the character processing unit of the character recognition device is a photographed image taken by the photographing unit. Alternatively, the character information is output by using the partial image of the captured image as the character image.

本発明の態様１０に係る文字認識装置は、第１のモデルと、第２のモデルとを用いた文字認識方法であって、前記第１のモデルが、１以上の文字を含む文字画像が入力され、当該文字画像の特徴量を出力する畳み込みニューラルネットワークであり、前記第２のモデルが、前記第１のモデルが出力する前記特徴量が１回以上入力され、前記文字画像に含まれる文字を示す文字情報を出力する再帰型ニューラルネットワークであり、前記第１のモデルおよび前記第２のモデルを用いて、前記文字画像に含まれる文字を示す文字情報を出力する文字処理ステップを含む。 The character recognition device according to the tenth aspect of the present invention is a character recognition method using a first model and a second model, and the first model inputs a character image containing one or more characters. It is a convolutional neural network that outputs the feature amount of the character image, and the second model is input with the feature amount output by the first model one or more times, and the character included in the character image is input. It is a recursive neural network that outputs the indicated character information, and includes a character processing step that outputs character information indicating characters included in the character image using the first model and the second model.

本発明の各態様に係る文字認識装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記文字認識装置が備える各部（ソフトウェア要素）として動作させることにより上記文字認識装置をコンピュータにて実現させる文字認識装置の文字認識プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The character recognition device according to each aspect of the present invention may be realized by a computer. In this case, the character recognition device is made into a computer by operating the computer as each part (software element) included in the character recognition device. The character recognition program of the character recognition device and the computer-readable recording medium on which the character recognition device is recorded also fall within the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments obtained by appropriately combining the technical means disclosed in the different embodiments. Is also included in the technical scope of the present invention.

１文字認識装置
２カメラ（撮影装置）
１１入出力部
１２文字認識部
１３学習部
１４文字処理部
ＭＡ第３のモデル
ＭＢ第１のモデル
ＭＣ第２のモデル 1 character recognition device 2 camera (shooting device)
11 Input / output unit 12 Character recognition unit 13 Learning unit 14 Character processing unit MA 3rd model MB 1st model MC 2nd model

Claims

A first model, which is a convolutional neural network in which a character image containing one or more characters is input and the feature amount of the character image is output,
A second model, which is a recurrent neural network in which the feature amount output by the first model is input one or more times and character information indicating characters included in the character image is output,
Using the first model and the second model, a character processing unit that outputs character information indicating characters included in the character image, and a character processing unit.
A character recognition device characterized by being equipped with.

The first aspect of claim 1, wherein the character processing unit inputs character information indicating characters output by the second model in the previous time together with the feature amount to the second model. Character recognition device.

It is understood that the first model and the second model are trained by using teacher data including a character image including one or more characters and character information indicating characters included in the character image. The character recognition device according to claim 1 or 2.

The character recognition device according to claim 3, further comprising a learning unit for learning the first model and the second model using the teacher data.

It further comprises a third model, which is a convolutional neural network into which an image is input and outputs area information indicating the area in which characters exist in the image.
The character processing unit cuts out a region indicated by the area information output by the third model from the image, and inputs the cut out region as the character image into the first model. The character recognition device according to any one of 1 to 4.

5. The third model is characterized in that it is trained using an image including a region in which characters exist and teacher data including the coordinates of the four corners of the region in which the characters exist in the image. The character recognition device described in.

The character recognition device according to any one of claims 1 to 6, wherein the character image includes a plurality of characters arranged in a plurality of stages.

The character recognition device according to any one of claims 1 to 7, wherein the character image is a photographed image including a license plate or a partial image of the photographed image.

With the shooting department
The character recognition device according to any one of claims 1 to 8 is provided.
The character processing unit of the character recognition device is a photographing device that outputs the character information by using a photographed image taken by the photographing unit or a partial image of the photographed image as the character image.

A character recognition method using the first model and the second model.
The first model is a convolutional neural network in which a character image containing one or more characters is input and the feature amount of the character image is output.
The second model is a recurrent neural network in which the feature amount output by the first model is input one or more times and character information indicating characters included in the character image is output.
A character recognition method comprising the character processing step of outputting character information indicating a character included in the character image by using the first model and the second model.

The character recognition program for operating a computer as the character recognition device according to claim 1, wherein the first model, the second model, and character recognition for operating the computer as the character processing unit. program.