JP6829575B2

JP6829575B2 - Image processing equipment, image processing system and image processing method

Info

Publication number: JP6829575B2
Application number: JP2016195984A
Authority: JP
Inventors: 良介石田; 素子黒岩; 雅典後藤
Original assignee: Glory Ltd
Current assignee: Glory Ltd
Priority date: 2016-10-03
Filing date: 2016-10-03
Publication date: 2021-02-10
Anticipated expiration: 2036-10-03
Also published as: JP2018060296A

Description

本発明は、様々な異なる状況で撮像された表示板の画像を、正面から撮像した場合に相当する画像（以下、「正面画像」と言う）に効率的かつ精度良く変換することができる画像処理装置、画像処理システム及び画像処理方法に関する。 The present invention is an image process capable of efficiently and accurately converting an image of a display board captured in various different situations into an image corresponding to the case where the image is captured from the front (hereinafter referred to as "front image"). The present invention relates to an apparatus, an image processing system, and an image processing method.

従来、車両に取り付けられたナンバープレートや道路に配設された道路標識等の表示板を撮像して読み取る技術が知られている。例えば、全国的、地域的又は都市内において、骨格的な道路網を形成する道路（以下、「幹線道路」と言う）を通行する車両、有料道路の料金所を通過する車両、駐車場に出入りする車両などの車両番号を読み取るために、車両に付されたナンバープレートを撮像し、撮像されたナンバープレートを含む画像データに表記された車両番号を読み取る装置が設置されている。 Conventionally, there is known a technique of capturing and reading a display board such as a license plate attached to a vehicle or a road sign arranged on a road. For example, vehicles passing through roads that form a skeletal road network (hereinafter referred to as "main roads"), vehicles passing through toll gates on toll roads, and entering and exiting parking lots nationwide, locally, or within cities. In order to read the vehicle number of the vehicle or the like, a device is installed that images the license plate attached to the vehicle and reads the vehicle number indicated in the image data including the imaged license plate.

例えば、特許文献１には、画像中のナンバープレートの位置に応じてあらかじめ座標変換パラメータを記憶しておき、画像中のナンバープレートの位置を検出したならば、その位置に応じた座標変換パラメータでナンバープレート部分の画像の座標変換を行い，ナンバープレートの正面画像を得る技術が開示されている。このようにナンバープレート部分を正面画像に変換してから認識処理を行うことで、認識処理の精度を向上させることができる。 For example, in Patent Document 1, coordinate conversion parameters are stored in advance according to the position of the license plate in the image, and if the position of the license plate in the image is detected, the coordinate conversion parameter corresponding to the position is used. A technique for obtaining a front image of a license plate by converting the coordinates of the image of the license plate portion is disclosed. By performing the recognition process after converting the license plate portion into the front image in this way, the accuracy of the recognition process can be improved.

また、特許文献２には、撮像装置に対する車両の走行方向に応じて、ナンバープレートを撮像した画像を回転させるとともに、拡大補正を行って、ナンバープレートの番号を認識する技術が開示されている。さらに、特許文献３には、ナンバープレートを撮像した画像から、プレート止め２つ及び最も右側の数字を含む４点を検出し、この４点を用いて座標変換を行って、正面から見たナンバープレートの画像を取得する技術が開示されている。 Further, Patent Document 2 discloses a technique for recognizing a license plate number by rotating an image captured by the license plate and performing enlargement correction according to the traveling direction of the vehicle with respect to the image pickup device. Further, in Patent Document 3, four points including two plate stoppers and the rightmost number are detected from the image obtained by capturing the license plate, coordinate conversion is performed using these four points, and the number viewed from the front. A technique for acquiring an image of a plate is disclosed.

また、特許文献４には、正面から撮像された画像ではない場合や樹木等により道路標識の一部が隠されて図形成分が欠落している場合に、ニューラルネットワークを利用して学習機能を付加し、図形成分の欠落、変形に対して正しく道路標識を抽出できるように学習を行わせる点に言及されている。 Further, Patent Document 4 adds a learning function by using a neural network when the image is not captured from the front or when a part of the road sign is hidden by a tree or the like and the graphic component is missing. However, it is mentioned that learning is performed so that road signs can be correctly extracted for missing or deformed graphic components.

特開平７−１１４６８８号公報Japanese Unexamined Patent Publication No. 7-114688 特許第４６７０７２１号公報Japanese Patent No. 4670721 特開２０１５−３２０８７号公報JP 2015-32087 特許第４７６２０２６号公報Japanese Patent No. 4762026

しかしながら、上記特許文献１のものは、車両の進入方向が限定されており、進入方向が変化する場合に適用することができない。また、上記特許文献２のものは、上記特許文献１と異なり複数の進入方向に対応することができるが、時間間隔を空けて撮影した２枚の画像が必要になるため、高速な処理に不適である。 However, the above-mentioned Patent Document 1 has a limited approach direction of the vehicle, and cannot be applied when the approach direction changes. Further, unlike the above-mentioned Patent Document 1, the above-mentioned Patent Document 2 can correspond to a plurality of approach directions, but is not suitable for high-speed processing because it requires two images taken at time intervals. Is.

さらに、特許文献３のものは、正面画像に変換するために必要となる４点が画像に含まれない場合には対応することができない。また、特許文献４には、道路標識が斜めに撮影されている場合や、樹木等により道路標識の一部が隠されて図形成分が欠落している場合に、図形成分の変形や欠落があっても正しく道路標識を抽出できるようニューラルネットワークに学習させる点が言及されているが、正面画像に補正するための情報がないため、抽出した道路標識画像を正面画像に変換することができない。 Further, the one of Patent Document 3 cannot deal with the case where the image does not include the four points required for converting into the front image. Further, in Patent Document 4, when the road sign is photographed at an angle, or when a part of the road sign is hidden by a tree or the like and the graphic component is missing, the graphic component is deformed or missing. However, it is mentioned that the neural network is trained so that the road sign can be extracted correctly, but the extracted road sign image cannot be converted into the front image because there is no information for correction in the front image.

これらのことから、上記特許文献１〜４に代表される従来技術は、それぞれ表示板の画像を正面画像に変換できない状況が存在するため、実環境を想定すると適用状況が限られる、或いは認識処理の精度が低下するという問題がある。 For these reasons, the prior arts represented by Patent Documents 1 to 4 have a situation in which the image of the display board cannot be converted into a front image, so that the application situation is limited or the recognition process is performed assuming an actual environment. There is a problem that the accuracy of

このため、撮像装置によって撮像されたナンバープレートを含む画像を正面画像に変換する場合に、いかにして様々な異なる状況に効率良く対応するかが重要な課題となっている。係る課題は、ナンバープレートを含む画像のみならず、道路標識を含む画像等の場合にも同様に生ずる課題である。 Therefore, when converting an image including a license plate imaged by an imaging device into a front image, how to efficiently deal with various different situations has become an important issue. Such a problem occurs not only in the case of an image including a license plate but also in the case of an image including a road sign and the like.

本発明は、上記従来技術の課題を解決するためになされたものであって、様々な異なる状況で撮像された画像を効率的かつ精度良く正面画像に変換することができる画像処理装置、画像処理システム及び画像処理方法を提供することを目的とする。 The present invention has been made to solve the above-mentioned problems of the prior art, and is an image processing apparatus and image processing capable of efficiently and accurately converting images captured in various different situations into front images. It is an object of the present invention to provide a system and an image processing method.

上記の課題を解決するため、本発明は、表示板を含み４つの基準点を有する参照画像から一部の基準点を含む領域を欠落させた学習用画像を生成する学習用画像生成部と、前記学習用画像と該学習用画像における４つの基準点を用いて教師有り学習を行った多層ニューラルネットワークと、表示板を含む入力画像を取得する入力画像受付部と、前記多層ニューラルネットワークに前記入力画像に基づく画像を投入し、前記基準点に対応する前記入力画像に関する４つの対応点に係る情報を特定する対応点特定部と、前記対応点特定部により特定された４つの対応点のうちの少なくとも３つの対応点に係る情報に基づく射影変換を行って、前記入力画像に対応する正面画像を生成する正面画像生成部とを備えたことを特徴とする。 To solve the above problems, the present invention includes a learning image generation unit for generating a learning image with missing a region including a portion of the reference point display panel from the reference image having the unrealized four reference points , The multi-layer neural network in which supervised learning was performed using the learning image and the four reference points in the learning image, the input image receiving unit for acquiring the input image including the display board, and the multi-layer neural network. put an image based on the input image, the four and the corresponding point specifying unit for specifying the information relating to corresponding points, four corresponding points identified by the corresponding point specifying section for said input image corresponding to the reference point It is characterized by including a front image generation unit that generates a front image corresponding to the input image by performing projection conversion based on information related to at least three corresponding points .

また、本発明は、上記の発明において、前記正面画像生成部により生成された正面画像に含まれる文字を文字認識する文字認識部をさらに備えたことを特徴とする。 Further, the present invention is characterized in that, in the above invention, a character recognition unit that recognizes characters included in the front image generated by the front image generation unit is further provided.

また、本発明は、上記の発明において、前記多層ニューラルネットワークは、所定の撮像装置により前記表示板を撮像した実画像の一部の情報を欠落させた第１の学習用画像を射影変換した学習用画像と、前記学習用画像における４つの基準点とを入力情報として教師有り学習を行うことを特徴とする。 The present invention, in the above invention, the multi-layer neural network, the first learning image image is missing some of the information of the real image obtained by imaging the display panel to projective transformation by a predetermined imaging apparatus and other studies習用image, and performing supervised learning and four reference points in the learning image as the input information.

また、本発明は、上記の発明において、前記表示板の種別を識別する種別識別部をさらに備えたことを特徴とする。 Further, the present invention is characterized in that, in the above invention, a type identification unit for identifying the type of the display board is further provided.

また、本発明は、上記の発明において、前記対応点特定部は、前記入力画像受付部により取得された入力画像から生成された濃淡画像である解析用画像を前記多層ニューラルネットワークに投入することを特徴とする。 Further, in the above invention, the corresponding point specifying unit inputs an analysis image, which is a grayscale image generated from an input image acquired by the input image receiving unit, into the multilayer neural network. It is a feature.

また、本発明は、上記の発明において、前記表示板は、車両の登録番号を示すナンバープレート又は道路標識であることを特徴とする。 Further, the present invention is characterized in that, in the above invention, the display board is a license plate or a road sign indicating a vehicle registration number.

また、本発明は、表示板を含む画像を処理する画像処理システムであって、前記表示板を含み４つの基準点を有する参照画像から一部の基準点を含む領域を欠落させた学習用画像を生成する学習用画像生成部と、前記学習用画像と該学習用画像における４つの基準点を用いて教師有り学習を行った多層ニューラルネットワークと、表示板を含む入力画像を取得する入力画像受付部と、前記多層ニューラルネットワークに前記入力画像に基づく画像を投入し、前記基準点に対応する前記入力画像に関する４つの対応点に係る情報を特定する対応点特定部と、前記対応点特定部により特定された４つの対応点のうちの少なくとも３つの対応点に係る情報に基づく射影変換を行って、前記入力画像に対応する正面画像を生成する正面画像生成部とを備えたことを特徴とする。 Further, the present invention provides an image processing system for processing an image including a display panel, for learning is missing a region including a portion of the reference point the panel from the reference image having the unrealized four reference points An image generation unit for learning that generates an image, a multi-layer neural network that performs supervised learning using the learning image and four reference points in the learning image, and an input that acquires an input image including a display board. An image receiving unit, a corresponding point specifying unit that inputs an image based on the input image to the multilayer neural network, and specifies information related to four corresponding points related to the input image corresponding to the reference point, and the corresponding point. It is provided with a front image generation unit that generates a front image corresponding to the input image by performing projection conversion based on information related to at least three correspondence points among the four correspondence points specified by the specific unit. It is a feature.

また、本発明は、表示板を含み４つの基準点を有する参照画像から一部の基準点を含む領域を欠落させた学習用画像を生成する学習用画像生成ステップと、前記学習用画像と該学習用画像における４つの基準点を用いて多層ニューラルネットワークに教師有り学習を実行させるステップと、表示板を含む入力画像を取得する入力画像取得ステップと、前記多層ニューラルネットワークに前記入力画像に基づく画像を投入し、前記基準点に対応する前記入力画像に関する４つの対応点に係る情報を特定する対応点特定ステップと、前記対応点特定ステップにより特定された４つの対応点のうちの少なくとも３つの対応点に係る情報に基づく射影変換を行って、前記入力画像に対応する正面画像を生成する正面画像生成ステップとを含んだことを特徴とする。 Further, the present invention includes a learning image generation step of generating a learning image with missing a region including a portion of the reference point display panel from the reference image having the unrealized four reference points, and the learning image a step of executing a supervised learning multi-layer neural network using the four reference points in the learning image, the input image acquiring step of acquiring an input image including a display panel, the input image into the multi-layer neural network A corresponding point specifying step for inputting an image based on the image and specifying information related to four corresponding points related to the input image corresponding to the reference point, and at least four corresponding points specified by the corresponding point specifying step. It is characterized by including a front image generation step of performing projection conversion based on information related to the three corresponding points to generate a front image corresponding to the input image .

本発明によれば、様々な異なる状況で撮像された表示板を含む画像から、表示板の正面画像を得ることが可能となる。 According to the present invention, it is possible to obtain a front image of a display board from an image including a display board captured in various different situations.

図１は、実施例１に係る画像処理装置の概念を説明するための説明図である。FIG. 1 is an explanatory diagram for explaining the concept of the image processing apparatus according to the first embodiment. 図２は、実施例１に係る画像処理システムのシステム構成を示す図である。FIG. 2 is a diagram showing a system configuration of the image processing system according to the first embodiment. 図３は、図２に示した画像処理装置の学習段階の構成を示す機能ブロック図である。FIG. 3 is a functional block diagram showing a configuration of a learning stage of the image processing apparatus shown in FIG. 図４は、学習用画像の作成の説明をするための図である。FIG. 4 is a diagram for explaining the creation of a learning image. 図５は、図２に示した画像処理装置の認識段階の構成を示す機能ブロック図である。FIG. 5 is a functional block diagram showing a configuration of a recognition stage of the image processing apparatus shown in FIG. 図６は、図２に示した画像処理装置の学習段階における処理手順を示すフローチャートである。FIG. 6 is a flowchart showing a processing procedure in the learning stage of the image processing apparatus shown in FIG. 図７は、図２に示した画像処理装置の認識段階における処理手順を示すフローチャートである。FIG. 7 is a flowchart showing a processing procedure in the recognition stage of the image processing apparatus shown in FIG. 図８は、図７に示したフローチャートで説明される画像の例である。FIG. 8 is an example of an image described by the flowchart shown in FIG. 図９は実施例２に係る画像処理装置の概念を説明するための説明図である。FIG. 9 is an explanatory diagram for explaining the concept of the image processing apparatus according to the second embodiment. 図１０は、実施例２に係る画像処理システムのシステム構成を示す図である。FIG. 10 is a diagram showing a system configuration of the image processing system according to the second embodiment. 図１１は、図１０に示した画像処理装置の学習段階における構成を示す機能ブロック図である。FIG. 11 is a functional block diagram showing a configuration of the image processing apparatus shown in FIG. 10 at the learning stage. 図１２は、図１０に示した画像処理装置の処理対象である標識の基準点について説明する図である。FIG. 12 is a diagram illustrating a reference point of a sign to be processed by the image processing apparatus shown in FIG. 図１３は、図１０に示した画像処理装置の認識段階における構成を示す機能ブロック図である。FIG. 13 is a functional block diagram showing a configuration of the image processing apparatus shown in FIG. 10 at the recognition stage. 図１４は、図１０に示した画像処理装置の認識段階における処理手順を示すフローチャートである。FIG. 14 is a flowchart showing a processing procedure in the recognition stage of the image processing apparatus shown in FIG. 図１５は、図１４に示したフローチャートで説明される画像の例である。FIG. 15 is an example of an image described by the flowchart shown in FIG.

以下、添付図面を参照して、本実施例１に係る画像処理装置、画像処理システム及び画像処理方法の実施例を説明する。本実施例１では、表示板の１つである、車両の登録番号を表示するナンバープレートの文字及び数字を読み取る場合を中心に説明することとする。 Hereinafter, examples of the image processing apparatus, the image processing system, and the image processing method according to the first embodiment will be described with reference to the attached drawings. In the first embodiment, the case of reading the characters and numbers of the license plate that displays the registration number of the vehicle, which is one of the display boards, will be mainly described.

＜実施例１に係る画像処理装置の概念＞
まず、本実施例１に係る画像処理装置の概念について説明する。図１は、本実施例１に係る画像処理装置の概念を説明するための説明図である。ここでは、学習に使用するナンバープレートを撮像した多階調（例えば、２５６階調の白黒画像や２４ビットカラー画像など）の画像（以下、「参照画像」と言う）と、この参照画像から生成された学習用画像がすでに取得されているものとする。 <Concept of image processing device according to Example 1>
First, the concept of the image processing apparatus according to the first embodiment will be described. FIG. 1 is an explanatory diagram for explaining the concept of the image processing apparatus according to the first embodiment. Here, a multi-gradation image (for example, a 256-gradation black-and-white image, a 24-bit color image, etc.) obtained by imaging the license plate used for learning (hereinafter referred to as a “reference image”) and a generation from this reference image. It is assumed that the created learning image has already been acquired.

本実施例１に係る画像処理装置３は、学習段階においてあらかじめ多層ニューラルネットワーク３６及びネットワークモデル更新処理部３７に教師有り学習を行わせ、多層ニューラルネットワーク３６のノード間を接続するシナプスの重み等をネットワークモデルパラメータＧとして記憶する。そして、認識段階では、ネットワークモデルパラメータＧを用いて多層ニューラルネットワーク３６を再構成し、この多層ニューラルネットワーク３６により画像の４隅の位置情報を用いて正面画像Ｆを生成する。その後、ナンバープレートの正面画像Ｆを文字認識して該ナンバープレートに含まれる文字及び数字を出力する。 The image processing device 3 according to the first embodiment causes the multi-layer neural network 36 and the network model update processing unit 37 to perform supervised learning in advance at the learning stage, and weights synapses and the like connecting the nodes of the multi-layer neural network 36. Store as network model parameter G. Then, in the recognition stage, the multi-layer neural network 36 is reconstructed using the network model parameter G, and the front image F is generated by the multi-layer neural network 36 using the position information of the four corners of the image. After that, the front image F of the license plate is recognized as characters, and the characters and numbers included in the license plate are output.

具体的には、本実施例１では、多層ニューラルネットワーク３６からネットワークモデルの更新を行う機能部（ネットワークモデル更新処理部３７）を分離している。あらかじめ教師有り学習を行う際に、このネットワークモデル更新処理部３７によって学習データ（シナプスの重み等）となるネットワークモデルパラメータＧを生成することとしている。このようにすることで、認証段階ではネットワークモデル更新処理部３７が不要になるとともに、ネットワークモデルパラメータＧを用いて異なる多数の多層ニューラルネットワーク３６を再構成することがでる。 Specifically, in the first embodiment, the functional unit (network model update processing unit 37) that updates the network model is separated from the multi-layer neural network 36. When performing supervised learning in advance, the network model update processing unit 37 generates a network model parameter G to be learning data (synapse weights, etc.). By doing so, the network model update processing unit 37 becomes unnecessary at the authentication stage, and a large number of different multi-layer neural networks 36 can be reconstructed using the network model parameter G.

ここで、多層ニューラルネットワーク３６は、ディープラーニング（深層学習）において用いられる多層化されたニューラルネットワークであり、本実施例では畳み込みニューラルネットワーク（Convolutional Neural Network）を用いている。この畳み込みニューラルネットワークは、順伝播型人工ニューラルネットワークの一種であり、個々のニューロンが視覚野と対応するような形で配置されている。この畳み込みネットワークは、生物学的処理に影響されたものであり、少ない事前処理で済むよう設計された多層パーセプトロンの一種である。なお、この畳み込みニューラルネットワーク自体は公知技術であるため、ここではその詳細な説明を省略する。 Here, the multi-layer neural network 36 is a multi-layered neural network used in deep learning (deep learning), and in this embodiment, a convolutional neural network is used. This convolutional neural network is a kind of forward propagation type artificial neural network, and individual neurons are arranged so as to correspond to the visual cortex. This convolutional network is biologically influenced and is a type of multi-layer perceptron designed for less pretreatment. Since this convolutional neural network itself is a known technique, detailed description thereof will be omitted here.

また、多層ニューラルネットワーク３６及びネットワークモデル更新処理部３７には、学習用画像Ｂ及び４つの基準点を用いて教師有り学習を行わせる。具体的には、ネットワークモデル更新処理部３７は、多層ニューラルネットワーク３６から出力された４つの対応点が、４つの基準点の位置座標と一致するようにネットワークモデルパラメータＧを更新（学習）する。更新されたネットワークモデルパラメータＧにより多層ニューラルネットワーク３６を再構成し、該多層ニューラルネットワーク３６から出力される４つの対応点を使って同様に学習を行う。係る学習処理を繰り返して、ネットワークモデルパラメータＧを固定する。例えば、誤差逆伝播法(Backpropagation)アルゴリズム等を用いてネットワークモデルパラメータＧを調整することができる。 Further, the multi-layer neural network 36 and the network model update processing unit 37 are made to perform supervised learning using the learning image B and the four reference points. Specifically, the network model update processing unit 37 updates (learns) the network model parameter G so that the four corresponding points output from the multi-layer neural network 36 match the position coordinates of the four reference points. The multi-layer neural network 36 is reconstructed by the updated network model parameter G, and learning is performed in the same manner using the four corresponding points output from the multi-layer neural network 36. The learning process is repeated to fix the network model parameter G. For example, the network model parameter G can be adjusted using an error backpropagation algorithm or the like.

図１（ａ）に示すように、学習段階においては、学習用画像Ｂを多層ニューラルネットワーク３６に入力し、教師データである４つの基準点といった学習データをネットワークモデル更新処理部３７に入力する（ステップＳ１１）。そして、これらの学習用画像Ｂ及び４つの基準点に基づいて多層ニューラルネットワーク３６に学習を行わせ、ネットワークモデル更新処理部３７によってネットワークモデルパラメータＧの更新処理を行う（ステップＳ１２）。なお、多層ニューラルネットワーク３６及びネットワークモデル更新処理部３７による教師有り学習自体は既存技術であるため、ここではその詳細な説明を省略する。 As shown in FIG. 1A, in the learning stage, the learning image B is input to the multilayer neural network 36, and learning data such as four reference points, which are teacher data, is input to the network model update processing unit 37 ( Step S11). Then, the multilayer neural network 36 is trained based on the learning image B and the four reference points, and the network model update processing unit 37 updates the network model parameter G (step S12). Since the supervised learning itself by the multi-layer neural network 36 and the network model update processing unit 37 is an existing technique, detailed description thereof will be omitted here.

学習用画像Ｂは、車両に付されたナンバープレートを撮像した実画像である参照画像から生成された画像データである。具体的には、この学習用画像Ｂは、撮像装置１によりナンバープレートを撮像した実画像である参照画像Ａ、ナンバープレートの一部が隠れていること等を想定し該参照画像Ａの一部の情報を欠落させた画像（第１の学習用画像）、車両と撮像装置１とのなす相対角度を想定し参照画像Ａを射影変換した画像（第２の学習用画像）又は参照画像Ａの一部の情報を欠落させた画像を射影変換した画像（第３の学習用画像）等である。また、学習用画像Ｂは、処理を効率化するためにナンバープレート及びその周囲のみが含まれるように切り出した画像とすることが望ましい。 The learning image B is image data generated from a reference image which is an image of a license plate attached to a vehicle. Specifically, the learning image B is a reference image A which is an actual image of the license plate imaged by the imaging device 1, a part of the reference image A on the assumption that a part of the license plate is hidden or the like. Image (first learning image), image obtained by projecting and transforming reference image A assuming the relative angle between the vehicle and the image pickup device 1 (second learning image) or reference image A. This is an image obtained by projecting and converting an image lacking some information (third learning image) or the like. Further, it is desirable that the learning image B is an image cut out so as to include only the license plate and its surroundings in order to improve the processing efficiency.

４つの基準点は、ネットワークモデルパラメータＧの更新及び多層ニューラルネットワーク３６に教師有り学習を行わせるための正解データ（教師データ）である。具体的には、学習用画像Ｂに含まれるナンバープレートの４隅の点の位置座標が基準点として指定される。 The four reference points are correct answer data (teacher data) for updating the network model parameter G and causing the multi-layer neural network 36 to perform supervised learning. Specifically, the position coordinates of the points at the four corners of the license plate included in the learning image B are designated as reference points.

このように、本実施例１に係る多層ニューラルネットワーク３６に学習用画像Ｂ及び４つの基準点を用いて学習を行わせる理由は、実環境下で、撮像方向の違いにより様々な歪みを持つ入力画像Ｃが入力された場合であっても、歪みのない正面画像Ｆを得るためである。この点を具体的に説明すると、撮像装置１で車両のナンバープレートを撮像する場合に、車両と撮像装置１とのなす相対角度によって、入力画像Ｃ中のナンバープレート部分に様々な歪みが生ずる。例えば、矩形形状のナンバープレートを撮像したのにもかかわらず、平行四辺形や台形状にナンバープレートが映り込む場合がある。また、他の車両や樹木等の存在によって、ナンバープレートの一部が隠れてしまう場合もある。その結果、３軸の射影変換に必要となる４つの基準点を特定できない状況が生じ得る。このため、あらかじめ様々な歪みを持つ学習用画像Ｂと、その正解データ（教師データ）となる４つの基準点とを入力して、多層ニューラルネットワーク３６に学習させるとともに、更新されたネットワークモデルパラメータＧを記憶している。なお、この４つの基準点は、操作者が表示部３２に表示させた学習用画像Ｂ上の点を指示入力することによって特定することができ、操作者により指定された４つの基準点を、画像と同様の射影変換により特定することもできる。 As described above, the reason why the multilayer neural network 36 according to the first embodiment is trained by using the training image B and the four reference points is that the input has various distortions due to the difference in the imaging direction in the actual environment. This is to obtain a front image F without distortion even when the image C is input. To explain this point concretely, when the license plate of the vehicle is imaged by the image pickup device 1, various distortions occur in the license plate portion in the input image C depending on the relative angle between the vehicle and the image pickup device 1. For example, even though a rectangular license plate is imaged, the license plate may be reflected in a parallelogram or trapezoidal shape. In addition, a part of the license plate may be hidden due to the presence of other vehicles or trees. As a result, a situation may occur in which the four reference points required for the three-axis projective transformation cannot be specified. Therefore, the learning image B having various distortions and the four reference points serving as the correct answer data (teacher data) are input in advance to train the multi-layer neural network 36, and the updated network model parameter G I remember. The four reference points can be specified by instructing and inputting points on the learning image B displayed on the display unit 32 by the operator, and the four reference points designated by the operator can be specified. It can also be specified by a projective transformation similar to an image.

次に、認識段階の処理について説明する。ここでは、前処理として、車両のナンバープレートを撮像した入力画像Ｃからナンバープレート及びその周辺部が含まれる認識用画像Ｄが生成されているものとする。なお、この前処理において、エッジ検出や平滑化処理などを行うこともできる。 Next, the processing at the recognition stage will be described. Here, as preprocessing, it is assumed that a recognition image D including the license plate and its peripheral portion is generated from the input image C obtained by capturing the license plate of the vehicle. In this preprocessing, edge detection, smoothing, and the like can also be performed.

図１（ｂ）に示すように、認識用画像Ｄから解析用画像Ｅを生成する。この解析用画像Ｅが多層ニューラルネットワーク３６に入力されると（ステップＳ２１）、多層ニューラルネットワーク３６は、４つの基準点にそれぞれ対応する４つの対応点の位置に係る情報を出力する（ステップＳ２２）。なお、この「対応点の位置に係る情報」とは、対応点の位置座標そのものであってもよいし、対応点を特定するための情報であってもよい。例えば、１つの対応点の位置座標と、この対応点から他の対応点へのベクトルであってもよい。 As shown in FIG. 1 (b), the analysis image E is generated from the recognition image D. When the analysis image E is input to the multilayer neural network 36 (step S21), the multilayer neural network 36 outputs information relating to the positions of the four corresponding points corresponding to the four reference points (step S22). .. The "information relating to the position of the corresponding point" may be the position coordinates of the corresponding point itself or may be information for specifying the corresponding point. For example, it may be the position coordinates of one corresponding point and a vector from this corresponding point to another corresponding point.

その後、４つの対応点を用いて認識用画像Ｄを射影変換し（ステップＳ２３）、ナンバープレートの正面画像Ｆを取得する。その後、正面画像Ｆに含まれる文字（数字を含む）を文字認識し（ステップＳ２４）、ナンバープレートに含まれる文字を出力する。なお、射影変換及び文字認識を行う際には、周知の技術が用いられる。 After that, the recognition image D is projected and transformed using the four corresponding points (step S23), and the front image F of the license plate is acquired. After that, the characters (including numbers) included in the front image F are recognized (step S24), and the characters included in the license plate are output. A well-known technique is used when performing projective conversion and character recognition.

上記の一連の処理を行うことにより、車両のナンバープレートを様々な角度で撮像した入力画像Ｃが入力された場合であっても、この入力画像Ｃを効率的に正面画像Ｆに変換し、精度良くナンバープレート内の文字を抽出することができる。特に、入力画像Ｃに含まれるナンバープレートに係る一部の情報が喪失され、通常の処理では４つの基準点全ての抽出が難しい場合であっても、この場合に備えた学習処理を行っているため、４つの基準点に対応する対応点を特定し、もって３軸の射影変換を行うことが可能となる。 By performing the above series of processes, even when the input image C obtained by capturing the license plate of the vehicle at various angles is input, the input image C is efficiently converted into the front image F and the accuracy is improved. The characters in the license plate can be extracted well. In particular, even if some information related to the license plate included in the input image C is lost and it is difficult to extract all four reference points by normal processing, learning processing is performed in preparation for this case. Therefore, it is possible to specify the corresponding points corresponding to the four reference points and perform the three-axis projective conversion.

＜画像処理システムのシステム構成＞
次に、実施例１に係る画像処理システムのシステム構成を説明する。図２は、実施例１に係る画像処理システムのシステム構成を示す図である。同図に示すように、この画像処理システムは、撮像装置１と画像処理装置３とをネットワーク２を介して接続した構成となる。 <System configuration of image processing system>
Next, the system configuration of the image processing system according to the first embodiment will be described. FIG. 2 is a diagram showing a system configuration of the image processing system according to the first embodiment. As shown in the figure, this image processing system has a configuration in which an image pickup device 1 and an image processing device 3 are connected via a network 2.

撮像装置１は、２５６階調の白黒画像を撮像できるＣＣＤカメラ等の撮像デバイスからなり、ネットワーク２を介して画像データを画像処理装置３に送信する。この撮像装置１は、道路を通行する車両のナンバープレートを撮像する監視カメラ等からなる。なお、ここでは撮像装置１が画像を撮像して画像処理装置３に送信する場合について説明するが、撮像装置１が動画を撮影して画像処理装置３に送信し、画像処理装置３において動画から画像を切り出す場合に適用することもできる。さらに、ここでは説明の便宜上、１台の撮像装置１のみを図示したが、複数の撮像装置１を設けることもできる。また、撮像装置１には、２５６階調の白黒画像ではなく、カラー画像を撮像できる撮像デバイスを用いることもできる。或いは、撮像装置１は可視光による画像ではなく赤外光などによる画像を撮像できる撮像デバイスを用いることもできる。 The image pickup device 1 comprises an image pickup device such as a CCD camera capable of capturing a black-and-white image of 256 gradations, and transmits image data to the image processing device 3 via a network 2. The imaging device 1 includes a surveillance camera or the like that images the license plates of vehicles passing on the road. Here, a case where the image pickup device 1 captures an image and transmits the image to the image processing device 3 will be described. However, the image pickup device 1 captures a moving image and transmits the moving image to the image processing device 3, and the image processing device 3 transmits the moving image. It can also be applied when cropping an image. Further, although only one imaging device 1 is shown here for convenience of explanation, a plurality of imaging devices 1 may be provided. Further, as the image pickup device 1, an image pickup device capable of capturing a color image instead of a black-and-white image of 256 gradations can also be used. Alternatively, the image pickup device 1 may use an image pickup device capable of capturing an image by infrared light or the like instead of an image by visible light.

ネットワーク２は、イーサネット（登録商標）等の有線ネットワークにより形成され、画像の受け渡しに利用される。このネットワーク２は、イーサネットに限定されるものではなく、単一の通信ケーブルを用いて直接ケーブル接続し、画像信号を伝送するものであってもよいし、無線ネットワークであってもよい。 The network 2 is formed by a wired network such as Ethernet (registered trademark) and is used for image transfer. The network 2 is not limited to Ethernet, and may be directly connected by a cable using a single communication cable to transmit an image signal, or may be a wireless network.

画像処理装置３は、撮像装置１から受信した入力画像Ｃ内のナンバープレートに含まれる文字を出力する装置である。具体的には、ネットワークモデルパラメータＧを用いて再構成された多層ニューラルネットワーク３６を用いて４つの対応点を特定し、この４つの対応点を用いて認識用画像Ｄの射影変換を行って正面画像Ｆを取得し、この正面画像Ｆを文字認識してナンバープレート内の文字を出力する。この画像処理装置３の詳細な説明については後述する。 The image processing device 3 is a device that outputs characters included in the license plate in the input image C received from the image pickup device 1. Specifically, four corresponding points are specified by using the multi-layer neural network 36 reconstructed using the network model parameter G, and the four corresponding points are used to perform the projective transformation of the recognition image D to perform the frontal surface. The image F is acquired, the front image F is recognized as a character, and the characters in the license plate are output. A detailed description of the image processing device 3 will be described later.

＜画像処理装置３の学習処理を行う場合の構成＞
次に、図２に示した画像処理装置３の学習処理を行う場合の構成について説明する。図３は、図２に示した画像処理装置３の学習処理を行う場合の構成を示す機能ブロック図である。同図に示すように、この画像処理装置３は、入力部３１、表示部３２、通信インターフェース部３３、記憶部３４及び学習処理制御部３５を有する。 <Structure when learning processing of the image processing device 3>
Next, a configuration in the case of performing the learning process of the image processing device 3 shown in FIG. 2 will be described. FIG. 3 is a functional block diagram showing a configuration when the learning process of the image processing device 3 shown in FIG. 2 is performed. As shown in the figure, the image processing device 3 includes an input unit 31, a display unit 32, a communication interface unit 33, a storage unit 34, and a learning processing control unit 35.

入力部３１は、キーボード及びマウス等の入力デバイスであり、表示部３２は、液晶パネル等の表示デバイスである。通信インターフェース部３３は、イーサネット通信等を利用して、撮像装置１により撮像された画像を受信するための通信デバイスである。 The input unit 31 is an input device such as a keyboard and a mouse, and the display unit 32 is a display device such as a liquid crystal panel. The communication interface unit 33 is a communication device for receiving an image captured by the image pickup device 1 by using Ethernet communication or the like.

記憶部３４は、フラッシュメモリ等の不揮発性メモリ又はハードディスクからなる記憶デバイスであり、参照画像Ａ、学習用画像Ｂ及びネットワークモデルパラメータＧなどを記憶する。この参照画像Ａは、多層ニューラルネットワーク３６の学習を行う際に用いる基本となる画像であり、少なくともナンバープレートの４つの基準点（左上、右上、左下、右下の４つの基準点）を特定し得るナンバープレートが正面を向いた正面画像である。 The storage unit 34 is a storage device composed of a non-volatile memory such as a flash memory or a hard disk, and stores a reference image A, a learning image B, a network model parameter G, and the like. This reference image A is a basic image used when learning the multi-layer neural network 36, and at least four reference points (upper left, upper right, lower left, and lower right) of the license plate are specified. It is a front image with the license plate to be obtained facing the front.

学習処理制御部３５は、多層ニューラルネットワーク３６及びネットワークモデル更新処理部３７を用いて学習処理を行い、ネットワークモデルパラメータＧを生成するよう制御する制御部であり、参照画像生成部３８、学習用画像生成部３９及び学習処理部４０を有する。 The learning processing control unit 35 is a control unit that controls the learning processing by using the multi-layer neural network 36 and the network model update processing unit 37 to generate the network model parameter G. The reference image generation unit 38 and the learning image It has a generation unit 39 and a learning processing unit 40.

参照画像生成部３８は、処理対象となるナンバープレートの正面画像（所定の大きさに正規化されているものとする）と、入力部３１より操作者から入力された４つの基準点とを関連づけた参照画像を生成する処理部である。なお、ここでは通信インターフェース部３３を介して入力された正面画像を用いて参照画像Ａを生成する場合について説明するが、ＵＳＢメモリなどの可搬記録媒体に記憶された正面画像を用いて参照画像Ａを生成することもできる。 The reference image generation unit 38 associates the front image of the license plate to be processed (assuming that it is normalized to a predetermined size) with the four reference points input from the operator from the input unit 31. This is a processing unit that generates a reference image. Here, a case where the reference image A is generated using the front image input via the communication interface unit 33 will be described, but the reference image will be generated using the front image stored in a portable recording medium such as a USB memory. It is also possible to generate A.

学習用画像生成部３９は、撮像装置１によって撮像される入力画像Ｃの態様を考慮して、参照画像Ａに対して様々な幾何変換等を行って学習用画像Ｂを生成し、生成した学習用画像Ｂを記憶部３４に格納する処理部である。なお、この学習用画像Ｂは、参照画像Ａに基づいて生成される学習用の画像であり、参照画像Ａに歪みを持たせるように射影変換を行った画像、ナンバープレートの一部の情報（例えば、右上部分）を喪失させた画像等が含まれる。 The learning image generation unit 39 generates the learning image B by performing various geometric transformations on the reference image A in consideration of the mode of the input image C captured by the image pickup device 1, and the generated learning. This is a processing unit that stores the image B for storage in the storage unit 34. The learning image B is a learning image generated based on the reference image A, and is an image that has undergone a projective transformation so as to give distortion to the reference image A, and a part of information on the license plate ( For example, an image or the like in which the upper right part) is lost is included.

具体的には、学習用画像生成部３９は、参照画像生成部３８から受け付けた参照画像Ａを幾何学的に変形させた歪みを有する一つ又は複数の学習用画像Ｂを生成する処理を行う。例えば、４つの基準点を用いて参照画像Ａ又は学習用画像Ｂの射影変換を行い、新たに学習用画像Ｂを生成する。複数の学習用画像Ｂを生成する場合には、参照画像Ａをそれぞれ異なる態様に幾何変換を行う。その際、４つの基準点も変換された位置に存在するので学習用画像Ｂに含ませておく。ただし、この学習用画像生成部３９は、操作者により手作業で変形された学習用画像Ｂ又は操作者が幾何変換の画像処理を参照画像Ａに適用して生成した学習用画像Ｂを受け付けることもできる。 Specifically, the learning image generation unit 39 performs a process of generating one or a plurality of learning images B having a distortion obtained by geometrically deforming the reference image A received from the reference image generation unit 38. .. For example, the reference image A or the learning image B is projected and transformed using the four reference points to newly generate the learning image B. When a plurality of learning images B are generated, the reference images A are geometrically transformed into different modes. At that time, since the four reference points also exist at the converted positions, they are included in the learning image B. However, the learning image generation unit 39 accepts the learning image B manually deformed by the operator or the learning image B generated by the operator applying the image processing of geometric transformation to the reference image A. You can also.

操作者が入力部３１を用いて表示部３２に表示された学習用画像Ｂ上を指示する場合には、指示された位置座標が基準点となる。ここでは、ナンバープレートの四隅の点（左上、右上、左下、右下）の４つの点の指示を受け付け、この４つの点を基準点とするものとする。なお、テンプレートマッチング等の画像処理技術を用いて、ナンバープレートの四隅の点（左上、右上、左下、右下）を自動的に検出することもできる。画像の一部が喪失しており、ナンバープレートの四隅の点の一部（例えば、右上）が自動的に検出できない場合に、操作者が基準点を追加指示することもできる。具体的な学習用画像Ｂについては後述する。 When the operator uses the input unit 31 to instruct on the learning image B displayed on the display unit 32, the instructed position coordinates serve as a reference point. Here, it is assumed that the instructions of the four points (upper left, upper right, lower left, lower right) of the four corners of the license plate are accepted, and these four points are used as reference points. It is also possible to automatically detect the four corner points (upper left, upper right, lower left, lower right) of the license plate by using an image processing technique such as template matching. If a part of the image is lost and some of the points at the four corners of the license plate (for example, the upper right) cannot be automatically detected, the operator can additionally instruct the reference point. The specific learning image B will be described later.

学習処理部４０は、画像処理装置３の学習に係る部分の処理を行う処理部であり、対応点学習受付部４１、多層ニューラルネットワーク３６及びネットワークモデル更新処理部３７を有する。なお、実際には、係る学習処理部４０に対応するプログラム及びデータをＣＰＵにロードして実行し、対応点学習受付部４１、多層ニューラルネットワーク３６及びネットワークモデル更新処理部３７の機能を実行することになる。そして、この学習処理部４０による学習に伴って、記憶部３４のネットワークモデルパラメータＧが更新される。 The learning processing unit 40 is a processing unit that processes a portion related to learning of the image processing device 3, and has a corresponding point learning reception unit 41, a multi-layer neural network 36, and a network model update processing unit 37. Actually, the program and data corresponding to the learning processing unit 40 are loaded into the CPU and executed, and the functions of the corresponding point learning reception unit 41, the multi-layer neural network 36, and the network model update processing unit 37 are executed. become. Then, the network model parameter G of the storage unit 34 is updated with the learning by the learning processing unit 40.

対応点学習受付部４１は、学習対象となる学習用画像Ｂを受け付け、この学習用画像Ｂから解析用画像Ｅを生成して多層ニューラルネットワーク３６に出力するとともに、該学習用画像Ｂの４つの基準点をネットワークモデル更新処理部３７に出力する処理部である。この解析用画像Ｅは、２５６階調の白黒画像等の濃淡画像又はＲＧＢの輝度画像でもよいし、微分画像やＨＳＶなどの画像データであってもよい。 The correspondence point learning reception unit 41 receives the learning image B to be learned, generates the analysis image E from the learning image B, outputs it to the multilayer neural network 36, and has four learning images B. This is a processing unit that outputs a reference point to the network model update processing unit 37. The analysis image E may be a shade image such as a black-and-white image having 256 gradations or an RGB brightness image, or may be image data such as a differential image or HSV.

多層ニューラルネットワーク３６は、ネットワークモデルパラメータＧによりノード間のシナプスの重みが再構成されつつ、入力される解析用画像Ｅの入力に基づいて動作し４つの対応点を出力する。 The multi-layer neural network 36 operates based on the input of the input analysis image E while reconstructing the synaptic weights between the nodes by the network model parameter G, and outputs four corresponding points.

ネットワークモデル更新処理部３７は、多層ニューラルネットワーク３６から出力された４つの対応点と対応点学習受付部４１から入力された４つの基準点との差を算出し、差が少なくなるようにネットワークモデルパラメータＧを更新する処理部である。更新されたネットワークモデルパラメータＧは、多層ニューラルネットワーク３６に適用されて学習が繰り返される。 The network model update processing unit 37 calculates the difference between the four corresponding points output from the multi-layer neural network 36 and the four reference points input from the corresponding point learning reception unit 41, and the network model so that the difference is small. This is a processing unit that updates the parameter G. The updated network model parameter G is applied to the multi-layer neural network 36, and learning is repeated.

＜学習用画像Ｂの一例＞
次に、学習用画像Ｂの一例として４種類の学習用画像について説明する。図４は、学習用画像Ｂの一例を示す図である。図４（ａ）に示すように、ナンバープレートを撮像した実画像である参照画像Ａには４つの基準点Ｐ１〜Ｐ４が含まれており、学習用画像Ｂとして使用する。図４（ｂ）に示すように、この参照画像Ａの一部の情報を欠落させた画像が学習用画像Ｂ(第１の学習用画像)となる。なお、図４では個別の車両の特定を防ぐために事業用判別番号であるかな文字を表示していないが、実際の処理ではかな文字を含んだまま処理すればよい。 <Example of learning image B>
Next, four types of learning images will be described as an example of the learning image B. FIG. 4 is a diagram showing an example of the learning image B. As shown in FIG. 4A, the reference image A, which is an actual image obtained by capturing the license plate, includes four reference points P1 to P4 and is used as a learning image B. As shown in FIG. 4B, an image lacking some information of the reference image A becomes a learning image B (first learning image). In FIG. 4, the kana characters, which are the business identification numbers, are not displayed in order to prevent the identification of individual vehicles, but in the actual processing, the kana characters may be included in the processing.

また、図４（ｃ）に示すように、参照画像Ａの４つの基準点Ｐ１〜Ｐ４を利用して射影変換した画像（図中に破線の矢印で示した射影変換後の画像）を学習用画像Ｂ（第２の学習用画像）とする。参照画像Ａの一部の情報を欠落させた学習用画像Ｂ(第１の学習用画像)を射影変換したものを新たな学習用画像Ｂ（第３の学習用画像）とすることもできる。これらの参照画像Ａ及び第１〜３の学習用画像のうち、１種類の画像を学習用画像Ｂとしてもよいし、複数種類の画像を学習用画像Ｂとしてもよい。なお、図示省略したが変換後の基準点Ｐ１’〜Ｐ４’の位置座標を学習用画像Ｂに含ませておけば、学習を自動的に進めることができる。 Further, as shown in FIG. 4 (c), an image obtained by projective conversion using the four reference points P1 to P4 of the reference image A (image after projective conversion indicated by a broken line arrow in the figure) is used for learning. Let it be image B (second learning image). A new learning image B (third learning image) can be obtained by projecting and transforming the learning image B (first learning image) in which some information of the reference image A is missing. Of these reference images A and the first to third learning images, one type of image may be used as the learning image B, or a plurality of types of images may be used as the learning image B. Although not shown, if the position coordinates of the converted reference points P1'to P4' are included in the learning image B, the learning can be automatically advanced.

＜画像処理装置３の認識処理を行う場合の構成＞
次に、図２に示した画像処理装置３が認識処理を行う場合の構成について説明する。図５は、図２に示した画像処理装置３が認識処理を行う場合の機能ブロック図である。 <Configuration when performing recognition processing of the image processing device 3>
Next, a configuration when the image processing device 3 shown in FIG. 2 performs recognition processing will be described. FIG. 5 is a functional block diagram when the image processing device 3 shown in FIG. 2 performs recognition processing.

認識処理制御部４４は、学習済のネットワークモデルパラメータＧを適用した多層ニューラルネットワーク３６を用いて、認識用画像Ｄの４つの基準点に対応する４つの対応点を特定して正面画像Ｆを生成し、この正面画像Ｆを基に文字認識をする制御部である。認識処理制御部４４は、外部の撮像装置１等から通信インターフェース部３３を経由して、背景を含んだ入力画像Ｃを受け付ける入力画像受付部４５と、多層ニューラルネットワーク３６を含む認識処理部４６とから構成される。図５において、図３と符号が同じものについては、図３の説明と重複するので説明を省略する。 The recognition processing control unit 44 uses the multi-layer neural network 36 to which the trained network model parameter G is applied to identify four corresponding points corresponding to the four reference points of the recognition image D and generate a front image F. It is a control unit that recognizes characters based on the front image F. The recognition processing control unit 44 includes an input image reception unit 45 that receives an input image C including a background from an external image pickup device 1 or the like via a communication interface unit 33, and a recognition processing unit 46 that includes a multilayer neural network 36. Consists of. In FIG. 5, the same reference numerals as those in FIG. 3 are omitted because they overlap with the description of FIG.

入力画像受付部４５は、入力画像Ｃを受け付けて、ナンバープレート部分の切り出し、画像に対してノイズ除去及び所定のサイズに揃える拡縮処理等の前処理を行って認識用画像Ｄを生成する処理を行う。ここで、入力画像Ｃは、撮像装置１によって撮像された処理対象となるナンバープレートの画像である。この入力画像Ｃにはナンバープレートのみならず、車体の一部も含まれている。また、認識用画像Ｄと図３で示した学習用画像Ｂと、認識用画像Ｄとは、学習用画像Ｂが４つの基準点の情報を有しているのに対して、認識用画像Ｄは基準点若しくはその対応点の情報を持っていない点で異なっている。 The input image receiving unit 45 receives the input image C and performs preprocessing such as cutting out the license plate portion, removing noise from the image, and scaling processing to align the image to a predetermined size to generate the recognition image D. Do. Here, the input image C is an image of the license plate to be processed, which is captured by the imaging device 1. This input image C includes not only the license plate but also a part of the vehicle body. Further, in the recognition image D, the learning image B shown in FIG. 3, and the recognition image D, the learning image B has information on four reference points, whereas the recognition image D Is different in that it does not have information on the reference point or its corresponding point.

認識処理部４６は、学習済みのネットワークモデルパラメータＧを適用した多層ニューラルネットワーク３６を用いて認識用画像Ｄのナンバープレート画像の４つの対応点を出力し、出力された４つの対応点を基に認識用画像Ｄを射影変換し、正面画像Ｆを生成し、文字認識を行ってナンバープレートの文字（数字を含む）を出力する処理部である。対応点特定部４７、多層ニューラルネットワーク３６、正面画像生成部４８及び文字認識部４９を有する。 The recognition processing unit 46 outputs four corresponding points of the license plate image of the recognition image D using the multi-layer neural network 36 to which the learned network model parameter G is applied, and based on the output four corresponding points. This is a processing unit that performs projective conversion of the recognition image D, generates a front image F, performs character recognition, and outputs the characters (including numbers) of the license plate. It has a corresponding point identification unit 47, a multi-layer neural network 36, a front image generation unit 48, and a character recognition unit 49.

対応点特定部４７は、認識用画像Ｄを受け付けて解析用画像Ｅを生成するとともに、認識用画像Ｄを正面画像生成部４８に出力する処理を行う。解析用画像Ｅは、グレー画像やＲＧＢの輝度画素でもよいし、微分画像やＨＳＶなどの画像であってもよい。対応点特定部４７と対応点学習受付部４１とは、対応点特定部４７が認識用画像Ｄを正面画像生成部４８に出力する点で異なっている。 The corresponding point specifying unit 47 receives the recognition image D, generates the analysis image E, and outputs the recognition image D to the front image generation unit 48. The image E for analysis may be a gray image or RGB luminance pixels, or may be a differential image or an image such as HSV. The corresponding point specifying unit 47 and the corresponding point learning reception unit 41 are different in that the corresponding point specifying unit 47 outputs the recognition image D to the front image generation unit 48.

正面画像生成部４８は、多層ニューラルネットワーク３６から出力された４つの対応点を用いて認識用画像Ｄを射影変換して正面画像Ｆを生成する処理を行う。ここで、この射影変換は、ある平面を別の平面に射影する際に用いられる変換であり、斜めから見た認識用画像Ｄを正面画像Ｆに変換する場合に用いられる。なお、この射影変換は周知技術であるため、ここではその詳細な説明を省略する。 The front image generation unit 48 performs a process of projecting and transforming the recognition image D using the four corresponding points output from the multilayer neural network 36 to generate the front image F. Here, this projective transformation is a transformation used when projecting a certain plane onto another plane, and is used when converting a recognition image D viewed from an angle into a front image F. Since this projective transformation is a well-known technique, detailed description thereof will be omitted here.

文字認識部４９は、正面画像Ｆのナンバープレート部分に含まれる文字（数字を含む）を文字認識して出力する処理を行う。例えば、正規化、特徴抽出、マッチング、知識処理によって一般的な文字認識を行うことができるが、この文字認識については周知技術であるため、ここではその詳細な説明を省略する。 The character recognition unit 49 performs a process of character recognition and output of characters (including numbers) included in the license plate portion of the front image F. For example, general character recognition can be performed by normalization, feature extraction, matching, and knowledge processing, but since this character recognition is a well-known technique, detailed description thereof will be omitted here.

認識処理時の記憶部５０は、学習時の記憶部３４と同じデバイスで構成されており、入力画像Ｃを一時的に記憶するとともに、学習済のネットワークモデルパラメータＧ及び、文字認識部４９が文字認識時に使用する文字認識用辞書Ｈを記憶している。 The storage unit 50 at the time of recognition processing is composed of the same device as the storage unit 34 at the time of learning, temporarily stores the input image C, and the learned network model parameter G and the character recognition unit 49 are characters. The character recognition dictionary H used at the time of recognition is stored.

＜学習処理手順＞
次に、図２に示した画像処理装置３の学習段階における処理手順について説明する。図６は、図２に示した画像処理装置３の学習段階における処理手順を示すフローチャートである。なお、ここでは記憶部３４に参照画像Ａがすでに記憶されているものとする。 <Learning process procedure>
Next, the processing procedure in the learning stage of the image processing apparatus 3 shown in FIG. 2 will be described. FIG. 6 is a flowchart showing a processing procedure in the learning stage of the image processing apparatus 3 shown in FIG. Here, it is assumed that the reference image A is already stored in the storage unit 34.

図６に示すように、画像処理装置３は、参照画像Ａを含む学習用画像Ｂ中の４つの基準点の指定を受け付ける（ステップＳ１０１）。この４つの基準点は、多層ニューラルネットワーク３６に教師有り学習を行わせる際の教師データとなる。 As shown in FIG. 6, the image processing device 3 accepts the designation of four reference points in the learning image B including the reference image A (step S101). These four reference points serve as teacher data when the multi-layer neural network 36 is made to perform supervised learning.

その後、参照画像Ａから学習用画像Ｂを生成する（ステップＳ１０２）。具体的には、参照画像Ａを幾何変換し、参照画像Ａを幾何学的に変形させた歪みを有する学習用画像Ｂを生成する。この学習用画像Ｂには、ナンバープレートを正面から撮像した参照画像Ａ、該参照画像Ａの一部の情報を欠落させた画像、参照画像Ａを射影変換した画像及び参照画像Ａの一部の情報を欠落させた画像を射影変換した画像等が含まれる。また、参照画像Ａに対する基準点座標に対しても同様の幾何学的な変形を加え、学習用画像Ｂ中の新たな基準点座標を求める。なお、参照画像Ａ中で基準点の付近の画素情報を欠落させた場合も、学習用画像Ｂ中での基準点の位置座標に画像と同様の幾何学的な変換によって変換された位置座標を用いれば、学習用画像Ｂの生成を自動化することができる。なお、学習用画像Ｂは画像データと４つの基準点の位置座標情報（以下、基準点情報）の両方を有する。 After that, the learning image B is generated from the reference image A (step S102). Specifically, the reference image A is geometrically transformed to generate a learning image B having a distortion obtained by geometrically deforming the reference image A. The learning image B includes a reference image A obtained by capturing the number plate from the front, an image in which some information of the reference image A is omitted, an image obtained by projecting and converting the reference image A, and a part of the reference image A. An image obtained by projecting and converting an image lacking information is included. Further, the same geometrical deformation is applied to the reference point coordinates with respect to the reference image A to obtain new reference point coordinates in the learning image B. Even if the pixel information near the reference point is omitted in the reference image A, the position coordinates converted by the same geometric transformation as the image are used as the position coordinates of the reference point in the learning image B. If it is used, the generation of the learning image B can be automated. The learning image B has both image data and position coordinate information of four reference points (hereinafter, reference point information).

次に学習用画像Ｂから、２５６階調の白黒画像又はＲＧＢの輝度画像、若しくは微分画像やＨＳＶなどの画像データに変換することにより解析用画像Ｅを生成する(ステップＳ１０３)。ただし、本ステップでは学習用画像Ｂ中での画像データと基準点情報の関係が保持できなくなる幾何変換は行わない。 Next, an analysis image E is generated by converting the learning image B into a black-and-white image of 256 gradations, an RGB brightness image, or image data such as a differential image or HSV (step S103). However, in this step, the geometric transformation that makes it impossible to maintain the relationship between the image data and the reference point information in the learning image B is not performed.

次に、ネットワークモデルパラメータＧの初期値、複数の解析用画像Ｅと各々の解析用画像Ｅに対応する基準点情報を多層ニューラルネットワーク３６に入力し（ステップＳ１０４）、多層ニューラルネットワーク３６にて４つの対応点を算出する（ステップＳ１０５）。 Next, the initial value of the network model parameter G, the plurality of analysis images E, and the reference point information corresponding to each analysis image E are input to the multi-layer neural network 36 (step S104), and the multi-layer neural network 36 4 Two corresponding points are calculated (step S105).

次に、算出された対応点と対応する基準点との差を算出し（ステップＳ１０６）、差が所定値以下であれば（ステップＳ１０７；Ｎｏ）、その時のネットワークモデルパラメータＧを更新完了したものとして記憶部３４に記憶しておく。 Next, the difference between the calculated corresponding point and the corresponding reference point is calculated (step S106), and if the difference is equal to or less than a predetermined value (step S107; No), the network model parameter G at that time has been updated. Is stored in the storage unit 34.

ステップＳ１０７で、差が所定値を超える場合（ステップＳ１０７；Ｙｅｓ）の場合に、繰り返し回数が所定値以上になっていれば(ステップＳ１０８；Ｎｏ)、終了する。この場合は異常となる。一方、繰り返し回数が所定値未満であれば(ステップＳ１０８；Ｙｅｓ)、差が近くなるようにネットワークモデルパラメータＧを更新し（ステップＳ１０９）、更新されたネットワークモデルパラメータＧを適用した多層ニューラルネットワーク３６を使ってステップＳ１０５以降を繰り返す。 In step S107, when the difference exceeds a predetermined value (step S107; Yes), and if the number of repetitions is equal to or greater than the predetermined value (step S108; No), the process ends. In this case, it becomes abnormal. On the other hand, if the number of repetitions is less than a predetermined value (step S108; Yes), the network model parameter G is updated so that the difference is close (step S109), and the multi-layer neural network 36 to which the updated network model parameter G is applied is applied. Step S105 and subsequent steps are repeated using.

＜認識処理手順＞
次に、図２に示した画像処理装置３の認識段階における処理手順について説明する。図７は、図２に示した画像処理装置３の認識段階における処理手順を示すフローチャートである。また、図８（ａ）から図８（ｄ）を参照しつつ説明をする。 <Recognition processing procedure>
Next, the processing procedure in the recognition stage of the image processing apparatus 3 shown in FIG. 2 will be described. FIG. 7 is a flowchart showing a processing procedure in the recognition stage of the image processing apparatus 3 shown in FIG. Further, the description will be given with reference to FIGS. 8 (a) to 8 (d).

図７に示すように、画像処理装置３は、入力画像Ｃ（図８（ａ））を受け付け(ステップＳ２０１)、ナンバープレートを含む入力画像Ｃの前処理を行って認識用画像Ｄを生成する（ステップＳ２０２）。そして、２５６階調の白黒画像やＲＧＢの輝度画像等に変換することにより多層ニューラルネットワーク３６へ入力する画像である解析用画像Ｅを生成する(ステップＳ２０３)。 As shown in FIG. 7, the image processing device 3 receives the input image C (FIG. 8A) (step S201), preprocesses the input image C including the license plate, and generates the recognition image D. (Step S202). Then, an analysis image E, which is an image to be input to the multilayer neural network 36, is generated by converting the image into a black-and-white image having 256 gradations, an RGB luminance image, or the like (step S203).

学習済のネットワークモデルパラメータＧを記憶部５０から読出し、多層ニューラルネットワーク３６に入力することによって多層ニューラルネットワーク３６を構成させ、これに解析用画像Ｅを入力する（ステップＳ２０４）。多層ニューラルネットワーク３６にて対応点を算出する（ステップＳ２０５）。ここで、図８（ｂ）に示すように対応点が付与された認識用画像Ｄが得られることになる。 The learned network model parameter G is read from the storage unit 50 and input to the multi-layer neural network 36 to form the multi-layer neural network 36, and the analysis image E is input to this (step S204). Corresponding points are calculated by the multi-layer neural network 36 (step S205). Here, as shown in FIG. 8B, the recognition image D to which the corresponding points are added can be obtained.

そして、この４つの対応点を用いて認識用画像Ｄの射影変換を行い（ステップＳ２０６）、図８（ｃ）に示すような正面画像Ｆを取得する。そして、この正面画像Ｆに対して文字認識処理を行って、図８（ｄ）に示すようなナンバープレートの文字を取得して出力する（ステップＳ２０７）。 Then, the recognition image D is projected and transformed using these four corresponding points (step S206), and the front image F as shown in FIG. 8C is acquired. Then, character recognition processing is performed on the front image F to acquire and output the characters of the license plate as shown in FIG. 8 (d) (step S207).

上述してきたように、本実施例１では、学習用画像Ｂと基準点情報を用いて教師有りの学習を行うことによってネットワークモデルパラメータＧの更新を行った多層ニューラルネットワーク３６に、入力画像Ｃを前処理した認識用画像Ｄから所定の処理を行った解析用画像Ｅを入力して４つの基準点に対応する認識用画像Ｄ上の４つの対応点を特定する。この４つの対応点を用いて認識用画像Ｄを射影変換して正面画像Ｆを取得し、この正面画像Ｆに文字認識を行ってナンバープレートの文字を出力するよう構成している。これにより、撮像距離や角度等の撮像条件が様々に異なる状況で撮像された入力画像Ｃを効率的かつ精度良く正面画像Ｆに変換し、ナンバープレートの文字認識結果を出力することができる。従って、画像に含まれる表示板の文字の認識を効率良く行うことができる。 As described above, in the first embodiment, the input image C is applied to the multi-layer neural network 36 in which the network model parameter G is updated by performing supervised learning using the learning image B and the reference point information. The analysis image E that has undergone the predetermined processing is input from the preprocessed recognition image D, and the four corresponding points on the recognition image D corresponding to the four reference points are specified. Using these four corresponding points, the recognition image D is projected and transformed to obtain the front image F, and the front image F is subjected to character recognition to output the characters on the license plate. As a result, the input image C captured under various conditions such as the imaging distance and the angle can be efficiently and accurately converted into the front image F, and the character recognition result of the license plate can be output. Therefore, it is possible to efficiently recognize the characters on the display board included in the image.

なお、本実施例１では、コンピュータである画像処理装置を用いる場合を示したが、本発明はこれに限定されるものではなく、複数台のコンピュータで分散コンピューティングを行う場合に適用することもできる。また、クラウド上で処理を行う場合に適用することもできる。 In the first embodiment, a case where an image processing device which is a computer is used has been shown, but the present invention is not limited to this, and may be applied to a case where distributed computing is performed by a plurality of computers. it can. It can also be applied when processing is performed on the cloud.

ところで、上記の実施例１では、本発明をナンバープレートの文字認識に用いる場合を示したが、本発明はこれに限定されるものではなく、車両に搭載された画像処理装置によって、道路標識等の表示板に含まれる文字の文字認識を行う場合に適用することもできる。道路標識等の表示板とは、具体的には、案内標識、警戒標識、規制標識、指示標識、補助標識等のことで、特に文字が表示されている道路標識を処理対象とする。そこで、本実施例２では、車両に搭載された画像処理装置によって道路標識等に含まれる文字の文字認識を行う場合を示すこととする。なお、実施例１と同様の部分については、その詳細な説明を省略する。 By the way, in the above-described first embodiment, the case where the present invention is used for character recognition of a license plate is shown, but the present invention is not limited to this, and a road sign or the like is used by an image processing device mounted on a vehicle. It can also be applied when performing character recognition of characters included in the display board of. The display board such as a road sign is specifically a guide sign, a warning sign, a regulation sign, an instruction sign, an auxiliary sign, etc., and in particular, a road sign on which characters are displayed is targeted for processing. Therefore, in the second embodiment, the case where the character recognition of the characters included in the road sign or the like is performed by the image processing device mounted on the vehicle will be shown. A detailed description of the same parts as in the first embodiment will be omitted.

＜実施例２に係る画像処理装置の概念＞
まず、本実施例２に係る画像処理装置の概念について説明する。図９は、本実施例２に係る画像処理装置の概念を説明するための説明図である。ここでは、実施例１で説明した教師有り学習の学習対象、即ち参照画像Ａがナンバープレートの１種類だったものが、複数の形状の種類を有する道路標識を処理対象としている点が実施例１とは異なっている。実施例２では参照画像Ａの種別識別後の学習処理が、実施例１で説明した学習処理部４０と同様なのでその詳細の多くは省略する。実施例２では、参照画像Ａの種別ごとに学習がなされ、ネットワークモデルパラメータＧは参照画像Ａの種別ごとに作成される。ここで言う参照画像Ａの種別とは、矩形状のナンバープレート、円状或いは、多角形状などと言った道路標識等の形状の種別である。 <Concept of image processing device according to Example 2>
First, the concept of the image processing apparatus according to the second embodiment will be described. FIG. 9 is an explanatory diagram for explaining the concept of the image processing apparatus according to the second embodiment. Here, the learning target of the supervised learning described in the first embodiment, that is, the reference image A which is one type of license plate, is the processing target of the road sign having a plurality of shape types. Is different. In the second embodiment, the learning process after the type identification of the reference image A is the same as the learning process unit 40 described in the first embodiment, and therefore most of the details are omitted. In the second embodiment, learning is performed for each type of the reference image A, and the network model parameter G is created for each type of the reference image A. The type of the reference image A referred to here is a type of a shape such as a rectangular license plate, a circular shape, a polygonal shape, or the like.

本実施例２に係る画像処理装置４は、学習段階においてあらかじめ多層ニューラルネットワーク３６に学習を行わせておき、認識段階において道路標識の形状の種別に応じて学習済みのネットワークモデルパラメータＧで再構成された多層ニューラルネットワーク３６を用いて４つの対応点を取得する。そして、この４つの対応点を用いて射影変換した正面画像Ｆで文字認識を行う。この多層ニューラルネットワーク３６は、実施例１で使用しているのと同じディープラーニング（深層学習）に用いられる畳み込みニューラルネットワークである。 The image processing device 4 according to the second embodiment has the multi-layer neural network 36 trained in advance in the learning stage, and is reconfigured with the trained network model parameter G according to the type of the shape of the road sign in the recognition stage. Four corresponding points are acquired using the multi-layer neural network 36. Then, character recognition is performed on the front image F that has been projected and transformed using these four corresponding points. This multi-layer neural network 36 is a convolutional neural network used for the same deep learning (deep learning) used in the first embodiment.

図９（ａ）に示すように、学習段階においては、多層ニューラルネットワーク３６に対して、学習用画像Ｂ及び４つの基準点といった学習データを入力し（ステップＳ３１）、これらの学習用画像Ｂ及び４つの基準点に基づき多層ニューラルネットワーク３６に教師有り学習を行わせ、ネットワークモデル更新処理部３７によりネットワークモデルパラメータＧを更新する（ステップＳ３２）。この学習を道路標識の形状の種別ごとに行い、ネットワークモデルパラメータＧを対象物の種別ごとに生成する（ステップＳ３３）。 As shown in FIG. 9A, in the learning stage, learning data such as a learning image B and four reference points are input to the multi-layer neural network 36 (step S31), and these learning images B and The multilayer neural network 36 is made to perform supervised learning based on the four reference points, and the network model parameter G is updated by the network model update processing unit 37 (step S32). This learning is performed for each type of road sign shape, and network model parameter G is generated for each type of object (step S33).

学習用画像Ｂは、道路標識を撮像した参照画像Ａから生成された画像データである。具体的には、周知の画像処理技術を用いて参照画像を幾何学的に変形させた歪を有する画像データを生成して、生成した画像データを学習用画像Ｂとする。概念は実施例１での説明と同じであり省略する。 The learning image B is image data generated from the reference image A obtained by capturing the road sign. Specifically, image data having distortion obtained by geometrically deforming a reference image is generated by using a well-known image processing technique, and the generated image data is used as a learning image B. The concept is the same as that described in the first embodiment and will be omitted.

４つの基準点は、多層ニューラルネットワーク３６に教師有り学習を行わせるための正解データ（教師データ）である。具体的には、学習用画像Ｂに含まれる対象物を特定する４点の位置座標が基準点として指定される。この教師有り学習は道路標識の形状の種別に応じてそれぞれ行い、ネットワークモデルパラメータＧを種別毎に用意する。 The four reference points are correct answer data (teacher data) for causing the multi-layer neural network 36 to perform supervised learning. Specifically, the position coordinates of four points that specify the object included in the learning image B are designated as reference points. This supervised learning is performed according to the type of road sign shape, and the network model parameter G is prepared for each type.

実施例２では、４つの基準点に関して、手入力で指示する点と真の基準点が必ずしも一致している必要はない。丸い形状や三角形状の道路標識では、ナンバープレートの４隅に相当する４点を目視で入力することが困難なので、手入力で指定する点としてはより特徴的な点を設定しておいて、基準点となる４点を多層ニューラルネットワーク３６により求める。例えば、手入力では三角形の３つの頂点を指示しておき、指示された３点から計算によって三角形の外接矩形の４隅を求め、その４隅を基準点とする、という処理を行う。
これにより、基準点の精度向上や、多様な形状の表示板に対応することができるという効果がある。 In the second embodiment, the points manually instructed and the true reference points do not necessarily have to coincide with each other with respect to the four reference points. With round or triangular road signs, it is difficult to visually input the four points corresponding to the four corners of the license plate, so set more characteristic points as points to be manually specified. Four points serving as reference points are obtained by the multilayer neural network 36. For example, in manual input, three vertices of a triangle are designated, four corners of the circumscribed rectangle of the triangle are obtained by calculation from the designated three points, and the four corners are used as reference points.
This has the effect of improving the accuracy of the reference point and making it possible to handle display boards of various shapes.

次に、認識段階の処理について説明する。図９（ｂ）に示すように、道路標識を撮像した入力画像Ｃが入力されると、前処理を行って認識用画像Ｄを出力する（ステップＳ４１）。この認識用画像Ｄを基に道路標識の形状の種別の識別（ステップＳ４２）を行い、それ以降は種別に応じたネットワークモデルパラメータＧを適用した多層ニューラルネットワーク３６に解析用画像Ｅを入力して（ステップＳ４３）、４つの基準点に対応する４つの対応点の位置に係る情報を出力させる（ステップＳ４４）。次に、４つの対応点を用いて認識用画像Ｄを射影変換して（ステップＳ４５）、正面画像Ｆを生成する。そして、正面画像Ｆを文字認識する（ステップＳ４６）。この時、種別に応じた文字認識用辞書Ｈを使う。道路標識の字体がそれぞれ異なっているからである。図９のステップＳ４３〜Ｓ４６の処理内容は図１に示すステップＳ２１〜Ｓ２４の処理内容と同じであるため、重複する説明は省略する。 Next, the processing at the recognition stage will be described. As shown in FIG. 9B, when the input image C that captures the road sign is input, preprocessing is performed and the recognition image D is output (step S41). Based on this recognition image D, the type of road sign shape is identified (step S42), and thereafter, the analysis image E is input to the multi-layer neural network 36 to which the network model parameter G according to the type is applied. (Step S43) Information relating to the positions of the four corresponding points corresponding to the four reference points is output (step S44). Next, the recognition image D is projected and transformed (step S45) using the four corresponding points to generate the front image F. Then, the front image F is recognized as a character (step S46). At this time, the character recognition dictionary H according to the type is used. This is because the fonts of the road signs are different. Since the processing contents of steps S43 to S46 of FIG. 9 are the same as the processing contents of steps S21 to S24 shown in FIG. 1, duplicate description will be omitted.

上述の一連の処理を行うことにより、道路標識を様々な角度で撮像した入力画像Ｃが入力された場合であっても、この入力画像Ｃを効率的に正面画像Ｆに変換し、精度良く道路標識の文字を出力することができる。特に、入力画像Ｃに含まれる道路標識に係る一部の情報が喪失され、通常の処理では４つの基準点全ての抽出が難しい場合であっても、この場合に備えた学習処理を行っているため、４つの対応点を特定し、もって３軸の射影変換を行うことが可能となる。そして、さまざまな形状の道路標識の文字の認識ができることになる。 By performing the above-mentioned series of processes, even when the input image C obtained by capturing the road sign at various angles is input, the input image C is efficiently converted into the front image F, and the road is accurately converted. The characters of the sign can be output. In particular, even if some information related to the road sign included in the input image C is lost and it is difficult to extract all four reference points by normal processing, learning processing is performed in preparation for this case. Therefore, it is possible to specify four corresponding points and perform a three-axis projective conversion. Then, the characters of road signs of various shapes can be recognized.

＜画像処理システムのシステム構成＞
次に、実施例２に係る画像処理システムのシステム構成を説明する。図１０は、実施例２に係る画像処理システムのシステム構成を示す図である。同図に示すように、この画像処理システムは、車両に搭載された撮像装置１と画像処理装置４とを接続した構成となる。なお、撮像装置１については実施例１に示すものと同様であるので、ここではその説明を省略する。 <System configuration of image processing system>
Next, the system configuration of the image processing system according to the second embodiment will be described. FIG. 10 is a diagram showing a system configuration of the image processing system according to the second embodiment. As shown in the figure, this image processing system has a configuration in which an image pickup device 1 mounted on a vehicle and an image processing device 4 are connected. Since the image pickup apparatus 1 is the same as that shown in the first embodiment, the description thereof will be omitted here.

画像処理装置４は、撮像装置１から受信した入力画像Ｃ内の道路標識の形状の種別を識別し、その道路標識に含まれる文字を認識する装置である。具体的には、認識用画像Ｄからその道路標識の形状の種別を識別し、その種別に応じて教師有り学習を行ったネットワークモデルパラメータＧを使う多層ニューラルネットワーク３６により４つの対応点を特定し、この４つの対応点を用いて認識用画像Ｄの射影変換を行って正面画像Ｆを取得し、この正面画像Ｆを種別に応じて用意された文字認識用辞書Ｈを用いて文字認識し、道路標識に含まれる文字を出力する。 The image processing device 4 is a device that identifies the type of shape of the road sign in the input image C received from the image pickup device 1 and recognizes the characters included in the road sign. Specifically, the type of the shape of the road sign is identified from the recognition image D, and four corresponding points are specified by the multi-layer neural network 36 using the network model parameter G in which supervised learning is performed according to the type. , The front image F is acquired by performing the projective conversion of the recognition image D using these four corresponding points, and the front image F is character-recognized using the character recognition dictionary H prepared according to the type. Output the characters included in the road sign.

＜画像処理装置４の構成＞
次に、図１０に示した画像処理装置４の構成について説明する。図１１は、図１０に示した画像処理装置４の学習時の構成を示す機能ブロック図である。同図に示すように、この画像処理装置４は、入力部３１、表示部３２、通信インターフェース部３３、記憶部３４及び学習処理制御部３５を有する。なお、実施例１の画像処理装置３と同様の部位については、同一の符号を付すこととして、その詳細な説明を省略する。入力部３１は、操作者が参照画像の種別を入力するのに使用される。 <Configuration of image processing device 4>
Next, the configuration of the image processing device 4 shown in FIG. 10 will be described. FIG. 11 is a functional block diagram showing a configuration of the image processing device 4 shown in FIG. 10 at the time of learning. As shown in the figure, the image processing device 4 includes an input unit 31, a display unit 32, a communication interface unit 33, a storage unit 34, and a learning processing control unit 35. The same parts as those of the image processing apparatus 3 of the first embodiment are designated by the same reference numerals, and detailed description thereof will be omitted. The input unit 31 is used by the operator to input the type of the reference image.

記憶部３４は、フラッシュメモリ等の不揮発性メモリ又はハードディスクからなる記憶デバイスであり、参照画像Ａ、学習用画像Ｂ及びネットワークモデルパラメータＧ（複数セット）などを記憶する。この参照画像Ａは、多層ニューラルネットワーク３６及びネットワークモデル更新処理部３７が教師有り学習を行ってネットワークモデルパラメータＧを更新する際に用いる基本となる画像であり、少なくとも道路標識を特定する４つの基準点を特定し得る正面画像である。 The storage unit 34 is a storage device composed of a non-volatile memory such as a flash memory or a hard disk, and stores a reference image A, a learning image B, a network model parameter G (a plurality of sets), and the like. This reference image A is a basic image used when the multilayer neural network 36 and the network model update processing unit 37 perform supervised learning to update the network model parameter G, and is at least four criteria for specifying a road sign. It is a front image that can identify a point.

学習用画像Ｂは、参照画像Ａに基づいて生成される学習用の画像であり、参照画像Ａに歪みを持たせるように幾何変換を行った画像、道路標識の一部の情報を喪失させた画像等が含まれる。撮像装置１によって撮像され得る入力画像Ｃの態様を考慮して、参照画像Ａに対して様々な幾何変換等を行う。学習用画像Ｂの作り方の原理は実施例１に述べたものと同じである。また、実施例２でも、学習用画像Ｂは４つの基準点の位置情報である基準点情報を有している。 The learning image B is a learning image generated based on the reference image A, and the reference image A is geometrically transformed so as to have distortion, and some information of the road sign is lost. Images etc. are included. Various geometric transformations and the like are performed on the reference image A in consideration of the mode of the input image C that can be captured by the image pickup device 1. The principle of how to make the learning image B is the same as that described in the first embodiment. Further, also in the second embodiment, the learning image B has reference point information which is the position information of the four reference points.

ここでは、認識用画像Ｄは撮像装置１によって撮像された画像に対してトリミング、ノイズ除去及びサイズ合わせ等の前処理を行った道路標識部分の画像である。 Here, the recognition image D is an image of a road sign portion obtained by performing preprocessing such as trimming, noise removal, and size adjustment on the image captured by the image pickup device 1.

図１１に示す実施例２の学習処理部４０は、図３の第１の実施例の学習処理部４０と同じ構成である。但し、第２の実施例では参照画像Ａの種別が複数あり、全ての参照画像Ａの種別に亘り個々に学習を行う点で異なっている。 The learning processing unit 40 of the second embodiment shown in FIG. 11 has the same configuration as the learning processing unit 40 of the first embodiment of FIG. However, in the second embodiment, there are a plurality of types of the reference image A, and all the types of the reference image A are individually learned.

ここで、図１２の図を用いて、処理対象物が道路標識の場合の基準点について説明する。図１２（ａ）に示すのは、方面及び距離標識であり、形状がナンバープレートと同じ４角形なので、四隅が基準点となる。図１２（ｂ）は、四角形ではない標識で円形のものと多角形のものを示している。この場合には、学習の最初には該当する標識を含む四角形の４点（Ｐ１〜Ｐ４）を基準点として入力して、多層ニューラルネットワーク３６によって、外周上の点或いは外周上の角点Ｐ１’〜Ｐ４’を得る。 Here, the reference point when the object to be processed is a road sign will be described with reference to the figure of FIG. FIG. 12A shows direction and distance signs, and since the shape is the same quadrangle as the license plate, the four corners serve as reference points. FIG. 12B shows a non-quadrangular sign with a circular shape and a polygonal shape. In this case, at the beginning of learning, four points (P1 to P4) of a quadrangle containing the corresponding marker are input as reference points, and a point on the outer circumference or a corner point P1'on the outer circumference is used by the multilayer neural network 36. Get ~ P4'.

次に図１０の画像処理装置４が認識処理を行う場合を説明する。図１３は認識処理を行う場合の機能ブロック図である。認識処理制御部５１は入力画像受付部４５と認識処理部４６からなる。この認識処理制御部５１は、撮像装置１から通信インターフェース部３３を経由して、背景を含んだ入力画像Ｃを受け付ける入力画像受付部４５と、道路標識の形状の種別を識別する種別識別部５２と、種別毎に学習したネットワークモデルパラメータＧで再構成された多層ニューラルネットワーク３６を含む認識処理部４６とから構成されている。図１３において、図５と符号が同じものについては、図５の説明と重複するので説明を省略する。 Next, a case where the image processing device 4 of FIG. 10 performs the recognition process will be described. FIG. 13 is a functional block diagram when performing recognition processing. The recognition processing control unit 51 includes an input image receiving unit 45 and a recognition processing unit 46. The recognition processing control unit 51 has an input image reception unit 45 that receives an input image C including a background from the image pickup device 1 via a communication interface unit 33, and a type identification unit 52 that identifies the type of the shape of the road sign. It is composed of a recognition processing unit 46 including a multi-layer neural network 36 reconstructed with the network model parameter G learned for each type. In FIG. 13, the same reference numerals as those in FIG. 5 are omitted because they overlap with the description of FIG.

入力画像Ｃは、処理対象となる道路標識の画像である。認識処理部４６は、具体的には、対応点特定部４７が４つの基準点に対応する認識用画像Ｄ上の４つの対応点、若しくはおおよその対応点を求め、道路標識の形状の種別に応じて学習済みのネットワークモデルパラメータＧで再構成された教師有り学習済みの多層ニューラルネットワーク３６を用いて認識用画像Ｄの道路標識の４つの対応点を抽出し、この４つの対応点の位置情報をもとに認識用画像Ｄを射影変換して正面画像Ｆを求め、求めた正面画像Ｆを種別に応じて用意された文字認識用辞書Ｈを用いて文字認識し、道路標識の文字を出力する。 The input image C is an image of a road sign to be processed. Specifically, the recognition processing unit 46 obtains four corresponding points or approximate corresponding points on the recognition image D corresponding to the four reference points by the corresponding point specifying unit 47, and determines the type of the shape of the road sign. The four corresponding points of the road sign of the recognition image D are extracted using the trained multi-layer neural network 36 with the supervision reconstructed by the trained network model parameter G, and the position information of these four corresponding points is obtained. The recognition image D is projected and converted to obtain the front image F based on the above, the obtained front image F is recognized by using the character recognition dictionary H prepared according to the type, and the characters of the road sign are output. To do.

種別識別部５２は、入力画像受付部４５が生成した認識用画像Ｄを用いて道路標識の形状の種別を識別する処理を行う。このときには、予め識別すべき種別分用意してあるテンプレートを適用し、一番近いテンプレートの種別を処理対象物の形状の種別とする。 The type identification unit 52 performs a process of identifying the type of the shape of the road sign using the recognition image D generated by the input image reception unit 45. At this time, the template prepared for each type to be identified in advance is applied, and the type of the nearest template is set as the type of the shape of the processing target.

認識処理部４６は、種別分のハードウェアの形（基板）で構成されており、４６（１）〜４６（ｎ）という符号で示され、種別に応じた認識処理部４６（ｉ）が選ばれることになる（ｉは１〜ｎのうちのいずれかを示す）。この点で実施例２の認識処理制御部５１は、実施例１の認識処理制御部４４と異なっている。 The recognition processing unit 46 is composed of hardware shapes (boards) for each type, is indicated by reference numerals 46 (1) to 46 (n), and the recognition processing unit 46 (i) is selected according to the type. (I indicates any of 1 to n). In this respect, the recognition processing control unit 51 of the second embodiment is different from the recognition processing control unit 44 of the first embodiment.

＜認識処理手順＞
次に、図１３に示した画像処理装置４の認識段階における処理手順について説明する。図１４は、図１０に示した画像処理装置４の認識段階における処理手順を示すフローチャートである。簡単に説明すると、入力画像受付部４５で入力画像Ｃを受け付け（ステップＳ３０１）、認識用画像Ｄを生成する（ステップＳ３０２）、この認識用画像Ｄを元に種別識別部５２により道路標識の形状の種別を決定する（ステップＳ３０３）。そして、種別に応じた認識処理部４６（ｉ）を選択する（ステップＳ３０４）。次に、実施例１の記載と同様の解析用画像Ｅを生成し（ステップＳ３０５）、生成した解析用画像Ｅを種別識別部５２によって識別された種別のネットワークモデルパラメータＧで再構成された多層ニューラルネットワーク３６へ入力する（ステップＳ３０６）。多層ニューラルネットワーク３６は対応点を算出する（ステップＳ３０７）。 <Recognition processing procedure>
Next, the processing procedure in the recognition stage of the image processing apparatus 4 shown in FIG. 13 will be described. FIG. 14 is a flowchart showing a processing procedure in the recognition stage of the image processing apparatus 4 shown in FIG. Briefly, the input image reception unit 45 receives the input image C (step S301), generates the recognition image D (step S302), and the type identification unit 52 uses the recognition image D to generate the shape of the road sign. Is determined (step S303). Then, the recognition processing unit 46 (i) according to the type is selected (step S304). Next, an analysis image E similar to that described in the first embodiment is generated (step S305), and the generated analysis image E is reconstructed with the network model parameter G of the type identified by the type identification unit 52. Input to the neural network 36 (step S306). The multi-layer neural network 36 calculates the corresponding points (step S307).

求まった対応点位置座標を用いて認識用画像Ｄを射影変換し、正面画像Ｆを生成する（ステップＳ３０８）。そして、求めた正面画像Ｆから種別に応じた文字認識用辞書Ｈを使って文字認識を行い、処理対象の道路標識の文字を得る（ステップＳ３０９）。 The recognition image D is projected and transformed using the obtained corresponding point position coordinates to generate the front image F (step S308). Then, character recognition is performed from the obtained front image F using the character recognition dictionary H according to the type, and the characters of the road sign to be processed are obtained (step S309).

ここで、実施例２で扱う画像について説明を行う。図１５は、三角形状の国道番号標識を例に示している。図１５（ａ）は入力画像Ｃであり、斜め上方から撮像した画像を示している。図１５（ｂ）は認識用画像Ｄであり、対応点情報を持った画像を示している。図１５（ｃ）は射影変換後の正面画像Ｆを示している。図１５（ｄ）は文字認識された国道番号である。 Here, the image handled in the second embodiment will be described. FIG. 15 shows a triangular national highway number sign as an example. FIG. 15A is an input image C, which shows an image taken from diagonally above. FIG. 15B is a recognition image D, which shows an image having corresponding point information. FIG. 15C shows the front image F after the projective transformation. FIG. 15 (d) is a character-recognized national highway number.

なお、実施例２では、認識処理部４６を処理対象の道路標識の形状の種別に応じて複数用意したが、例えば、ＦＰＧＡ（Field-Programmable Gate Array）を使って、種別の識別の都度ハードウェアを構築するようにすれば、複数枚のハードウェア基板を用意する必要はない。 In the second embodiment, a plurality of recognition processing units 46 are prepared according to the type of the shape of the road sign to be processed. For example, using FPGA (Field-Programmable Gate Array), hardware is used each time the type is identified. There is no need to prepare multiple hardware boards if you try to build.

なお、本実施例２では、道路標識の種別を特定する場合を示したが、本発明はこれに限定されるものではなく、看板等の表示板に含まれる文字を出力する場合にも同様に適用することができる。また、実施例１及び２を組み合わせることも可能である。 In the second embodiment, the case of specifying the type of the road sign is shown, but the present invention is not limited to this, and the same applies to the case of outputting characters included in a display board such as a signboard. Can be applied. It is also possible to combine Examples 1 and 2.

また、上記の実施例１及び２では、４つの基準点に対応する対応点を用いる場合を示したが、５つ以上の基準点に対応する対応点を設定し、そのうちの４点を用いて射影変換を行うこともできる。さらに、上記の入力画像中での表示板の歪みが平行四辺形で近似できる場合には、３つの対応点を出力できるように学習した多層ニューラルネットワーク３６によって３つの対応点を検出し、検出された３つの対応点を用いて３軸の射影変換ではなくアフィン変換により正面画像への変換を行うこともできる。 Further, in Examples 1 and 2 above, the case where the corresponding points corresponding to the four reference points are used is shown, but the corresponding points corresponding to five or more reference points are set, and four of them are used. It is also possible to perform projective transformation. Further, when the distortion of the display board in the above input image can be approximated by a parallelogram, the three corresponding points are detected and detected by the multi-layer neural network 36 trained so that the three corresponding points can be output. It is also possible to convert to a front image by an affine transformation instead of a three-axis projective transformation using the three corresponding points.

また、上記の実施例１及び２では、認識段階で学習を行わないこととしたが、認識段階において教師無し学習を行うよう構成することもできる。また、上記実施例１及び２では、白黒画像を処理対象とする場合を示したが、カラー画像及び赤外画像を処理対象とすることもできる。赤外画像を用いると夜間に対応することが可能となる。なお、解析用画像Ｅは、Ｓｏｂｅｌオペレータ、Ｒｏｂｅｒｔｓのオペレータ等の微分オペレータを使った微分画像やガボールフィルタなどを適用した他の画像であってもよい。 Further, in the above-mentioned Examples 1 and 2, learning is not performed in the recognition stage, but unsupervised learning may be performed in the recognition stage. Further, in the above Examples 1 and 2, the case where the black-and-white image is the processing target is shown, but the color image and the infrared image can also be the processing target. Infrared images can be used at night. The analysis image E may be a differential image using a differential operator such as a Sobel operator or a Roberts operator, or another image to which a Gabor filter or the like is applied.

本発明の画像処理装置、画像処理システム及び画像処理方法は、様々な異なる状況で撮像された画像を効率的かつ精度良く正面画像に変換する場合に適している。 The image processing apparatus, image processing system, and image processing method of the present invention are suitable for efficiently and accurately converting images captured in various different situations into front images.

Ａ参照画像
Ｂ学習用画像
Ｃ入力画像
Ｄ認識用画像
Ｅ解析用画像
Ｆ正面画像
Ｇネットワークモデルパラメータ
Ｈ文字認識用辞書
１撮像装置
２ネットワーク
３，４画像処理装置
３１入力部
３２表示部
３３通信インターフェース部
３４記憶部
３５学習処理制御部
３６多層ニューラルネットワーク
３７ネットワークモデル更新処理部
３８参照画像生成部
３９学習用画像生成部
４０学習処理部
４１対応点学習受付部
４４認識処理制御部
４５入力画像受付部
４６認識処理部
４７対応点特定部
４８正面画像生成部
４９文字認識部
５０記憶部
５１認識処理制御部
５２種別識別部 A Reference image B Learning image C Input image D Recognition image E Analysis image F Front image G Network model parameter H Character recognition dictionary 1 Image pickup device 2 Network 3, 4 Image processing device 31 Input section 32 Display section 33 Communication interface Unit 34 Storage unit 35 Learning processing control unit 36 Multi-layer neural network 37 Network model update processing unit 38 Reference image generation unit 39 Learning image generation unit 40 Learning processing unit 41 Corresponding point learning reception unit 44 Recognition processing control unit 45 Input image reception unit 46 Recognition processing unit 47 Corresponding point identification unit 48 Front image generation unit 49 Character recognition unit 50 Storage unit 51 Recognition processing control unit 52 Type identification unit

Claims

A learning image generation unit for generating a learning image with missing a region including a portion of the reference point display panel from the reference image having the unrealized four reference points,
A multi-layer neural network in which supervised learning was performed using the learning image and four reference points in the learning image, and
An input image reception unit that acquires an input image including a display board,
A correspondence point specifying unit that inputs an image based on the input image to the multilayer neural network and specifies information related to four corresponding points related to the input image corresponding to the reference point .
It is provided with a front image generation unit that generates a front image corresponding to the input image by performing a projective transformation based on information related to at least three correspondence points out of the four correspondence points specified by the correspondence point identification unit. An image processing device characterized by the fact that.

The image processing apparatus according to claim 1 , further comprising a character recognition unit that recognizes characters included in the front image generated by the front image generation unit.

Said multilayer neural networks, and Manabu習用image a first learning image image projective transformation is missing some of the information of the real image obtained by imaging the display panel by a predetermined image pickup device, in the learning image The image processing apparatus according to claim 1 or 2 , wherein supervised learning is performed using four reference points as input information.

The image processing apparatus according to any one of claims 1 to 3, further comprising a type identification unit for identifying the type of the display board.

Any one of claims 1 to 4 , wherein the corresponding point specifying unit inputs an analysis image, which is a shade image generated from the input image acquired by the input image receiving unit, into the multilayer neural network. The image processing apparatus according to one.

The image processing apparatus according to any one of claims 1 to 5 , wherein the display board is a license plate or a road sign indicating a vehicle registration number.

An image processing system that processes images including a display board.
A learning image generation unit for generating a learning image with missing a region including a portion of the reference point the panel from the reference image having the unrealized four reference points,
A multi-layer neural network in which supervised learning was performed using the learning image and four reference points in the learning image, and
An input image reception unit that acquires an input image including a display board,
A correspondence point specifying unit that inputs an image based on the input image to the multilayer neural network and specifies information related to four corresponding points related to the input image corresponding to the reference point .
It is provided with a front image generation unit that generates a front image corresponding to the input image by performing a projective transformation based on information related to at least three correspondence points out of the four correspondence points specified by the correspondence point identification unit. An image processing system characterized by the fact that.

A learning image generation step of generating a learning image with missing a region including a portion of the reference points from the reference image having the unrealized four reference points display panel,
A step of causing a multi-layer neural network to perform supervised learning using the learning image and four reference points in the learning image, and
An input image acquisition step to acquire an input image including a display board, and
A correspondence point identification step of inputting an image based on the input image into the multilayer neural network and specifying information related to four corresponding points related to the input image corresponding to the reference point ,
The front image generation step includes a front image generation step of generating a front image corresponding to the input image by performing a projective transformation based on information relating to at least three correspondence points out of the four correspondence points specified by the correspondence point identification step. An image processing method characterized by being.