JP2019125116A

JP2019125116A - Information processing device, information processing method, and program

Info

Publication number: JP2019125116A
Application number: JP2018004555A
Authority: JP
Inventors: 久義降籏; Hisayoshi Furihata
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-01-15
Filing date: 2018-01-15
Publication date: 2019-07-25

Abstract

To provide an information processing device capable of highly accurately performing image recognition, an information processing method, and a program.SOLUTION: An information processing device for outputting output results corresponding to a photographic image input on the basis of a learning model comprises: first acquisition means for acquiring photographing conditions under which the image used to generate the learning model was photographed; input means for inputting a photographic image of a processing object from an imaging device; second acquisition means for acquiring the photographing conditions under which the photographic image of the processing object was photographed; conversion means for converting the photographic image of the processing object on the basis of the photographing conditions acquired by the first and second acquisition means; and output means for outputting output results corresponding to the converted image on the basis of the learning model.SELECTED DRAWING: Figure 3

Description

本発明は、画像認識を行う技術に関する。 The present invention relates to a technology for performing image recognition.

事前に物体を撮影した画像からパターンや特徴点を学習し、その学習結果を使って画像上の物体の種類や位置、形状などを認識する機械学習の技術が知られている。機械学習による画像認識を精度よく動作させるためには、学習する画像の撮影条件と、上記パターンや特徴点を認識できるように学習された認識器に入力する画像の撮影条件ができるだけ一致していることが望ましい。ここで撮影条件とは、学習用の画像を撮影する際、あるいは認識器に認識させるために入力する画像を撮影する際における、撮像装置の位置や姿勢、あるいはそれら撮影場所に影響を与える光源の明るさや色合い等の条件を指す。 A technique of machine learning is known which learns patterns and feature points from an image obtained by photographing an object in advance, and recognizes the type, position, and shape of the object on the image using the learning result. In order to operate the image recognition by machine learning with high accuracy, the shooting conditions of the image to be learned and the shooting conditions of the image to be input to the recognizer learned so as to be able to recognize the above pattern and feature points match as much as possible. Is desirable. Here, the shooting conditions include the position and orientation of the imaging apparatus or the light source that affects the shooting locations when shooting an image for learning or shooting an image to be input to a recognizer. Indicates conditions such as brightness and tint.

特許文献１では画像認識を行う物体の学習画像を生成する方法が示されている。 Patent Document 1 discloses a method of generating a learning image of an object for image recognition.

また、非特許文献１では、事前に与えられた画像に基づく機械学習により構築した認識処理に関する方法で、撮影された画像から画像中の物体の３次元形状を認識し、その形状をもとに上記撮影をした撮像装置の位置や姿勢を推定する方法が開示されている。 Further, in Non-Patent Document 1, a method related to recognition processing constructed by machine learning based on an image given in advance recognizes a three-dimensional shape of an object in an image from a photographed image, and based on the shape A method is disclosed for estimating the position and orientation of the imaging device that has taken the above image.

特開２０１４−１９９５８４号公報JP, 2014-199584, A

Ｋ．Ｔａｔｅｎｏ，ｅｔ．ａｌ．ＣＮＮ−ＳＬＡＭ：Ｒｅａｌ−ｔｉｍｅｄｅｎｓｅｍｏｎｏｃｕｌａｒＳＬＡＭｗｉｔｈｌｅａｒｎｅｄｄｅｐｔｈｐｒｅｄｉｃｔｉｏｎ，ＩＥＥＥＣｏｍｐｕｔｅｒＳｏｃｉｅｔｙＣｏｎｆｅｒｅｎｃｅＣＶＰＲ，２０１７K. Tateno, et. al. CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction, IEEE Computer Society Conference CVPR, 2017

非特許文献１では、学習する画像の撮影条件と、パターンや特徴点を認識できるように学習された認識器に入力する画像の撮影条件が一致している前提がある。しかしながら、学習を行う環境と、実際に認識を行う環境の撮影条件を一致させることは現実的に難しいため、認識を精度よく行うことができなかった。 In Non-Patent Document 1, there is a premise that the shooting conditions of an image to be learned and the shooting conditions of an image to be input to a recognizer learned so as to be able to recognize a pattern or a feature point match. However, since it is practically difficult to match the shooting conditions of the environment in which learning is performed and the environment in which recognition is actually performed, recognition could not be performed with high accuracy.

特許文献１では画像認識を実施する撮影条件に対応するように学習時の画像を変換してから学習させる方法が示されている。しかしながら、この方法では実際に画像認識を行う環境の撮像条件を事前に把握できていないと、必要十分な学習するための画像を生成できず、効率的に学習を行えない。特に、事前の学習時に、実際に画像認識を行う環境が未知の場合には、認識の精度が低下してしまう。 Patent Document 1 discloses a method of converting an image at the time of learning so as to correspond to a photographing condition for performing image recognition and then learning. However, with this method, if the imaging conditions of the environment in which image recognition is actually performed can not be grasped in advance, an image for performing necessary and sufficient learning can not be generated, and efficient learning can not be performed. In particular, when the environment in which image recognition is actually performed is unknown at the time of prior learning, the accuracy of recognition is degraded.

本発明は上記課題に鑑みてなされたものであり、事前に学習を行う環境と、実際に認識を行う環境とで撮影条件が一致していない場合であっても、画像認識の精度を向上させる技術を提供することを目的とする。 The present invention has been made in view of the above problems, and improves the accuracy of image recognition even when shooting conditions do not match between an environment in which learning is performed in advance and an environment in which recognition is actually performed. The purpose is to provide technology.

上記の目的を達成する本発明に係る情報処理装置は、学習モデルに基づいて、入力された撮影画像に対応する出力結果を出力する情報処理装置であって、学習モデルを生成する際に用いる画像が撮影された撮影条件を取得する第１取得手段と、処理対象の撮影画像を撮像装置から入力する入力手段と、前記処理対象の撮影画像が撮影された撮影条件を取得する第２取得手段と、前記第１および第２取得手段で取得した撮影条件に基づいて、前記処理対象の撮影画像を変換する変換手段と、前記学習モデルに基づいて、前記変換された画像に対応する出力結果を出力する出力手段とを備えることを特徴とする。 An information processing apparatus according to the present invention for achieving the above object is an information processing apparatus for outputting an output result corresponding to an input photographed image based on a learning model, and an image used when generating the learning model A first acquisition unit for acquiring a photographing condition in which a subject is photographed, an input unit for inputting a photographed image to be processed from an imaging device, and a second acquisition unit for acquiring a photographing condition in which the photographed image to be processed is photographed A conversion means for converting the photographed image to be processed based on the photographing conditions acquired by the first and second acquisition means, and an output result corresponding to the converted image based on the learning model And output means for

本発明により、事前に学習を行う環境と、実際に認識を行う環境とで撮影条件が一致していない場合であっても、画像認識の精度を向上させる技術を提供することができる。 According to the present invention, it is possible to provide a technique for improving the accuracy of image recognition even when shooting conditions do not match between an environment in which learning is performed in advance and an environment in which recognition is actually performed.

情報処理装置の構成例を示すブロック図Block diagram showing a configuration example of an information processing apparatus 認識時（ａ）と学習時（ｂ）の撮像装置の配置の違いを示す図Diagram showing the difference in the arrangement of imaging devices at recognition (a) and at learning (b) 情報処理装置の機能構成例を示すブロック図Block diagram showing an example of functional configuration of information processing apparatus 情報処理装置の処理を示すフローチャートFlow chart showing processing of information processing apparatus 情報処理装置の機能構成例を示すブロック図Block diagram showing an example of functional configuration of information processing apparatus 情報処理装置システムの構成を示すブロック図Block diagram showing the configuration of an information processing system 情報処理装置を車両に搭載した例を示す図A diagram showing an example in which an information processing apparatus is mounted on a vehicle 情報処理システム全体の処理を示すフローチャートFlow chart showing processing of the entire information processing system

（第１の実施形態）
本実施形態では、学習モデルを使って画像認識を行う場合において、学習時と画像認識処理時の２つの条件下で撮影された画像同士の変化（差異）に注目する。認識処理に用いる画像を幾何的に変形させ、学習に用いた画像の撮影条件に近づけることで、画像認識の精度を向上させる方法について説明する。本実施形態における画像認識では、本実施形態では入力画像に映っている物体から撮像装置までの距離を推定する。入力画像と距離の正解値とを学習させた学習モデルを用いることで距離の推定がより正確になる。なお、認識処理は、機械学習に基づく方法で画像から何か情報を認識する方法であればどのような方法でも良い。また、本実施形態において、学習モデルとは、入力画像から入力画像に対応する距離情報を出力するニューラルネットワークに基づくネットワーク構造とそのパラメータとする。 First Embodiment
In this embodiment, when performing image recognition using a learning model, attention is paid to changes (differences) between images captured under two conditions at the time of learning and at the time of image recognition processing. A method of improving the accuracy of image recognition by geometrically deforming an image used for recognition processing and bringing it close to the imaging conditions of the image used for learning will be described. In the image recognition in this embodiment, in this embodiment, the distance from the object shown in the input image to the imaging device is estimated. By using a learning model in which the input image and the correct value of the distance are learned, estimation of the distance becomes more accurate. The recognition process may be any method as long as it is a method of recognizing some information from an image by a method based on machine learning. Further, in the present embodiment, the learning model is a network structure based on a neural network that outputs distance information corresponding to an input image from an input image and parameters thereof.

まず、画像に発生する幾何学的な“ずれ”は、学習時と認識処理時に、撮像装置の位置や姿勢が異なることから発生する。この“ずれ”とは、２つの条件下で撮影された画像同士の変化（差異）を指す。例えば、図２のように車両に撮像装置を搭載する場合を考える。学習時と認識処理時で同じ高さあるいは同じ種類の車両を使うことができた場合でも、撮像装置の位置や姿勢は、すべて同じに合わせることができない可能性がある。また、学習に使った撮像装置と、認識処理を実行するときに使う撮像装置とで、形や種類が異なる可能性がある。そのため、学習に用いる画像の撮影条件と、パターンや特徴点を認識できるように学習された認識器に入力する画像の撮影条件が一致しない場合が考えられる。撮像条件が異なる画像では、学習の効果が反映されにくくなり、画像認識の精度が低下する。 First, geometric “misalignment” that occurs in an image occurs because the position and orientation of the imaging device are different during learning and recognition processing. The "deviation" refers to a change (difference) between images captured under two conditions. For example, consider the case of mounting an imaging device on a vehicle as shown in FIG. Even if the same height or the same type of vehicle can be used in learning and recognition processing, there is a possibility that the positions and orientations of the imaging device can not all be the same. In addition, the shape and type may differ between the imaging device used for learning and the imaging device used when performing recognition processing. Therefore, it may be considered that the shooting conditions of the image used for learning do not match the shooting conditions of the image input to the recognizer learned so as to be able to recognize the pattern and feature points. With images having different imaging conditions, the effect of learning is less likely to be reflected, and the accuracy of image recognition is reduced.

図１を用いて、本実施形態のハードウエアの構成例を示す。 A configuration example of hardware of the present embodiment is shown using FIG.

中央処理ユニット（ＣＰＵ）１０１は、ＲＡＭ１０３をワークメモリとして、ＲＯＭ１０２や記憶装置１０４に格納されたＯＳやその他プログラムを読みだして実行し、システムバス１０６に接続された各構成を制御して、各種処理の演算や論理判断などを行う。ＣＰＵ１０１が実行する処理には、実施形態の画像認識処理が含まれる。 The central processing unit (CPU) 101 uses the RAM 103 as a work memory, reads and executes the OS and other programs stored in the ROM 102 and the storage device 104, controls various components connected to the system bus 106, Perform processing operations and logical decisions. The processing executed by the CPU 101 includes the image recognition processing of the embodiment.

記憶装置１０４は、ハードディスクドライブや外部記憶装置などであり、実施形態の画像認識処理にかかるプログラムや各種データを記憶する。 The storage device 104 is a hard disk drive, an external storage device, or the like, and stores programs and various data related to the image recognition processing of the embodiment.

入力部１０５は、カメラなどの撮像装置、ユーザ指示を入力するためのボタン、キーボード、タッチパネルなどの入力デバイスである。なお、記憶装置１０４は例えばＳＡＴＡなどのインタフェイスを介して、入力部１０５は例えばＵＳＢなどのシリアルバスを介して、それぞれシステムバス１０９に接続されるが、それらの詳細は省略する。通信部１０６は無線通信で外部の機器と通信を行う。表示部１０７はディスプレイである。センサ１０８は画像センサや距離センサである。 The input unit 105 is an imaging device such as a camera, an input device such as a button for inputting a user instruction, a keyboard, and a touch panel. The storage unit 104 is connected to the system bus 109 via an interface such as SATA, and the input unit 105 is connected to the system bus 109 via a serial bus such as USB. The communication unit 106 communicates with an external device by wireless communication. The display unit 107 is a display. The sensor 108 is an image sensor or a distance sensor.

尚、ＣＰＵはプログラムを実行することで各種の手段として機能することが可能である。なお、ＣＰＵと協調して動作するＡＳＩＣなどの制御回路がこれらの手段として機能しても良い。また、ＣＰＵと画像処理装置の動作を制御する制御回路との協調によってこれらの手段が実現されても良い。また、ＣＰＵは単一のものである必要はなく、複数であっても良い。この場合、複数のＣＰＵは分散して処理を実行することが可能である。また、複数のＣＰＵは単一のコンピュータに配置されていても良いし、物理的に異なる複数のコンピュータに配置されていても良い。なお、ＣＰＵがプログラムを実行することで実現する手段が専用の回路によって実現されても良い。 Note that the CPU can function as various means by executing a program. Note that a control circuit such as an ASIC operating in cooperation with the CPU may function as these means. Also, these means may be realized by cooperation between the CPU and a control circuit that controls the operation of the image processing apparatus. Also, the CPU need not be a single CPU, but may be plural. In this case, a plurality of CPUs can execute processing in a distributed manner. Also, the plurality of CPUs may be disposed in a single computer or may be disposed in physically different computers. Note that means realized by the CPU executing a program may be realized by a dedicated circuit.

図２は、画像認識処理に用いる車両１００（ａ）と、学習に用いる車両２００（ｂ）の違いを表す図である。また、（ｃ）は認識処理時に、画像１１１から画像１１２に画像変換をした一例を表す図である。図２の例では、それぞれの車両に搭載された撮像装置の位置（高さ）が異なる。図２の１００と２００は異なる車両を表し、１１０と２１０はそれぞれに搭載された撮像装置を表す。なお、２つの車両は同じ車種であることが望ましいが、本実施形態では異なる車種である。車種が異なる場合、撮像装置の位置や姿勢をまったく同じように設置することは難しい。図２のように撮像装置の位置が異なると、１１１と２１１に示す画像のように、得られる画像に幾何的なずれが発生する。この場合の幾何学的なずれは、具体的には、画像の中にある道路や建物の見た目の位置や角度のずれである。それゆえ、例えば図２の撮像装置２１０の画像２１１を用いて学習を行った場合、撮影条件の異なる撮像装置１１０の画像１１１を、画像から距離を推定する学習を行った学習モデルに入力しても、十分な画像認識の精度が得られない場合がある。 FIG. 2 is a diagram showing the difference between a vehicle 100 (a) used for image recognition processing and a vehicle 200 (b) used for learning. Further, (c) is a diagram showing an example of image conversion from the image 111 to the image 112 at the time of recognition processing. In the example of FIG. 2, the positions (heights) of the imaging devices mounted on the respective vehicles are different. Reference numerals 100 and 200 in FIG. 2 represent different vehicles, and reference numerals 110 and 210 represent imaging devices mounted on the respective vehicles. Although it is preferable that the two vehicles be of the same vehicle type, in this embodiment, they are different vehicle types. If the vehicle type is different, it is difficult to set the position and attitude of the imaging device exactly the same. When the position of the imaging device is different as shown in FIG. 2, as in the images shown by 111 and 211, geometric deviation occurs in the obtained image. The geometrical deviation in this case is specifically the deviation of the apparent position or angle of the road or building in the image. Therefore, for example, when learning is performed using the image 211 of the imaging device 210 in FIG. 2, the image 111 of the imaging device 110 with different imaging conditions is input to a learning model in which learning is performed to estimate the distance from the image. In some cases, sufficient image recognition accuracy may not be obtained.

そこで本実施形態では、学習に用いる車両に搭載した撮像装置の位置や姿勢と、学習結果を使った認識処理に用いる車両に搭載した撮像装置の位置や姿勢の違いに基づいて、認識処理に用いる画像を幾何的に変形させ、学習に用いた画像の撮影条件に近づける。これにより、事前に学習を行う環境と、実際に認識を行う環境とで撮影条件が一致していない場合であっても、画像認識の精度を出来るだけ向上する
次に、図３を用いて本実施形態の情報処理装置の機能構成を説明する。図３の３００は本実施形態の情報処理装置、１１０は画像を撮影する撮像装置を示す。情報処理装置３００は、画像取得部３１０、撮影条件取得部３２０、画像変換部３３０、学習モデル保持部３４０、学習モデル取得部３５０、画像認識部３６０からなる。 So, in this embodiment, it uses for recognition processing based on the difference between the position and posture of the imaging device carried in the vehicle used for learning, and the position and posture of the imaging device carried in the vehicle used for recognition processing using a learning result The image is deformed geometrically to be close to the imaging condition of the image used for learning. As a result, the accuracy of image recognition is improved as much as possible even when the shooting conditions do not match between the environment in which learning is performed in advance and the environment in which recognition is actually performed. The functional configuration of the information processing apparatus of the embodiment will be described. Reference numeral 300 in FIG. 3 denotes an information processing apparatus according to this embodiment, and reference numeral 110 denotes an imaging apparatus for capturing an image. The information processing apparatus 300 includes an image acquisition unit 310, a photographing condition acquisition unit 320, an image conversion unit 330, a learning model holding unit 340, a learning model acquisition unit 350, and an image recognition unit 360.

撮像装置１１０は、２次元の処理対象の画像を撮像するカメラである。ただし、カメラはカラーカメラのほかモノクロカメラでも良い。本実施形態では、このカメラは単眼のカラーカメラであり、撮影された画像はカラー画像であるとする。なお、撮像装置１１０は、図２でも説明したとおり、学習結果を使った認識処理を行う画像を撮影する。 The imaging device 110 is a camera that captures an image of a two-dimensional process target. However, the camera may be a monochrome camera as well as a color camera. In this embodiment, this camera is a monocular color camera, and the captured image is a color image. Note that, as described with reference to FIG. 2, the imaging device 110 captures an image to be subjected to recognition processing using the learning result.

画像取得部３１０は、撮像装置１１０から処理対象のカラー画像を入力され取得する。 The image acquisition unit 310 receives and acquires a color image to be processed from the imaging device 110.

撮影条件取得部３２０は、学習に用いられた画像の撮影条件（第１の撮影条件）と画像認識に用いる画像の撮影条件（第２の撮影条件）とを取得する。すなわち、撮像装置１１０の撮影条件と、学習に用いた異なる車両に取り付けられた図２の学習時の撮像装置２１０の撮影条件を取得する。なお、本実施形態において撮影条件とは、車両に搭載する撮像装置の位置と姿勢とする。（本実施形態では位置姿勢で位置と姿勢の両方を含むものとして説明する。）位置姿勢は予め設定した原点の座標系における位置姿勢とする。本実施形態では、車両に撮像装置を搭載した状態で、地面の位置を基準とした高さを位置の情報、水平面を基準とした角度を姿勢と考える。なお、位置姿勢は、位置姿勢を表わす情報であればどのようなものであっても良い。例えば、環境の世界座標系上における位置３自由度（Ｘ、Ｙ、Ｚ）と姿勢３自由度（Ｒｏｌｌ、Ｐｉｔｃｈ、Ｙａｗ）との合計６自由度の位置姿勢パラメータであっても良い。 The imaging condition acquisition unit 320 acquires imaging conditions (first imaging conditions) of images used for learning and imaging conditions (second imaging conditions) of images used for image recognition. That is, the imaging conditions of the imaging device 110 and the imaging conditions of the imaging device 210 at the time of learning of FIG. 2 attached to different vehicles used for learning are acquired. In the present embodiment, the imaging conditions are the position and orientation of an imaging device mounted on a vehicle. (In this embodiment, the position and orientation will be described as including both the position and orientation.) The position and orientation are assumed to be the position and orientation in the coordinate system of the origin set in advance. In the present embodiment, in a state where the imaging device is mounted on a vehicle, the height based on the position of the ground is considered as information of the position, and the angle based on the horizontal plane is considered as the posture. The position and orientation may be any information that represents the position and orientation. For example, position and orientation parameters having a total of six degrees of freedom, that is, three degrees of freedom (X, Y, Z) and three degrees of freedom (Roll, Pitch, Yaw) on the world coordinate system of the environment may be used.

撮像装置１１０と学習時の撮像装置２１０の内部パラメータ行列を既知とし、それぞれＫ１、Ｋ２とする。ここで、内部パラメータ行列とは、画像中心と焦点に関するパラメータを含む行列であり、事前にキャリブレーションを行うことでその値を取得しておくことができる。また、撮像装置の位置姿勢は、車両と撮像装置の配置に関わる設計情報から既知の情報として取得できるものとする。なお、この設計情報は記憶装置１０４に記憶してあるようにしても良いし、通信部１０６から受け取るようにしても良い。また、撮影条件は、記憶装置１０４といった記憶領域に記憶してあるものとし、そこから読みだすことで取得することができる。 The internal parameter matrix of the imaging device 110 and the imaging device 210 at the time of learning is known as K1 and K2, respectively. Here, the internal parameter matrix is a matrix including parameters related to the image center and the focus, and the values can be obtained by performing calibration in advance. Further, the position and orientation of the imaging device can be acquired as known information from design information related to the arrangement of the vehicle and the imaging device. The design information may be stored in the storage device 104 or may be received from the communication unit 106. Further, the photographing conditions are assumed to be stored in a storage area such as the storage device 104, and can be acquired by reading from the storage area.

撮影条件はこれ以外にも、撮像装置で撮影する際のズーム倍率を示す画像のズーム値、画角、色合いについての撮影条件に注目し変換を行っても良い。例えば、撮像条件の違いとして画角に注目する場合、撮影した画像のスケールを変換することになる。なお、撮影条件は撮像装置の設置場所や用途によって変えることができる。例えば、撮像装置は車両以外にも、監視カメラや作業用ロボットに搭載できる。その場合、撮像装置の位置姿勢は、同じく地面の位置を基準とした高さを位置の情報、水平面を基準とした角度を姿勢と考えても良いし、別途ユーザが指定する基準の座標系からの位置姿勢を考えても良い。 Besides the above, the imaging conditions may be converted by focusing attention on the imaging conditions for the zoom value of the image indicating the zoom magnification when imaging with the imaging device, the angle of view, and the hue. For example, when focusing on the angle of view as the difference in imaging conditions, the scale of the captured image is converted. The imaging conditions can be changed depending on the installation place and application of the imaging device. For example, the imaging device can be mounted on a surveillance camera or a work robot as well as a vehicle. In that case, the position and orientation of the image pickup apparatus may be considered to be the height based on the position of the ground as the position information, and the angle based on the horizontal surface as the attitude, or from the coordinate system of the reference separately designated by the user. You may consider the position and orientation of

画像変換部３３０は、画像認識処理時に撮影した画像の撮影条件と、学習時に用いた画像の撮影条件に基づいて画像変換を行う。すなわち、撮影条件取得部３２０で取得した撮像装置１１０の撮影条件と、学習時の撮像装置２１０の撮影条件に基づいて、画像取得部３１０で取得した画像１１１を画像１１２に幾何変換する。 The image conversion unit 330 performs image conversion based on the imaging conditions of the image captured at the time of image recognition processing and the imaging conditions of the image used at the time of learning. That is, the image 111 acquired by the image acquisition unit 310 is geometrically converted to the image 112 based on the imaging condition of the imaging device 110 acquired by the imaging condition acquisition unit 320 and the imaging condition of the imaging device 210 at the time of learning.

学習モデル保持部３４０は、学習時の撮像装置２１０で撮影した画像を用いて学習した学習モデルとその学習モデルの撮影条件を保持する。なお、学習モデルの撮影条件とは、学習に用いられた代表的な画像の撮影条件を指す。代表的な画像は、学習に用いられた画像の平均的な画像を指す。本実施形態において学習モデルとは、学習画像から機械学習による認識結果を出力するニューラルネットワークに基づくネットワーク構造とそのパラメータである。なおネットワークの学習は具体的に、学習モデルの入力側の層に画像を設定し、出力画像の層に画像に対する正解値を設定し、ネットワークを経由して算出される出力が設定した正解値に近づくようにネットワークのパラメータを調整する処理を指す。本実施形態では、２次元の画像を学習モデルへの入力とし、学習モデルからは、この２次元の画像の個々の位置に対応させて距離情報が配列されている距離画像が出力されるものとする。この距離情報は、撮像装置からの奥行き情報で、撮像装置を原点としたカメラ座標における周囲の物体までの距離を表す。なお、２次元の画像から距離画像を推定する技術は公知であるので詳細な説明は省略する。非特許文献１では、２次元画像とその正解値としての距離画像とを用意して学習を行う。正解値としての距離画像は、別途ＬｉＤＥＲ、ＴｏＦ等の距離センサで周囲を計測した計測情報を用意して取得しておくものとする。なお、学習モデルとは、ニューラルネットワークに基づくモデルのみならず、ランダムフォレストやサポートベクターマシンなど他の機械学習の手法に基づく学習モデルでも良い。また、画像に映っている所定の物体を認識するようにしても良い。その場合は、たとえば、画像から人物、移動体、車両、障害物といった物体のクラス・種類を認識するように、学習用の画像とその正解値（物体の形状、クラス、種類）を与えて学習を行う。 The learning model holding unit 340 holds a learning model learned using an image captured by the imaging device 210 at the time of learning and shooting conditions of the learning model. The imaging condition of the learning model refers to the imaging condition of a typical image used for learning. A representative image refers to the average image of the images used for learning. In the present embodiment, the learning model is a network structure based on a neural network that outputs a recognition result by machine learning from a learning image and its parameters. In the network learning, specifically, an image is set in the layer on the input side of the learning model, a correct value for the image is set in the layer of the output image, and an output calculated via the network is set as the correct value. It refers to the process of adjusting network parameters to get closer. In this embodiment, a two-dimensional image is used as an input to the learning model, and the learning model outputs a distance image in which the distance information is arranged corresponding to each position of the two-dimensional image. Do. This distance information is depth information from the imaging device, and represents the distance to a surrounding object at camera coordinates with the imaging device as the origin. In addition, since the technique which estimates a distance image from a two-dimensional image is known, detailed description is abbreviate | omitted. In Non-Patent Document 1, learning is performed by preparing a two-dimensional image and a distance image as its correct value. The distance image as the correct value is prepared by separately preparing measurement information obtained by measuring the surroundings with a distance sensor such as LiDER or ToF. The learning model is not limited to a model based on a neural network, but may be a learning model based on another machine learning method such as a random forest or a support vector machine. In addition, a predetermined object appearing in an image may be recognized. In that case, for example, learning image and its correct value (object shape, class, type) are given and learning so that the class and type of an object such as a person, a moving object, a vehicle and an obstacle are recognized from the image. I do.

学習モデル取得部３５０は、学習モデル保持部３４０で保持する学習モデルと学習モデルの撮影条件を取得する。または、情報処理装置３００の外部から学習モデルと学習モデルの撮影条件を取得する。具体的には、学習モデル保持部３４０が保持するニューラルネットワークとそのパラメータのデータを読み出す。 The learning model acquisition unit 350 acquires the learning model held by the learning model holding unit 340 and the imaging condition of the learning model. Alternatively, the learning model and the imaging condition of the learning model are acquired from the outside of the information processing apparatus 300. Specifically, the neural network held by the learning model holding unit 340 and the data of its parameters are read out.

画像認識部３６０では、画像変換部３３０で変換した画像を入力として、撮像装置１１０で撮影された画像の画素または領域に対応する距離情報を出力結果として出力する。本実施形態では、入力画像から距離を推定し、距離情報を出力する。ここで認識処理は、本実施形態で述べたように距離画像の他、３次元点群データを推定しても良い。また、画像に映っている所定の物体を認識するようにしても良い。例えば、シーンに存在する人や物体の位置や種類を認識しても良い。車載の撮像装置を想定する場合、障害物、標識認識、走行領域、車両の位置や姿勢を認識しても良い。距離画像を推定する例と同様に、学習用の画像とその正解値（物体の形状、名前）を与えて学習を行うことで、学習モデルは生成できる。またいずれの方法であっても、画像変換部３３０の操作により、学習に用いた画像の撮影条件に合わせて画像を変換することで、認識の精度を向上させることができる。 The image recognition unit 360 receives the image converted by the image conversion unit 330 and outputs distance information corresponding to the pixels or the area of the image captured by the imaging device 110 as an output result. In the present embodiment, the distance is estimated from the input image, and the distance information is output. Here, the recognition processing may estimate three-dimensional point group data other than the distance image as described in the present embodiment. In addition, a predetermined object appearing in an image may be recognized. For example, the position or type of a person or an object present in a scene may be recognized. When an on-vehicle imaging device is assumed, obstacles, marker recognition, a traveling region, and the position and posture of a vehicle may be recognized. As in the example of estimating a distance image, a learning model can be generated by performing learning with an image for learning and its correct value (shape of object, name). In either method, the accuracy of recognition can be improved by converting the image according to the shooting conditions of the image used for learning by the operation of the image conversion unit 330.

次に、本実施形態の処理手順について説明する。図４は、情報処理装置で行われる処理手順を示すフローチャートである。以下、フローチャートは、ＣＰＵが制御プログラムを実行することにより実現されるものとする。以下の説明では、各工程（ステップ）について先頭にＳを付けて表記することで、工程（ステップ）の表記を省略する。
Next, the processing procedure of this embodiment will be described. FIG. 4 is a flowchart showing a processing procedure performed by the information processing apparatus. Hereinafter, the flowchart is realized by the CPU executing the control program. In the following description, S is added to the beginning of each process (step) and described, and the description of the process (step) is omitted.

まず、Ｓ４１０において、学習モデル取得部３５０は学習モデル保持部３４０から学習モデルと学習モデルの撮影条件を取得する。 First, in S410, the learning model acquisition unit 350 acquires the learning model and the imaging conditions of the learning model from the learning model holding unit 340.

Ｓ４２０において、撮像装置１１０で撮影された画像を、画像取得部３１０が取得する。 In S420, the image acquisition unit 310 acquires an image captured by the imaging device 110.

Ｓ４３０において、撮影条件取得部３２０は、撮像装置１１０で撮影された画像の撮影条件と、学習モデルにおいて学習に用いられた画像の撮影条件を取得する。すなわち、撮像装置１１０の位置姿勢と、学習時の撮像装置２１０の位置姿勢を取得する。ここで、２つの撮像装置の相対的な位置姿勢の差は、撮像装置１１０を原点として、位置を３次元ベクトルｔ、姿勢を３行３列の回転行列Ｒとする。 In S430, the imaging condition acquisition unit 320 acquires imaging conditions of the image captured by the imaging device 110 and imaging conditions of the image used for learning in the learning model. That is, the position and orientation of the imaging device 110 and the position and orientation of the imaging device 210 at the time of learning are acquired. Here, the difference between the relative positions and orientations of the two imaging devices is a three-dimensional vector t at a position and a rotation matrix R of three rows and three columns, with the imaging device 110 as an origin.

Ｓ４４０において、画像変換部３３０は、撮影条件取得部３２０で取得した撮影条件を撮影条件に近づけるように撮像装置１１０で撮影された画像を変換する。すなわち、撮像装置撮影条件と、学習時の撮像装置２１０の撮影条件に基づいて、画像取得部３１０で取得した画像を幾何変換する。本実施形態で説明する幾何変換とは、具体的にはホモグラフィ変換（射影変換）である。ただし、位置や回転など、一部の成分のみを変形させても良いし、アフィン変換でも良い。例えば、パノラマ撮影した画像の一部を切り取り、その一部を拡大縮小といった変換を加えても良い。 In step S440, the image conversion unit 330 converts the image captured by the imaging apparatus 110 so that the imaging condition acquired by the imaging condition acquisition unit 320 approaches the imaging condition. That is, the image acquired by the image acquiring unit 310 is geometrically converted based on the imaging device imaging condition and the imaging condition of the imaging device 210 at the time of learning. Specifically, the geometric transformation described in the present embodiment is a homographic transformation (projective transformation). However, only some components such as position and rotation may be deformed, or affine transformation may be performed. For example, a part of the panoramic image may be cut, and a part of the image may be subjected to conversion such as scaling.

まず、撮像装置１１０と学習時の撮像装置２１０の相対的な位置姿勢からホモグラフィ行列Ｈを計算する。撮像装置１１０と学習時の撮像装置２１０の内部パラメータ行列をそれぞれＫ１、Ｋ２とするとホモグラフィ行列は、次の式（１）で計算できる。 First, the homography matrix H is calculated from the relative position and orientation of the imaging device 110 and the imaging device 210 at the time of learning. Assuming that the internal parameter matrix of the imaging device 110 and the imaging device 210 at the time of learning is K1 and K2, respectively, the homography matrix can be calculated by the following equation (1).

ここで、ｄは撮像装置１１０を基準として仮定する平面の距離を表す数値であり、ｎは平面の法線ベクトルである。本実施形態では、ｄを概略的なシーンの距離値、ｎは撮像装置１１０に正対する方向として予め設定しておくものとする。なおＴは行列の転置行列を表す記号である。 Here, d is a numerical value representing the distance of the plane assumed with reference to the imaging device 110, and n is a normal vector of the plane. In the present embodiment, it is assumed that d is set in advance as a rough distance value of a scene and n is a direction facing the imaging device 110 in advance. T is a symbol representing a transposed matrix of a matrix.

最後に、画像取得部３１０で取得した画像にホモグラフィ行列Ｈを適応し、画像を変形させる。この操作により、画像の幾何的な撮影条件を、学習時の撮影条件に近づけることができる。 Finally, the homography matrix H is applied to the image acquired by the image acquisition unit 310 to deform the image. By this operation, it is possible to make the geometric imaging condition of the image close to the imaging condition at the time of learning.

Ｓ４５０において、画像認識部３６０は、画像変換部３３０で変換した画像を入力として、撮像装置１１０で撮影された画像の画素に対応する距離情報を出力する。ここでは、学習モデル取得部３５０で取得した学習モデルを利用する。本実施形態における認識処理は、機械学習に基づく方法で画像から何か情報を認識する方法であればどのような方法でも良い。画像からシーンに存在する人や物体の位置や種類を認識しても良いし、車両の位置姿勢や制御値を認識しても良い。 In S450, the image recognition unit 360 receives the image converted by the image conversion unit 330, and outputs distance information corresponding to the pixels of the image captured by the imaging device 110. Here, the learning model acquired by the learning model acquisition unit 350 is used. The recognition process in this embodiment may be any method as long as it is a method of recognizing some information from an image by a method based on machine learning. The position or type of a person or an object present in the scene may be recognized from the image, or the position and orientation of the vehicle or the control value may be recognized.

Ｓ４６０では、システム終了の指定があるまで、Ｓ４２０からの処理を繰り返し実行する。システム終了の条件を満たすときはＹＥＳに進みシステムを終了する。システム終了の条件を満たさない場合はＮＯに進みＳ４２０に戻る。システム終了の条件は、例えば、自動車の自動運転を考えた場合、ユーザの入力をシステム終了のトリガとしても良い。ここでは、所定の時間でシステム終了する。 In S460, the processing from S420 is repeatedly executed until there is a designation of system termination. If the system termination condition is satisfied, the process proceeds to YES to terminate the system. If the system termination condition is not satisfied, the process proceeds to NO and returns to S420. The condition of the system termination may be, for example, the user's input as a trigger for the system termination when considering automatic driving of a car. Here, the system ends at a predetermined time.

以上述べたように、本実施形態では、学習に用いる画像の撮影条件と、認識処理時の撮影条件の違いによって、両者の撮影画像に発生する幾何学的な“ずれ”に注目し、認識処理に用いる画像を幾何的に変形させ、学習に用いる画像の撮影条件に近づける。これにより、画像の見え方を合わせることができるため、認識の精度を向上させることができる。 As described above, in the present embodiment, recognition processing is performed by focusing on geometric “misalignment” that occurs in both photographed images due to differences between the photographing conditions of the image used for learning and the photographing conditions during recognition processing. Geometrically deform the image used for the image to bring it close to the imaging conditions of the image used for learning. This makes it possible to match the appearance of the image, thereby improving the recognition accuracy.

（第１の実施形態の変形例１）
第１の実施形態では、画像の撮影条件として撮像装置の位置姿勢を考え、撮影条件取得部３２０では予め用意されている設計情報やキャリブレーション結果に基づいて撮像装置の位置と姿勢を取得する。しかし、撮像装置を車両に配置する場合、移動中の撮像装置の位置と姿勢は、路面の凹凸によって時系列的に連続的に上下に変動することがある。具体的には、撮像装置の位置（地面からの高さ）や、姿勢（水平面を基準とした角度）のパラメータが時間によって動的に変動する。この動的な変動に対応するため、撮影条件取得部３２０は、位置姿勢推定部５１０が時間ｓにおける撮像装置１１０の位置姿勢を推定した結果から撮影条件を随時更新する。もしくは別途車両に取り付けられたセンサによって車両の周囲を計測した計測情報から、撮影条件を随時更新するようにしても良い。これにより一定の時間間隔で撮影条件を更新し、時間によって変化する撮影条件の変化を画像変換に反映することができるため、認識の精度を向上させることができる。ここでは第１の実施形態との差分である、位置姿勢推定部５１０と撮影条件取得部３２０の処理について説明する。 (Modification 1 of the first embodiment)
In the first embodiment, the position and orientation of the imaging device are considered as the imaging condition of the image, and the imaging condition acquisition unit 320 acquires the position and orientation of the imaging device based on design information and calibration results prepared in advance. However, when the imaging device is disposed in a vehicle, the position and posture of the imaging device in motion may be continuously fluctuated up and down in time series due to the unevenness of the road surface. Specifically, parameters of the position (height from the ground) of the imaging device and the posture (angle with respect to the horizontal plane) dynamically change with time. In order to cope with this dynamic change, the imaging condition acquisition unit 320 updates the imaging condition as needed from the result of the position and orientation estimation unit 510 estimating the position and orientation of the imaging device 110 at time s. Alternatively, the imaging conditions may be updated as needed from measurement information obtained by measuring the surroundings of the vehicle by a sensor separately attached to the vehicle. As a result, the imaging conditions can be updated at constant time intervals, and changes in imaging conditions that change with time can be reflected in image conversion, so that the accuracy of recognition can be improved. Here, processing of the position and orientation estimation unit 510 and the imaging condition acquisition unit 320, which are differences from the first embodiment, will be described.

位置姿勢推定部５１０は、所定の時間における画像の撮影条件（例えば撮像装置の位置）を推定する。位置姿勢推定部５１０が撮像装置１１０の位置や姿勢を推定する方法として、公知のＳＬＡＭ（ＳＩＭＵＬＴＡＮＥＯＵＳＬＯＣＡＬＩＺＡＴＩＯＮＡＮＤＭＡＰＰＩＮＧ）技術を用いる。ここで、ＳＬＡＭとは、カメラで撮影する周辺の環境を認識すると同時に、自分自身の位置姿勢を精度よく推定する技術である。具体的な方法は前述の非特許文献１に開示されている。なお、ＳＬＡＭでの車両の位置姿勢推定は、画像取得部１１０から取得した画像に基づいて行われても良いし、距離センサであるセンサ１０８の計測情報に基づいて行われても良い。距離センサは例えばＬｉＤＥＲ、ＴｏＦ等のアクティブ距離センサ、赤外線センサやステレオカメラでも良い。また、センサ１０８はＧＰＳやジャイロセンサ等の位置センサや姿勢センサでも良い。上記のように撮像装置１１０やセンサ１０８によって計測された距離情報をＳＬＡＭに入力する方法が第１の推定方法である。また第２の推定方法として、位置姿勢推定部５１０は、画像認識部３６０で出力された距離画像を入力に用いて撮像装置の位置や姿勢を推定しても良い。 The position and orientation estimation unit 510 estimates imaging conditions (for example, the position of the imaging apparatus) of an image at a predetermined time. As a method by which the position and orientation estimation unit 510 estimates the position and orientation of the imaging device 110, a known SLAM (SIMULTANEOUS LOCALIZATION AND MAPPING) technique is used. Here, SLAM is a technology that accurately estimates its own position and orientation while recognizing the surrounding environment to be photographed by a camera. A specific method is disclosed in the aforementioned Non-Patent Document 1. Note that position and orientation estimation of the vehicle in SLAM may be performed based on the image acquired from the image acquisition unit 110 or may be performed based on measurement information of the sensor 108 which is a distance sensor. The distance sensor may be, for example, an active distance sensor such as LiDER or ToF, an infrared sensor, or a stereo camera. Further, the sensor 108 may be a position sensor such as a GPS or a gyro sensor or an attitude sensor. The first estimation method is a method in which the distance information measured by the imaging device 110 or the sensor 108 as described above is input to the SLAM. As a second estimation method, the position and orientation estimation unit 510 may estimate the position and orientation of the imaging apparatus using the distance image output from the image recognition unit 360 as an input.

撮影条件取得部３２０は、位置姿勢推定部５１０で撮像装置１１０の位置や姿勢を推定した結果から撮影条件を更新する。また、センサ１０８が取得した位置姿勢に関する計測情報を撮像条件取得部３２０が直接取得し、更新するようにしても良い。その他、センサが計測可能な撮影条件（例えば周囲の明るさ）を更新するようにしても良い。 The imaging condition acquisition unit 320 updates the imaging conditions from the result of the position and orientation estimation unit 510 estimating the position and orientation of the imaging device 110. Alternatively, the imaging condition acquisition unit 320 may directly acquire and update measurement information on the position and orientation acquired by the sensor 108. In addition, the photographing conditions (for example, the brightness of the surroundings) that can be measured by the sensor may be updated.

この機能構成を有する情報処理装置では、例えば、学習用画像に揺れ等の動的な変化がない撮影条件である場合、画像変換部３３０は運転中の揺れを画像から除去するような画像変換を行う。このように、撮像装置の動的な変化に対応して、認識時の画像を変換することで、車両の揺れなどに対して安定した精度で画像の認識を行うことができる。 In the information processing apparatus having this functional configuration, for example, the image conversion unit 330 performs image conversion such that vibration during driving is removed from the image when there is a shooting condition without dynamic change such as vibration in the learning image. Do. As described above, by converting the image at the time of recognition in response to the dynamic change of the imaging device, it is possible to perform the recognition of the image with stable accuracy against the shaking of the vehicle and the like.

また、学習用画像の撮影に関しても、同様の操作を行うことで揺れを画像から除去しても良い。この操作により、学習用画像は一定の撮影条件における画像として設定できる。 Also, with regard to photographing of a learning image, the shaking may be removed from the image by performing the same operation. By this operation, the learning image can be set as an image under a constant imaging condition.

（第１の実施形態の変形例２）
第１の実施形態では、学習モデルが１つである場合を説明した。ここでは、複数の学習モデルを保持し、各学習モデルにおいて学習に用いた画像の撮影条件と認識処理に用いる画像の撮影条件が最も一致する学習モデルを選択することで、画像の劣化を抑え認識の性能を向上させる方法について説明する。 (Modification 2 of the first embodiment)
In the first embodiment, the case of one learning model has been described. Here, image degradation is suppressed and recognition is performed by holding a plurality of learning models and selecting a learning model that most closely matches the image shooting conditions used for learning in each learning model and the image shooting conditions used for recognition processing. The method of improving the performance of

学習時の画像と認識時の画像のそれぞれの撮影条件の差が大きいと変換後の画像の劣化が大きくなることがある。例えば、画像を幾何的に変換する場合、画像の不自然な歪みが発生することや、画像の位置をずらすことで認識すべき領域が枠の外にはみ出てしまう可能性がある。また、第２の実施形態説明するような画像の輝度を変換する場合、ノイズを増幅する場合がある。以上の理由から、認識処理に用いる画像の撮影条件と、学習に用いる画像の撮影条件の差はできるだけ小さい方が望ましい。なお、学習モデルは特定の撮影条件が異なるように複数作成しておく。例えば、学習に用いた画像の輝度が一定で、撮像装置の姿勢を変えて撮影した画像によって学習した学習モデルを用意する。 If the difference between the respective shooting conditions of the image at the time of learning and the image at the time of recognition is large, degradation of the image after conversion may be large. For example, when an image is geometrically transformed, unnatural distortion of the image may occur, or an area to be recognized may be out of the frame by shifting the position of the image. When converting the luminance of an image as described in the second embodiment, noise may be amplified. From the above reasons, it is desirable that the difference between the imaging condition of an image used for recognition processing and the imaging condition of an image used for learning be as small as possible. A plurality of learning models are created so that specific imaging conditions are different. For example, a learning model is prepared in which the brightness of the image used for learning is constant, and the posture of the imaging device is changed and learned using the image.

そこで、本実施形態では、学習時に複数の撮影条件において画像を撮影し、それぞれの撮影条件で学習を行った複数の学習モデルを準備する。認識処理の時点では、撮影した画像に、最も近い撮影条件の学習モデルを選択して取得することで、撮影条件の差を小さく抑える。これにより、認識の性能を向上させる。ここでは第１の実施形態との差分である、学習モデル保持部３４０と学習モデル取得部３５０の処理について説明する。 So, in this embodiment, an image is image | photographed on several imaging conditions at the time of learning, and the several learning model which learned on each imaging condition is prepared. At the time of recognition processing, the difference between the imaging conditions is reduced by selecting and acquiring the learning model of the imaging conditions closest to the captured image. This improves the recognition performance. Here, processing of the learning model holding unit 340 and the learning model acquisition unit 350, which are differences from the first embodiment, will be described.

学習モデル保持部３４０は、複数の撮影条件で撮影した画像に基づく複数の学習モデルとその撮影条件を保持する。第１の実施形態では１つの撮影条件において１つの学習モデルを構築した。これに対し、本変形例では、撮影条件を変えて複数の学習モデルを構築し、保持する。なお、本実施形態では、撮影条件は第１の実施形態と同じく撮像装置の位置姿勢とする。ただし、輝度など他の撮影条件を用いても良い。また、学習モデル保持部３４０は情報処理装置３００の外部にあっても良い。その場合、情報処理装置３００は外部と通信して複数の学習モデルを取得する。 The learning model holding unit 340 holds a plurality of learning models based on the images captured under a plurality of shooting conditions and the shooting conditions. In the first embodiment, one learning model was constructed under one imaging condition. On the other hand, in this modification, the imaging conditions are changed to construct and hold a plurality of learning models. In the present embodiment, the photographing condition is the position and orientation of the imaging apparatus as in the first embodiment. However, other imaging conditions such as luminance may be used. In addition, the learning model holding unit 340 may be located outside the information processing device 300. In that case, the information processing apparatus 300 communicates with the outside to acquire a plurality of learning models.

機能構成とハードウエア構成は第１の実施形態同様である。ここでは第１の実施形態と手順が異なる部分について図４のフローチャートに基づいて説明する。 The functional configuration and the hardware configuration are the same as in the first embodiment. Here, parts different from the first embodiment in the procedure will be described based on the flowchart of FIG.

学習モデル取得部３５０は、図４におけるＳ４１０において、撮影条件取得部３２０で取得した認識に用いる撮像装置１１０の撮影条件に、最も一致する撮影条件に基づいて構築した学習モデルの撮影条件を選択し、学習モデル保持部３４０から取得する。学習モデル取得部３５０では、複数の学習モデルを取得し、画像変換部３３０で、撮影条件取得部３２０で取得した認識に用いる撮像装置の撮影条件に、最も一致する撮影条件を選択しても良い。本実施形態において、撮影条件の一致度は撮像装置の位置姿勢の差とする。例えば、学習時と認識時での撮像装置の位置姿勢を示すパラメータの差が最小となるような学習モデルを選択する。具体的には、撮像装置の高さの差や、姿勢の差を見て一致度を計算しても良い。画像の輝度を撮影条件とする場合は、画像全体または特徴部分の輝度が最も近い学習モデルを選択する。或いは屋外や室内といった撮影環境や、朝昼夜といった時間等の外的要因が最もあてはまる学習モデルを選択するようにしても良い。なお、Ｓ４２０以降は第１の実施形態と同様の手順で行う。 The learning model acquisition unit 350 selects the imaging condition of the learning model constructed based on the imaging condition that most closely matches the imaging condition of the imaging device 110 used for recognition acquired by the imaging condition acquisition unit 320 in S410 in FIG. , Acquired from the learning model holding unit 340. The learning model acquisition unit 350 may acquire a plurality of learning models, and the image conversion unit 330 may select the imaging condition that most closely matches the imaging condition of the imaging device used for recognition acquired by the imaging condition acquisition unit 320. . In the present embodiment, the degree of coincidence of the imaging conditions is the difference between the position and orientation of the imaging device. For example, a learning model is selected such that the difference between the parameters indicating the position and orientation of the imaging device at the time of learning and at the time of recognition is minimized. Specifically, the degree of coincidence may be calculated by looking at the difference in height of the imaging apparatus and the difference in posture. When the luminance of an image is used as a shooting condition, a learning model in which the luminance of the entire image or the feature portion is closest is selected. Alternatively, it is possible to select a learning model to which external factors such as shooting environment such as outdoors and indoors and time such as morning, day and night most apply. Note that S420 and subsequent steps are performed in the same procedure as the first embodiment.

以上述べたように、複数の学習モデルを保持し、認識処理に用いる画像の撮影条件に最も一致する学習モデルを選択することで、画像変換における画像の劣化を抑えることができ、その結果画像認識の性能を向上させることができる。 As described above, by holding a plurality of learning models and selecting a learning model that most closely matches the shooting conditions of the image used for recognition processing, it is possible to suppress image deterioration in image conversion, and as a result, image recognition Performance can be improved.

（第２の実施形態）
本実施形態では、学習に用いる画像の撮影条件と、学習結果を使った認識処理に用いる画像の撮影条件の明るさの違いに注目する。 Second Embodiment
In the present embodiment, attention is paid to the difference between the imaging condition of an image used for learning and the brightness of the imaging condition of an image used for recognition processing using a learning result.

まず、画像の明るさの撮影条件の違いは、撮像装置の周辺の明るさや撮像装置固有のセンサの感度などの違いから発生する。また、屋外で撮像する場合は画像を撮影した時間帯や天候によっても、画像の明るさに違いが発生する。例えば、第１の実施形態のような車両に撮像装置を設置する場合、同じ場所でも日中と夜間で露光が変化することもあり、走行中はトンネルや駐車場で明るさが異なることもある。また、屋内で撮像する場合でも、学習時とまったく同じ照明条件で撮像することは難しい。例えば、工場や倉庫内を移動する作業用ロボット等である移動体は明るさの異なる空間を行き来するため、作業場所によって周辺の明るさが異なる。移動体の大きさによっては棚や机の下を通過ることもあるため、同じ空間内でも照明条件が異なる。このような違いを補正するため、本実施形態では、学習に用いる車両に搭載した撮像装置の画像の明るさと、学習結果を使った認識処理に用いる車両に搭載した撮像装置の画像の明るさの違いに基づいて、認識処理に用いる画像の輝度を変化させる。そして、学習に用いる画像の撮影条件に近づける。これにより、認識の精度を向上させる。 First, the difference in the shooting condition of the brightness of the image arises from the difference in the brightness around the image pickup apparatus, the sensitivity of the sensor unique to the image pickup apparatus, and the like. In the case of imaging outdoors, the brightness of the image also differs depending on the time zone in which the image was taken and the weather. For example, when the imaging device is installed in a vehicle as in the first embodiment, the exposure may change in the daytime and in the night even at the same place, and the brightness may differ in a tunnel or a parking lot while traveling. . Moreover, even when imaging indoors, it is difficult to image under the same illumination conditions as at the time of learning. For example, a mobile object such as a working robot moving in a factory or a warehouse moves in and out of a space with different brightness, so that the brightness of the surroundings varies depending on the work place. Depending on the size of the moving object, it may pass under the shelf or desk, so the lighting conditions are different even in the same space. In order to correct such a difference, in the present embodiment, the brightness of the image of the image pickup device mounted on the vehicle used for learning and the brightness of the image of the image pickup device mounted on the vehicle used for recognition processing using the learning result Based on the difference, the brightness of the image used for the recognition process is changed. Then, it approaches the shooting conditions of the image used for learning. This improves the accuracy of recognition.

まず、本実施形態におけるハードウエア構成は、第１の実施形態と同様である。次に、図３を用いて本実施形態の情報処理装置の機能構成について説明する。本実施形態のモジュール構成は、第１の実施形態のモジュール構成と同じである。ただし、撮影条件取得部３２０と画像変換部３３０の処理は、撮影条件を画像の明るさとして、処理の内容は異なる。以下、第１の実施形態との差分を説明する。 First, the hardware configuration in the present embodiment is the same as that in the first embodiment. Next, the functional configuration of the information processing apparatus of the present embodiment will be described using FIG. The module configuration of the present embodiment is the same as the module configuration of the first embodiment. However, the processing of the photographing condition acquisition unit 320 and the processing of the image conversion unit 330 differ in the processing content, with the photographing condition as the brightness of the image. Hereinafter, differences from the first embodiment will be described.

撮影条件取得部３２０は、画像認識に用いる画像の撮影条件と学習に用いられた画像の撮影条件を取得する。すなわち、撮影条件として、撮像装置１１０の明るさに関する情報と、学習に用いた異なる車両に取り付けられた図１の学習時の撮像装置２１０の明るさに関する情報とを取得する。なお、本実施形態において撮影条件である明るさに関する情報とは、車両に搭載する撮像装置が撮影した画像の明るさの度合いとする。さらに、明るさの度合いとは、画像の輝度値の平均値とする。また、学習に用いた撮像画像が撮影した画像の輝度値の平均値は予め計算しておくものとする。また、画像から得る輝度情報だけではなく、外部情報を明るさ度合に関わる変数として取り入れても良い。具体例としては、時刻、天候、照明の強さ、照明の数、照明の色といった情報を量的に扱う。また、明るさに関する情報としてに照明センサ等によって周囲の明るさを計測した計測情報を使っても良い。 The imaging condition acquisition unit 320 acquires imaging conditions of an image used for image recognition and imaging conditions of an image used for learning. That is, information on the brightness of the imaging device 110 and information on the brightness of the imaging device 210 at the time of learning of FIG. 1 attached to a different vehicle used for learning are acquired as shooting conditions. In the present embodiment, the information on brightness, which is a shooting condition, is a degree of brightness of an image captured by an imaging device mounted on a vehicle. Furthermore, the degree of brightness is an average value of luminance values of the image. In addition, it is assumed that the average value of the luminance values of the image captured by the captured image used for learning is calculated in advance. Not only luminance information obtained from an image but also external information may be taken as a variable related to the degree of brightness. As a specific example, it deals quantitatively with information such as time, weather, intensity of illumination, number of illumination, color of illumination. Further, measurement information obtained by measuring the ambient brightness with a light sensor or the like may be used as the information on brightness.

画像変換部３３０は、画像認識時に撮影した画像の撮影条件と、学習時に用いた画像の撮影条件に基づいて画像変換を行う。すなわち、撮影条件取得部３２０で取得した撮像装置１１０の撮影条件と、学習時の撮像装置２１０の撮影条件に基づいて、画像取得部３１０で取得した画像に輝度値の変換を加える。本実施形態では、撮影条件を画像の明るさの度合いとし、その比に基づいて画像の輝度値の変換を行う。なお、本実施形態における撮影条件取得部２２０と画像変換部３３０は、画像の全体としての明るさの度合いを撮影条件として取得し、その比率に基づいて画像の輝度値を変換する。ここで、撮影条件取得部２２０で取得する明るさの度合いは、画像の全体に注目しても良いし、画像をいくつかに分割しその各部について明るさを計量しても良い。同じく画像変換部３３０における輝度値の変換も、画像の各部の明るさの度合いに応じて輝度値を変換しても良い。また、変換は輝度変換だけするのではなく、第１の実施形態で行った幾何変換や他の変換とともに実施しても良い。 The image conversion unit 330 performs image conversion based on the imaging conditions of the image captured at the time of image recognition and the imaging conditions of the image used at the time of learning. That is, based on the imaging condition of the imaging device 110 acquired by the imaging condition acquisition unit 320 and the imaging condition of the imaging device 210 at the time of learning, conversion of luminance value is added to the image acquired by the image acquisition unit 310. In the present embodiment, the photographing condition is a degree of brightness of the image, and the conversion of the luminance value of the image is performed based on the ratio. Note that the imaging condition acquisition unit 220 and the image conversion unit 330 in the present embodiment acquire the degree of brightness of the entire image as the imaging condition, and convert the luminance value of the image based on the ratio. Here, the degree of brightness acquired by the shooting condition acquiring unit 220 may focus on the entire image or may divide the image into several parts and measure the brightness of each part. Similarly, in the conversion of the luminance value in the image conversion unit 330, the luminance value may be converted according to the degree of brightness of each part of the image. Further, the conversion may be performed together with the geometric conversion and other conversions performed in the first embodiment, not only the luminance conversion.

次に、本実施形態の処理手順について説明する。処理の順序は、第１の実施形態と同じであるが、Ｓ４３０、Ｓ４４０における撮影条件取得部３２０と画像変換部３３０の処理内容が異なる。ここでは、Ｓ４３０とＳ４４０における処理の内容を説明する。 Next, the processing procedure of this embodiment will be described. Although the order of the processing is the same as that of the first embodiment, the processing contents of the imaging condition acquisition unit 320 and the image conversion unit 330 in S430 and S440 are different. Here, the contents of the processing in S430 and S440 will be described.

Ｓ４３０では、撮影条件取得部３２０が、撮像装置１１０で撮影された画像の撮影条件と、学習モデルにおいて学習に用いられた画像の撮影条件を取得する。すなわち、学習に用いた画像の撮影を行う異なる車両に取り付けられた図１の学習時の撮像装置２１０で撮影した画像の明るさの度合いＶ１と、撮像装置１１０で撮影した画像の明るさの度合いＶ２を取得する。まず、撮像装置１１０で撮影した画像についての明るさの度合いＶ２は、画像取得部３１０で取得した画像の輝度値の平均値を得ることで算出する。次に、学習時の明るさの度合いＶ１は、予め計算しておいた輝度値の平均値を取得する。また、明るさの度合いの比率をＳ＝Ｖ１／Ｖ２と定義する。 In S430, the imaging condition acquisition unit 320 acquires imaging conditions of the image captured by the imaging device 110 and imaging conditions of the image used for learning in the learning model. That is, the degree of brightness V1 of the image taken by the imaging device 210 at the time of learning in FIG. 1 attached to different vehicles that take pictures of the image used for learning, and the degree of brightness of the image taken by the imaging device 110 Get V2. First, the degree of brightness V2 of the image captured by the imaging device 110 is calculated by obtaining the average value of the luminance values of the image acquired by the image acquisition unit 310. Next, as the degree of brightness V1 at the time of learning, an average value of luminance values calculated in advance is acquired. Further, the ratio of the degree of brightness is defined as S = V1 / V2.

これ以外にも、明るさの計量方法として、例えば季節や時刻に基づいて算出しても良い。環境の明るさは時刻と相関があるため、例えば昼はＳ＝１、夕方はＳ＝２、夜はＳ＝１０の様に、想定される環境の明るさに応じて明るさの度合いの比率を設定しても良い。また、センサ１０８で計測した明るさ情報を用いても良い。 In addition to this, the brightness may be calculated based on, for example, the season or time. Since the brightness of the environment is correlated with the time, for example, S = 1 for day, S = 2 for evening, S = 10 for night, etc., the ratio of the degree of brightness according to the expected brightness of the environment You may set it. Alternatively, brightness information measured by the sensor 108 may be used.

Ｓ４４０では、画像変換部３３０が、撮影条件取得部３２０で取得した撮像装置１１０の画像の明るさの度合いと、学習時の撮像装置２１０の画像の明るさの度合いに基づいて、画像取得部２１０で取得した画像に輝度変換を加える。 In S440, the image conversion unit 330 uses the image acquisition unit 210 based on the degree of brightness of the image of the imaging device 110 acquired by the imaging condition acquisition unit 320 and the degree of brightness of the image of the imaging device 210 at the time of learning. Add luminance conversion to the image acquired in.

ここでは、２つの撮影条件の明るさの度合いの比率Ｓに基づいて、画像取得部２１０で取得した画像の各画素の輝度値を変換する。具体的には、各画素の輝度値にＳを掛ける。この操作により、画像の明るさについての撮影条件を、学習時の画像の撮影条件に近づけることができる。なお、複数の撮影条件を補正するように画像を変換しても良い。例えば位置姿勢についての撮影条件の変換を行った後、さらに明るさについての撮影条件の変換を行っても良い。 Here, the luminance value of each pixel of the image acquired by the image acquisition unit 210 is converted based on the ratio S of the degree of brightness of the two imaging conditions. Specifically, the luminance value of each pixel is multiplied by S. By this operation, it is possible to make the photographing condition about the brightness of the image close to the photographing condition of the image at the time of learning. The image may be converted to correct a plurality of shooting conditions. For example, after conversion of shooting conditions regarding position and orientation, conversion of shooting conditions regarding brightness may be performed.

以上述べたように、本実施形態では、学習に用いる画像の撮影条件と、学習結果を使った認識処理を行う画像の撮影条件の明るさの違いに注目し、認識処理に用いる画像の輝度値を変化させ、学習に用いる画像の撮影条件に近づける。これにより、画像の見え方を合わせることができるため、認識の精度を向上させることができる。 As described above, in the present embodiment, the brightness value of the image used for the recognition process is focused on the difference between the image shooting condition of the image used for learning and the brightness of the image capturing condition for performing the recognition process using the learning result. To approximate the shooting conditions of the image used for learning. This makes it possible to match the appearance of the image, thereby improving the recognition accuracy.

（第３の実施形態）
本実施形態では、第１及び第２の実施形態における認識結果を、車両の自動運転に利用する例について説明する。なお、ここでいう自動運転とは人による（運転制御）操作を基本的に必要とせず、情報処理装置やシステムが車両や移動体等の移動装置の運動を制御する技術を指す。画像認識の精度が向上すると、自動運転における実測スケールでの距離計測や標識認識等に適用できる。ここでは具体例として、撮影した画像の認識結果に基づいて、現在の車両の位置姿勢を特定することで車両の運転制御を行う。なお、車両以外にも移動装置は、ドローン、作業用ロボット、運搬用ロボットといった移動ロボットの自動運転制御にも本実施形態を適用できる。陸上を歩行・走行する車両やロボットの他、空中を飛行して移動するものや、水上を移動するもの、水中を潜水して移動する移動体に適用しても良い。 Third Embodiment
In the present embodiment, an example in which the recognition results in the first and second embodiments are used for automatic driving of a vehicle will be described. Here, the term “automatic driving” as used herein refers to a technology in which an information processing apparatus or system controls the movement of a moving apparatus such as a vehicle or a moving object without basically requiring a human (driving control) operation. When the accuracy of image recognition is improved, it can be applied to distance measurement and marker recognition on a measured scale in automatic driving. Here, as a specific example, the driving control of the vehicle is performed by specifying the current position and orientation of the vehicle based on the recognition result of the captured image. In addition to the vehicle, the moving apparatus can also apply the present embodiment to automatic operation control of a mobile robot such as a drone, a work robot, or a transport robot. The present invention may be applied to a vehicle or robot walking on or traveling on land, a device moving in the air, a device moving on the water, a moving object diving in the water and moving.

まず、本実施形態のモジュール構成について説明する。本実施形態のモジュール構成は、第１及び第２の実施形態における図３のモジュール構成に加え、図６に示すように位置姿勢推定部６１０、運転制御部６２０、アクチュエータ部６３０を持つ。また、これらのモジュールは、図７に示すように車両１００に搭載されており、撮像装置１１０で撮影した画像に基づいて処理を行う。ここで、本実施形態では第１または第２の実施形態と同様に、画像認識部３６０は距離画像を出力結果として出力するものとする。なお、必要に応じて画像認識部３６０は障害物検知、標識認識や走行領域認識を出力結果として出力しても良い。 First, the module configuration of the present embodiment will be described. The module configuration of the present embodiment has a position and orientation estimation unit 610, an operation control unit 620, and an actuator unit 630 as shown in FIG. 6 in addition to the module configuration of FIG. 3 in the first and second embodiments. In addition, these modules are mounted on the vehicle 100 as shown in FIG. 7 and perform processing based on the image captured by the imaging device 110. Here, in the present embodiment, as in the first or second embodiment, the image recognition unit 360 outputs a distance image as an output result. Note that the image recognition unit 360 may output obstacle detection, sign recognition, and travel area recognition as an output result as necessary.

位置姿勢推定部６１０は、車両１００の目的地の座標である地図情報の座標系における車両１００（または撮像装置１１０）の位置や姿勢を推定する。本実施形態では、画像認識部３６０で認識した距離情報を地図情報における座標に変換した距離情報に基づいて前記移動装置の位置を推定する。画像や距離画像から車両または撮像装置の位置や姿勢を推定する方法としては、公知のＳＬＡＭが挙げられる。本実施形態では、画像認識部３６０が出力する距離画像に基づいて、車両１００の位置または姿勢を推定するものとする。ここで、車両１００と撮像装置１１０の相対的な位置と姿勢の関係は既知とすると、撮像装置１１０の位置姿勢から車両１００の位置姿勢を算出できる。すなわち、位置姿勢推定部６１０は車両１００の位置姿勢を推定することができる。なお、別途ＧＰＳやジャイロセンサ等の位置センサや姿勢センサを搭載しておき、その計測情報に基づいて世界座標における車両１００の位置姿勢を取得しても良い。また、撮像装置１１０で撮影した画像やセンサ１０８で計測した距離情報から車両１００の位置や姿勢を推定しても良い。なお、位置姿勢推定部６１０は情報処理装置３００に含まれても良いし、移動装置本体や車両の運転制御装置に組み込まれていても良い。なお、車両１００には予め地図情報を記憶部に保存しているか、通信部によって周辺の地図情報を取得できるようになっているものとする。 The position and orientation estimation unit 610 estimates the position and orientation of the vehicle 100 (or the imaging device 110) in the coordinate system of the map information that is the coordinates of the destination of the vehicle 100. In the present embodiment, the position of the moving device is estimated based on distance information obtained by converting the distance information recognized by the image recognition unit 360 into coordinates in the map information. As a method of estimating the position or posture of a vehicle or an imaging device from an image or a distance image, known SLAM can be mentioned. In the present embodiment, it is assumed that the position or posture of the vehicle 100 is estimated based on the distance image output by the image recognition unit 360. Here, assuming that the relative position-posture relationship between the vehicle 100 and the imaging device 110 is known, the position and orientation of the vehicle 100 can be calculated from the position and orientation of the imaging device 110. That is, the position and orientation estimation unit 610 can estimate the position and orientation of the vehicle 100. A position sensor or attitude sensor such as a GPS or a gyro sensor may be separately mounted, and the position / posture of the vehicle 100 in world coordinates may be acquired based on the measurement information. Alternatively, the position or posture of the vehicle 100 may be estimated from an image captured by the imaging device 110 or distance information measured by the sensor 108. The position and orientation estimation unit 610 may be included in the information processing device 300 or may be incorporated in the moving device body or the operation control device of a vehicle. It is assumed that the vehicle 100 stores map information in the storage unit in advance, or that the communication unit can acquire map information of the surroundings.

運転制御部６２０は、前記移動装置を前記目的地に移動させる制御値を算出する。すなわち、位置姿勢推定部６１０における車両１００の位置姿勢の推定結果と、画像認識部３６０の学習モデルを使った認識結果の少なくとも一方に基づいて、車両を移動させる方向、加速度といった制御値を算出する。或いは、人がアクセルを踏む操作やハンドルを回転させる操作に相当する制御のパラメータを算出する。ここでは、車両の位置姿勢から制御値を算出する具体的な方法として、地図上に設定した目的地まで車両を自動運転する例について説明する。アクチュエータ部６３０は、運転制御部６２０から出力する制御値に基づいて、車両の各機構（車輪のトルクや方向など）を動かし運転を行う。 The operation control unit 620 calculates a control value for moving the moving device to the destination. That is, based on at least one of the estimation result of the position and orientation of the vehicle 100 in the position and orientation estimation unit 610 and the recognition result using the learning model of the image recognition unit 360, a control value such as a direction to move the vehicle and acceleration is calculated. . Alternatively, a control parameter corresponding to an operation by which a person depresses the accelerator or an operation to rotate the steering wheel is calculated. Here, as a specific method of calculating a control value from the position and orientation of a vehicle, an example of automatically driving the vehicle to a destination set on a map will be described. The actuator unit 630 performs driving by moving each mechanism (for example, the torque or the direction of the wheel) of the vehicle based on the control value output from the operation control unit 620.

図８は、情報処理システムの処理手順を示すフローチャートである。以下、第１の実施形態との差を中心に説明する。初期設定として、現在の車両の位置の座標と目的地を示す地図情報をＧＰＳ等で取得する。あとに述べる位置姿勢推定部６１０の処理は、この座標を基準とした位置姿勢を推定するものとする。Ｓ８１０では、撮像装置１１０が車両１００の周辺を撮影する。Ｓ８２０では、情報処理装置３００の画像取得部３１０が、Ｓ８１０で撮影された処理対象の画像を入力されて、処理対象の画像を取得する。Ｓ８３０では、学習モデル取得部３５０が学習モデルと学習時の画像の撮影条件を取得する。学習モデル取得部３５０では、複数の学習モデルを取得し、画像変換部３３０で、撮影条件取得部３２０で取得した認識に用いる撮像装置の撮影条件に、最も一致する撮影条件を選択しても良い。このとき、Ｓ８４０で取得する撮影条件の情報に基づいて学習モデル保持部が保持する複数の学習モデルから適した学習モデルを取得するようにしても良い。Ｓ８４０では、撮像条件取得部３２０が、撮像装置１１０で撮影された画像の撮影条件と、学習モデルにおいて学習に用いられた画像の撮影条件を取得する。Ｓ８５０では、画像変換部３３０が、処理対象の画像の撮影条件を学習に用いた画像の撮影条件に近づけるように、撮像装置１１０で撮影された画像を変換する。Ｓ８６０では、画像認識部３６０は、画像変換部３３０で変換した画像を入力として、撮像装置１１０で撮影された画像の画素または領域に対応する距離情報を出力結果として出力する。この距離情報は、撮像装置からの奥行き情報で、撮像装置を原点としたカメラ座標における周囲の物体までの距離を表す。 FIG. 8 is a flowchart showing the processing procedure of the information processing system. Hereinafter, differences from the first embodiment will be mainly described. As initial setting, GPS information or the like is obtained by using GPS or the like to indicate coordinates of the current position of the vehicle and a destination. The processing of the position and orientation estimation unit 610, which will be described later, is assumed to estimate the position and orientation based on the coordinates. In S810, the imaging device 110 captures an image of the area around the vehicle 100. In S820, the image acquisition unit 310 of the information processing device 300 receives the processing target image captured in S810, and acquires the processing target image. In S830, the learning model acquisition unit 350 acquires the learning model and the imaging conditions of the image at the time of learning. The learning model acquisition unit 350 may acquire a plurality of learning models, and the image conversion unit 330 may select the imaging condition that most closely matches the imaging condition of the imaging device used for recognition acquired by the imaging condition acquisition unit 320. . At this time, a suitable learning model may be acquired from the plurality of learning models held by the learning model holding unit based on the information of the imaging conditions acquired in S840. In S840, the imaging condition acquisition unit 320 acquires imaging conditions of the image captured by the imaging device 110 and imaging conditions of the image used for learning in the learning model. In S850, the image conversion unit 330 converts the image captured by the imaging device 110 so that the imaging condition of the processing target image approaches the imaging condition of the image used for learning. In S860, the image recognition unit 360 receives, as an input, the image converted by the image conversion unit 330, and outputs distance information corresponding to the pixels or the area of the image captured by the imaging device 110 as an output result. This distance information is depth information from the imaging device, and represents the distance to a surrounding object at camera coordinates with the imaging device as the origin.

Ｓ８７０では、位置姿勢推定部６１０が、Ｓ８６０で得られた出力結果をもとに車両１００の自己位置推定を行う。Ｓ８５０の出力結果である距離情報をカメラ座標から地図座標もしくは世界座標に変換し、地図情報における車両１００の位置を推定する。自己位置推定の方法としては、ＳＬＡＭやそれ以外の方法を使っても良い。 In S870, position and orientation estimation unit 610 performs self-position estimation of vehicle 100 based on the output result obtained in S860. The distance information which is the output result of S850 is converted from camera coordinates to map coordinates or world coordinates, and the position of the vehicle 100 in the map information is estimated. As a method of self-position estimation, SLAM or another method may be used.

Ｓ８８０では、移動装置にある運転制御部６２０が、位置姿勢推定部６１０で推定された車両１００の位置と地図情報に基づいてスタート地点から目的地に向かうルートを計算する。なお、車両１００には予め目的地の座標を示す地図情報を記憶部に保存しているか、通信部によって周辺の地図を取得できるようになっているものとする。また、ＳＬＡＭ（ＳＩＭＵＬＴＡＮＥＯＵＳＬＯＣＡＬＩＺＡＴＩＯＮＡＮＤＭＡＰＰＩＮＧ）の技術を用いて随時周辺の地図を作成しながらルート探索するようにしても良い。次に、そのルートをたどるための制御値を算出し、アクチュエータ部６３０を介して車両の自動運転を行う。なお、自動運転中に変化する車両の位置姿勢を、常に位置姿勢推定部６１０で推定し、制御値を更新することで精度の高い自動運転を実現できる。また、位置姿勢推定部６１０は情報処理装置３００が出力する距離画像に基づいて、道路上に存在する凸凹や障害物の存在を検知し、運転制御部６２０は車両のスピードを緩める制御や停止の制御を自動的に選択できる。具体的には加速度や進行方向を制御値として算出する。アクチュエータ部６３０は、運転制御部６２０から出力する制御値に基づいて、車両の各機構（車輪のトルクや方向など）を動かし運転を行う。 In S880, the operation control unit 620 in the mobile device calculates a route from the start point to the destination based on the position of the vehicle 100 estimated by the position and orientation estimation unit 610 and the map information. In the vehicle 100, it is assumed that map information indicating coordinates of a destination is stored in the storage unit in advance, or that the communication unit can acquire maps of the surroundings. In addition, the route may be searched while creating a map of the surrounding area at any time using the technique of SLAM (SIMULTANEOUS LOCALIZATION AND MAPPING). Next, a control value for following the route is calculated, and automatic driving of the vehicle is performed via the actuator unit 630. The position and orientation estimation unit 610 always estimates the position and orientation of the vehicle that changes during automatic driving, and updates the control value to realize highly accurate automatic driving. Further, the position and orientation estimation unit 610 detects the presence of bumps and obstacles present on the road based on the distance image output by the information processing device 300, and the operation control unit 620 performs control to stop the speed of the vehicle or stop. Control can be selected automatically. Specifically, the acceleration and the traveling direction are calculated as control values. The actuator unit 630 performs driving by moving each mechanism (for example, the torque or the direction of the wheel) of the vehicle based on the control value output from the operation control unit 620.

Ｓ８９０では、運転制御部６２０が、車両１００が目的地に到着したときにシステムを終了する。Ｓ８７０で位置姿勢推定部６１０が、目的地と車両１００の距離が一定の値以下である場合は、運転制御部６２０が、目的地に到着したと判断する。目的地に到着していない場合はＳ８１０に戻る。システム終了の合図はこれ以外にも人間側の操作によって指示したりするようにしても良い。 In S890, the operation control unit 620 ends the system when the vehicle 100 arrives at the destination. When the distance between the destination and the vehicle 100 is equal to or less than a predetermined value in S870, the position and orientation estimation unit 610 determines that the driving control unit 620 has arrived at the destination. If it has not arrived at the destination, the process returns to S810. Other than this, a signal of system termination may be instructed by a human operation.

以上述べたように、本実施形態では、撮影した画像の認識結果に基づいて、現在の車両の位置姿勢を精度よく特定できる。これにより、認識結果に基づく車両の制御においてもその精度を向上させることができる。 As described above, in the present embodiment, the current position and orientation of the vehicle can be identified with high accuracy based on the recognition result of the photographed image. Thereby, the accuracy can be improved also in the control of the vehicle based on the recognition result.

（その他の実施形態）
第１及び第２の実施形態では、撮影条件として撮像装置の位置姿勢や明るさに注目し、画像の変換を行う方法について説明した。撮影条件としては、これ以外にも、画像のズーム値や画角、色合いについての撮影条件に注目し変換を行っても良い。 (Other embodiments)
In the first and second embodiments, a method has been described in which the image conversion is performed by focusing on the position and orientation and the brightness of the imaging device as the imaging conditions. As the shooting conditions, conversion may be performed by paying attention to the zoom value of the image, the angle of view, and the shooting conditions for the color tone other than this.

例えばズーム値に注目する場合、学習に用いた画像のズーム値に合わせて、認識処理に用いる画像を拡大或いは縮小することで、撮影条件を合わせることができる。また、色合いに注目する場合、学習に用いた画像の色合いとしてＲＧＢの値の比率を取得し、認識処理に用いる画像のＲＧＢの値の比率を変化させることで、撮影条件を合わせることができる。また、学習に用いた画像と認識処理に用いる画像のセンサとしてチャンネルの違いがある場合、（カラー画像とモノクロの濃淡画像、赤外画像など）その違いを補正する変換を加えても良い。認識処理に用いるの撮像装置１１０の画角は、学習に用いる撮像装置２１０の画角より広く設定しても良い。例えば、撮像条件の違いとして画角に注目する場合、撮影した画像のスケールを変換することになる。画角を広くすることで、画像変換部３３０において画像の位置や角度を変換するときに発生する画像の位置をずらすことで認識すべき領域が枠の外にはみ出てしまう見切れ部分を抑えることができる。 For example, when focusing on the zoom value, the imaging conditions can be matched by enlarging or reducing the image used for the recognition processing in accordance with the zoom value of the image used for learning. In addition, in the case of focusing on the color, the photographing condition can be matched by acquiring the ratio of RGB values as the color of the image used for learning and changing the ratio of the RGB values of the image used for recognition processing. In addition, when there is a difference between channels as a sensor of an image used for learning and an image used for recognition processing, a conversion for correcting the difference (such as a color image and a monochrome gray-scale image, an infrared image) may be added. The angle of view of the imaging device 110 used for recognition processing may be set wider than the angle of view of the imaging device 210 used for learning. For example, when focusing on the angle of view as the difference in imaging conditions, the scale of the captured image is converted. By widening the angle of view, by shifting the position of the image generated when converting the position or angle of the image in the image conversion unit 330, it is possible to suppress a missing portion in which the area to be recognized is out of the frame. it can.

本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、データ通信用のネットワーク又は各種記憶媒体を介してシステム或いは装置に供給する。そして、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。また、そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供しても良い。 The present invention is also realized by performing the following processing. That is, software (program) for realizing the functions of the above-described embodiments is supplied to a system or apparatus via a network for data communication or various storage media. Then, the computer (or CPU or MPU or the like) of the system or apparatus reads out and executes the program. Alternatively, the program may be provided by being recorded on a computer readable recording medium.

１００認識実行時の車両
１１０撮像装置
２００学習時の車両
２１０撮像装置
３００情報処理装置
３１０画像取得部
３２０撮影条件取得部
３３０画像変換部
３４０学習モデル保持部
３５０学習モデル取得部
３６０画像認識部 100 Vehicle 110 at the time of recognition execution 200 Vehicle at the time of learning 210 Imager 300 Information processor 310 Image acquisition unit 320 Imaging condition acquisition unit 330 Image conversion unit 340 Learning model holding unit 350 Learning model acquisition unit 360 Image recognition unit

Claims

An information processing apparatus that outputs an output result corresponding to an input captured image based on a learning model,
A first acquisition unit configured to acquire imaging conditions under which an image used when generating a learning model is acquired;
An input unit for inputting a photographed image to be processed from the imaging apparatus;
A second acquisition unit configured to acquire a photographing condition under which the photographed image to be processed is photographed;
Conversion means for converting the photographed image to be processed based on the photographing conditions acquired by the first and second acquisition means;
An information processing apparatus comprising: output means for outputting an output result corresponding to the converted image based on the learning model.

The information processing apparatus according to claim 1, wherein the second acquisition unit updates the imaging condition based on measurement information of a sensor that measures the periphery of the imaging apparatus.

The apparatus further comprises a first estimation unit that estimates the position of the imaging device based on the distance information measured by the sensor.
The second acquisition unit updates the imaging condition acquired by the second acquisition unit based on the estimation result of the first estimation unit.
The information processing apparatus according to claim 2, wherein the conversion unit converts the photographed image to be processed based on the updated photographing condition and the photographing condition acquired by the first acquisition unit.

A second estimation unit configured to estimate the position of the imaging device based on distance information among the output results;
The information processing apparatus according to claim 1, wherein the second acquisition unit updates the estimation result of the second estimation unit as the photographing condition acquired by the second acquisition unit.

The first estimation means estimates the position of the imaging device based on the image captured by the imaging device,
4. The information processing apparatus according to claim 3, wherein the second acquisition unit updates the estimation result of the first estimation unit as the imaging condition acquired by the second acquisition unit.

The first estimation means estimates the position of the imaging device based on the measurement information;
4. The information processing apparatus according to claim 3, wherein the second acquisition unit updates the estimation result of the first estimation unit as the imaging condition acquired by the second acquisition unit.

When the conversion means has a plurality of learning models and is learned using images taken under a plurality of different imaging conditions, the imaging condition selected from the plurality of imaging conditions and the second acquisition means The information processing apparatus according to any one of claims 1 to 6, wherein the photographed image of the processing target is converted based on the photographing condition acquired in (4).

Learning is performed to output an object included in the input captured image from the captured image input when generating the learning model,
The said output means recognizes and outputs the said object contained in the said converted image based on the said converted image and the said learning model, It is characterized by the above-mentioned. Information processing equipment.

The learning is performed to output distance information corresponding to the pixel or the region of the input photographed image from the photographed image input when generating the learning model,
The output means outputs distance information corresponding to a pixel or a region of the converted image based on the converted image and the learning model. Information processor as described.

An image acquisition unit that acquires an image captured by an imaging device;
A first imaging condition of an image used for learning in a learning model learned to output at least distance information corresponding to a pixel or a region of the input image from the input image, a position or a posture of the imaging device, A photographing condition acquisition unit that acquires, as a photographing condition, at least one of information regarding brightness when photographing an image, and a second photographing condition in the image;
Conversion means for converting an image captured by the imaging device based on the first imaging condition and the second imaging condition;
An information processing apparatus, comprising: recognition means for outputting at least distance information corresponding to a pixel or a region of the converted image based on the image converted by the conversion means and the learning model.

An image acquisition unit that acquires an image captured by an imaging device;
A photographing condition for acquiring a first photographing condition of an image used for learning in a learning model learned to output an output result corresponding to an input image and a second photographing condition of an image photographed by the imaging device Acquisition means,
Conversion means for converting an image captured by the imaging device based on the first imaging condition and the second imaging condition;
An information processing apparatus, comprising: recognition means for outputting an output result corresponding to the image converted by the conversion means based on the learning model.

An information processing system comprising: a moving device that moves based on map information indicating a destination, an imaging device mounted on the moving device, and an information processing device,
An information processing apparatus that outputs an output result corresponding to an input captured image based on a learning model is:
A first acquisition unit configured to acquire imaging conditions under which an image used when generating a learning model is acquired;
An input unit for inputting a photographed image to be processed from the imaging apparatus;
A second acquisition unit configured to acquire a photographing condition under which the photographed image to be processed is photographed;
Conversion means for converting the photographed image to be processed based on the photographing conditions acquired by the first and second acquisition means;
And output means for outputting an output result corresponding to the converted image based on the learning model.
The moving device is
Position estimation means for estimating at least the position of the mobile device based on distance information obtained by converting distance information in the output result into coordinates in the map information;
An information processing system comprising: control means for controlling the movement device to a destination based on the position of the movement device and the map information.

An information processing system comprising: a moving device that moves based on map information indicating a destination, an imaging device mounted on the moving device, and an information processing device,
The information processing apparatus is
An image acquisition unit that acquires an image captured by the imaging device;
A first imaging condition of an image used for learning in a learning model learned to output at least distance information corresponding to a pixel or a region of the input image from the input image, a position or a posture of the imaging device, A photographing condition acquisition unit that acquires, as a photographing condition, at least one of information regarding brightness when photographing an image, and a second photographing condition in the image;
Conversion means for converting an image captured by the imaging device based on the first imaging condition and the second imaging condition;
Recognition means for outputting distance information which is depth information from the imaging device corresponding to a pixel or a region of the image converted based on the image converted by the image conversion means and the learning model;
The moving device is
Position estimation means for estimating at least the position of the mobile device based on distance information obtained by converting the distance information into coordinates in the map information;
An information processing system comprising: control means for controlling the movement device to a destination based on the position of the movement device and the map information.

An information processing method for outputting an output result corresponding to an input photographed image based on a learning model,
A first acquisition step of acquiring imaging conditions under which an image used to generate a learning model is captured;
An input step of inputting a photographed image to be processed from the imaging device;
A second acquisition step of acquiring a photographing condition under which the photographed image to be processed is photographed;
A conversion step of converting the photographed image to be processed based on the photographing conditions acquired by the first and second acquisition means;
An output step of outputting an output result corresponding to the converted image based on the learning model.

An image acquisition step of acquiring an image captured by the imaging device;
A photographing condition for acquiring a first photographing condition of an image used for learning in a learning model learned to output an output result corresponding to an input image and a second photographing condition of an image photographed by the imaging device Acquisition process,
A conversion step of converting an image captured by the imaging device based on the first imaging condition and the second imaging condition;
A recognition step of outputting an output result corresponding to the image converted by the conversion means based on the learning model.

The program for functioning a computer as each means with which the information processing apparatus in any one of Claims 1-11 is equipped.

A computer readable storage medium storing the program according to claim 16.