JP2019200769A

JP2019200769A - Learning device, method for learning, and program

Info

Publication number: JP2019200769A
Application number: JP2018176328A
Authority: JP
Inventors: 佐藤　智; Satoshi Sato; 智佐藤; 吾妻　健夫; Takeo Azuma; 健夫吾妻; 登　一生; Kazuo Nobori; 一生登; 信彦若井; Nobuhiko Wakai
Original assignee: Panasonic Intellectual Property Management Co Ltd
Current assignee: Panasonic Intellectual Property Management Co Ltd
Priority date: 2018-05-14
Filing date: 2018-09-20
Publication date: 2019-11-21
Anticipated expiration: 2038-09-20
Also published as: JP7126123B2

Abstract

To provide a learning device which can identify an object by an image more accurately and also can perform identification processing more rapidly.SOLUTION: A learning device 12 has a memory and a processing circuit. The processing circuit: (a) acquires a first calculation taken image including an imaging target and surrounding environments of the imaging target from the memory, the first calculation taken image having a plurality of first pixels; (b) acquires a taken image including the imaging target and surrounding environments of the imaging target from the memory, the taken image having a plurality of second pixels; (c) acquires the result of identification of the imaging target and the surrounding environments of the imaging target in the taken image; (d) generates an identification model for identifying the first calculation taken image on the basis of the result of the identification of the taken image with reference to the correspondence relation between a plurality of first pixels and a plurality of second pixels; and (e) outputs the identification model to an image recognition device 10 for identifying a second calculation taken image.SELECTED DRAWING: Figure 2

Description

本開示は、学習装置、学習方法及びプログラムに関する。 The present disclosure relates to a learning device, a learning method, and a program.

自動運転の車両及びロボットにおいて、周囲の物体を識別し、環境を認識する技術は重要である。近年、例えば自動運転の車両及びロボットにおける物体識別のために、ディープラーニング（Deep Learning）と呼ばれる技術が注目されている。ディープラーニングとは、多層構造のニューラルネットワークを用いた機械学習であり、学習において大量の学習データを使用している。このようなディープラーニングを用いることにより、従来法と比べて、より高精度な識別性能を実現することが可能である。そして、このような物体識別において、画像情報は特に有効である。非特許文献１では、画像情報を入力としたディープラーニングによって、従来の物体識別能力を大幅に向上させる手法が開示されている。また、高精度に識別するためには、入力画像が高解像度である必要がある。低解像度の画像は、例えば遠方の被写体について十分な解像度で撮像することができておらず、入力画像が低解像度である場合には、識別性能が低下してしまうためである。 Technology for recognizing surrounding objects and recognizing the environment is important in autonomous driving vehicles and robots. 2. Description of the Related Art In recent years, for example, a technique called deep learning has attracted attention for object identification in autonomously driven vehicles and robots. Deep learning is machine learning using a multi-layered neural network, and a large amount of learning data is used in learning. By using such deep learning, it is possible to realize higher-precision identification performance as compared with the conventional method. In such object identification, image information is particularly effective. Non-Patent Document 1 discloses a technique for greatly improving conventional object identification capability by deep learning using image information as input. Further, in order to identify with high accuracy, the input image needs to have a high resolution. This is because, for example, a low-resolution image cannot be captured at a sufficient resolution with respect to a distant subject, and the identification performance deteriorates when the input image has a low resolution.

一方で、非特許文献２では、画像情報に加え、３次元レンジファインダによる奥行情報も入力とすることで、ディープラーニングの識別能力をさらに向上させる手法が開示されている。奥行情報を使用すると、近傍と遠方との被写体を分離できる。そのため、奥行情報を使用することで遠方の被写体に対しても識別性能を上げることができる。また、低解像度の画像を撮像しながら、高解像度の画像を復元するために、例えば、非特許文献３に開示されるような圧縮センシングと呼ばれる手法が知られている。 On the other hand, Non-Patent Document 2 discloses a technique for further improving the deep learning identification ability by inputting depth information by a three-dimensional range finder in addition to image information. By using depth information, it is possible to separate the near and far subjects. Therefore, the identification performance can be improved even for a distant subject by using the depth information. In order to restore a high-resolution image while capturing a low-resolution image, for example, a technique called compression sensing as disclosed in Non-Patent Document 3 is known.

A. Krizhevsky, I. Sutskever及びG. E. Hinton著、「ImageNet Classication with Deep Convolutional Neural Networks」、NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems、2012年、P1097-1105A. Krizhevsky, I. Sutskever and G. E. Hinton, `` ImageNet Classication with Deep Convolutional Neural Networks '', NIPS'12 Proceedings of the 25th International Conference on Neural Information Processing Systems, 2012, P1097-1105 Andreas Eitel他著、「Multimodal Deep Learning for Robust RGB-D Object Recognition」、2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)、2015年Andreas Eitel et al., `` Multimodal Deep Learning for Robust RGB-D Object Recognition '', 2015 IEEE / RSJ International Conference on Intelligent Robots and Systems (IROS), 2015 Y．Oike及びA．E．Gamal著、「A 256×256 CMOS Image Sensor with ΔΣ-Based Single-Shot Compressed Sensing」、2012 IEEE International Solid-State Circuits Conference（ISSCC） Dig. of Tech. Papers、2012年、P386-387Y. Oike and A. E. Gamal, “A 256 × 256 CMOS Image Sensor with ΔΣ-Based Single-Shot Compressed Sensing”, 2012 IEEE International Solid-State Circuits Conference (ISSCC) Dig. Of Tech. Papers, 2012, P386-387 M. Salman Asif，Ali Ayremlou, Ashok Veeraraghavan, Richard Baraniuk及びAswin Sankaranarayanan著、「FlatCam: Replacing Lenses with Masks and Computation」、International Conference on Computer Vision Workshop (ICCVW)、2015年、P.663-666M. Salman Asif, Ali Ayremlou, Ashok Veeraraghavan, Richard Baraniuk and Aswin Sankaranarayanan, "FlatCam: Replacing Lenses with Masks and Computation", International Conference on Computer Vision Workshop (ICCVW), 2015, P.663-666 Yusuke Nakamura, Takeshi Shimano, Kazuyuki Tajima, Mayu Sao及びTaku Hoshizawa著、「Lensless Light-field Imaging with Fresnel Zone Aperture」、3rd International Workshop on Image Sensors and Imaging Systems (IWISS2016) ITE-IST2016-51、2016年、no.40、P.7-8Yusuke Nakamura, Takeshi Shimano, Kazuyuki Tajima, Mayu Sao and Taku Hoshizawa, `` Lensless Light-field Imaging with Fresnel Zone Aperture '', 3rd International Workshop on Image Sensors and Imaging Systems (IWISS2016) ITE-IST2016-51, 2016, no .40, P.7-8

しかしながら、上記非特許文献１〜３に開示された技術では、画像を用いた物体の識別精度の向上及び識別処理速度の向上を両立することが難しいという問題がある。 However, the techniques disclosed in Non-Patent Documents 1 to 3 have a problem that it is difficult to achieve both improvement in identification accuracy of an object using an image and improvement in identification processing speed.

そこで、本開示は、画像を用いた物体の識別精度を向上し、かつ、識別処理速度を向上する学習装置等を提供する。 Therefore, the present disclosure provides a learning device and the like that improve the identification accuracy of an object using an image and improve the identification processing speed.

上記課題を解決するために、本開示の学習装置の一態様は、メモリ及び処理回路を備えた学習装置であって、前記処理回路は、（ａ）前記メモリから撮像対象物及び前記撮像対象物の周辺環境を含む第１の計算撮像画像を取得し、前記第１の計算撮像画像は複数の第１の画素を有し、（ｂ）前記メモリから前記撮像対象物及び前記撮像対象物の周辺環境を含む撮像画像を取得し、前記撮像画像は複数の第２の画素を有し、（ｃ）前記撮像画像に含まれる前記撮像対象物及び前記撮像対象物の周辺環境の識別結果を取得し、（ｄ）前記複数の第１の画素及び前記複数の第２の画素の対応関係を参照して、前記撮像画像の識別結果に基づいて、前記第１の計算撮像画像を識別するための識別モデルを生成し、（ｅ）第２の計算撮像画像を識別する画像識別装置に、前記識別モデルを出力する。 In order to solve the above-described problem, an aspect of the learning device of the present disclosure is a learning device including a memory and a processing circuit, and the processing circuit includes: (a) an imaging object and the imaging object from the memory A first calculated captured image including a surrounding environment of the first calculated captured image, the first calculated captured image includes a plurality of first pixels, and (b) the imaging object and the periphery of the imaging object from the memory A captured image including an environment is acquired, the captured image includes a plurality of second pixels, and (c) an identification result of the imaging object included in the captured image and a surrounding environment of the imaging object is acquired. , (D) identification for identifying the first calculated captured image based on the identification result of the captured image with reference to the correspondence relationship between the plurality of first pixels and the plurality of second pixels Generate a model and (e) identify the second computed captured image The image identification apparatus, and outputs the identification model.

なお、上記の包括的又は具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム又はコンピュータ読み取り可能な記録ディスク等の記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。コンピュータ読み取り可能な記録媒体は、例えばＣＤ−ＲＯＭ（Compact Disc-Read Only Memory）等の不揮発性の記録媒体を含む。 The comprehensive or specific aspect described above may be realized by a system, an apparatus, a method, an integrated circuit, a recording medium such as a computer program or a computer-readable recording disk, and the system, apparatus, method, and integrated circuit. The present invention may be realized by any combination of a computer program and a recording medium. The computer-readable recording medium includes a nonvolatile recording medium such as a CD-ROM (Compact Disc-Read Only Memory).

本開示の学習装置等によると、画像を用いた物体の識別精度を向上し、かつ、識別処理速度を向上することが可能になる。 According to the learning device or the like of the present disclosure, it is possible to improve the identification accuracy of an object using an image and improve the identification processing speed.

本開示の一態様の付加的な恩恵及び有利な点は本明細書及び図面から明らかとなる。この恩恵及び／又は有利な点は、本明細書及び図面に開示した様々な態様及び特徴により個別に提供され得るものであり、その１つ以上を得るために全てが必要ではない。 Additional benefits and advantages of one aspect of the present disclosure will become apparent from the specification and drawings. This benefit and / or advantage may be provided individually by the various aspects and features disclosed in this specification and the drawings, and not all are required to obtain one or more thereof.

図１は、実施の形態に係る画像識別装置を備える識別システムの機能的な構成の一例を示す模式図である。FIG. 1 is a schematic diagram illustrating an example of a functional configuration of an identification system including an image identification device according to an embodiment. 図２は、実施の形態の変形例に係る識別システムの機能的な構成の一例を示す模式図である。FIG. 2 is a schematic diagram illustrating an example of a functional configuration of an identification system according to a modification of the embodiment. 図３は、実施の形態の変形例に係る識別システムのハードウェア構成の一例を示す模式図である。FIG. 3 is a schematic diagram illustrating an example of a hardware configuration of an identification system according to a modification of the embodiment. 図４は、実施の形態の変形例に係る学習装置の主要な処理の流れの一例を示すフローチャートである。FIG. 4 is a flowchart illustrating an example of a main processing flow of the learning device according to the modification of the embodiment. 図５は、マルチピンホールを使用したライトフィールドカメラの例を示す図である。FIG. 5 is a diagram illustrating an example of a light field camera using a multi-pinhole. 図６は、通常撮像された被写体の画像（撮像画像）の例を示す模式図である。FIG. 6 is a schematic diagram illustrating an example of a normally captured subject image (captured image). 図７は、マルチピンホールマスクを含むライトフィールドカメラを使用して撮像された被写体の画像（計算撮像画像）の例を示す模式図である。FIG. 7 is a schematic diagram illustrating an example of a subject image (calculated captured image) captured using a light field camera including a multi-pinhole mask. 図８Ａは、識別領域枠が重畳表示された撮像画像を示す模式図である。FIG. 8A is a schematic diagram illustrating a captured image in which an identification area frame is superimposed and displayed. 図８Ｂは、識別領域枠のみを示す模式的な図である。FIG. 8B is a schematic diagram showing only the identification area frame. 図９は、画像上でマスクとして与えられた識別正解の例を示す模式図である。FIG. 9 is a schematic diagram illustrating an example of an identification correct answer given as a mask on an image. 図１０は、実施の形態に係る画像識別装置の動作の流れの一例を示すフローチャートである。FIG. 10 is a flowchart illustrating an example of an operation flow of the image identification device according to the embodiment. 図１１は、識別部の機能的な構成の一例を示す模式図である。FIG. 11 is a schematic diagram illustrating an example of a functional configuration of the identification unit. 図１２は、ランダムマスクを符号化絞りとして使用する符号化開口マスクの例の模式図である。FIG. 12 is a schematic diagram of an example of a coded aperture mask that uses a random mask as a coded stop. 図１３は、識別部の機能的な構成の別の一例を示す模式図である。FIG. 13 is a schematic diagram illustrating another example of the functional configuration of the identification unit. 図１４Ａは、第二画像取得部の光軸と第一画像取得部の光軸とがおおよそ一致することを示す模式図である。FIG. 14A is a schematic diagram showing that the optical axis of the second image acquisition unit and the optical axis of the first image acquisition unit approximately match. 図１４Ｂは、第二画像取得部を構成するステレオカメラの各光軸と第一画像取得部の光軸とがおおよそ一致することを示す模式図である。FIG. 14B is a schematic diagram showing that each optical axis of the stereo camera constituting the second image acquisition unit and the optical axis of the first image acquisition unit approximately coincide with each other. 図１５は、第一画像取得部の光軸と第二画像取得部の光軸とを一致させるために、ビームスプリッタが利用されることを示す模式図である。FIG. 15 is a schematic diagram showing that a beam splitter is used to match the optical axis of the first image acquisition unit with the optical axis of the second image acquisition unit.

「背景技術」の欄で記載したように、ディープラーニング等の機械学習が用いられることにより、機械装置による高精度な識別技術の実現が可能になった。このような識別技術を、車両の自動運転及びロボットの動作に適用することが試みられている。車両及びロボットは、移動体であるため、移動しつつ、カメラの撮像画像から周囲の物体を認識する必要がある。このため、高い識別処理速度が要求される。 As described in the “Background Technology” section, the use of machine learning such as deep learning has made it possible to implement a highly accurate identification technique using a mechanical device. Attempts have been made to apply such identification technology to automatic driving of a vehicle and movement of a robot. Since the vehicle and the robot are moving bodies, it is necessary to recognize surrounding objects from the captured image of the camera while moving. For this reason, a high identification processing speed is required.

非特許文献１に開示される技術は、高い識別精度を得るために、高解像度の画像を必要とする。高解像度の画像情報を取得するためには、高価なカメラを使用する必要があり、物体の識別システム自体が高価になるという課題がある。また、高解像度の画像の取得には、高価なカメラが必要になるだけでなく、高解像度の画像の処理量が大きくなり、処理に遅延が生じる可能性がある。 The technique disclosed in Non-Patent Document 1 requires a high-resolution image in order to obtain high identification accuracy. In order to acquire high-resolution image information, it is necessary to use an expensive camera, and there is a problem that the object identification system itself is expensive. In addition, acquiring a high-resolution image not only requires an expensive camera, but also increases the processing amount of the high-resolution image, which may cause a delay in processing.

非特許文献２には、奥行情報を使用する高精度な識別システムについての技術が開示されている。このようなシステムは、奥行情報を取得するために高価な３次元レンジファインダを必要とするため、コストが増大するという課題がある。さらに、この技術では、撮像画像と奥行情報とを関連付けて処理する必要があるため、処理量が多くなる。３次元レンジファインダによる奥行情報は、例えばレーダを用いた走査による数多くの点からなる点群情報を含むことから、そのデータサイズは大きいためである。つまり、画像情報に加えこのような３次元レンジファインダ等による奥行情報も入力として用いることで、ニューラルネットワークのネットワークサイズが大きくなり、識別処理速度が低下するという問題もある。 Non-Patent Document 2 discloses a technique regarding a highly accurate identification system that uses depth information. Such a system requires an expensive three-dimensional range finder in order to acquire depth information, and there is a problem that the cost increases. Furthermore, in this technique, since it is necessary to process a captured image and depth information in association with each other, the processing amount increases. This is because the depth information obtained by the three-dimensional range finder includes point group information including a large number of points obtained by scanning using a radar, for example, and thus has a large data size. That is, using depth information from such a three-dimensional range finder in addition to image information as an input causes a problem that the network size of the neural network increases and the identification processing speed decreases.

また、非特許文献３に開示される技術では、低解像度の画像から高解像度の画像を復元する処理量が膨大である。本開示に係る本発明者らは、非特許文献１〜３の技術に上述のような問題を見出し、識別精度を向上しつつ、識別処理速度を向上する技術を検討し、以下に示すような技術を創案した。 In the technique disclosed in Non-Patent Document 3, the amount of processing for restoring a high-resolution image from a low-resolution image is enormous. The present inventors according to the present disclosure have found the above-described problems in the techniques of Non-Patent Documents 1 to 3, and studied a technique for improving the identification processing speed while improving the identification accuracy, as shown below. Invented technology.

本開示の一態様に係る学習装置は、メモリ及び処理回路を備えた学習装置であって、前記処理回路は、（ａ）前記メモリから撮像対象物及び前記撮像対象物の周辺環境を含む第１の計算撮像画像を取得し、前記第１の計算撮像画像は複数の第１の画素を有し、（ｂ）前記メモリから前記撮像対象物及び前記撮像対象物の周辺環境を含む撮像画像を取得し、前記撮像画像は複数の第２の画素を有し、（ｃ）前記撮像画像に含まれる前記撮像対象物及び前記撮像対象物の周辺環境の識別結果を取得し、（ｄ）前記複数の第１の画素及び前記複数の第２の画素の対応関係を参照して、前記撮像画像の識別結果に基づいて、前記第１の計算撮像画像を識別するための識別モデルを生成し、（ｅ）第２の計算撮像画像を識別する画像識別装置に、前記識別モデルを出力する。 A learning device according to an aspect of the present disclosure is a learning device including a memory and a processing circuit. The processing circuit includes: (a) a first object including an imaging object and a surrounding environment of the imaging object from the memory; The first calculated captured image has a plurality of first pixels, and (b) acquires a captured image including the imaged object and the surrounding environment of the imaged object from the memory. The captured image has a plurality of second pixels, and (c) obtains an identification result of the imaging object included in the captured image and the surrounding environment of the imaging object, and (d) the plurality of the plurality of pixels. An identification model for identifying the first calculated captured image is generated on the basis of the identification result of the captured image with reference to the correspondence relationship between the first pixel and the plurality of second pixels, and (e ) In the image identification device for identifying the second calculated captured image, the identification And outputs a model.

計算撮像画像には、画像自体に奥行情報等の他の情報を付加することができるため、物体の識別にあたり、単に画像自体を入力として用いるだけでよく、３次元レンジファインダ等によるデータサイズの大きい点群情報等を入力として用いることを要さない。このため、ニューラルネットワークのネットワークサイズが大きくなることを抑制でき、識別処理速度を向上できる。また、低解像度の画像から高解像度の画像を復元する処理も要さないため、識別処理速度を向上できる。また、計算撮像画像によって奥行情報等の他の情報を用いることができるため、識別精度を向上できる。このように、画像を用いた物体の識別精度を向上し、かつ、識別処理速度を向上することが可能になる。 Since other information such as depth information can be added to the calculated captured image, the image itself can be simply used as an input to identify the object, and the data size of the three-dimensional range finder is large. There is no need to use point cloud information or the like as input. For this reason, it can suppress that the network size of a neural network becomes large, and can improve the identification processing speed. In addition, since the process of restoring the high resolution image from the low resolution image is not required, the identification processing speed can be improved. Moreover, since other information, such as depth information, can be used by the calculated captured image, the identification accuracy can be improved. As described above, it is possible to improve the identification accuracy of an object using an image and improve the identification processing speed.

ただし、計算撮像画像は、人によって実空間の状態と同様に視覚的に認識できない画像であり、第１の計算撮像画像を入力として機械学習を行う場合、人は第１の計算撮像画像を実空間の状態と同様に視覚的に認識できないことから、機械学習を行う際に第１の計算撮像画像についての識別結果を識別正解として入力することは困難である。そこで、第１の計算撮像画像を入力として機械学習を行う場合であっても、人によって実空間の状態と同様に視覚的に認識できる通常の撮像画像についての識別結果を識別正解として入力する。撮像画像は、人によって実空間の状態と同様に視覚的に認識できる画像であることから、撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の位置等の識別結果を容易に取得できるためである。また、第１の計算撮像画像を入力とし、第１の計算撮像画像とは異なる撮像画像についての識別結果に基づいて機械学習を行うことで、第１の計算撮像画像を識別するための識別モデルを生成するためには、第１の計算撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の位置（画素）が、撮像画像ではどこの位置（画素）に対応するかがわかっている必要がある。このため、本態様では、第１の計算撮像画像と撮像画像との位置についての対応関係（具体的には、第１の計算撮像画像が有する複数の第１の画素と撮像画像が有する複数の第２の画素との対応関係）を参照している。 However, the calculated captured image is an image that cannot be visually recognized by a person in the same manner as the state of the real space. When machine learning is performed using the first calculated captured image as an input, the person actually executes the first calculated captured image. Since it cannot be visually recognized in the same manner as the state of the space, it is difficult to input the identification result for the first calculated captured image as an identification correct answer when performing machine learning. Therefore, even when machine learning is performed by using the first calculated captured image as an input, an identification result for a normal captured image that can be visually recognized by a person in the same manner as the state of the real space is input as an identification correct answer. Since the captured image is an image that can be visually recognized by a person in the same way as the state of the real space, it is possible to easily acquire the identification result such as the position of the imaging target included in the captured image and the surrounding environment of the imaging target Because. An identification model for identifying the first calculated captured image by using the first calculated captured image as input and performing machine learning based on the identification result of the captured image different from the first calculated captured image. In order to generate the image, it is known which position (pixel) in the captured image corresponds to the imaging object included in the first calculated captured image and the position (pixel) of the surrounding environment of the imaging object. There is a need. For this reason, in this aspect, the correspondence relationship between the positions of the first calculated captured image and the captured image (specifically, the plurality of first pixels included in the first calculated captured image and the plurality of captured images includes (Corresponding relationship with the second pixel).

例えば、前記識別結果は、前記撮像対象物及び前記撮像対象物の周辺環境の平面における位置を含んでいてもよい。 For example, the identification result may include a position of the imaging object and the surrounding environment of the imaging object in a plane.

これによれば、撮像対象物及び前記撮像対象物の周辺環境の平面における位置に基づいて識別モデルが生成されるため、当該識別モデルを用いて、第２の計算撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の平面における位置を識別できる。 According to this, since the identification model is generated based on the imaging object and the position of the imaging object in the plane of the surrounding environment, the imaging object included in the second calculated captured image using the identification model. And the position in the plane of the surrounding environment of an imaging target object can be identified.

例えば、前記識別結果は、前記撮像対象物及び前記撮像対象物の周辺環境の奥行方向における位置を含んでいてもよい。 For example, the identification result may include a position in the depth direction of the imaging object and the surrounding environment of the imaging object.

これによれば、撮像対象物及び前記撮像対象物の周辺環境の奥行方向における位置に基づいて識別モデルが生成されるため、当該識別モデルを用いて、第２の計算撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の奥行方向における位置を識別できる。 According to this, since the identification model is generated based on the imaging object and the position in the depth direction of the surrounding environment of the imaging object, the imaging object included in the second calculated captured image using the identification model The position in the depth direction of the surrounding environment of the object and the imaging object can be identified.

例えば、前記識別結果は、前記撮像対象物及び前記撮像対象物の周辺環境が属するカテゴリ情報を含んでいてもよい。 For example, the identification result may include category information to which the imaging object and the surrounding environment of the imaging object belong.

これによれば、撮像対象物及び前記撮像対象物の周辺環境のカテゴリ情報に基づいて識別モデルが生成されるため、当該識別モデルを用いて、第２の計算撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境のカテゴリ情報を識別できる。例えば、撮像対象物等が人物、自動車、自転車又は信号等であるかを識別できる。 According to this, since the identification model is generated based on the category information of the imaging object and the surrounding environment of the imaging object, using the identification model, the imaging object included in the second calculated captured image and The category information of the surrounding environment of the imaging object can be identified. For example, it can be identified whether the object to be imaged is a person, a car, a bicycle, a signal, or the like.

例えば、前記第１の計算撮像画像及び前記第２の計算撮像画像は、前記撮像対象物及び前記撮像対象物の周辺環境がそれぞれ複数重畳された視差情報を含んだ画像であってもよい。具体的には、前記第１の計算撮像画像及び前記第２の計算撮像画像は、マルチピンホールカメラ、ＣｏｄｅｄＡｐｅｒｔｕｒｅカメラ、ライトフィールドカメラ、又は、レンズレスカメラによる前記撮像対象物及び前記撮像対象物の周辺環境の撮像により得られる画像であってもよい。 For example, the first calculated captured image and the second calculated captured image may be images including parallax information in which a plurality of surroundings of the imaging target and the imaging target are superimposed. Specifically, the first calculated captured image and the second calculated captured image are obtained by using the multi-pinhole camera, the coded aperture camera, the light field camera, or the lensless camera, and the imaging object and the imaging object. It may be an image obtained by imaging the surrounding environment.

これによれば、撮像対象物及び撮像対象物の周辺環境をそれぞれ複数重畳することで、画像自体に奥行情報を付加することができる。 According to this, depth information can be added to an image itself by superimposing a plurality of imaging objects and surrounding environments of the imaging objects.

例えば、前記撮像画像は、マルチビューステレオカメラによる前記撮像対象物及び前記撮像対象物の周辺環境の撮像により得られる画像であってもよい。 For example, the captured image may be an image obtained by imaging the imaging object and the surrounding environment of the imaging object with a multi-view stereo camera.

マルチビューステレオカメラにより得られる撮像画像を用いることで、当該撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の奥行方向における位置を推定することができる。したがって、撮像画像についての識別結果として奥行方向における位置を、識別正解として入力することができる。 By using the captured image obtained by the multi-view stereo camera, it is possible to estimate the imaging object included in the captured image and the position in the depth direction of the surrounding environment of the imaging object. Therefore, the position in the depth direction can be input as the identification correct answer as the identification result for the captured image.

例えば、前記第１の計算撮像画像の撮像に用いられるカメラの光軸と、前記撮像画像の撮像に用いられるカメラの光軸とは、略一致していてよい。具体的には、前記第１の計算撮像画像の撮像に用いられるカメラの光軸と、前記撮像画像の撮像に用いられるカメラの光軸とは、ビームスプリッタ、プリズム又はハーフミラーを介することで一致していてもよい。 For example, the optical axis of the camera used for capturing the first calculated captured image and the optical axis of the camera used for capturing the captured image may be substantially the same. Specifically, the optical axis of the camera used for capturing the first calculated captured image and the optical axis of the camera used for capturing the captured image are determined by passing through a beam splitter, a prism, or a half mirror. You may do it.

これによれば、撮像画像に対する識別正解を第１の計算撮像画像に対する識別正解に変換する際に、各光軸を略一致（若しくは一致）させることで、変換に伴う誤差を小さくすることができ、より高精度の識別が実現できる。第１の計算撮像画像の撮像に用いられるカメラの光軸と、撮像画像の撮像に用いられるカメラの光軸とが略一致することで、第１の計算撮像画像と撮像画像とが略同じ位置（環境）を撮像したときに得られる画像となるためである。 According to this, when converting the identification correct answer with respect to the captured image into the identification correct answer with respect to the first calculated captured image, each optical axis is substantially matched (or matched), thereby reducing errors caused by the conversion. , More accurate identification can be realized. Since the optical axis of the camera used for capturing the first calculated captured image and the optical axis of the camera used for capturing the captured image substantially coincide with each other, the first calculated captured image and the captured image are at substantially the same position. This is because an image obtained when imaging (environment) is obtained.

本開示の一態様に係る学習方法は、（ａ）撮像対象物及び前記撮像対象物の周辺環境を含む第１の計算撮像画像であって、複数の第１の画素を有する第１の計算撮像画像を取得し、（ｂ）前記撮像対象物及び前記撮像対象物の周辺環境を含む撮像画像であって、複数の第２の画素を有する撮像画像を取得し、（ｃ）前記撮像画像に含まれる前記撮像対象物及び前記撮像対象物の周辺環境の識別結果を取得し、（ｄ）前記複数の第１の画素及び前記複数の第２の画素の対応関係を参照して、前記撮像画像の識別結果に基づいて、前記第１の計算撮像画像を識別するための識別モデルを生成し、（ｅ）第２の計算撮像画像を識別する画像識別装置に、前記識別モデルを出力する。 A learning method according to an aspect of the present disclosure includes: (a) a first calculated captured image including a plurality of first pixels, the first calculated captured image including an imaging target object and a surrounding environment of the imaging target object. An image is acquired; (b) a captured image including the imaging object and a surrounding environment of the imaging object, the captured image having a plurality of second pixels; (c) included in the captured image And (d) referring to a correspondence relationship between the plurality of first pixels and the plurality of second pixels, and Based on the identification result, an identification model for identifying the first calculated captured image is generated, and (e) the identification model is output to an image identification device for identifying the second calculated captured image.

これによれば、画像を用いた物体の識別精度を向上し、かつ、識別処理速度を向上する学習方法を提供できる。 According to this, it is possible to provide a learning method that improves the identification accuracy of an object using an image and improves the identification processing speed.

本開示の一態様に係るプログラムは、上記の学習方法をコンピュータに実行させるためのプログラムである。 A program according to an aspect of the present disclosure is a program for causing a computer to execute the learning method described above.

これによれば、画像を用いた物体の識別精度を向上し、かつ、識別処理速度を向上するプログラムを提供できる。 According to this, the program which improves the identification accuracy of the object using an image and improves the identification processing speed can be provided.

なお、上記の包括的又は具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム又はコンピュータ読み取り可能な記録ディスク等の記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。コンピュータ読み取り可能な記録媒体は、例えばＣＤ−ＲＯＭ等の不揮発性の記録媒体を含む。 The comprehensive or specific aspect described above may be realized by a system, an apparatus, a method, an integrated circuit, a recording medium such as a computer program or a computer-readable recording disk, and the system, apparatus, method, and integrated circuit. The present invention may be realized by any combination of a computer program and a recording medium. The computer-readable recording medium includes a non-volatile recording medium such as a CD-ROM.

［実施の形態］
以下、実施の形態について、図面を参照しながら説明する。なお、以下で説明する実施の形態は、いずれも包括的又は具体的な例を示すものである。以下の実施の形態で示される数値、形状、構成要素、構成要素の配置位置及び接続形態、ステップ（工程）、ステップの順序等は、一例であり、本開示を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。また、以下の実施の形態の説明において、略一致のような「略」を伴った表現が用いられる場合がある。例えば、略一致とは、完全に一致であることを意味するだけでなく、実質的に一致、すなわち、例えば数％程度の差異を含むことも意味する。他の「略」を伴った表現についても同様である。また、各図は模式図であり、必ずしも厳密に図示されたものではない。さらに、各図において、実質的に同一の構成要素に対しては同一の符号を付しており、重複する説明は省略又は簡略化される場合がある。 [Embodiment]
Hereinafter, embodiments will be described with reference to the drawings. It should be noted that each of the embodiments described below shows a comprehensive or specific example. Numerical values, shapes, components, arrangement positions and connection forms of components, steps (steps), order of steps, and the like shown in the following embodiments are merely examples, and are not intended to limit the present disclosure. In addition, among the constituent elements in the following embodiments, constituent elements that are not described in the independent claims indicating the highest concept are described as optional constituent elements. Further, in the following description of the embodiment, an expression with “substantially” such as substantially coincidence may be used. For example, “substantially coincident” not only means that they are completely coincident, but also means that they are substantially coincident, that is, include a difference of, for example, several percent. The same applies to expressions involving other “abbreviations”. Each figure is a mimetic diagram and is not necessarily illustrated strictly. Furthermore, in each figure, the same code | symbol is attached | subjected to the substantially same component, and the overlapping description may be abbreviate | omitted or simplified.

実施の形態に係る画像識別装置を説明する。 An image identification device according to an embodiment will be described.

図１は、実施の形態に係る画像識別装置１０を備える識別システム１の機能的な構成の一例を示す模式図である。 FIG. 1 is a schematic diagram illustrating an example of a functional configuration of an identification system 1 including an image identification device 10 according to an embodiment.

識別システム１は、撮像対象物及び撮像対象物の周辺環境を含む計算撮像画像を撮像するカメラと、識別モデルを用いて、計算撮像画像中の撮像対象物を識別する処理回路とを備える。当該識別モデル及び計算撮像画像については後述する。識別システム１は、当該処理回路を有する画像識別装置１０と当該カメラとして撮像部１１とを備える。画像識別装置１０は、取得部１０１と、識別部１０２と、出力部１０３とを備える。識別システム１は、撮像部１１が取得する画像を用いて、当該画像に含まれる被写体を検出し、検出結果を出力する。画像における被写体の検出を、「識別」とも呼ぶ。 The identification system 1 includes a camera that captures a captured image including a captured object and a surrounding environment of the captured object, and a processing circuit that identifies the captured object in the captured captured image using an identification model. The identification model and the calculated captured image will be described later. The identification system 1 includes an image identification device 10 having the processing circuit and an imaging unit 11 as the camera. The image identification device 10 includes an acquisition unit 101, an identification unit 102, and an output unit 103. The identification system 1 detects a subject included in the image using the image acquired by the imaging unit 11, and outputs a detection result. The detection of the subject in the image is also called “identification”.

識別システム１は、車両及びロボット等の移動体に搭載されてもよく、監視カメラシステム等の固定物に搭載されてもよい。本実施の形態では、識別システム１は、移動体の一例である自動車に搭載されるとして説明する。この場合、撮像部１１及び画像識別装置１０の両方が移動体に搭載されてもよい。又は、撮像部１１が移動体に搭載され、画像識別装置１０が移動体の外部に配置されてもよい。画像識別装置１０が配置される対象の例は、コンピュータ装置又は移動体の操作者の端末装置等である。端末装置の例は、移動体専用の操作用端末装置、又は、スマートフォン、スマートウォッチ及びタブレット等の汎用的な携帯端末装置等である。コンピュータ装置の例は、カーナビゲーションシステム、ＥＣＵ（Engine Control Unit）又はサーバ装置等である。 The identification system 1 may be mounted on a moving body such as a vehicle and a robot, or may be mounted on a fixed object such as a surveillance camera system. In the present embodiment, the identification system 1 will be described as being mounted on an automobile that is an example of a mobile object. In this case, both the imaging unit 11 and the image identification device 10 may be mounted on the moving body. Alternatively, the imaging unit 11 may be mounted on the moving body, and the image identification device 10 may be disposed outside the moving body. An example of a target on which the image identification device 10 is arranged is a computer device or a terminal device of an operator of a moving object. Examples of the terminal device are an operation terminal device dedicated to a mobile body, or a general-purpose portable terminal device such as a smartphone, a smart watch, and a tablet. Examples of the computer device are a car navigation system, an ECU (Engine Control Unit), a server device, or the like.

画像識別装置１０と撮像部１１とが離れて配置される場合、画像識別装置１０及び撮像部１１は、有線通信又は無線通信を介して通信してもよい。有線通信には、例えば、イーサネット（登録商標）規格に準拠したネットワーク等の有線ＬＡＮ（Local Area Network）及びその他のいかなる有線通信が適用されてもよい。無線通信には、第３世代移動通信システム（３Ｇ）、第４世代移動通信システム（４Ｇ）、又はＬＴＥ（登録商標）等のような移動通信システムで利用されるモバイル通信規格、Ｗｉ−Ｆｉ（登録商標）（Wireless Fidelity）などの無線ＬＡＮ、及び、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＺｉｇＢｅｅ（登録商標）等の近距離無線通信が適用されてもよい。 When the image identification device 10 and the imaging unit 11 are arranged apart from each other, the image identification device 10 and the imaging unit 11 may communicate via wired communication or wireless communication. For the wired communication, for example, a wired LAN (Local Area Network) such as a network conforming to the Ethernet (registered trademark) standard and any other wired communication may be applied. For wireless communication, a mobile communication standard used in a mobile communication system such as the third generation mobile communication system (3G), the fourth generation mobile communication system (4G), or LTE (registered trademark), Wi-Fi ( Wireless LAN such as registered trademark (Wireless Fidelity), and short-range wireless communication such as Bluetooth (registered trademark) and ZigBee (registered trademark) may be applied.

撮像部１１は、撮像対象物及び撮像対象物の周辺環境を含む計算撮像画像（computational imaging photography）を撮像する、つまり取得する。具体的には、撮像部１１は、計算撮像画像として、撮像対象物及び撮像対象物の周辺環境がそれぞれ複数重畳された視差情報を含んだ画像を撮像（取得）する。撮像部１１が取得する計算撮像画像を第２の計算撮像画像とも呼ぶ。第２の計算撮像画像は、物体の識別時に用いられる画像である。なお、計算撮像画像は、計算画像とも呼ばれる。例えば、撮像部１１は、所定の周期である第１の周期毎に第２の計算撮像画像を取得してもよいし、連続的に動画として第２の計算撮像画像を取得してもよい。撮像部１１は、時刻と対応付けられた第２の計算撮像画像を取得してもよい。撮像部１１のハードウェアの例はカメラであり、具体的にはマルチピンホールカメラ、ＣｏｄｅｄＡｐｅｒｔｕｒｅカメラ、ライトフィールドカメラ、又は、レンズレスカメラ等である。このようなカメラである場合、撮像部１１は、後述するように、１回の撮像動作で被写体についての複数の画像を同時に取得することができる。なお、撮像部１１は、例えば、撮像部１１が備える撮像素子の撮像領域、つまり受光領域を変化させることによって、上記の複数の画像を複数回の撮像動作で取得してもよい。撮像部１１は、取得した第２の計算撮像画像を、画像識別装置１０の取得部１０１に出力する。 The imaging unit 11 captures, that is, acquires a computational imaging image including an imaging object and the surrounding environment of the imaging object. Specifically, the imaging unit 11 captures (acquires) an image including parallax information in which a plurality of imaging objects and surrounding environments of the imaging object are superimposed as a calculated captured image. The calculated captured image acquired by the imaging unit 11 is also referred to as a second calculated captured image. The second calculated captured image is an image used when identifying an object. Note that the calculated captured image is also called a calculated image. For example, the imaging unit 11 may acquire the second calculated captured image every first cycle that is a predetermined cycle, or may acquire the second calculated captured image continuously as a moving image. The imaging unit 11 may acquire a second calculated captured image associated with the time. An example of hardware of the imaging unit 11 is a camera, specifically, a multi-pinhole camera, a coded aperture camera, a light field camera, a lensless camera, or the like. In the case of such a camera, the imaging unit 11 can simultaneously acquire a plurality of images of the subject in one imaging operation, as will be described later. Note that the imaging unit 11 may acquire the plurality of images by a plurality of imaging operations, for example, by changing an imaging region of the imaging element included in the imaging unit 11, that is, a light receiving region. The imaging unit 11 outputs the acquired second calculated captured image to the acquisition unit 101 of the image identification device 10.

なお、撮像部１１は、物体の識別時に用いられる第２の計算撮像画像だけでなく、後述する図２等で説明する学習時に用いられる第１の計算撮像画像を取得し、取得した第１の計算撮像画像を、学習装置１２の第一画像取得部１２１（図２参照）に出力してもよい。 Note that the imaging unit 11 acquires not only the second calculated captured image used when identifying the object but also the first calculated captured image used during learning described in FIG. The calculated captured image may be output to the first image acquisition unit 121 (see FIG. 2) of the learning device 12.

ここで、計算撮像画像と通常撮像画像とを説明する。通常撮像画像は、光学系を通して撮像される画像である。通常撮像画像は、通常、光学系により集光された物体からの光を結像（imaging）することによって、取得される。光学系の一例は、レンズである。物体と像内の像点（image point）とを入れ替えて、像点に物体を配置することにより、物体と像内の像点とを入れ替える前と同じ光学系で元の物体の位置に像点ができるような物体の点と像点との位置関係を共役（conjugate）と呼ぶ。本明細書において、このように共役関係にある状態で撮像された画像は、通常撮像画像（又は撮像画像）と表記する。物体が存在する環境下で、人が物体を直接見たとき、人は通常撮像画像とほぼ同様の状態で当該物体を知覚する。言い換えると、人は、通常のデジタルカメラで撮像された通常撮像画像を、実空間の状態と同様に視覚的に認識する。 Here, the calculated captured image and the normal captured image will be described. A normal captured image is an image captured through an optical system. A normal captured image is usually acquired by imaging light from an object collected by an optical system. An example of the optical system is a lens. By swapping the object and the image point in the image and placing the object at the image point, the image point at the position of the original object in the same optical system as before the object and the image point in the image are replaced The positional relationship between an object point and an image point that can be used is called a conjugate. In this specification, an image captured in such a conjugate state is referred to as a normal captured image (or captured image). When a person views the object directly in an environment where the object exists, the person perceives the object in a state almost the same as a normal captured image. In other words, a person visually recognizes a normal captured image captured by a normal digital camera in the same manner as a real space state.

一方、計算撮像画像は、例えばマルチピンホールを用いることで複数の画像がずれて重畳されたものであり、人によって実空間の状態と同様に視覚的に認識できない画像である。ただし、計算撮像画像は、人が視覚的に認識できない画像であり得るが、コンピュータ処理を用いれば、撮像対象物及び周辺環境等の画像に含まれる情報の取得が可能である画像である。計算撮像画像は、画像を復元することによって人が認識できるように視覚化されることができる。計算撮像画像の例は、マルチピンホール又はマイクロレンズを用いて撮像されたライトフィールド画像、時空間で画素情報を重み付け加算して撮像された圧縮センシング画像、又は、符号化絞りとコード化されたマスクとを使用して撮像されたＣｏｄｅｄＡｐｅｒｔｕｒｅ画像（符号化開口画像）などの符号化画像である。例えば、非特許文献３には、圧縮センシング画像の例が示されている。また、計算撮像画像の他の例は、非特許文献４及び非特許文献５に示されるような、屈折による結像光学系を有しないレンズレスカメラを使用して撮像された画像である。上記のいずれの計算撮像画像も、既知な技術であるため、その詳細な説明を省略する。 On the other hand, the calculated captured image is an image in which a plurality of images are shifted and superimposed by using, for example, a multi-pinhole, and is an image that cannot be visually recognized by a person in the same manner as the state of the real space. However, the calculated captured image may be an image that cannot be visually recognized by a person, but if computer processing is used, it is possible to acquire information included in the image of the imaging target and the surrounding environment. The computed captured image can be visualized so that a person can recognize it by restoring the image. Examples of computed captured images are light field images captured using multi-pinholes or microlenses, compressed sensing images captured by weighted addition of pixel information in space-time, or encoded with an encoded aperture It is an encoded image such as a coded aperture image (encoded aperture image) imaged using a mask. For example, Non-Patent Document 3 shows an example of a compressed sensing image. Another example of the calculated captured image is an image captured using a lensless camera that does not have an imaging optical system by refraction as shown in Non-Patent Document 4 and Non-Patent Document 5. Since any of the above calculated captured images is a known technique, a detailed description thereof will be omitted.

例えば、ライトフィールド画像には、各画素に、画像値に加えて、奥行情報も含まれる。ライトフィールド画像は、撮像素子の前に配置された複数のピンホール又はマイクロレンズを介して、撮像素子によって取得された画像である。複数のピンホール及びマイクロレンズは、撮像素子の受光面に沿って平面的に配置され、例えば、格子状に配置される。撮像素子は、その全体での１回の撮像動作において、複数のピンホール又はマイクロレンズのそれぞれを通じて複数の像を同時に取得する。複数の像は、異なる視点から撮像された像である。このような複数の像と視点との位置関係から、被写体の奥行方向の距離の取得が可能である。撮像素子の例は、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサ又はＣＣＤ（Charge-Coupled Device）イメージセンサ等のイメージセンサである。 For example, a light field image includes depth information in addition to an image value for each pixel. The light field image is an image acquired by the image sensor through a plurality of pinholes or microlenses arranged in front of the image sensor. The plurality of pinholes and microlenses are arranged in a plane along the light receiving surface of the image sensor, for example, arranged in a lattice shape. The imaging device simultaneously acquires a plurality of images through each of a plurality of pinholes or microlenses in one imaging operation as a whole. The plurality of images are images taken from different viewpoints. The distance in the depth direction of the subject can be acquired from the positional relationship between the plurality of images and the viewpoint. An example of the image sensor is an image sensor such as a complementary metal oxide semiconductor (CMOS) image sensor or a charge-coupled device (CCD) image sensor.

圧縮センシング画像は、圧縮センシングの対象画像である。圧縮センシングの対象画像の例は、レンズレスカメラで撮像された画像である。レンズレスカメラは、屈折による結像光学系を有さず、撮像素子の前に配置されたマスクを介して、画像を取得する。マスクは、透過率が異なる複数の領域を、例えば格子状に含む。このようなマスクを通して撮影することで、様々な方向からの光線（ライトフィールド画像）をマスクによってコード化して撮像することができる。圧縮センシングでは、このマスク情報を利用することで、コード化されたライトフィールド画像から、所望の方向の光線のみの画像、又は、すべての距離に焦点が合った全焦点画像を取得することができ、さらには奥行情報を取得することができる。 The compressed sensing image is a target image for compressed sensing. An example of a compression sensing target image is an image captured by a lensless camera. The lensless camera does not have an image forming optical system by refraction, and acquires an image via a mask arranged in front of the image sensor. The mask includes a plurality of regions having different transmittances, for example, in a lattice shape. By photographing through such a mask, light rays (light field images) from various directions can be coded and imaged by the mask. In compressed sensing, by using this mask information, it is possible to acquire an image of only the light beam in the desired direction or an omnifocal image focused at all distances from the coded light field image. Furthermore, depth information can be acquired.

また、このようなマスクをカメラの開口部に絞りとして設置して撮影した画像はＣｏｄｅｄＡｐｅｒｔｕｒｅ画像（符号化開口画像）と呼ばれる。 An image captured by setting such a mask as a diaphragm at the opening of the camera is called a coded aperture image (coded aperture image).

このように、計算撮像画像（第１の計算撮像画像及び第２の計算撮像画像）は、撮像対象物及び撮像対象物の周辺環境がそれぞれ複数重畳された視差情報を含んだ画像であり、具体的には、マルチピンホールカメラ、ＣｏｄｅｄＡｐｅｒｔｕｒｅカメラ、ライトフィールドカメラ、又は、レンズレスカメラによる撮像対象物及び撮像対象物の周辺環境の撮像により得られる画像である。 As described above, the calculated captured image (the first calculated captured image and the second calculated captured image) is an image including parallax information in which a plurality of imaging objects and a plurality of surrounding environments of the imaging objects are superimposed. Specifically, it is an image obtained by imaging the imaging object and the surrounding environment of the imaging object by a multi-pinhole camera, a coded aperture camera, a light field camera, or a lensless camera.

画像識別装置１０の取得部１０１は、撮像部１１から第２の計算撮像画像を取得し、識別部１０２に出力する。また、取得部１０１は、識別部１０２が識別のために用いる識別器を取得してもよく、取得した識別器を識別部１０２に出力してもよい。画像識別装置１０が移動体に搭載される場合、取得部１０１は、移動体から、移動体の速度を取得してもよい。取得部１０１は、移動体の速度をリアルタイムに取得してもよく、定期的に取得してもよい。例えば、取得部１０１は、移動体が速度計を備える場合、速度計から速度を取得してもよく、また、移動体が備えるコンピュータであって、速度計から速度情報を受信するコンピュータから速度を取得してもよい。また、例えば、取得部１０１は、移動体が速度計を備えない場合、移動体が備えるＧＰＳ（Global Positioning System）装置、加速度計及び角速度計などの慣性計測装置等から速度に関連する情報を取得してもよい。 The acquisition unit 101 of the image identification device 10 acquires the second calculated captured image from the imaging unit 11 and outputs it to the identification unit 102. Further, the acquiring unit 101 may acquire a discriminator used by the discriminating unit 102 for discrimination, or may output the acquired discriminator to the discriminating unit 102. When the image identification device 10 is mounted on a moving body, the acquisition unit 101 may acquire the speed of the moving body from the moving body. The acquisition part 101 may acquire the speed of a moving body in real time, and may acquire it regularly. For example, when the moving body includes a speedometer, the acquiring unit 101 may acquire the speed from the speedometer, or the computer provided in the moving body may obtain the speed from a computer that receives speed information from the speedometer. You may get it. Further, for example, when the moving body does not include a speedometer, the acquiring unit 101 acquires speed-related information from an inertial measurement device such as a GPS (Global Positioning System) device, an accelerometer, and an angular velocity meter provided in the moving body. May be.

識別部１０２は、取得部１０１から第２の計算撮像画像を取得する。識別部１０２は、例えば取得部１０１から取得した識別器を含む。識別器は、画像から対象物の情報を取得するための識別モデルであって、識別部１０２が識別のために用いるデータである。識別器は、機械学習を用いて構築される。計算撮像画像を学習用データとして用いて機械学習することによって、識別性能を向上した識別器の構築が可能である。なお、学習用データとして機械学習のために用いられる計算撮像画像を第１の計算撮像画像とも呼ぶ。本実施の形態では、識別器に適用される機械学習モデルは、ＤｅｅｐＬｅａｒｎｉｎｇ（深層学習）等のニューラルネットワークを用いた機械学習モデルであるが、他の学習モデルであってもよい。例えば、機械学習モデルは、ＲａｎｄｏｍＦｏｒｅｓｔ、又はＧｅｎｅｔｉｃＰｒｏｇｒａｍｍｉｎｇ等を用いた機械学習モデルであってもよい。 The identification unit 102 acquires the second calculated captured image from the acquisition unit 101. The identification unit 102 includes, for example, a classifier acquired from the acquisition unit 101. The discriminator is an identification model for acquiring object information from an image, and is data used by the discriminator 102 for discrimination. The classifier is constructed using machine learning. It is possible to construct a discriminator with improved discrimination performance by machine learning using the calculated captured image as learning data. The calculated captured image used for machine learning as the learning data is also referred to as a first calculated captured image. In the present embodiment, the machine learning model applied to the discriminator is a machine learning model using a neural network such as deep learning, but may be another learning model. For example, the machine learning model may be a machine learning model using Random Forest, Genetic Programming, or the like.

識別部１０２は、識別器を用いて、第２の計算撮像画像中の物体（撮像対象物及び撮像対象物の周辺環境）の情報を取得する。具体的には、識別部１０２は、第２の計算撮像画像に含まれる物体を識別し、且つ、第２の計算撮像画像中の物体の位置を取得する。つまり、物体の情報は、物体の存在の有無と、物体の位置とを含む。物体の位置は、画像上における平面的な位置と、画像の奥行方向の位置とを含んでもよい。例えば、識別部１０２は、識別器を用いて、第２の計算撮像画像の少なくとも１つの画素毎に、物体が存在するか否かを識別する。識別部１０２は、第２の計算撮像画像中の物体の位置として、物体が存在することが識別された少なくとも１つの画素の位置を取得する。ここで、本明細書における物体の識別とは、第２の計算撮像画像において、物体が存在する画素を検出することを含む。 The identification unit 102 acquires information on the object (the imaging target object and the surrounding environment of the imaging target object) in the second calculated captured image using the classifier. Specifically, the identification unit 102 identifies an object included in the second calculated captured image, and acquires the position of the object in the second calculated captured image. That is, the object information includes the presence / absence of the object and the position of the object. The position of the object may include a planar position on the image and a position in the depth direction of the image. For example, the identification unit 102 identifies whether or not an object exists for each at least one pixel of the second calculated captured image using a classifier. The identification unit 102 acquires the position of at least one pixel that is identified as the presence of the object as the position of the object in the second calculated captured image. Here, the identification of the object in this specification includes detecting a pixel in which the object exists in the second calculated captured image.

例えば、識別システム１が自動車に搭載される場合、物体の例は、人物、自動車、自転車又は信号である。なお、識別部１０２は、第２の計算撮像画像を用いて、あらかじめ定められた１種類の物体を識別してもよく、複数の種類の物体を識別してもよい。また、識別部１０２は、人物、自動車又は自転車を含む移動体などのカテゴリ単位で、物体を識別してもよい。このとき、識別する物体の種類（カテゴリ）に応じた識別器が用いられてもよい。識別器は、例えば画像識別装置１０が有するメモリ（例えば後述する第一メモリ２０３）に記録される。 For example, when the identification system 1 is mounted on a car, examples of the object are a person, a car, a bicycle, or a signal. Note that the identifying unit 102 may identify a predetermined type of object or a plurality of types of objects using the second calculated captured image. The identification unit 102 may identify an object in units of categories such as a moving object including a person, a car, or a bicycle. At this time, a discriminator corresponding to the type (category) of the object to be identified may be used. The discriminator is recorded in, for example, a memory (for example, a first memory 203 described later) included in the image discrimination device 10.

例えば、ライトフィールド画像には、画像値に加えて、各画素の被写体の奥行情報も含まれる。また、非特許文献２にも記載されるように、被写体の奥行情報を学習データに用いることは、識別器の識別能力向上に有効である。例えば、画像において小さく写っている物体が、遠方に存在する被写体であることを認識でき、ゴミとして認識されない（つまり無視されてしまう）ことを抑制できる。このため、ライトフィールド画像を使用した機械学習により構築された識別器は、その識別性能を向上することができる。同様に、圧縮センシング画像及び符号化開口画像を用いた機械学習も、識別器の識別性能の向上に有効である。 For example, the light field image includes depth information of the subject of each pixel in addition to the image value. Further, as described in Non-Patent Document 2, the use of subject depth information as learning data is effective in improving the discrimination capability of the discriminator. For example, it is possible to recognize that an object that is small in the image is a subject that exists in the distance, and it is possible to prevent the object from being recognized as dust (that is, ignored). For this reason, a discriminator constructed by machine learning using a light field image can improve its discrimination performance. Similarly, machine learning using a compressed sensing image and a coded aperture image is also effective in improving the identification performance of the classifier.

また、識別システム１は、後述する図２に示すように、識別器を生成するための学習装置１２を備えてもよい。この場合、画像識別装置１０の識別部１０２は、学習装置１２で生成された、言い換えると学習が完了した識別器を使用する。 Further, the identification system 1 may include a learning device 12 for generating a classifier as shown in FIG. 2 described later. In this case, the identification unit 102 of the image identification device 10 uses the classifier generated by the learning device 12, in other words, the learning is completed.

出力部１０３は、識別部１０２の識別結果を出力する。出力部１０３は、識別システム１がさらにディスプレイを備える場合には、当該ディスプレイに、識別結果を出力する指示を出力する。又は、出力部１０３は、通信部を有し、通信部を介して、有線又は無線で、識別結果を出力してもよい。上述の通り、物体の情報は、物体の存在の有無と、物体の位置とを含み、物体の情報についての識別結果に応じて自動運転等が行われ、また、例えばディスプレイ等に物体の情報が出力されることで、ユーザは識別システム１が搭載された移動体の周辺の状況を認識できる。 The output unit 103 outputs the identification result of the identification unit 102. When the identification system 1 further includes a display, the output unit 103 outputs an instruction to output the identification result to the display. Alternatively, the output unit 103 may include a communication unit, and output the identification result via a communication unit in a wired or wireless manner. As described above, the object information includes the presence / absence of the object and the position of the object, and automatic operation or the like is performed according to the identification result of the object information. By outputting the information, the user can recognize the situation around the moving body on which the identification system 1 is mounted.

上述のような取得部１０１、識別部１０２及び出力部１０３からなる画像識別装置１０の構成要素は、ＣＰＵ（Central Processing Unit）又はＤＳＰ（Digital Signal Processor）等のプロセッサ、並びに、ＲＡＭ（Random Access Memory）及びＲＯＭ（Read−Only Memory）等のメモリなどからなる処理回路により構成されてもよい。上記構成要素の一部又は全部の機能は、ＣＰＵ又はＤＳＰがＲＡＭを作業用のメモリとして用いてＲＯＭに記録されたプログラムを実行することによって達成されてもよい。また、上記構成要素の一部又は全部の機能は、電子回路又は集積回路等の専用のハードウェア回路によって達成されてもよい。上記構成要素の一部又は全部の機能は、上記のソフトウェア機能とハードウェア回路との組み合わせによって構成されてもよい。 The components of the image identification apparatus 10 including the acquisition unit 101, the identification unit 102, and the output unit 103 as described above are a processor such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), and a RAM (Random Access Memory). ) And a processing circuit including a memory such as a ROM (Read-Only Memory). Some or all of the functions of the above components may be achieved by the CPU or DSP executing a program recorded in the ROM using the RAM as a working memory. Further, some or all of the functions of the above components may be achieved by a dedicated hardware circuit such as an electronic circuit or an integrated circuit. A part or all of the functions of the constituent elements may be configured by a combination of the software function and a hardware circuit.

次に、識別システムが学習装置を含むケースとして、実施の形態に係る識別システム１の変形例を、図２を用いて説明する。 Next, as a case where the identification system includes a learning device, a modified example of the identification system 1 according to the embodiment will be described with reference to FIG.

図２は、実施の形態の変形例に係る識別システム１Ａの機能的な構成の一例を示す模式図である。 FIG. 2 is a schematic diagram illustrating an example of a functional configuration of an identification system 1A according to a modification of the embodiment.

図２に示すように、変形例に係る識別システム１Ａは、画像識別装置１０と、撮像部１１と、学習装置１２とを備える。学習装置１２は、第一画像取得部１２１と、第二画像取得部１２２と、識別正解取得部１２３と、学習部１２４とを備える。画像識別装置１０、撮像部１１及び学習装置１２は、１つの装置に搭載されてもよく、複数の装置に分かれて搭載されてもよい。画像識別装置１０、撮像部１１及び学習装置１２が複数の装置に分かれて搭載される場合、有線通信又は無線通信を介して、装置間で情報が授受されてもよい。適用される有線通信及び無線通信は、上記で例示したもののいずれかであってもよい。 As illustrated in FIG. 2, the identification system 1 A according to the modification includes an image identification device 10, an imaging unit 11, and a learning device 12. The learning device 12 includes a first image acquisition unit 121, a second image acquisition unit 122, an identification correct answer acquisition unit 123, and a learning unit 124. The image identification device 10, the imaging unit 11, and the learning device 12 may be mounted on one device, or may be mounted on a plurality of devices. When the image identification device 10, the imaging unit 11, and the learning device 12 are separately installed in a plurality of devices, information may be exchanged between the devices via wired communication or wireless communication. The applied wired communication and wireless communication may be any of those exemplified above.

図３は、実施の形態の変形例に係る識別システム１Ａのハードウェア構成の一例を示す模式図である。 FIG. 3 is a schematic diagram illustrating an example of a hardware configuration of an identification system 1A according to a modification of the embodiment.

図３に示すように、学習装置１２は、第二入力回路２２１と、第三入力回路２２２と、第二演算回路２２３と、第二メモリ２２４とを備える。また、画像識別装置１０は、第一入力回路２０１と、第一演算回路２０２と、第一メモリ２０３と、出力回路２０４とを備える。 As illustrated in FIG. 3, the learning device 12 includes a second input circuit 221, a third input circuit 222, a second arithmetic circuit 223, and a second memory 224. Further, the image identification device 10 includes a first input circuit 201, a first arithmetic circuit 202, a first memory 203, and an output circuit 204.

第一入力回路２０１、第一演算回路２０２及び出力回路２０４は、画像識別装置１０が備える処理回路の一例であり、第一メモリ２０３は、画像識別装置１０が備えるメモリの一例である。図１及び図２を参照すると、第一入力回路２０１は、取得部１０１に対応する。第一演算回路２０２は、識別部１０２に対応する。出力回路２０４は、出力部１０３に対応する。このように、取得部１０１、識別部１０２及び出力部１０３は、第一入力回路２０１、第一演算回路２０２及び出力回路２０４に対応していることから、第取得部１０１、識別部１０２及び出力部１０３についても、画像識別装置１０が備える処理回路の一例といえる。第一メモリ２０３は、第一入力回路２０１、第一演算回路２０２及び出力回路２０４が処理を実行するためのコンピュータプログラム、取得部１０１が取得する第２の計算撮像画像、及び、識別部１０２が用いる識別器等を記憶する。第一メモリ２０３は、１つのメモリで構成されてもよく、同じ種類又は異なる種類の複数のメモリで構成されてもよい。第一入力回路２０１及び出力回路２０４は、通信回路を含んでもよい。 The first input circuit 201, the first arithmetic circuit 202, and the output circuit 204 are an example of a processing circuit included in the image identification device 10, and the first memory 203 is an example of a memory included in the image identification device 10. Referring to FIGS. 1 and 2, the first input circuit 201 corresponds to the acquisition unit 101. The first arithmetic circuit 202 corresponds to the identification unit 102. The output circuit 204 corresponds to the output unit 103. Thus, since the acquisition unit 101, the identification unit 102, and the output unit 103 correspond to the first input circuit 201, the first arithmetic circuit 202, and the output circuit 204, the first acquisition unit 101, the identification unit 102, and the output The unit 103 is also an example of a processing circuit included in the image identification device 10. The first memory 203 includes a computer program for the first input circuit 201, the first arithmetic circuit 202, and the output circuit 204 to execute processing, a second calculated captured image acquired by the acquisition unit 101, and an identification unit 102. The discriminator to be used is stored. The first memory 203 may be composed of a single memory, or may be composed of a plurality of memories of the same type or different types. The first input circuit 201 and the output circuit 204 may include a communication circuit.

第二入力回路２２１、第三入力回路２２２及び第二演算回路２２３は、学習装置１２が備える処理回路の一例であり、第二メモリ２２４は、学習装置１２が備えるメモリの一例である。図２及び図３を参照すると、第二入力回路２２１は、第一画像取得部１２１に対応する。第二入力回路２２１は、通信回路を含んでもよい。第三入力回路２２２は、第二画像取得部１２２に対応する。第三入力回路２２２は、通信回路を含んでもよい。第二演算回路２２３は、識別正解取得部１２３及び学習部１２４に対応する。第二演算回路２２３は、通信回路を含んでもよい。このように、第一画像取得部１２１、第二画像取得部１２２、識別正解取得部１２３及び学習部１２４は、第二入力回路２２１、第三入力回路２２２及び第二演算回路２２３に対応していることから、第一画像取得部１２１、第二画像取得部１２２、識別正解取得部１２３及び学習部１２４についても、学習装置１２が備える処理回路の一例といえる。第二メモリ２２４は、第二入力回路２２１、第三入力回路２２２及び第二演算回路２２３が処理を実行するためのコンピュータプログラム、第一画像取得部１２１が取得する第１の計算撮像画像、第二画像取得部１２２が取得する撮像画像、識別正解取得部１２３が取得する識別正解、学習部１２４が生成した識別器等を記憶する。第二メモリ２２４は、１つのメモリで構成されてもよく、同じ種類又は異なる種類の複数のメモリで構成されてもよい。 The second input circuit 221, the third input circuit 222, and the second arithmetic circuit 223 are examples of processing circuits included in the learning device 12, and the second memory 224 is an example of memory included in the learning device 12. Referring to FIGS. 2 and 3, the second input circuit 221 corresponds to the first image acquisition unit 121. The second input circuit 221 may include a communication circuit. The third input circuit 222 corresponds to the second image acquisition unit 122. The third input circuit 222 may include a communication circuit. The second arithmetic circuit 223 corresponds to the identification correct answer acquisition unit 123 and the learning unit 124. The second arithmetic circuit 223 may include a communication circuit. Thus, the first image acquisition unit 121, the second image acquisition unit 122, the identification correct answer acquisition unit 123, and the learning unit 124 correspond to the second input circuit 221, the third input circuit 222, and the second arithmetic circuit 223. Therefore, the first image acquisition unit 121, the second image acquisition unit 122, the identification correct answer acquisition unit 123, and the learning unit 124 are also examples of processing circuits included in the learning device 12. The second memory 224 includes a computer program for executing processing by the second input circuit 221, the third input circuit 222, and the second arithmetic circuit 223, the first calculated captured image acquired by the first image acquisition unit 121, the first The captured image acquired by the two image acquisition unit 122, the identification correct answer acquired by the identification correct acquisition unit 123, the classifier generated by the learning unit 124, and the like are stored. The second memory 224 may be composed of a single memory, or may be composed of a plurality of memories of the same type or different types.

第一入力回路２０１、第一演算回路２０２、出力回路２０４、第二入力回路２２１、第三入力回路２２２及び第二演算回路２２３は、ＣＰＵ又はＤＳＰ等のプロセッサを含む処理回路で構成され得る。第一メモリ２０３及び第二メモリ２２４は、例えば、ＲＯＭ、ＲＡＭ、フラッシュメモリなどの半導体メモリ、ハードディスクドライブ、又は、ＳＳＤ（Solid State Drive）等の記憶装置によって実現される。第一メモリ２０３及び第二メモリ２２４は、１つのメモリにまとめられてもよい。プロセッサは、メモリに展開されたコンピュータプログラムに記述された命令群を実行する。これにより、プロセッサは種々の機能を実現することができる。 The first input circuit 201, the first arithmetic circuit 202, the output circuit 204, the second input circuit 221, the third input circuit 222, and the second arithmetic circuit 223 may be configured by a processing circuit including a processor such as a CPU or a DSP. The first memory 203 and the second memory 224 are realized by a storage device such as a semiconductor memory such as a ROM, a RAM, and a flash memory, a hard disk drive, or an SSD (Solid State Drive), for example. The first memory 203 and the second memory 224 may be combined into one memory. The processor executes a group of instructions described in the computer program expanded in the memory. Thereby, the processor can realize various functions.

学習装置１２の第一画像取得部１２１及び第二画像取得部１２２は、機械学習のための第１の計算撮像画像及び撮像画像を取得する。第一画像取得部１２１のハードウェアの例は計算撮像画像を撮像するためのカメラであり、具体的にはマルチピンホールカメラ、ＣｏｄｅｄＡｐｅｒｔｕｒｅカメラ、ライトフィールドカメラ、又は、レンズレスカメラ等である。つまり、第一画像取得部１２１は、例えば、第二入力回路２２１と計算撮像画像を撮像するためのカメラとによって実現される。第二画像取得部１２２のハードウェアの例は撮像画像を撮像するためのカメラであり、具体的にはデジタルカメラ等である。つまり、第二画像取得部１２２は、例えば、第三入力回路２２２と撮像画像を撮像するためのカメラとによって実現される。 The first image acquisition unit 121 and the second image acquisition unit 122 of the learning device 12 acquire a first calculated captured image and a captured image for machine learning. An example of hardware of the first image acquisition unit 121 is a camera for capturing a calculated captured image, and specifically, a multi-pinhole camera, a coded aperture camera, a light field camera, a lensless camera, or the like. That is, the first image acquisition unit 121 is realized by, for example, the second input circuit 221 and a camera for capturing a calculated captured image. An example of hardware of the second image acquisition unit 122 is a camera for capturing a captured image, specifically, a digital camera or the like. That is, the second image acquisition unit 122 is realized by, for example, the third input circuit 222 and a camera for capturing a captured image.

例えば、計算撮像画像を撮像するためのカメラによって撮像された第１の計算撮像画像は第二メモリ２２４に記憶され、第二入力回路２２１が第二メモリ２２４から第１の計算撮像画像を取得することで、第一画像取得部１２１は、第１の計算撮像画像を取得する。なお、第一画像取得部１２１は、ハードウェアとして計算撮像画像を撮像するためのカメラを含んでいなくてもよい。この場合、第一画像取得部１２１（第二入力回路２２１）は、撮像部１１から第１の計算撮像画像を取得してもよく（具体的には、撮像部１１によって撮像された第１の計算撮像画像は第二メモリ２２４に記憶され、第二メモリ２２４から第１の計算撮像画像を取得してもよく）、識別システム１Ａの外部から有線通信又は無線通信を介して、第１の計算撮像画像を取得してもよい。適用される有線通信及び無線通信は、上記で例示したもののいずれかであってもよい。 For example, a first calculated captured image captured by a camera for capturing a calculated captured image is stored in the second memory 224, and the second input circuit 221 acquires the first calculated captured image from the second memory 224. Thus, the first image acquisition unit 121 acquires the first calculated captured image. Note that the first image acquisition unit 121 may not include a camera for capturing a calculated captured image as hardware. In this case, the first image acquisition unit 121 (second input circuit 221) may acquire the first calculated captured image from the imaging unit 11 (specifically, the first image captured by the imaging unit 11). The calculated captured image is stored in the second memory 224, and the first calculated captured image may be acquired from the second memory 224), and the first calculation is performed from outside the identification system 1A via wired communication or wireless communication. A captured image may be acquired. The applied wired communication and wireless communication may be any of those exemplified above.

また、例えば、撮像画像を撮像するためのカメラによって撮像された撮像画像は第二メモリ２２４に記憶され、第三入力回路２２２が第二メモリ２２４から撮像画像を取得することで、第二画像取得部１２２は、撮像画像を取得する。なお、第二画像取得部１２２は、ハードウェアとして撮像画像を撮像するためのカメラを含んでいなくてもよい。この場合、第二画像取得部１２２（第三入力回路２２２）は、識別システム１Ａの外部から有線通信又は無線通信を介して、撮像画像を取得してもよい。適用される有線通信及び無線通信は、上記で例示したもののいずれかであってもよい。 Further, for example, a captured image captured by a camera for capturing a captured image is stored in the second memory 224, and the third input circuit 222 acquires the captured image from the second memory 224, thereby acquiring the second image. The unit 122 acquires a captured image. Note that the second image acquisition unit 122 may not include a camera for capturing a captured image as hardware. In this case, the second image acquisition unit 122 (third input circuit 222) may acquire a captured image from outside the identification system 1A via wired communication or wireless communication. The applied wired communication and wireless communication may be any of those exemplified above.

識別正解取得部１２３は、第一画像取得部１２１が取得した第１の計算撮像画像を用いた機械学習のために、識別正解を取得する。識別正解は、第１の計算撮像画像と共に、識別システム１Ａの外部から与えられてもよく、ユーザが識別正解を手動等により入力することによって与えられてもよい。識別正解は、第１の計算撮像画像に含まれる被写体が属するカテゴリ情報と、被写体の位置情報とを含む。被写体のカテゴリの例は、人物、自動車、自転車又は信号等である。位置情報は、画像上の位置（具体的には、平面における位置又は奥行方向における位置）を含む。識別正解取得部１２３は、取得した識別正解を、第１の計算撮像画像と対応付けて、第二メモリ２２４に格納する。 The identification correct answer acquisition unit 123 acquires an identification correct answer for machine learning using the first calculated captured image acquired by the first image acquisition unit 121. The identification correct answer may be given together with the first calculated captured image from the outside of the identification system 1A, or may be given by the user manually inputting the identification correct answer. The identification correct answer includes category information to which the subject included in the first calculated captured image belongs, and position information of the subject. Examples of subject categories are people, cars, bicycles or signals. The position information includes a position on the image (specifically, a position on a plane or a position in the depth direction). The identification correct answer acquisition unit 123 stores the acquired identification correct answer in the second memory 224 in association with the first calculated captured image.

ただし、前述のように、取得部１０１及び第一画像取得部１２１が取得する計算撮像画像は、人によって実空間の状態と同様に視覚的に認識できない画像である。そのため、第一画像取得部１２１が取得した第１の計算撮像画像に識別正解を入力することは困難である。そこで、本実施の形態の識別システム１Ａは、第二画像取得部１２２を有し、第一画像取得部１２１が取得した第１の計算撮像画像ではなく、第二画像取得部１２２が取得した、人によって実空間の状態と同様に視覚的に認識できる撮像画像に対して識別正解を入力する。詳細は、後述する。 However, as described above, the calculated captured image acquired by the acquisition unit 101 and the first image acquisition unit 121 is an image that cannot be visually recognized by a person in the same manner as the state of the real space. Therefore, it is difficult to input an identification correct answer to the first calculated captured image acquired by the first image acquisition unit 121. Therefore, the identification system 1A of the present embodiment includes the second image acquisition unit 122, which is acquired by the second image acquisition unit 122 instead of the first calculated captured image acquired by the first image acquisition unit 121. An identification correct answer is input to a captured image that can be visually recognized by a person in the same manner as in the real space. Details will be described later.

学習部１２４は、第一画像取得部１２１が取得した第１の計算撮像画像と、識別正解取得部１２３が取得した、第二画像取得部１２２が取得した撮像画像に対する識別正解とを用いて、識別部１０２の識別器の学習を行う。学習部１２４は、第二メモリ２２４に格納された識別器に機械学習をさせ、学習後の最新の識別器を第二メモリ２２４に格納する。識別部１０２は、第二メモリ２２４に格納された最新の識別器を取得し、第一メモリ２０３に格納しつつ、識別処理に使用する。上記機械学習は、例えば、ディープラーニングなどにおける誤差逆伝播法（ＢＰ：BackPropagation）などによって実現される。具体的には、学習部１２４は、識別器に第１の計算撮像画像を入力し、識別器が出力する識別結果を取得する。そして、学習部１２４は、識別結果が識別正解となるように識別器を調整する。学習部１２４は、このような調整をそれぞれ異なる複数の（例えば数千組の）第１の計算撮像画像及びこれに対応する識別正解について繰り返すことによって、識別器の識別精度を向上させる。 The learning unit 124 uses the first calculated captured image acquired by the first image acquisition unit 121 and the identification correct answer for the captured image acquired by the second image acquisition unit 122 acquired by the identification correct answer acquisition unit 123. Learning of the classifier of the classifier 102 is performed. The learning unit 124 causes the discriminator stored in the second memory 224 to perform machine learning, and stores the latest discriminator after learning in the second memory 224. The identification unit 102 acquires the latest classifier stored in the second memory 224 and stores it in the first memory 203 while using it for the identification process. The machine learning is realized by, for example, an error back propagation (BP) in deep learning or the like. Specifically, the learning unit 124 inputs the first calculated captured image to the classifier, and acquires the identification result output by the classifier. Then, the learning unit 124 adjusts the discriminator so that the discrimination result becomes the discrimination correct answer. The learning unit 124 improves the discrimination accuracy of the discriminator by repeating such adjustment for a plurality of (for example, several thousand) first calculated captured images and discrimination correct answers corresponding thereto.

次に、図２〜図４を参照しつつ、学習装置１２の動作について説明する。 Next, the operation of the learning device 12 will be described with reference to FIGS.

図４は、学習装置１２の主要な処理の流れの一例を示すフローチャートである。 FIG. 4 is a flowchart illustrating an example of the main processing flow of the learning device 12.

まず、ステップＳ１において、学習部１２４は、第一画像取得部１２１が取得する第１の計算撮像画像と、第二画像取得部１２２が取得する撮像画像の画像上での位置（画素）の対応関係を取得する。具体的には、学習部１２４は、第１の計算撮像画像が有する複数の第１の画素及び撮像画像が有する複数の第２の画素の対応関係を取得する。これは、第１の計算撮像画像及び撮像画像に対して幾何学的キャリブレーションが行なわれることで実現される。幾何学的キャリブレーションは、３次元位置が既知の点が第１の計算撮像画像及び撮像画像のどこに撮像されるかを事前に取得し、その情報を元に被写体の３次元位置と第１の計算撮像画像及び撮像画像との関係を求めるものである。これは、例えばＴｓａｉのキャリブレーションとして知られている手法を利用することで実現できる。通常、撮像画像からは被写体の３次元位置を求めることができないが、前述のように、計算撮像画像であるライトフィールド画像では１枚の画像から３次元位置を求めることができる。また、第一画像取得部１２１が取得する第１の計算撮像画像と、第二画像取得部１２２が取得する撮像画像の画像上での対応点（画素）を取得することで、キャリブレーションを実現することができる。例えば、第１の計算撮像画像と撮像画像との対応関係が取得されることで、第１の計算撮像画像と撮像画像との原点合わせをすることができる。なお、第１の計算撮像画像を撮像するカメラと、撮像画像を撮像するカメラとの位置関係が変わらなければ、このようなキャリブレーションは一度行うだけでよい。なお、以下の説明では、計算撮像画像がライトフィールド画像であるとして説明する。 First, in step S 1, the learning unit 124 associates the first calculated captured image acquired by the first image acquisition unit 121 with the position (pixel) on the image of the captured image acquired by the second image acquisition unit 122. Get relationship. Specifically, the learning unit 124 acquires a correspondence relationship between the plurality of first pixels included in the first calculated captured image and the plurality of second pixels included in the captured image. This is realized by performing geometric calibration on the first calculated captured image and the captured image. Geometric calibration acquires in advance in the first calculated captured image and the captured image where a point with a known three-dimensional position is captured, and based on that information, the three-dimensional position of the subject and the first The calculated captured image and the relationship with the captured image are obtained. This can be realized, for example, by using a technique known as Tsai calibration. Normally, the three-dimensional position of the subject cannot be obtained from the captured image, but as described above, the three-dimensional position can be obtained from one image in the light field image that is the calculated captured image. Further, calibration is realized by acquiring corresponding points (pixels) on the image of the first calculated captured image acquired by the first image acquisition unit 121 and the captured image acquired by the second image acquisition unit 122. can do. For example, by acquiring the correspondence between the first calculated captured image and the captured image, the origin of the first calculated captured image and the captured image can be adjusted. If the positional relationship between the camera that captures the first calculated captured image and the camera that captures the captured image does not change, such calibration only needs to be performed once. In the following description, it is assumed that the calculated captured image is a light field image.

ライトフィールド画像は、画素値と奥行情報との両方の情報を有する。ライトフィールド画像は、ライトフィールドカメラによって取得される。ライトフィールドカメラの具体例は、マルチピンホール又はマイクロレンズを使用したカメラである。撮像部１１がライトフィールドカメラであり、第一画像取得部１２１は、撮像部１１が撮像したライトフィールド画像を取得してもよい。又は、第一画像取得部１２１は、識別システム１Ａの外部から有線通信又は無線通信を介してライトフィールド画像を取得してもよい。 The light field image has both information of pixel values and depth information. The light field image is acquired by a light field camera. A specific example of the light field camera is a camera using a multi-pinhole or a microlens. The imaging unit 11 may be a light field camera, and the first image acquisition unit 121 may acquire a light field image captured by the imaging unit 11. Alternatively, the first image acquisition unit 121 may acquire a light field image from outside the identification system 1A via wired communication or wireless communication.

図５は、マルチピンホールを使用したライトフィールドカメラの例を示す図である。 FIG. 5 is a diagram illustrating an example of a light field camera using a multi-pinhole.

図５に示すライトフィールドカメラ２１１は、マルチピンホールマスク２１１ａと、イメージセンサ２１１ｂとを有する。マルチピンホールマスク２１１ａは、イメージセンサ２１１ｂから一定距離離れて配置されている。マルチピンホールマスク２１１ａは、ランダム又は等間隔に配置された複数のピンホール２１１ａａを有している。複数のピンホール２１１ａａのことを、マルチピンホールとも呼ぶ。イメージセンサ２１１ｂは、各ピンホール２１１ａａを通じて被写体の画像を取得する。ピンホールを通じて取得される画像を、ピンホール画像と呼ぶ。各ピンホール２１１ａａの位置及び大きさによって、被写体のピンホール画像は異なるため、イメージセンサ２１１ｂは、複数のピンホール画像の重畳画像を取得する。ピンホール２１１ａａの位置は、イメージセンサ２１１ｂ上に投影される被写体の位置に影響を与え、ピンホール２１１ａａの大きさは、ピンホール画像のボケに影響を与える。マルチピンホールマスク２１１ａを用いることによって、位置及びボケの程度が異なる複数のピンホール画像を重畳して取得することが可能である。被写体がピンホール２１１ａａから離れている場合、複数のピンホール画像はほぼ同じ位置に投影される。一方、被写体がピンホール２１１ａａに近い場合、複数のピンホール画像は離れた位置に投影される。このように、重畳された複数のピンホール画像のずれ量と、被写体とマルチピンホールマスク２１１ａ間の距離とは対応しているため、重畳画像には当該ずれ量に応じた被写体の奥行情報が含まれている。 A light field camera 211 shown in FIG. 5 includes a multi-pinhole mask 211a and an image sensor 211b. The multi-pinhole mask 211a is arranged at a certain distance from the image sensor 211b. The multi-pinhole mask 211a has a plurality of pinholes 211aa arranged at random or at equal intervals. The plurality of pinholes 211aa is also referred to as a multipinhole. The image sensor 211b acquires an image of a subject through each pinhole 211aa. An image acquired through a pinhole is called a pinhole image. Since the pinhole image of the subject differs depending on the position and size of each pinhole 211aa, the image sensor 211b acquires a superimposed image of a plurality of pinhole images. The position of the pinhole 211aa affects the position of the subject projected on the image sensor 211b, and the size of the pinhole 211aa affects the blur of the pinhole image. By using the multi-pinhole mask 211a, it is possible to superimpose and acquire a plurality of pinhole images having different positions and degrees of blur. When the subject is away from the pinhole 211aa, the plurality of pinhole images are projected at substantially the same position. On the other hand, when the subject is close to the pinhole 211aa, a plurality of pinhole images are projected at distant positions. As described above, since the shift amount of the plurality of superimposed pinhole images corresponds to the distance between the subject and the multi-pinhole mask 211a, the superimposed image includes the depth information of the subject corresponding to the shift amount. include.

例えば、図６及び図７にはそれぞれ、通常撮像画像の例と、マルチピンホールを使用したライトフィールドカメラによるライトフィールド画像（計算撮像画像）の例とが、示されている。 For example, FIG. 6 and FIG. 7 each show an example of a normal captured image and an example of a light field image (calculated captured image) by a light field camera using a multi-pinhole.

図６は、通常撮像された被写体の画像（撮像画像）の例を示す模式図であり、図７は、マルチピンホールマスクを含むライトフィールドカメラを使用して撮像された被写体の画像（計算撮像画像）の例を示す模式図である。 FIG. 6 is a schematic diagram showing an example of a subject image (captured image) that is normally captured, and FIG. 7 is an image of the subject (computed imaging) captured using a light field camera including a multi-pinhole mask. It is a schematic diagram which shows the example of an image.

図６に示すように、通常撮像画像において、被写体として、道路上の人物Ａと自動車Ｂ及びＣとが写し出される。これらの被写体を、例えば４つのピンホールを有するライトフィールドカメラで撮像した場合、図７に示すように、人物Ａ、自動車Ｂ及びＣそれぞれの画像は、複数の重畳された画像として取得される。具体的には、人物Ａの画像は、人物Ａ１、Ａ２及びＡ３として取得され、自動車Ｂの画像は、自動車Ｂ１、Ｂ２、Ｂ３及びＢ４として取得され、自動車Ｃの画像は、自動車Ｃ１、Ｃ２、Ｃ３及びＣ４として取得される。また、図６及び図７において符号を付していないが、図６において自動車Ｂ及びＣが走行する道路の画像についても、図７に示すように、複数の重畳された画像として取得される。このように、計算撮像画像は、撮像対象物（例えば人物Ａ、自動車Ｂ及びＣ等）及び撮像対象物の周辺環境（例えば道路等）がそれぞれ複数重畳された視差情報を含んだ画像となる。 As shown in FIG. 6, in a normal captured image, a person A on a road and cars B and C are projected as subjects. When these subjects are imaged by, for example, a light field camera having four pinholes, the images of the person A, the cars B, and C are acquired as a plurality of superimposed images as shown in FIG. Specifically, the image of the person A is acquired as the persons A1, A2, and A3, the image of the car B is acquired as the cars B1, B2, B3, and B4, and the image of the car C is acquired by the cars C1, C2, Obtained as C3 and C4. 6 and FIG. 7, the road images on which the automobiles B and C travel are also acquired as a plurality of superimposed images as shown in FIG. 7. As described above, the calculated captured image is an image including parallax information in which a plurality of imaging objects (for example, person A, automobiles B, and C) and surrounding environments (for example, roads) of the imaging object are respectively superimposed.

図４に示すように、ステップＳ２において、第一画像取得部１２１は第二メモリ２２４から撮像対象物及び撮像対象物の周辺環境を含む第１の計算撮像画像を取得し、ステップＳ３において、第二画像取得部１２２は第二メモリ２２４から当該撮像対象物及び当該周辺環境を含む撮像画像を取得する。ここで、第一画像取得部１２１は実空間の状態と同様に視覚的に認識できない画像である計算撮像画像を取得するが、第二画像取得部１２２は、実空間の状態と同様に視覚的に認識できる画像である通常の撮像画像を取得する。 As shown in FIG. 4, in step S 2, the first image acquisition unit 121 acquires a first calculated captured image including the imaging target and the surrounding environment of the imaging target from the second memory 224, and in step S 3, The two-image acquisition unit 122 acquires a captured image including the imaging object and the surrounding environment from the second memory 224. Here, the first image acquisition unit 121 acquires a calculated captured image that is an image that cannot be visually recognized in the same manner as the state of the real space, but the second image acquisition unit 122 is visually in the same way as the state of the real space. A normal captured image that is a recognizable image is acquired.

図４に示すように、ステップＳ４において、識別正解取得部１２３は、第二画像取得部１２２が取得した撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の識別結果（識別正解）を取得する。識別正解は、例えば、撮像対象物及び撮像対象物の周辺環境（人物、自動車、自転車又は信号等の被写体）が属するカテゴリ情報と、画像上での撮像対象物及び撮像対象物の周辺環境の平面における位置及び領域とを含む。なお、識別正解は、画像上での撮像対象物及び撮像対象物の周辺環境の奥行方向における位置を含んでいてもかまわない。識別正解は、第１の計算撮像画像と共に識別システム１Ａの外部から与えられたもの、又は、第二画像取得部１２２による撮像画像に対してユーザによって与えられたものである。識別正解取得部１２３は、撮像画像において、被写体の位置に基づき、被写体を特定し、特定した被写体とカテゴリとを対応付ける。この結果、識別正解取得部１２３は、被写体の領域と、被写体のカテゴリと、第二画像取得部１２２が取得した撮像画像に対する被写体の位置情報とを対応付けて取得し、これらの情報を識別正解とする。 As shown in FIG. 4, in step S 4, the identification correct answer acquiring unit 123 obtains the identification result (identification correct answer) of the imaging target and the surrounding environment of the imaging target included in the captured image acquired by the second image acquisition unit 122. get. The identification correct answer includes, for example, category information to which the imaging object and the surrounding environment of the imaging object (a person such as a person, a car, a bicycle, or a signal) belong, and the plane of the imaging object and the surrounding environment of the imaging object on the image. Position and region. The identification correct answer may include the imaging object on the image and the position in the depth direction of the surrounding environment of the imaging object. The identification correct answer is given from the outside of the identification system 1 A together with the first calculated captured image, or is given by the user to the captured image by the second image acquisition unit 122. The identification correct answer acquisition unit 123 identifies the subject based on the position of the subject in the captured image, and associates the identified subject with the category. As a result, the identification correct answer acquiring unit 123 acquires the subject area, the subject category, and the position information of the subject with respect to the captured image acquired by the second image acquiring unit 122 in association with each other. And

識別正解取得部１２３は、被写体の撮像画像上での平面位置及び領域を決定する際、指標を用いる。例えば、識別正解取得部１２３は、当該指標として、被写体を囲む枠を用いる。以下、被写体を囲む枠を識別領域枠とも呼ぶ。識別領域枠は、被写体の位置及び領域を示すことができる。識別領域枠の一例が、図８Ａ及び図８Ｂに示されている。 The identification correct answer acquisition unit 123 uses an index when determining the plane position and area on the captured image of the subject. For example, the identification correct answer acquiring unit 123 uses a frame surrounding the subject as the index. Hereinafter, the frame surrounding the subject is also referred to as an identification area frame. The identification area frame can indicate the position and area of the subject. An example of the identification area frame is shown in FIGS. 8A and 8B.

図８Ａは、識別領域枠が重畳表示された撮像画像を示す模式的な図である。図８Ｂは、識別領域枠のみを示す模式的な図である。 FIG. 8A is a schematic diagram illustrating a captured image in which an identification area frame is superimposed and displayed. FIG. 8B is a schematic diagram showing only the identification area frame.

図８Ａ及び図８Ｂに示す例では、識別正解取得部１２３は、各被写体を外から囲み且つ各被写体に外接する矩形の識別領域枠を設定する。なお、識別領域枠の形状は、図８Ａ及び図８Ｂの例に限定されない。 In the example shown in FIGS. 8A and 8B, the identification correct answer acquiring unit 123 sets a rectangular identification area frame that surrounds each subject from the outside and circumscribes each subject. Note that the shape of the identification area frame is not limited to the example of FIGS. 8A and 8B.

図８Ａ及び図８Ｂにおいて、識別正解取得部１２３は、例えば、人物Ａに識別領域枠ＦＡを設定し、自動車Ｂに識別領域枠ＦＢを設定し、自動車Ｃに識別領域枠ＦＣを設定する。この際、識別正解取得部１２３は、識別領域枠の形状及びその位置を示す情報として、識別領域枠全体の線形及び座標を算出してもよく、識別領域枠の各頂点の座標を算出してもよく、識別領域枠の左上等の１つの頂点の座標及び各辺の長さを算出してもよい。座標は、例えば上述したように、第１の計算撮像画像と撮像画像とで原点合わせをしたときの当該原点に対する座標である。上述のようにすることで、識別正解取得部１２３は、識別正解として、識別領域枠の領域の平面位置（座標）及び形状等を含む情報を出力する。なお、識別正解として、識別領域枠の領域の平面位置及び形状等の他に撮像画像が含まれていてもよい。また、ここでは、識別正解として、道路には識別領域枠が設定されていないが、道路等の周辺環境に対しても識別領域枠が設定されてもよい。 8A and 8B, the identification correct answer acquiring unit 123 sets, for example, an identification area frame FA for the person A, an identification area frame FB for the automobile B, and an identification area frame FC for the automobile C. At this time, the identification correct answer acquiring unit 123 may calculate the alignment and coordinates of the entire identification area frame as information indicating the shape and position of the identification area frame, and calculate the coordinates of each vertex of the identification area frame. Alternatively, the coordinates of one vertex such as the upper left of the identification area frame and the length of each side may be calculated. For example, as described above, the coordinates are coordinates with respect to the origin when the origin is matched between the first calculated captured image and the captured image. As described above, the identification correct answer acquiring unit 123 outputs information including the plane position (coordinates) and the shape of the area of the identification area frame as the identification correct answer. Note that, as an identification correct answer, a captured image may be included in addition to the planar position and shape of the area of the identification area frame. Further, here, as an identification correct answer, an identification area frame is not set on the road, but an identification area frame may also be set for a surrounding environment such as a road.

また、識別正解取得部１２３は、識別正解として、識別領域枠の情報を取得するのではなく、画素毎に識別正解を取得してもよい。画素毎の識別正解は、図９においてドットハッチングで示すように例えば画像上にマスクとして与えられてもよい。 Further, the identification correct answer acquisition unit 123 may acquire the identification correct answer for each pixel instead of acquiring the identification area frame information as the identification correct answer. The identification correct answer for each pixel may be given as a mask on the image, for example, as shown by dot hatching in FIG.

図９は、画像上でマスクとして与えられた識別正解の例を示す模式図である。 FIG. 9 is a schematic diagram illustrating an example of an identification correct answer given as a mask on an image.

図９の例では、識別正解として、人物ＡにはマスクＡａが与えられ、自動車Ｂ及びＣにはそれぞれマスクＢａ及びＣａが与えられている。このようにすることで、識別正解取得部１２３は、画素毎に識別正解を出力する。なお、ここでは、識別正解として、道路にはマスクが与えられていないが、道路等の周辺環境に対してもマスクが与えられてもよい。 In the example of FIG. 9, the mask Aa is given to the person A and the masks Ba and Ca are given to the cars B and C, respectively, as correct identification answers. By doing in this way, the identification correct answer acquisition part 123 outputs an identification correct answer for every pixel. Here, as an identification correct answer, no mask is given to the road, but a mask may be given to the surrounding environment such as the road.

図４に示すように、ステップＳ５において、学習部１２４は、ステップＳ１で取得された複数の第１の画素及び複数の第２の画素の対応関係を参照して、撮像画像の識別結果に基づいて、第１の計算撮像画像を識別するための識別モデル（識別器）を生成する。例えば、図６に示す撮像画像が有する複数の第２の画素と図７に示す第１の計算撮像画像が有する複数の第１の画素との対応関係を参照することで、撮像画像における各位置（各画素）が第１の計算撮像画像においてどの位置（画素）に対応しているかを認識できる。そして、例えば、図７に示す第１の計算撮像画像に含まれる人物Ａ１、Ａ２及びＡ３についての識別正解が、図６に示す撮像画像の識別結果である図８Ｂに示すような識別領域枠ＦＡの位置又は図９に示すようなマスクＡａの位置となり、かつ、カテゴリが人となるように機械学習が行われて識別器が生成される。同じように、自動車Ｂ１、Ｂ２、Ｂ３及びＢ４についての識別正解が、識別領域枠ＦＢの位置又はマスクＢａの位置となり、かつ、カテゴリが自動車となるように機械学習が行われ、自動車Ｃ１、Ｃ２、Ｃ３及びＣ４についての識別正解が、識別領域枠ＦＣの位置又はマスクＣａの位置となり、かつ、カテゴリが自動車となるように機械学習が行われて識別器が生成される。なお、このとき、撮像対象物及び撮像対象物の周辺環境の奥行方向における位置についても機械学習が行われてもよい。詳細は後述するが、通常の撮像画像を撮像するカメラとしてマルチビューステレオカメラ等を用いることで、容易に当該奥行方向における位置を取得でき、取得した奥行方向における位置に基づいて機械学習を行うことができる。 As illustrated in FIG. 4, in step S5, the learning unit 124 refers to the correspondence relationship between the plurality of first pixels and the plurality of second pixels acquired in step S1, and based on the identification result of the captured image. Thus, an identification model (identifier) for identifying the first calculated captured image is generated. For example, each position in the captured image is referred to by referring to the correspondence relationship between the plurality of second pixels included in the captured image illustrated in FIG. 6 and the plurality of first pixels included in the first calculated captured image illustrated in FIG. It can be recognized which position (pixel) corresponds to each pixel in the first calculated captured image. Then, for example, the identification correct answer for the persons A1, A2, and A3 included in the first calculated captured image shown in FIG. 7 is the identification area frame FA as shown in FIG. 8B, which is the identification result of the captured image shown in FIG. Or the position of the mask Aa as shown in FIG. 9, and the machine learning is performed so that the category is human, and a discriminator is generated. In the same manner, machine learning is performed such that the identification correct answer for the automobiles B1, B2, B3, and B4 is the position of the identification area frame FB or the position of the mask Ba, and the category is the automobile, and the automobiles C1, C2 , C3 and C4 are machine-learned so that the correct answer for identification is the position of the identification area frame FC or the position of the mask Ca, and the category is a car, thereby generating a classifier. At this time, machine learning may also be performed on the imaging object and the position of the surrounding environment of the imaging object in the depth direction. Although details will be described later, a position in the depth direction can be easily acquired by using a multi-view stereo camera or the like as a camera for capturing a normal captured image, and machine learning is performed based on the acquired position in the depth direction. Can do.

図６に示すような撮像画像及び図７に示すような第１の計算撮像画像の組を数多く（例えば数千組）準備する。学習部１２４は、第二メモリ２２４に格納された識別器を取得し、識別器にこれらの第１の計算撮像画像を入力し出力結果を取得し、出力結果が第１の計算撮像画像のそれぞれに対応する撮像画像を用いて入力された識別正解となるように、識別器を調整する。そして、学習部１２４は、調整後の識別器を第二メモリ２２４に格納することで第二メモリ２２４内の識別器を更新する。 Many sets (for example, several thousand sets) of captured images as shown in FIG. 6 and first calculated captured images as shown in FIG. 7 are prepared. The learning unit 124 acquires the discriminator stored in the second memory 224, inputs the first calculated captured image to the discriminator, acquires the output result, and the output result is the first calculated captured image. The discriminator is adjusted so that the correct discrimination is input using the captured image corresponding to. Then, the learning unit 124 updates the discriminator in the second memory 224 by storing the adjusted discriminator in the second memory 224.

ステップＳ６において、学習部１２４は、第２の計算撮像画像を識別する画像識別装置１０に、識別モデル（識別器）を出力する。これにより、画像識別装置１０は、学習装置１２によって生成された識別器を用いて、人によって実空間の状態と同様に視覚的に認識できない第２の計算撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境を識別できるようになる。これについて、図１及び図１０を参照して説明する。 In step S6, the learning unit 124 outputs an identification model (identifier) to the image identification device 10 that identifies the second calculated captured image. Thereby, the image identification device 10 uses the classifier generated by the learning device 12 and the imaging object and the imaging object included in the second calculated captured image that cannot be visually recognized by the person in the same manner as the state of the real space. The surrounding environment of the object can be identified. This will be described with reference to FIGS.

図１０は、実施の形態に係る画像識別装置１０の動作の流れの一例を示すフローチャートである。なお、以下の説明において、撮像部１１がライトフィールドカメラであるとして説明する。 FIG. 10 is a flowchart illustrating an example of an operation flow of the image identification device 10 according to the embodiment. In the following description, it is assumed that the imaging unit 11 is a light field camera.

ステップＳ１０１において、取得部１０１は、第一メモリ２０３（図３参照）から、撮像部１１によって撮像された撮像対象物及び撮像対象物の周辺環境を含む第２の計算撮像画像を取得する。具体的には、第一入力回路２０１が第一メモリ２０３から第２の計算撮像画像を取得することで、取得部１０１は、第２の計算撮像画像を取得する。例えば、撮像部１１は、所定の周期である第１の周期毎に、第２の計算撮像画像として、ライトフィールド画像を撮像（取得）し、当該画像が第一メモリ２０３に記憶される。取得部１０１は、撮像部１１が撮像したライトフィールド画像を取得し、識別部１０２に出力する。なお、取得部１０１は、識別システム１の外部からライトフィールド画像を取得してもよい（具体的には、外部からのライトフィールド画像は第一メモリ２０３に記憶され、取得部１０１は、第一メモリ２０３からライトフィールド画像を取得してもよい）。 In step S 101, the acquisition unit 101 acquires a second calculated captured image including the imaging object captured by the imaging unit 11 and the surrounding environment of the imaging object from the first memory 203 (see FIG. 3). Specifically, when the first input circuit 201 acquires the second calculated captured image from the first memory 203, the acquisition unit 101 acquires the second calculated captured image. For example, the imaging unit 11 captures (acquires) a light field image as the second calculated captured image for each first period that is a predetermined period, and the image is stored in the first memory 203. The acquisition unit 101 acquires the light field image captured by the imaging unit 11 and outputs it to the identification unit 102. The acquisition unit 101 may acquire a light field image from the outside of the identification system 1 (specifically, the light field image from the outside is stored in the first memory 203, and the acquisition unit 101 A light field image may be acquired from the memory 203).

次いで、ステップＳ１０２において、識別部１０２は、第一メモリ２０３に記憶された識別器を用いて、第２の計算撮像画像中の撮像対象物を識別する。つまり、識別部１０２は、ライトフィールド画像において識別対象とされる物体を検出する。識別対象の物体は、予め、識別器に設定されてよい。例えば、識別システム１が自動車に搭載される場合、識別対象の物体の例は、人物、自動車、自転車及び信号等である。識別部１０２は、識別器にライトフィールド画像に入力することによって、識別器から、出力結果として、識別対象の物体の識別結果を取得する。識別部１０２による識別処理の詳細については後述する。なお、識別部１０２は、識別処理済みのライトフィールド画像を、第一メモリ２０３に格納してもよい。 Next, in step S 102, the identification unit 102 identifies the imaging target in the second calculated captured image using the classifier stored in the first memory 203. That is, the identification unit 102 detects an object to be identified in the light field image. The object to be identified may be set in the classifier in advance. For example, when the identification system 1 is mounted on a car, examples of objects to be identified are a person, a car, a bicycle, a signal, and the like. The identification unit 102 inputs the light field image to the classifier, and acquires the identification result of the object to be identified as an output result from the classifier. Details of the identification processing by the identification unit 102 will be described later. Note that the identification unit 102 may store the light field image subjected to the identification process in the first memory 203.

次いで、ステップＳ１０３において、出力部１０３は、識別部１０２によって識別処理された結果（識別結果）を出力する。例えば、出力部１０３は、ライトフィールド画像を含む画像情報を出力してもよいし、ライトフィールド画像を含まない画像情報を出力してもよい。少なくともこの画像情報は、識別部１０２が検出した物体の情報を含んでもよい。物体の情報は、物体の位置（平面における位置又は奥行方向における位置）、領域等を含む。出力部１０３は、識別システム１が備えるディスプレイ及び外部機器の少なくとも一方に、画像情報を出力してもよい。 Next, in step S 103, the output unit 103 outputs a result (identification result) subjected to identification processing by the identification unit 102. For example, the output unit 103 may output image information including a light field image, or may output image information not including a light field image. At least the image information may include information on the object detected by the identification unit 102. The object information includes the position of the object (position on the plane or position in the depth direction), region, and the like. The output unit 103 may output image information to at least one of a display and an external device provided in the identification system 1.

さらに、図１０におけるステップＳ１０２の識別処理を説明する。ライトフィールドカメラである撮像部１１が撮像したライトフィールド画像から、画像情報と奥行情報とを同時に取得することが可能である。識別部１０２は、ライトフィールド画像に対して、学習装置１２で学習した識別器を使用して識別処理を行う。この学習は、上述したように、ディープラーニングなどのニューラルネットワークを用いた機械学習によって実現する。 Further, the identification process in step S102 in FIG. 10 will be described. Image information and depth information can be simultaneously acquired from a light field image captured by the imaging unit 11 that is a light field camera. The identification unit 102 performs identification processing on the light field image using the classifier learned by the learning device 12. As described above, this learning is realized by machine learning using a neural network such as deep learning.

識別部１０２は、テクスチャ情報の識別と奥行情報の識別とを行い、識別されたテクスチャ情報及び奥行情報を用いて、画像に含まれる物体を統合的に識別する構成であってもよい。このような構成を図１１に示す。 The identification unit 102 may be configured to identify texture information and depth information, and collectively identify objects included in the image using the identified texture information and depth information. Such a configuration is shown in FIG.

図１１は、識別部１０２の機能的な構成の一例を示す模式図である。 FIG. 11 is a schematic diagram illustrating an example of a functional configuration of the identification unit 102.

このような識別部１０２は、図１１に示すように、テクスチャ情報識別部１０２１と、奥行情報識別部１０２２と、統合識別部１０２３とを含む。テクスチャ情報識別部１０２１及び奥行情報識別部１０２２は、例えば、統合識別部１０２３に対して、並列に接続されている。 As shown in FIG. 11, such an identification unit 102 includes a texture information identification unit 1021, a depth information identification unit 1022, and an integrated identification unit 1023. The texture information identification unit 1021 and the depth information identification unit 1022 are connected in parallel to the integrated identification unit 1023, for example.

テクスチャ情報識別部１０２１は、ライトフィールド画像においてテクスチャ情報を使用して被写体を検出する。具体的には、テクスチャ情報識別部１０２１は、例えば、非特許文献１に記載されるようなニューラルネットワークを識別器として使用することによって、ライトフィールド画像において被写体の領域（平面における位置）と被写体のカテゴリとを識別する。テクスチャ情報識別部１０２１への入力情報は、ライトフィールド画像であり、テクスチャ情報識別部１０２１の識別結果は、学習装置１２の場合と同様に、ライトフィールド画像上での被写体の領域及び被写体のカテゴリである。通常撮像画像の場合、入射する光線の方向の値、つまり奥行値が積分されて画素値に含まれるため、奥行情報が削除されている。このような通常撮像画像と比較すると、ライトフィールド画像は、画像自体に被写体に関する多くの情報を含む。このため、マルチピンホール等が用いられるライトフィールド画像が、識別器の入力情報として用いられることによって、通常撮像画像を入力情報とする場合以上の高精度な識別が可能である。 The texture information identification unit 1021 detects the subject using the texture information in the light field image. Specifically, the texture information identifying unit 1021 uses, for example, a neural network as described in Non-Patent Document 1 as a classifier, so that the subject area (position on the plane) and the subject in the light field image are detected. Identify the category. The input information to the texture information identification unit 1021 is a light field image, and the identification result of the texture information identification unit 1021 is the subject area and subject category on the light field image, as in the learning device 12. is there. In the case of a normal captured image, since the value of the direction of the incident light beam, that is, the depth value is integrated and included in the pixel value, the depth information is deleted. Compared to such a normal captured image, the light field image includes a lot of information about the subject in the image itself. For this reason, a light field image using a multi-pinhole or the like is used as input information of the discriminator, so that it is possible to perform discrimination with higher accuracy than when a normal captured image is used as input information.

奥行情報識別部１０２２は、ライトフィールド画像から被写体の奥行情報を検出する。具体的には、奥行情報識別部１０２２は、学習装置１２において、ライトフィールド画像と対応する被写体の奥行情報を事前に学習する。被写体の奥行情報は、後述するように、第二画像取得部１２２からマルチビューステレオ画像を取得することで計算してもかまわないし、識別正解取得部１２３から取得してもかまわない。 The depth information identification unit 1022 detects the depth information of the subject from the light field image. Specifically, the depth information identification unit 1022 learns in advance the depth information of the subject corresponding to the light field image in the learning device 12. As will be described later, the depth information of the subject may be calculated by acquiring a multi-view stereo image from the second image acquisition unit 122, or may be acquired from the identification correct acquisition unit 123.

統合識別部１０２３は、テクスチャ情報識別部１０２１の識別結果と、奥行情報識別部１０２２の識別結果とを統合し、最終的な識別結果を出力する。統合識別部１０２３が用いる識別器は、テクスチャ情報識別部１０２１のテクスチャ情報又はその識別結果と、奥行情報識別部１０２２の識別結果である奥行情報とを入力とし、最終的な識別結果を出力するものである。最終的な識別結果は、ライトフィールド画像に含まれる物体の領域、当該領域の画像上での平面位置、及び当該領域の奥行位置等を含む。 The integrated identification unit 1023 integrates the identification result of the texture information identification unit 1021 and the identification result of the depth information identification unit 1022 and outputs a final identification result. The discriminator used by the integrated discriminating unit 1023 receives the texture information of the texture information discriminating unit 1021 or its identification result and the depth information which is the discriminating result of the depth information discriminating unit 1022 and outputs the final discrimination result. It is. The final identification result includes the area of the object included in the light field image, the planar position on the image of the area, the depth position of the area, and the like.

なお、テクスチャ情報識別部１０２１用のニューラルネットワークと、奥行情報識別部１０２２用のニューラルネットワークとがそれぞれ生成されてもよい。つまり、平面における位置及びカテゴリについては、平面における位置及びカテゴリを識別するためのニューラルネットワークが用いられ、奥行方向における位置については平面における位置及びカテゴリを識別するためのニューラルネットワークとは別途生成された、奥行方向における位置を識別するためのニューラルネットワークが用いられてもよい。また、テクスチャ情報識別部１０２１用のニューラルネットワークと、奥行情報識別部１０２２用のニューラルネットワークとがまとめて生成されてもよい。つまり、平面における位置、奥行方向における位置及びカテゴリについて、平面における位置、奥行方向における位置及びカテゴリをまとめて識別するための１つのニューラルネットワークが用いられてもよい。 Note that a neural network for the texture information identification unit 1021 and a neural network for the depth information identification unit 1022 may be generated. That is, for the position and category in the plane, a neural network for identifying the position and category in the plane is used, and for the position in the depth direction, it is generated separately from the neural network for identifying the position and category in the plane. A neural network for identifying the position in the depth direction may be used. Further, the neural network for the texture information identification unit 1021 and the neural network for the depth information identification unit 1022 may be generated together. That is, for the position in the plane, the position in the depth direction, and the category, one neural network for collectively identifying the position in the plane, the position in the depth direction, and the category may be used.

また、上記説明では、撮像部１１は、マルチピンホール又はマイクロレンズを用いるライトフィールドカメラであったが、これに限らない。例えば、撮像部１１は、符号化開口画像を撮像する構成であってもよい。これは、一種のマルチピンホールカメラでもある。 In the above description, the imaging unit 11 is a light field camera using a multi-pinhole or a microlens, but is not limited thereto. For example, the imaging unit 11 may be configured to capture a coded aperture image. This is also a kind of multi-pinhole camera.

図１２は、ランダムマスクを符号化絞りとして使用する符号化開口マスクの例の模式図である。 FIG. 12 is a schematic diagram of an example of a coded aperture mask that uses a random mask as a coded stop.

図１２に示すように、符号化開口マスク３１１は、色無し領域で示される光の透過領域と、黒塗り領域で示される光の遮光領域とを有し、光の透過領域及び遮光領域はランダムに配置されていることがわかる。このような符号化開口マスク３１１は、ガラスにクロムを蒸着することで作製される。このような符号化開口マスク３１１が、主レンズとイメージセンサとの間の光路上に配置されると、光線の一部が遮断される。これにより、符号化開口画像を撮像するカメラの実現が可能である。 As shown in FIG. 12, the coded aperture mask 311 has a light transmission region indicated by a non-colored region and a light shielding region indicated by a blacked region, and the light transmission region and the light shielding region are random. It can be seen that the Such a coded aperture mask 311 is produced by evaporating chromium on glass. When such a coded aperture mask 311 is placed on the optical path between the main lens and the image sensor, part of the light beam is blocked. As a result, it is possible to realize a camera that captures an encoded aperture image.

また、第二画像取得部１２２は通常画像ではなく、画像情報に加え、奥行情報も取得できる画像を取得するようにしてもかまわない。例えば、第二画像取得部１２２はマルチビューステレオカメラで構成されてもよい。第二画像取得部１２２は、マルチビューステレオ画像を取得することにより、被写体の３次元情報も取得することができる。そのため、第一画像取得部１２１と第二画像取得部１２２の取得する画像を事前にキャリブレーションすることで、第一画像取得部１２１が取得した画像と第二画像取得部１２２が取得した画像の対応関係を取得することができる。このキャリブレーションでは、第二画像取得部１２２で取得する３次元座標と第一画像取得部１２１が取得する画像座標との対応が求められる。これにより、第二画像取得部１２２が取得した撮像画像に対する識別正解を、第一画像取得部１２１が取得した第１の計算撮像画像に対する識別正解に変換させることができる。このように、撮像画像は、マルチビューステレオカメラによる撮像対象物及び撮像対象物の周辺環境の撮像により得られる画像であってもよい。 Further, the second image acquisition unit 122 may acquire an image that can acquire depth information in addition to image information instead of a normal image. For example, the second image acquisition unit 122 may be configured with a multi-view stereo camera. The second image acquisition unit 122 can also acquire the three-dimensional information of the subject by acquiring the multi-view stereo image. Therefore, by calibrating the images acquired by the first image acquisition unit 121 and the second image acquisition unit 122 in advance, the images acquired by the first image acquisition unit 121 and the images acquired by the second image acquisition unit 122 Correspondence can be acquired. In this calibration, correspondence between the three-dimensional coordinates acquired by the second image acquisition unit 122 and the image coordinates acquired by the first image acquisition unit 121 is required. Thereby, the identification correct answer with respect to the captured image acquired by the second image acquisition unit 122 can be converted into the identification correct answer with respect to the first calculated captured image acquired by the first image acquisition unit 121. As described above, the captured image may be an image obtained by imaging the imaging object and the surrounding environment of the imaging object by the multi-view stereo camera.

以上の説明では、識別正解として、例えば、人物、自動車、自転車又は信号等の被写体が属するカテゴリ情報と、画像上での被写体の平面的な位置及び領域と、画像上での被写体の奥行方向における位置を与えていた。例えば、識別システム１が、識別正解として奥行方向における位置（奥行情報）を識別することは、第二画像取得部１２２が取得したマルチビューステレオから求めた奥行方向における位置（奥行情報）を識別正解として与えるようにすることで実現できる。 In the above description, as the identification correct answer, for example, category information to which a subject such as a person, a car, a bicycle, or a signal belongs, the planar position and area of the subject on the image, and the depth direction of the subject on the image Gave position. For example, the identification system 1 identifying the position (depth information) in the depth direction as the identification correct answer identifies the position (depth information) in the depth direction obtained from the multi-view stereo acquired by the second image acquisition unit 122. It can be realized by giving as.

また、識別部１０２は、テクスチャ情報識別部１０２１と奥行情報識別部１０２２とが並列関係である構成を有するのではなく、奥行情報識別部１０２２による奥行情報の抽出後に、テクスチャ情報識別部１０２１による識別を行うように構成されてもよい。 In addition, the identification unit 102 does not have a configuration in which the texture information identification unit 1021 and the depth information identification unit 1022 are in a parallel relationship, but is identified by the texture information identification unit 1021 after the depth information is extracted by the depth information identification unit 1022. May be configured.

図１３は、識別部１０２の機能的な構成の別の一例を示す模式図である。 FIG. 13 is a schematic diagram illustrating another example of the functional configuration of the identification unit 102.

図１３に示すように、識別部１０２では、奥行情報識別部１０２２、テクスチャ情報識別部１０２１及び統合識別部１０２３が直列関係にあってもよい。奥行情報識別部１０２２は、ライトフィールド画像に対して奥行画像を生成する。テクスチャ情報識別部１０２１は、奥行情報識別部１０２２が生成した奥行画像を入力情報として、例えば、非特許文献１に記載されるようなニューラルネットワークを用いることによって、被写体の位置、領域及び被写体のカテゴリを識別する。統合識別部１０２３は、テクスチャ情報識別部１０２１の識別結果を出力する。最終的な識別結果は、テクスチャ情報識別部１０２１及び統合識別部１０２３が並列関係にある場合と同様に、ライトフィールド画像に含まれる物体の領域、当該領域の画像上での平面位置、及び当該領域の奥行位置等を含む。 As illustrated in FIG. 13, in the identification unit 102, the depth information identification unit 1022, the texture information identification unit 1021, and the integrated identification unit 1023 may be in a serial relationship. The depth information identification unit 1022 generates a depth image for the light field image. The texture information identification unit 1021 uses the depth image generated by the depth information identification unit 1022 as input information, for example, by using a neural network as described in Non-Patent Document 1, and thereby the subject position, region, and subject category. Identify The integrated identification unit 1023 outputs the identification result of the texture information identification unit 1021. As in the case where the texture information identifying unit 1021 and the integrated identifying unit 1023 are in a parallel relationship, the final identification result is the region of the object included in the light field image, the planar position on the image of the region, and the region. Including the depth position.

また、識別部１０２は、撮像部１１に応じて、そのニューラルネットワークの構成を変えるようにしてもよい。撮像部１１がライトフィールドカメラである場合、奥行画像は、撮像部１１のマルチピンホールの位置及び大きさ等を用いて生成される。例えば、撮像部１１の種類又は製造ばらつき等によって、マルチピンホールの位置及び大きさが撮像部１１毎に異なる場合、撮像部１１毎にニューラルネットワークを構成することにより（言い換えると撮像部１１毎に個別に機械学習がなされることにより）、識別部１０２の識別精度を向上させることができる。なお、マルチピンホールの位置及び大きさの情報は、事前にカメラキャリブレーションを実施することで取得可能である。 In addition, the identification unit 102 may change the configuration of the neural network according to the imaging unit 11. When the imaging unit 11 is a light field camera, the depth image is generated using the position and size of the multi-pinhole of the imaging unit 11. For example, when the position and size of the multi-pinholes differ for each imaging unit 11 due to the type or manufacturing variation of the imaging unit 11, a neural network is configured for each imaging unit 11 (in other words, for each imaging unit 11. By performing machine learning individually, the identification accuracy of the identification unit 102 can be improved. Information on the position and size of the multi-pinhole can be acquired by performing camera calibration in advance.

以上のように、識別部１０２は、ライトフィールド画像を入力情報とし、当該ライトフィールド画像のテクスチャ情報及び奥行情報から識別処理を行う。それにより、識別部１０２は、従来の通常撮像画像を使用したテクスチャ画像のみに基づく識別処理と比べ、例えばどれだけ離れた位置にあるのかも識別できるため、より高精度の識別処理を可能にする。 As described above, the identification unit 102 uses the light field image as input information, and performs identification processing from the texture information and depth information of the light field image. As a result, the identification unit 102 can identify how far away it is from the identification process based only on the texture image using the conventional normal captured image, for example, thus enabling a highly accurate identification process. .

上述したように、識別部１０２を含む画像識別装置１０を備える実施の形態に係る識別システム１と、当該画像識別装置１０と学習装置１２とを備える実施の形態の変形例に係る識別システム１Ａとを例示した。しかしながら、例えば、識別部１０２は、学習装置１２を包含してもよく、この場合、識別システム１が学習装置１２を備えることになる。つまり、この場合、識別システム１は、識別システム１Ａと同等の機能を有する。 As described above, the identification system 1 according to the embodiment including the image identification device 10 including the identification unit 102, and the identification system 1A according to a modification of the embodiment including the image identification device 10 and the learning device 12 Was illustrated. However, for example, the identification unit 102 may include the learning device 12, and in this case, the identification system 1 includes the learning device 12. That is, in this case, the identification system 1 has a function equivalent to that of the identification system 1A.

以上のように、実施の形態及び変形例に係る識別システム１及び１Ａにおいて、画像識別装置１０は、ライトフィールド画像等の第２の計算撮像画像を用いて、当該画像内の被写体の識別を行う。さらに、画像識別装置１０は、一連の識別処理の過程において、第２の計算撮像画像を通常撮像画像に画像復元せず、第２の計算撮像画像に含まれるテクスチャ情報と、計算撮像画像に含まれる奥行情報とに基づき、第２の計算撮像画像内の被写体の識別を行う。よって、画像識別装置１０は、被写体の識別処理量を低減することができる。特に、識別処理の際に第２の計算撮像画像から通常撮像画像への画像復元を伴う手法と比較して、画像識別装置１０は、識別処理の大幅な高速化を可能にする。また、３次元レンジファインダ等を用いなくても奥行情報を取得できるため、低コスト化が可能となる。 As described above, in the identification systems 1 and 1A according to the embodiment and the modification, the image identification device 10 identifies the subject in the image using the second calculated captured image such as the light field image. . Further, the image identification device 10 does not restore the second calculated captured image to the normal captured image in the course of a series of identification processes, and is included in the texture information included in the second calculated captured image and the calculated captured image. The subject in the second calculated captured image is identified based on the depth information. Therefore, the image identification device 10 can reduce the amount of subject identification processing. In particular, the image identification device 10 can significantly speed up the identification process as compared with a technique involving image restoration from the second calculated captured image to the normal captured image during the identification process. Further, since the depth information can be acquired without using a three-dimensional range finder or the like, the cost can be reduced.

また、第１の計算撮像画像の撮像に用いられるカメラ（例えば第一画像取得部１２１）の光軸と、撮像画像の撮像に用いられるカメラ（例えば第二画像取得部１２２）の光軸とは、略一致するようにしてもかまわない。図１４Ａはこれを説明するための模式図である。 Also, the optical axis of the camera (for example, the first image acquisition unit 121) used for capturing the first calculated captured image and the optical axis of the camera (for example, the second image acquisition unit 122) used for capturing the captured image. It does not matter if they are approximately the same. FIG. 14A is a schematic diagram for explaining this.

図１４Ａは、第二画像取得部１２２の光軸と第一画像取得部１２１の光軸とがおおよそ一致することを示す模式図である。 FIG. 14A is a schematic diagram showing that the optical axis of the second image acquisition unit 122 and the optical axis of the first image acquisition unit 121 are approximately the same.

この図において、第一画像取得部１２１及び第二画像取得部１２２として、それぞれ、そのハードウェアの例であるカメラを模式的に示している。また、光軸２３１は第一画像取得部１２１の光軸を示し、光軸２３２は第二画像取得部１２２の光軸を示している。各光軸をおおよそ一致させるためには、第一画像取得部１２１と第二画像取得部１２２とを接近させ、かつ、各光軸がほぼ平行になるように配置すればよい。 In this figure, as the first image acquisition unit 121 and the second image acquisition unit 122, cameras that are examples of the hardware are schematically shown. An optical axis 231 indicates the optical axis of the first image acquisition unit 121, and an optical axis 232 indicates the optical axis of the second image acquisition unit 122. In order to make the optical axes approximately coincide with each other, the first image acquisition unit 121 and the second image acquisition unit 122 may be brought close to each other and arranged so that the optical axes are substantially parallel to each other.

また、第二画像取得部１２２をステレオカメラとして構成する場合、第二画像取得部１２２を構成する２つのカメラのそれぞれの光軸と第一画像取得部１２１の光軸とがおおよそ一致するようにすればよい。図１４Ｂはこれを説明するための模式図である。 Further, when the second image acquisition unit 122 is configured as a stereo camera, the optical axes of the two cameras configuring the second image acquisition unit 122 and the optical axes of the first image acquisition unit 121 are approximately the same. do it. FIG. 14B is a schematic diagram for explaining this.

図１４Ｂは、第二画像取得部１２２を構成するステレオカメラの各光軸と第一画像取得部１２１の光軸とがおおよそ一致することを示す模式図である。 FIG. 14B is a schematic diagram showing that the optical axes of the stereo camera constituting the second image acquisition unit 122 and the optical axes of the first image acquisition unit 121 are approximately the same.

この図において、図１４Ａと同じ構成要素には同じ符号を付与し説明を省略する。この図において、光軸２３２ａ及び２３２ｂは第二画像取得部１２２を構成するステレオカメラの各光軸を示している。前述のように、本実施の形態の識別システム１又は１Ａは、第二画像取得部１２２が取得した撮像画像に対する識別正解を、第一画像取得部１２１が取得した第１の計算撮像画像に対する識別正解に変換させるが、各光軸をおおよそ一致させることで、変換に伴う誤差を小さくすることができ、より高精度の識別が実現できる。 In this figure, the same components as those in FIG. In this figure, optical axes 232 a and 232 b indicate the respective optical axes of the stereo camera constituting the second image acquisition unit 122. As described above, the identification system 1 or 1A according to the present embodiment identifies the correct identification for the captured image acquired by the second image acquisition unit 122 and the identification for the first calculated captured image acquired by the first image acquisition unit 121. Although it is converted to a correct answer, by making the optical axes approximately coincident, errors due to the conversion can be reduced, and more accurate identification can be realized.

また、第一画像取得部１２１と第二画像取得部１２２の光軸を一致させるために、ビームスプリッタ、プリズム又はハーフミラーなどを利用してもかまわない。 Further, in order to make the optical axes of the first image acquisition unit 121 and the second image acquisition unit 122 coincide with each other, a beam splitter, a prism, a half mirror, or the like may be used.

図１５は、第一画像取得部１２１の光軸と第二画像取得部１２２の光軸とを一致させるために、ビームスプリッタが利用されることを示す模式図である。 FIG. 15 is a schematic diagram showing that a beam splitter is used to match the optical axis of the first image acquisition unit 121 and the optical axis of the second image acquisition unit 122.

この図において、図１４Ａと同じ構成要素には同じ番号を付与し説明を省略する。ビームスプリッタ２４０により、被写体からの光線を二つに分離することができるため、分離した光線の一方を第一画像取得部１２１の光軸２３１と一致させ、もう一方を第二画像取得部１２２の光軸２３２と一致させることで、第一画像取得部１２１の光軸と第二画像取得部１２２の光軸とを一致させることが可能である。このように、第１の計算撮像画像の撮像に用いられるカメラ（例えば第一画像取得部１２１）の光軸と、撮像画像の撮像に用いられるカメラ（例えば第二画像取得部１２２）の光軸とは、ビームスプリッタ、プリズム又はハーフミラーを介することで一致する。前述のように、本実施の形態の識別システム１又は１Ａは、第二画像取得部１２２が取得した撮像画像に対する識別正解を、第一画像取得部１２１が取得した第１の計算撮像画像に対する識別正解に変換させるが、各光軸を一致させることで、変換に伴う誤差を小さくすることができ、より高精度の識別が実現できる。 In this figure, the same components as those in FIG. Since the light beam from the subject can be separated into two by the beam splitter 240, one of the separated light beams is made to coincide with the optical axis 231 of the first image acquisition unit 121, and the other one of the second image acquisition unit 122. By matching with the optical axis 232, the optical axis of the first image acquisition unit 121 and the optical axis of the second image acquisition unit 122 can be matched. Thus, the optical axis of the camera (for example, the first image acquisition unit 121) used for capturing the first calculated captured image and the optical axis of the camera (for example, the second image acquisition unit 122) used for capturing the captured image. Is matched with a beam splitter, a prism, or a half mirror. As described above, the identification system 1 or 1A according to the present embodiment identifies the correct identification for the captured image acquired by the second image acquisition unit 122 and the identification for the first calculated captured image acquired by the first image acquisition unit 121. Although it is converted to a correct answer, by making each optical axis coincide, an error accompanying the conversion can be reduced, and more accurate identification can be realized.

以上、本開示の学習装置１２について、実施の形態に基づいて説明したが、本開示は、上記実施の形態に限定されるものではない。本開示の趣旨を逸脱しない限り、当業者が思いつく各種変形を本実施の形態に施したもの、及び、異なる実施の形態における構成要素を組み合わせて構築される形態も、本開示の範囲内に含まれる。 Although the learning device 12 of the present disclosure has been described based on the embodiments, the present disclosure is not limited to the above embodiments. Unless it deviates from the gist of the present disclosure, various modifications conceived by those skilled in the art have been made in the present embodiment, and forms constructed by combining components in different embodiments are also included in the scope of the present disclosure. It is.

例えば、上記実施の形態では、第２の計算撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の平面における位置、奥行方向における位置及びカテゴリ情報が識別されたが、これに限らない。例えば、撮像対象物及び撮像対象物の周辺環境の平面における位置、奥行方向における位置及びカテゴリ情報のいずれか１つ又は２つのみが識別されてもよい。つまり、撮像対象物及び撮像対象物の周辺環境の平面における位置、奥行方向における位置及びカテゴリ情報のいずれか１つ又は２つのみが機械学習されて、識別モデルが生成されてもよい。 For example, in the above embodiment, the imaging object and the position in the plane of the surrounding environment of the imaging object, the position in the depth direction, and the category information included in the second calculated captured image are identified, but the present invention is not limited to this. For example, only one or two of the imaging object and the position in the plane of the surrounding environment of the imaging object, the position in the depth direction, and the category information may be identified. That is, only one or two of the imaging object and the position in the plane of the surrounding environment of the imaging object, the position in the depth direction, and the category information may be machine-learned to generate the identification model.

また、例えば、上記実施の形態では、奥行方向における位置についても機械学習されたが、されなくてもよい。例えば、取得部１０１が第２の計算撮像画像を取得した段階において、第２の計算撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境がそれぞれ複数重畳された画像を用いて、被写体の奥行方向における位置が計算されてもよい。つまり、識別モデルを用いずに、第２の計算撮像画像自体から直接奥行方向における位置が計算されてもよい。 Further, for example, in the above-described embodiment, the machine learning is performed on the position in the depth direction, but it may not be performed. For example, when the acquisition unit 101 acquires the second calculated captured image, an image of the subject is used by using an image in which a plurality of imaging objects and surrounding environments of the imaging target included in the second calculated captured image are respectively superimposed. The position in the depth direction may be calculated. That is, the position in the depth direction may be calculated directly from the second calculated captured image itself without using the identification model.

また、例えば、第二画像取得部１２２が取得した撮像画像に対する識別正解は、例えば人によって手動で与えられたが、これに限らない。例えば、第二画像取得部１２２が取得した撮像画像に対する識別正解を与えるための学習モデルを予め準備しておいて、当該学習モデルを用いて識別正解が与えられてもよい。 Moreover, for example, the correct identification for the captured image acquired by the second image acquisition unit 122 is manually given by a person, for example, but is not limited thereto. For example, a learning model for giving an identification correct answer to the captured image acquired by the second image acquisition unit 122 may be prepared in advance, and the identification correct answer may be given using the learning model.

また、例えば、本開示は、学習装置１２として実現できるだけでなく、学習装置１２を構成する各構成要素が行うステップ（処理）を含む学習方法として実現できる。 Further, for example, the present disclosure can be realized not only as the learning device 12 but also as a learning method including steps (processes) performed by each component constituting the learning device 12.

具体的には、当該学習方法は、図４に示すように、撮像対象物及び撮像対象物の周辺環境を含む第１の計算撮像画像であって、複数の第１の画素を有する第１の計算撮像画像を取得し（ステップＳ２）、撮像対象物及び撮像対象物の周辺環境を含む撮像画像であって、複数の第２の画素を有する撮像画像を取得し（ステップＳ３）、撮像画像に含まれる撮像対象物及び撮像対象物の周辺環境の識別結果を取得し（ステップＳ４）、複数の第１の画素及び複数の第２の画素の対応関係を参照して、撮像画像の識別結果に基づいて、第１の計算撮像画像を識別するための識別モデルを生成し（ステップＳ５）、第２の計算撮像画像を識別する画像識別装置１０に、識別モデルを出力する（ステップＳ６）。 Specifically, as shown in FIG. 4, the learning method is a first calculated captured image including an imaging target object and a surrounding environment of the imaging target object, and includes a first pixel having a plurality of first pixels. A calculated captured image is acquired (step S2), and a captured image including a captured object and a surrounding environment of the captured object, the captured image having a plurality of second pixels is acquired (step S3). The identification result of the imaging target object and the surrounding environment of the imaging target object are acquired (step S4), and the correspondence relationship between the plurality of first pixels and the plurality of second pixels is referred to to obtain the identification result of the captured image. Based on this, an identification model for identifying the first calculated captured image is generated (step S5), and the identification model is output to the image identification device 10 for identifying the second calculated captured image (step S6).

また、例えば、それらのステップは、コンピュータ（コンピュータシステム）によって実行されてもよい。そして、本開示は、それらの方法に含まれるステップを、コンピュータに実行させるためのプログラムとして実現できる。さらに、本開示は、そのプログラムを記録したＣＤ−ＲＯＭ等である非一時的なコンピュータ読み取り可能な記録媒体として実現できる。 Further, for example, these steps may be executed by a computer (computer system). The present disclosure can be realized as a program for causing a computer to execute the steps included in these methods. Furthermore, the present disclosure can be realized as a non-transitory computer-readable recording medium such as a CD-ROM or the like on which the program is recorded.

また、本開示において、システム、装置、部材又は部の全部又は一部、又は各図に示されるブロック図の機能ブロックの全部又は一部は、半導体装置、半導体集積回路（ＩＣ）、又はＬＳＩ（large scale integration）を含む一つ又は複数の電子回路によって実行されてもよい。 Further, in this disclosure, all or part of the system, device, member, or part, or all or part of the functional blocks in the block diagrams shown in the drawings may be a semiconductor device, a semiconductor integrated circuit (IC), or an LSI ( It may be performed by one or more electronic circuits including large scale integration).

ＬＳＩ又はＩＣは、一つのチップに集積されてもよいし、複数のチップを組み合わせて構成されてもよい。例えば、記憶素子以外の機能ブロックは、一つのチップに集積されてもよい。ここでは、ＬＳＩやＩＣと呼んでいるが、集積の度合いによって呼び方が変わり、システムＬＳＩ、ＶＬＳＩ（very large scale integration）、若しくはＵＬＳＩ（ultra large scale integration）と呼ばれるものであってもよい。ＬＳＩの製造後にプログラムされる、ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ（FPGA）、又はＬＳＩ内部の接合関係の再構成又はＬＳＩ内部の回路区画のセットアップができるｒｅｃｏｎｆｉｇｕｒａｂｌｅｌｏｇｉｃｄｅｖｉｃｅも同じ目的で使うことができる。 The LSI or IC may be integrated on a single chip, or may be configured by combining a plurality of chips. For example, the functional blocks other than the memory element may be integrated on one chip. Here, the term “LSI” or “IC” is used, but the term changes depending on the degree of integration, and it may be called system LSI, VLSI (very large scale integration), or ULSI (ultra large scale integration). A field programmable gate array (FPGA), which is programmed after the manufacture of the LSI, or a reconfigurable logic device capable of reconfiguring the junction relationship inside the LSI or setting up a circuit partition inside the LSI can be used for the same purpose.

さらに、システム、装置、部材又は部の全部又は一部の機能又は操作は、上述したように、ソフトウェア処理によって実行することが可能である。この場合、ソフトウェアは少なくとも１つのＲＯＭ、光学ディスク、又はハードディスクドライブなどの非一時的記録媒体に記録され、ソフトウェアが処理装置（processor）によって実行されたときに、そのソフトウェアで特定された機能が処理装置（processor）及び周辺装置によって実行される。 Furthermore, the functions or operations of all or part of the system, apparatus, member, or unit can be executed by software processing as described above. In this case, the software is recorded on a non-transitory recording medium such as at least one ROM, optical disk, or hard disk drive, and when the software is executed by a processor, the function specified by the software is processed. It is executed by a processor and peripheral devices.

システム又は装置は、ソフトウェアが記録されている一つ又は複数の非一時的記録媒体、処理装置（processor）、及びハードウェアデバイスを備えていてもよい。 The system or apparatus may include one or more non-transitory recording media in which software is recorded, a processor, and a hardware device.

また、上記で用いた序数、数量等の数字は、全て本開示の技術を具体的に説明するために例示するものであり、本開示は例示された数字に制限されない。また、構成要素間の接続関係は、本開示の技術を具体的に説明するために例示するものであり、本開示の機能を実現する接続関係はこれに限定されない。 Further, the numbers such as the ordinal numbers and the quantities used in the above are examples for specifically explaining the technology of the present disclosure, and the present disclosure is not limited to the illustrated numbers. In addition, the connection relationship between the constituent elements is exemplified for specifically explaining the technology of the present disclosure, and the connection relationship for realizing the functions of the present disclosure is not limited thereto.

また、ブロック図における機能ブロックの分割は一例であり、複数の機能ブロックを１つの機能ブロックとして実現したり、１つの機能ブロックを複数に分割したり、一部の機能を他の機能ブロックに移してもよい。また、単一のハードウェア又はソフトウェアが、類似する機能を有する複数の機能ブロックの機能を並列又は時分割に処理してもよい。 In addition, division of functional blocks in the block diagram is an example, and a plurality of functional blocks are realized as one functional block, one functional block is divided into a plurality of parts, or some functions are transferred to other functional blocks. May be. A single piece of hardware or software may process the functions of a plurality of functional blocks having similar functions in parallel or in time division.

本開示の一態様に係る識別システム１Ａは、撮像対象の周辺環境の情報を含む第２の計算撮像画像を撮像する撮像部１１と、撮像部１１が撮像した第２の計算撮像画像から、識別器を利用して当該画像に含まれる被写体を検出し、検出結果を出力する画像識別装置１０と、識別器を生成する学習装置１２からなる識別システムである。学習装置１２は、第１の計算撮像画像を取得する第一画像取得部１２１と、撮像画像を取得する第二画像取得部１２２と、第二画像取得部１２２が取得した撮像画像に関する識別正解を取得する識別正解取得部１２３と、撮像画像に関する識別正解を利用して、第一画像取得部１２１が取得した第１の計算撮像画像に対する機械学習を行なうことで、識別器を取得する学習部１２４とを備えることを特徴とする。 The identification system 1A according to an aspect of the present disclosure identifies an imaging unit 11 that captures a second calculated captured image including information on a surrounding environment of an imaging target, and a second calculated captured image captured by the imaging unit 11 This is an identification system including an image identification device 10 that detects a subject included in the image using a classifier and outputs a detection result, and a learning device 12 that generates a classifier. The learning device 12 obtains an identification correct answer related to the captured image acquired by the first image acquisition unit 121 that acquires the first calculated captured image, the second image acquisition unit 122 that acquires the captured image, and the second image acquisition unit 122. The identification correct answer acquisition unit 123 to be acquired and the learning unit 124 to acquire the classifier by performing machine learning on the first calculated captured image acquired by the first image acquisition unit 121 using the identification correct answer regarding the captured image. It is characterized by providing.

撮像部１１及び第一画像取得部１２１は、マルチピンホールカメラ、ＣｏｄｅｄＡｐｅｒｔｕｒｅカメラ、ライトフィールドカメラ、又は、レンズレスカメラ、から構成される。 The imaging unit 11 and the first image acquisition unit 121 include a multi-pinhole camera, a coded aperture camera, a light field camera, or a lensless camera.

撮像部１１及び第一画像取得部１２１は、計算撮像画像として、人が見ても視覚的に認識できない画像を取得する。 The imaging unit 11 and the first image acquisition unit 121 acquire an image that cannot be visually recognized even when viewed by a person as a calculated captured image.

学習装置１２は、第一画像取得部１２１が取得する第１の計算撮像画像と、第二画像取得部１２２が取得する撮像画像の画像上での位置関係の対応を利用することで、第二画像取得部１２２が取得した撮像画像に関する識別正解を第一画像取得部１２１の識別正解として学習する。 The learning device 12 uses the correspondence of the positional relationship on the image of the first calculated captured image acquired by the first image acquisition unit 121 and the captured image acquired by the second image acquisition unit 122, so that the second The identification correct answer regarding the captured image acquired by the image acquisition unit 122 is learned as the identification correct answer of the first image acquisition unit 121.

第二画像取得部１２２は、画像情報に加え、奥行情報も取得できる画像を取得する。 The second image acquisition unit 122 acquires an image that can acquire depth information in addition to the image information.

第二画像取得部１２２は、マルチビューステレオカメラである。 The second image acquisition unit 122 is a multi-view stereo camera.

学習装置１２において、第一画像取得部１２１と第二画像取得部１２２の光軸がおおよそ一致する。 In the learning device 12, the optical axes of the first image acquisition unit 121 and the second image acquisition unit 122 are approximately the same.

学習装置１２は、さらにビームスプリッタを有し、ビームスプリッタを用いて光軸を一致させている。 The learning device 12 further includes a beam splitter, and the optical axes are matched using the beam splitter.

本開示の一態様に係る学習装置１２は、第１の計算撮像画像を取得する第一画像取得部１２１と、撮像画像を取得する第二画像取得部１２２と、第二画像取得部１２２が取得した撮像画像に関する識別正解を取得する識別正解取得部１２３と、撮像画像に関する識別正解を利用して、第一画像取得部１２１が取得した第１の計算撮像画像に対する機械学習を行なうことで、識別器を取得する学習部１２４とを備える。 The learning device 12 according to an aspect of the present disclosure is acquired by the first image acquisition unit 121 that acquires the first calculated captured image, the second image acquisition unit 122 that acquires the captured image, and the second image acquisition unit 122. The identification correct answer acquisition unit 123 that acquires the identification correct answer related to the captured image, and the machine learning for the first calculated captured image acquired by the first image acquisition unit 121 using the identification correct answer related to the captured image identifies And a learning unit 124 for acquiring a vessel.

本開示の一態様に係る学習方法では、第１の計算撮像画像から、識別器を利用して当該画像に含まれる被写体を検出し、検出結果を出力し、第１の計算撮像画像と撮像画像を取得し、撮像画像に関する識別正解を取得し、撮像画像に関する識別正解を利用して、第１の計算撮像画像に対する機械学習を行なうことで、識別器を生成する。 In the learning method according to an aspect of the present disclosure, the first calculation captured image and the captured image are detected from the first calculated captured image by using the classifier to detect a subject included in the image and outputting a detection result. , The identification correct answer regarding the captured image is acquired, and the classifier is generated by performing machine learning on the first calculated captured image using the identification correct answer regarding the captured image.

本開示の技術は、計算撮像画像中の物体を画像認識する技術に広く適用可能である。本開示の技術は、計算撮像画像を撮像する撮像装置が、高い識別処理速度が要求される移動体に搭載される場合にも、広く適用可能であり、例えば、自動車の自動運転技術、ロボット及び周辺監視カメラシステム等に適用可能である。 The technique of the present disclosure can be widely applied to a technique for recognizing an object in a calculated captured image. The technology of the present disclosure can be widely applied even when an imaging device that captures a calculated captured image is mounted on a moving body that requires a high identification processing speed. For example, an automatic driving technology for an automobile, a robot, It can be applied to a peripheral monitoring camera system or the like.

１，１Ａ識別システム
１０画像識別装置
１１撮像部
１２学習装置
１０１取得部
１０２識別部
１０３出力部
１２１第一画像取得部
１２２第二画像取得部
１２３識別正解取得部
１２４学習部
２０１第一入力回路
２０２第一演算回路
２０３第一メモリ
２０４出力回路
２１１ライトフィールドカメラ
２１１ａマルチピンホールマスク
２１１ａａピンホール
２１１ｂイメージセンサ
２２１第二入力回路
２２２第三入力回路
２２３第二演算回路
２２４第二メモリ
２３１，２３２，２３２ａ，２３２ｂ光軸
２４０ビームスプリッタ
３１１符号化開口マスク
１０２１テクスチャ情報識別部
１０２２奥行情報識別部
１０２３統合識別部 DESCRIPTION OF SYMBOLS 1,1A identification system 10 Image identification apparatus 11 Imaging part 12 Learning apparatus 101 Acquisition part 102 Identification part 103 Output part 121 First image acquisition part 122 Second image acquisition part 123 Identification correct answer acquisition part 124 Learning part 201 First input circuit 202 First arithmetic circuit 203 First memory 204 Output circuit 211 Light field camera 211a Multi-pinhole mask 211aa Pinhole 211b Image sensor 221 Second input circuit 222 Third input circuit 223 Second arithmetic circuit 224 Second memory 231, 232, 232a , 232b Optical axis 240 Beam splitter 311 Encoded aperture mask 1021 Texture information identification unit 1022 Depth information identification unit 1023 Integrated identification unit

Claims

A learning device comprising a memory and a processing circuit,
The processing circuit is
(A) obtaining a first calculated captured image including an imaging target and a surrounding environment of the imaging target from the memory, wherein the first calculated captured image includes a plurality of first pixels;
(B) acquiring a captured image including the imaging object and a surrounding environment of the imaging object from the memory, the captured image having a plurality of second pixels;
(C) Obtaining the identification result of the imaging object and the surrounding environment of the imaging object included in the captured image;
(D) An identification model for identifying the first calculated captured image based on the identification result of the captured image with reference to the correspondence relationship between the plurality of first pixels and the plurality of second pixels. Produces
(E) outputting the identification model to an image identification device for identifying a second calculated captured image;
Learning device.

The identification result includes a position in a plane of a surrounding environment of the imaging object and the imaging object,
The learning device according to claim 1.

The identification result includes a position in a depth direction of the surrounding environment of the imaging object and the imaging object,
The learning device according to claim 1 or 2.

The identification result includes category information to which the imaging object and the surrounding environment of the imaging object belong.
The learning device according to claim 1.

The first calculated captured image and the second calculated captured image are images including parallax information in which a plurality of surroundings of the imaging object and the imaging object are respectively superimposed.
The learning device according to any one of claims 1 to 4.

The first calculated captured image and the second calculated captured image are images of the imaging object and the surrounding environment of the imaging object by a multi-pinhole camera, a coded aperture camera, a light field camera, or a lensless camera. Is an image obtained by
The learning device according to claim 5.

The captured image is an image obtained by imaging the imaging object and the surrounding environment of the imaging object with a multi-view stereo camera.
The learning device according to claim 1.

8. The learning according to claim 1, wherein an optical axis of a camera used for imaging the first calculated captured image and an optical axis of a camera used for capturing the captured image substantially coincide with each other. apparatus.

The optical axis of the camera used for imaging the first calculated captured image and the optical axis of the camera used for capturing the captured image coincide with each other via a beam splitter, a prism, or a half mirror. The learning device described.

(A) a first calculated captured image including an imaging object and a surrounding environment of the imaging object, the first calculated captured image having a plurality of first pixels;
(B) A captured image including the imaging object and a surrounding environment of the imaging object, the captured image having a plurality of second pixels,
(C) Obtaining the identification result of the imaging object and the surrounding environment of the imaging object included in the captured image;
(D) An identification model for identifying the first calculated captured image based on the identification result of the captured image with reference to the correspondence relationship between the plurality of first pixels and the plurality of second pixels. Produces
(E) outputting the identification model to an image identification device for identifying a second calculated captured image;
Learning method.

A program for causing a computer to execute the learning method according to claim 10.