JPWO2020008684A1

JPWO2020008684A1 - Object recognition device, object recognition system, and program

Info

Publication number: JPWO2020008684A1
Application number: JP2020528688A
Authority: JP
Inventors: 由紀子柳川
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2018-07-04
Filing date: 2019-03-12
Publication date: 2021-07-08
Anticipated expiration: 2039-03-12
Also published as: JP7131612B2; WO2020008684A1

Abstract

物体認識装置（２００）は、物体の少なくとも一部の外形に沿った三次元位置を含む三次元情報を入力する入力部（２１）と、三次元情報に基づいて物体を認識する演算処理部（２２）と、を備え、演算処理部（２２）は、三次元情報に基づいて、三次元位置によって表される立体を複数の方向から見た複数の二次元情報を生成し、複数の二次元情報に基づいて物体を認識する。The object recognition device (200) includes an input unit (21) for inputting three-dimensional information including a three-dimensional position along the outer shape of at least a part of the object, and an arithmetic processing unit (21) for recognizing the object based on the three-dimensional information. 22), and the arithmetic processing unit (22) generates a plurality of two-dimensional information when the three-dimensional position represented by the three-dimensional position is viewed from a plurality of directions based on the three-dimensional information, and the plurality of two-dimensional information is generated. Recognize an object based on information.

Description

本開示は、物体を認識する物体認識装置、物体認識システム、及びプログラムに関する。 The present disclosure relates to an object recognition device, an object recognition system, and a program for recognizing an object.

非特許文献１は、ＶｏｘＮｅｔを開示している。ＶｏｘＮｅｔは、三次元畳み込みニューラルネットワーク（３ＤＣＮＮ）を使用した画像処理によって物体を認識する手法である。具体的には、ＶｏｘＮｅｔは、ＬｉＤＡＲ及びＲＧＢＤセンサ等から得られる三次元点群データを、所定サイズの三次元空間に写像して三次元情報を生成し、その三次元情報を三次元畳み込みニューラルネットワークに入力して物体を認識する手法である。 Non-Patent Document 1 discloses VoxNet. VoxNet is a method of recognizing an object by image processing using a three-dimensional convolutional neural network (3D CNN). Specifically, VoxNet maps three-dimensional point cloud data obtained from LiDAR, RGBD sensors, etc. into a three-dimensional space of a predetermined size to generate three-dimensional information, and the three-dimensional information is converted into a three-dimensional convolutional neural network. It is a method of recognizing an object by inputting to.

Daniel Maturana, Sebastian Scherer, "VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition", インターネット＜ＵＲＬ：https://www.ri.cmu.edu/pub_files/2015/9/voxnet_maturana_scherer_iros15.pdf＞Daniel Maturana, Sebastian Scherer, "VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition", Internet <URL: https://www.ri.cmu.edu/pub_files/2015/9/voxnet_maturana_scherer_iros15.pdf>

非特許文献１のような三次元畳み込みニューラルネットワークによる画像処理では、データ量が多く且つ大きなネットワークが必要であった。そのため、画像処理の処理負荷が大きく、物体認識の処理速度が遅かった。 Image processing by a three-dimensional convolutional neural network as in Non-Patent Document 1 requires a large amount of data and a large network. Therefore, the processing load of image processing is large, and the processing speed of object recognition is slow.

本開示の目的は、画像処理の処理負荷を低減して、物体認識の処理速度を向上させる、物体認識装置、物体認識システム、及びプログラムを提供することにある。 An object of the present disclosure is to provide an object recognition device, an object recognition system, and a program that reduce the processing load of image processing and improve the processing speed of object recognition.

本開示に係る物体認識装置は、物体の少なくとも一部の外形に沿った三次元位置を含む三次元情報を入力する入力部と、三次元情報に基づいて物体を認識する演算処理部と、を備え、演算処理部は、三次元情報に基づいて、三次元位置によって表される立体を複数の方向から見た二次元図面を示す複数の二次元情報を生成し、複数の二次元情報に基づいて物体を認識する。 The object recognition device according to the present disclosure includes an input unit for inputting three-dimensional information including a three-dimensional position along the outer shape of at least a part of the object, and an arithmetic processing unit for recognizing the object based on the three-dimensional information. The arithmetic processing unit generates a plurality of two-dimensional information showing a two-dimensional drawing of a solid represented by a three-dimensional position viewed from a plurality of directions based on the three-dimensional information, and is based on the plurality of two-dimensional information. Recognize an object.

本開示に係る物体認識システムは、物体までの距離を計測して三次元情報を生成するセンサと、上記物体認識装置と、を含む。 The object recognition system according to the present disclosure includes a sensor that measures a distance to an object and generates three-dimensional information, and the object recognition device.

本開示に係るプログラムは、物体の少なくとも一部の外形に沿った三次元位置を含む三次元情報を入力するステップと、三次元情報に基づいて、三次元位置によって表される立体を複数の方向から見た二次元図面を示す複数の二次元情報を生成するステップと、複数の二次元情報に基づいて物体を認識するステップと、をコンピュータに実行させる。 The program according to the present disclosure includes a step of inputting three-dimensional information including a three-dimensional position along the outer shape of at least a part of an object, and a plurality of directions of a solid represented by the three-dimensional position based on the three-dimensional information. The computer is made to perform a step of generating a plurality of two-dimensional information showing the two-dimensional drawing viewed from the above and a step of recognizing an object based on the plurality of two-dimensional information.

本開示に係る物体認識装置、物体認識システム、及びプログラムによると、物体の少なくとも一部の外形に沿った三次元位置によって表される立体を複数の方向から見た二次元図面を示す複数の二次元情報に基づいて物体を認識するため、画像処理の処理負荷が低減する。よって、物体認識の処理速度が向上する。 According to the object recognition device, the object recognition system, and the program according to the present disclosure, a plurality of two showing a two-dimensional drawing of a solid represented by a three-dimensional position along the outer shape of at least a part of an object as viewed from a plurality of directions. Since the object is recognized based on the dimensional information, the processing load of image processing is reduced. Therefore, the processing speed of object recognition is improved.

本開示に係る物体認識システムの適用例を説明するための図The figure for demonstrating the application example of the object recognition system which concerns on this disclosure. 実施形態１，２に係る物体認識システムの構成を例示するブロック図Block diagram illustrating the configuration of the object recognition system according to the first and second embodiments. 実施形態１，２に係る距離センサによる測距を説明するための図The figure for demonstrating the distance measurement by the distance sensor which concerns on Embodiments 1 and 2. 実施形態１に係る物体認識装置による物体認識処理の一例を示すフローチャートA flowchart showing an example of the object recognition process by the object recognition device according to the first embodiment. 実施形態１に係る距離画像と検出される認識対象領域の一例を示す図The figure which shows an example of the distance image which concerns on Embodiment 1 and the recognition target area which is detected. 実施形態１，２に係る三次元空間を説明するための図The figure for demonstrating the three-dimensional space which concerns on Embodiments 1 and 2. 実施形態１に係る占有グリッドを説明するための図The figure for demonstrating the occupancy grid which concerns on Embodiment 1. 図７の占有領域内のボクセルの平面図の一例を示す図The figure which shows an example of the plan view of the voxel in the occupied area of FIG. 図７の占有領域内のボクセルの正面図の一例を示す図The figure which shows an example of the front view of the voxel in the occupied area of FIG. 図７の占有領域内のボクセルの側面図の一例を示す図The figure which shows an example of the side view of the voxel in the occupied area of FIG. 図８Ａ〜図８Ｃの三面図に基づく合成図の生成を説明するための図The figure for demonstrating the generation of the composite drawing based on the three views of FIGS. 8A-8C. 実施形態１，２に係る畳み込みニューラルネットワークによる画像処理を説明するための図The figure for demonstrating the image processing by the convolutional neural network which concerns on Embodiments 1 and 2. 実施形態１に係る畳み込みニューラルネットワークの学習処理の一例を示すフローチャートA flowchart showing an example of the learning process of the convolutional neural network according to the first embodiment. 三面図とＶｏｘＮｅｔの要素数の比較を説明するための図A diagram for explaining a comparison between the three views and the number of elements of VoxNet. 実施形態２に係る物体認識装置による物体認識処理の一例を示すフローチャートA flowchart showing an example of the object recognition process by the object recognition device according to the second embodiment. 実施形態２に係る占有グリッドを説明するための図The figure for demonstrating the occupancy grid which concerns on Embodiment 2. 図１４の占有領域内のボクセルの六面図を示す図The figure which shows the hexagonal view of the voxel in the occupied area of FIG.

以下、添付の図面を参照して本開示に係る物体認識システムの実施の形態を説明する。なお、以下の各実施形態において、同様の構成要素については同一の符号を付している。 Hereinafter, embodiments of the object recognition system according to the present disclosure will be described with reference to the accompanying drawings. In each of the following embodiments, the same reference numerals are given to the same components.

（適用例）
本開示に係る物体認識システムが適用可能な一例について、図１を用いて説明する。図１は、本開示に係る物体認識システム１の適用例を説明するための図である。(Application example)
An example to which the object recognition system according to the present disclosure can be applied will be described with reference to FIG. FIG. 1 is a diagram for explaining an application example of the object recognition system 1 according to the present disclosure.

本開示に係る物体認識システム１は、例えば、車載用途に適用可能である。図１に示す例において、物体認識システム１は車両３に搭載される。車両３は、例えば、自動運転車であり、自動運転を行うための車両駆動装置２を備える。物体認識システム１は、例えば、車両３の進行方向にある物体４を認識する。物体４は、例えば、車、バス、バイク、自転車、歩行者、電柱、縁石、ガードレールである。 The object recognition system 1 according to the present disclosure can be applied to, for example, in-vehicle use. In the example shown in FIG. 1, the object recognition system 1 is mounted on the vehicle 3. The vehicle 3 is, for example, an autonomous driving vehicle, and includes a vehicle driving device 2 for performing automatic driving. The object recognition system 1 recognizes, for example, an object 4 in the traveling direction of the vehicle 3. The object 4 is, for example, a car, a bus, a motorcycle, a bicycle, a pedestrian, a utility pole, a curb, or a guardrail.

物体認識システム１は、車両３の進行方向に向けて光を投光し、物体４によって反射された反射光を受光する。物体認識システム１は、投光から受光までの時間差に基づいて、物体認識システム１から物体４までの距離を計測する。物体認識システム１は、計測した距離に基づいて、物体４の外形に沿った三次元位置を含むセンシングデータを生成する。 The object recognition system 1 projects light in the traveling direction of the vehicle 3 and receives the reflected light reflected by the object 4. The object recognition system 1 measures the distance from the object recognition system 1 to the object 4 based on the time difference from the light emission to the light reception. The object recognition system 1 generates sensing data including a three-dimensional position along the outer shape of the object 4 based on the measured distance.

物体認識システム１は、センシングデータに基づいて、三次元位置によって表される立体を複数の方向から見た二次元図面を示す複数の二次元情報を生成する。物体認識システム１は、複数の方向から見た二次元図面に基づいて物体を認識する。物体認識システム１は、例えば、物体４までの距離、方位、及び認識結果などを示す物体情報を車両駆動装置２に出力する。 The object recognition system 1 generates a plurality of two-dimensional information showing a two-dimensional drawing of a solid represented by a three-dimensional position viewed from a plurality of directions based on sensing data. The object recognition system 1 recognizes an object based on a two-dimensional drawing viewed from a plurality of directions. The object recognition system 1 outputs, for example, object information indicating the distance to the object 4, the direction, the recognition result, and the like to the vehicle driving device 2.

車両駆動装置２は、例えば、物体認識システム１から出力される物体情報に基づいて、道路上の物体４を回避して進行方向を設定して、車両３を駆動する操舵機構を含む。物体認識システム１によって物体４を認識することによって、車両駆動装置２は、物体４を回避しながら自動運転を行うことができる。 The vehicle driving device 2 includes, for example, a steering mechanism that drives the vehicle 3 by avoiding the object 4 on the road and setting the traveling direction based on the object information output from the object recognition system 1. By recognizing the object 4 by the object recognition system 1, the vehicle driving device 2 can perform automatic driving while avoiding the object 4.

（構成例）
以下、物体認識システム１の構成例としての実施形態を説明する。(Configuration example)
Hereinafter, embodiments of the object recognition system 1 as a configuration example will be described.

（実施形態１）
実施形態１に係る物体認識システム１の構成と動作を以下に説明する。(Embodiment 1)
The configuration and operation of the object recognition system 1 according to the first embodiment will be described below.

１．構成
本実施形態に係る物体認識システム１の構成について、図２及び図３を用いて説明する。図２は、物体認識システム１の構成を例示するブロック図である。図３は、距離センサ１００による測距を説明するための図である。1. 1. Configuration The configuration of the object recognition system 1 according to the present embodiment will be described with reference to FIGS. 2 and 3. FIG. 2 is a block diagram illustrating the configuration of the object recognition system 1. FIG. 3 is a diagram for explaining distance measurement by the distance sensor 100.

物体認識システム１は、距離センサ１００と物体認識装置２００とを含む。 The object recognition system 1 includes a distance sensor 100 and an object recognition device 200.

１．１距離センサの構成
距離センサ１００は、投光部１１、受光部１２、走査部１３、センサ制御部１４、及び入出力インタフェース部１５を含む。距離センサ１００は、例えば、ＬＩＤＡＲ（Light Detection and Ranging、あるいは、Laser Imaging Detection and Ranging）装置である。1.1 Configuration of distance sensor The distance sensor 100 includes a light emitting unit 11, a light receiving unit 12, a scanning unit 13, a sensor control unit 14, and an input / output interface unit 15. The distance sensor 100 is, for example, a LIDAR (Light Detection and Ranging or Laser Imaging Detection and Ranging) device.

投光部１１は、光を外部に投光する。具体的には、投光部１１は、センサ制御部１４の制御に従って、光の光束を外部に出射する。投光部１１は、例えば、１つ以上の光源素子で構成された光源と、光源をパルス駆動する光源駆動回路とを含む。光源素子は、例えば、レーザ光を発光する半導体レーザ（ＬＤ）である。光源素子は、ＬＥＤ等であってもよい。光源素子は、例えば、図３に示す垂直方向Ｙにおいて一列のアレイ状に配置され、投光部１１は投光領域Ｒ１１に向けて光を投光する。 The light projecting unit 11 projects light to the outside. Specifically, the light projecting unit 11 emits a light flux to the outside according to the control of the sensor control unit 14. The light projecting unit 11 includes, for example, a light source composed of one or more light source elements and a light source driving circuit for pulse-driving the light source. The light source element is, for example, a semiconductor laser (LD) that emits laser light. The light source element may be an LED or the like. For example, the light source elements are arranged in a row in an array in the vertical direction Y shown in FIG. 3, and the light projecting unit 11 projects light toward the light projecting region R11.

受光部１２は、外部から光を受光する。受光部１２は、複数の受光素子を備える。受光素子は、光を受光すると、受光量に応じた受光信号を生成する。複数の受光素子は、例えば、垂直方向Ｙに沿って一列のアレイ状に配置される。各受光素子は、例えば距離画像の１画素に対応し、１画素の垂直画角に応じた範囲から入射する光を別々に受光する。受光素子は、例えばＳＰＡＤ（単一光子アバランシェフォトダイオード）で構成される。受光素子は、ＰＤ（フォトダイオード）又はＡＰＤ（アバランシェフォトダイオード）で構成されてもよい。 The light receiving unit 12 receives light from the outside. The light receiving unit 12 includes a plurality of light receiving elements. When the light receiving element receives light, it generates a light receiving signal according to the amount of light received. The plurality of light receiving elements are arranged in a row in an array along the vertical direction Y, for example. Each light receiving element corresponds to, for example, one pixel of a distance image, and separately receives light incident from a range corresponding to the vertical angle of view of one pixel. The light receiving element is composed of, for example, a SPAD (single photon avalanche photodiode). The light receiving element may be composed of a PD (photodiode) or an APD (avalanche photodiode).

走査部１３は、例えば、ミラーと、垂直方向Ｙに沿った回転軸の周りにミラーを回転させる回転機構と、回転機構を駆動する走査駆動回路と、を含む。走査駆動回路は、センサ制御部１４の制御により、ミラーを回転駆動する。これにより、走査部１３は、投光する方向を一定時間ごとに少しずつ変化させて、光が進行する光路を少しずつ移動させる。例えば、図３に示すように、走査部１３は、投光領域Ｒ１１を水平方向Ｘにおいてシフトさせる。 The scanning unit 13 includes, for example, a mirror, a rotation mechanism that rotates the mirror around a rotation axis along the vertical direction Y, and a scanning drive circuit that drives the rotation mechanism. The scanning drive circuit rotationally drives the mirror under the control of the sensor control unit 14. As a result, the scanning unit 13 gradually changes the direction of light projection at regular time intervals, and gradually moves the optical path through which the light travels. For example, as shown in FIG. 3, the scanning unit 13 shifts the light projecting region R11 in the horizontal direction X.

センサ制御部１４は、半導体素子などで実現可能である。センサ制御部１４は、例えば、マイコン、ＣＰＵ、ＭＰＵ、ＧＰＵ、ＤＳＰ、ＦＰＧＡ、ＡＳＩＣで構成することができる。センサ制御部１４の機能は、ハードウェアのみで構成してもよいし、ハードウェアとソフトウェアとを組み合わせることにより実現してもよい。センサ制御部１４は、距離センサ１００内の記憶部に格納されたデータやプログラムを読み出して種々の演算処理を行うことで、所定の機能を実現する。 The sensor control unit 14 can be realized by a semiconductor element or the like. The sensor control unit 14 can be composed of, for example, a microcomputer, a CPU, an MPU, a GPU, a DSP, an FPGA, and an ASIC. The function of the sensor control unit 14 may be configured only by hardware, or may be realized by combining hardware and software. The sensor control unit 14 realizes a predetermined function by reading data or a program stored in a storage unit in the distance sensor 100 and performing various arithmetic processes.

センサ制御部１４は、投光部１１による投光のタイミングを制御する。センサ制御部１４は、投光のタイミングと受光部１２から得られる受光信号とに基づいて、投光してからの経過時間に応じた受光量を示す受光波形のデータを画素毎に生成する。センサ制御部１４は、受光波形に基づいて画素毎に距離を算出する。例えば、センサ制御部１４は、投光部１１から投光された光が反射されて受光部１２によって受光されるまでの光の飛行時間を受光波形に基づいて計測する。センサ制御部１４は、計測した飛行時間に基づいて、基準点５から光を反射した物体の外形までの距離を算出する。基準点５は、例えば、投光部１１の光の出射口である。センサ制御部１４は、画素毎に測定した距離に基づいて、距離画像を生成する。 The sensor control unit 14 controls the timing of light projection by the light projection unit 11. The sensor control unit 14 generates data of a light receiving waveform indicating the amount of light received according to the elapsed time after the light is projected, for each pixel, based on the timing of the light projection and the light receiving signal obtained from the light receiving unit 12. The sensor control unit 14 calculates the distance for each pixel based on the received light waveform. For example, the sensor control unit 14 measures the flight time of the light from the light projected from the light projecting unit 11 to being reflected by the light receiving unit 12 based on the received light waveform. The sensor control unit 14 calculates the distance from the reference point 5 to the outer shape of the object that reflects light based on the measured flight time. The reference point 5 is, for example, a light outlet of the light projecting unit 11. The sensor control unit 14 generates a distance image based on the distance measured for each pixel.

センサ制御部１４は、距離画像の画角に対応した投影面Ｒ１０を、水平方向Ｘに走査しながら測距を行い、距離画像を生成する。距離画像の分解能すなわち画素毎の画角は、例えば、水平方向Ｘにおいて１．０度〜１．６度であり、垂直方向Ｙにおいて０．３度〜１．２度である。投影面Ｒ１０の走査を繰り返すことにより、所望のフレームレートで距離画像を順次、生成することができる。センサ制御部１４は、例えば、生成した距離画像をセンシングデータとして、入出力インタフェース部１５を介して、物体認識装置２００に出力する。 The sensor control unit 14 measures the distance while scanning the projection surface R10 corresponding to the angle of view of the distance image in the horizontal direction X, and generates a distance image. The resolution of the distance image, that is, the angle of view for each pixel is, for example, 1.0 to 1.6 degrees in the horizontal direction X and 0.3 to 1.2 degrees in the vertical direction Y. By repeating the scanning of the projection surface R10, distance images can be sequentially generated at a desired frame rate. For example, the sensor control unit 14 outputs the generated distance image as sensing data to the object recognition device 200 via the input / output interface unit 15.

入出力インタフェース部１５は、所定の通信規格に準拠して外部機器との通信を行う回路を含む。所定の通信規格は、例えば、ＬＡＮ、Ｗｉ−Ｆｉ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＵＳＢ、及びＨＤＭＩ（登録商標）を含む。 The input / output interface unit 15 includes a circuit that communicates with an external device in accordance with a predetermined communication standard. Predetermined communication standards include, for example, LAN, Wi-Fi®, Bluetooth®, USB, and HDMI®.

１．２物体認識装置の構成
物体認識装置２００は、例えばＰＣや種々の情報端末などの情報処理装置である。物体認識装置２００は、入出力インタフェース部２１、演算処理部２２、及び記憶部２３を備える。1.2 Configuration of Object Recognition Device The object recognition device 200 is an information processing device such as a PC or various information terminals. The object recognition device 200 includes an input / output interface unit 21, an arithmetic processing unit 22, and a storage unit 23.

入出力インタフェース部２１は、例えば、機器インタフェースとネットワークインタフェースを含む。機器インタフェースは、物体認識装置２００に、距離センサ１００等の外部機器を接続するための回路（モジュール）である。機器インタフェースは、所定の通信規格にしたがい通信を行う取得部の一例である。所定の規格には、ＵＳＢ、ＨＤＭＩ（登録商標）、ＩＥＥＥ１３９５、Ｗｉ−Ｆｉ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）等が含まれる。ネットワークインタフェースは、無線または有線の通信回線を介して物体認識装置２００を通信ネットワークに接続するための回路（モジュール）である。ネットワークインタフェースは所定の通信規格に準拠した通信を行う取得部の一例である。所定の通信規格には、ＩＥＥＥ８０２．３，ＩＥＥＥ８０２．１１ａ／１１ｂ／１１ｇ／１１ａｃ等の通信規格が含まれる。入出力インタフェース部２１は、物体の少なくとも一部の外形に沿った三次元位置を含む三次元情報を入力する入力部の一例である。 The input / output interface unit 21 includes, for example, a device interface and a network interface. The device interface is a circuit (module) for connecting an external device such as a distance sensor 100 to the object recognition device 200. The device interface is an example of an acquisition unit that performs communication according to a predetermined communication standard. Predetermined standards include USB, HDMI®, IEEE1395, Wi-Fi®, Bluetooth® and the like. The network interface is a circuit (module) for connecting the object recognition device 200 to the communication network via a wireless or wired communication line. The network interface is an example of an acquisition unit that performs communication conforming to a predetermined communication standard. Predetermined communication standards include communication standards such as IEEE802.3 and IEEE802.11a / 11b / 11g / 11ac. The input / output interface unit 21 is an example of an input unit that inputs three-dimensional information including a three-dimensional position along the outer shape of at least a part of an object.

演算処理部２２は、ソフトウェアと協働して所定の機能を実現するＣＰＵやＧＰＵを含み、物体認識装置２００の全体動作を制御する。演算処理部２２は、記憶部２３に格納されたデータやプログラムを読み出して種々の演算処理を行い、各種の機能を実現する。演算処理部２２の機能は、ハードウェアのみで構成してもよいし、ハードウェアとソフトウェアとを組み合わせることにより実現してもよい。演算処理部２２は、後述する畳み込みニューラルネットワークを構築するプログラムを実行する。プログラムは、記憶部２３に格納されている。畳み込みニューラルネットワークを構築するプログラムは、各種の通信ネットワークから提供されてもよいし、又は可搬性を有する記録媒体に格納されていてもよい。 The arithmetic processing unit 22 includes a CPU and a GPU that cooperate with software to realize a predetermined function, and controls the overall operation of the object recognition device 200. The arithmetic processing unit 22 reads data and programs stored in the storage unit 23 and performs various arithmetic processing to realize various functions. The function of the arithmetic processing unit 22 may be configured only by hardware, or may be realized by combining hardware and software. The arithmetic processing unit 22 executes a program for constructing a convolutional neural network, which will be described later. The program is stored in the storage unit 23. The program for constructing the convolutional neural network may be provided from various communication networks, or may be stored in a portable recording medium.

演算処理部２２は、所定の機能を実現するように設計された専用の電子回路や再構成可能な電子回路などのハードウェア回路であってもよい。演算処理部２２は、ＣＰＵ、ＧＰＵの他に、ＭＰＵ、ＧＰＧＰＵ、ＴＰＵ、マイコン、ＤＳＰ、ＦＰＧＡ、ＡＳＩＣ等の種々の半導体集積回路で構成されてもよい。 The arithmetic processing unit 22 may be a hardware circuit such as a dedicated electronic circuit or a reconfigurable electronic circuit designed to realize a predetermined function. In addition to the CPU and GPU, the arithmetic processing unit 22 may be composed of various semiconductor integrated circuits such as MPU, GPGPU, TPU, microcomputer, DSP, FPGA, and ASIC.

演算処理部２２は、機能的構成として、領域検出部２２ａ、占有グリッド生成部２２ｂ、２Ｄ図面生成部２２ｃ、及び物体認識部２２ｄを含む。領域検出部２２ａは、距離画像において１つの物体が存在する領域を、物体の認識対象領域として検出する。占有グリッド生成部２２ｂは、検出された認識対象領域内の物体をボクセルの集合体によって表した占有グリッドを示す占有グリッドデータを生成する。２Ｄ図面生成部２２ｃは、占有グリッドデータに基づいて、ボクセル集合体の二次元図面を生成する。本実施形態においては、２Ｄ図面生成部２２ｃは、三面図を生成する。三面図は、複数の二次元情報の一例である。物体認識部２２ｄは、三面図に基づき畳み込みニューラルネットワークによる画像処理を実行して、物体の種別を認識する。 The arithmetic processing unit 22 includes an area detection unit 22a, an occupied grid generation unit 22b, a 2D drawing generation unit 22c, and an object recognition unit 22d as functional configurations. The area detection unit 22a detects an area in which one object exists in the distance image as an object recognition target area. The occupancy grid generation unit 22b generates occupancy grid data indicating the occupancy grid in which the detected object in the recognition target area is represented by an aggregate of voxels. The 2D drawing generation unit 22c generates a two-dimensional drawing of the voxel aggregate based on the occupied grid data. In the present embodiment, the 2D drawing generation unit 22c generates a three-view drawing. The three-view view is an example of a plurality of two-dimensional information. The object recognition unit 22d executes image processing by a convolutional neural network based on the three views to recognize the type of the object.

記憶部２３は、所定の機能を実現するために必要なパラメータ、データ及び制御プログラム等を格納する。例えば、記憶部２３は、畳み込みニューラルネットワークのためのプログラム、学習中及び学習済みのパラメータなどを格納する。記憶部２３は、例えば、ハードディスク（ＨＤＤ）、ＳＳＤ、ＲＡＭ、ＤＲＡＭ、ＳＲＡＭ、強誘電体メモリ、フラッシュメモリ、磁気ディスク、又はこれらの組み合わせによって実現できる。記憶部２３は、各種情報を一時的に記憶してもよい。記憶部２３は、例えば、演算処理部２２の作業エリアとして機能するように構成されてもよい。 The storage unit 23 stores parameters, data, a control program, and the like necessary for realizing a predetermined function. For example, the storage unit 23 stores a program for a convolutional neural network, learning and learned parameters, and the like. The storage unit 23 can be realized by, for example, a hard disk (HDD), SSD, RAM, DRAM, SRAM, ferroelectric memory, flash memory, magnetic disk, or a combination thereof. The storage unit 23 may temporarily store various types of information. The storage unit 23 may be configured to function as, for example, a work area of the arithmetic processing unit 22.

物体認識装置２００は、ユーザが操作を行うユーザインタフェースである操作部を備えてもよい。操作部は、例えば、キーボード、タッチパッド、タッチパネル、ボタン、スイッチ、及びこれらの組み合わせで構成される。物体認識装置２００は、液晶ディスプレイや有機ＥＬディスプレイで構成される表示部を備えてもよい。物体認識装置２００は、音声を出力するスピーカを備えてもよい。 The object recognition device 200 may include an operation unit that is a user interface for the user to operate. The operation unit is composed of, for example, a keyboard, a touch pad, a touch panel, buttons, switches, and a combination thereof. The object recognition device 200 may include a display unit composed of a liquid crystal display or an organic EL display. The object recognition device 200 may include a speaker that outputs voice.

２．動作
２．１物体認識処理
以上のように構成される物体認識システム１の物体認識処理に関する動作について、図４〜図１０を参照して説明する。2. Operation 2.1 Object recognition processing The operation related to the object recognition processing of the object recognition system 1 configured as described above will be described with reference to FIGS. 4 to 10.

図４は、物体認識装置２００の演算処理部２２の動作を例示するフローチャートである。 FIG. 4 is a flowchart illustrating the operation of the arithmetic processing unit 22 of the object recognition device 200.

領域検出部２２ａは、距離センサ１００が生成したセンシングデータを、入出力インタフェース部２１を介して取得する（Ｓ１０１）。本実施形態において、センシングデータは、基準点５から物体の外形までの距離を示す距離画像である。図５に、距離画像３０の一例を示している。距離画像３０は、水平方向Ｘ及び垂直方向Ｙに並んだ画素毎に、奥行き方向Ｚの距離を示す。すなわち、距離画像３０において、各画素は、奥行き方向Ｚの距離を示す画素値を有する。距離画像３０において、例えば、奥行き方向Ｚの距離は各画素の色によって識別される。一例では、距離が近いほど赤色になり（例えば、画素値がＲＧＢ＝（２５５，０，０）に近づき）、距離が遠いほど青色になる（例えば、画素値がＲＧＢ＝（０，０，２５５）に近づく）。距離画像３０は、物体の少なくとも一部の外形に沿った三次元位置を含む三次元情報の一例である。 The area detection unit 22a acquires the sensing data generated by the distance sensor 100 via the input / output interface unit 21 (S101). In the present embodiment, the sensing data is a distance image showing the distance from the reference point 5 to the outer shape of the object. FIG. 5 shows an example of the distance image 30. The distance image 30 shows the distance in the depth direction Z for each pixel arranged in the horizontal direction X and the vertical direction Y. That is, in the distance image 30, each pixel has a pixel value indicating a distance in the depth direction Z. In the distance image 30, for example, the distance in the depth direction Z is identified by the color of each pixel. In one example, the closer the distance, the redder (eg, the pixel value approaches RGB = (255,0,0)), and the farther the distance, the bluer (eg, the pixel value RGB = (0,0,255)). ) Approach). The distance image 30 is an example of three-dimensional information including a three-dimensional position along the outer shape of at least a part of an object.

領域検出部２２ａは、センシングデータに基づいて、１つの物体が存在する領域を認識対象領域として検出する（Ｓ１０２）。認識対象領域の検出は、公知の技術を用いて行うことができる。例えば、領域検出部２２ａは、距離画像３０において、各画素の画素値と周囲８画素の画素値との差が所定の閾値以下の場合に同一の物体である判断して、認識対象領域３５を検出する。 The region detection unit 22a detects a region in which one object exists as a recognition target region based on the sensing data (S102). The recognition target region can be detected by using a known technique. For example, in the distance image 30, the area detection unit 22a determines that the object is the same when the difference between the pixel value of each pixel and the pixel value of the surrounding eight pixels is equal to or less than a predetermined threshold value, and determines that the recognition target area 35 is the same object. To detect.

占有グリッド生成部２２ｂは、所定サイズの三次元空間を定義する（Ｓ１０３）。図６は、三次元空間４０を模式的に示している。占有グリッド生成部２２ｂは、例えば、仮想的な座標系（ｘ，ｙ，ｚ）において３２×３２×３２個のボクセル４１で構成される立方体の三次元空間４０を定義する。 The occupied grid generation unit 22b defines a three-dimensional space of a predetermined size (S103). FIG. 6 schematically shows the three-dimensional space 40. The occupied grid generation unit 22b defines, for example, a cubic three-dimensional space 40 composed of 32 × 32 × 32 voxels 41 in a virtual coordinate system (x, y, z).

占有グリッド生成部２２ｂは、検出した認識対象領域３５と定義した三次元空間４０とに基づいて、占有グリッドを示す占有グリッドデータを生成する（Ｓ１０４）。図７は、占有グリッド４５の一例を模式的に示している。占有グリッド４５は、三次元空間４０内において占有領域４６と非占有領域４７とが区別された、ボクセル４１の集合体である。占有領域４６は物体が存在する領域を示し、非占有領域４７は物体が存在しない領域を示す。占有グリッド生成部２２ｂは、例えば、距離画像３０における座標系を座標変換して、認識対象領域３５内の各画素を三次元空間４０のボクセル４１に対応付ける。占有グリッド生成部２２ｂは、認識対象領域３５内の距離値を示す各画素が三次元空間４０のいずれのボクセル４１に対応するかを判断して、各ボクセル４１にフラグを付与することによって、占有グリッド４５を示す占有グリッドデータを生成する。フラグは、例えば、物体であるか否かを示す二値である。占有領域４６内のボクセル４１で構成される立体が、物体の少なくとも一部分に相当する。 The occupied grid generation unit 22b generates occupied grid data indicating the occupied grid based on the detected recognition target area 35 and the defined three-dimensional space 40 (S104). FIG. 7 schematically shows an example of the occupied grid 45. The occupied grid 45 is an aggregate of voxels 41 in which the occupied area 46 and the unoccupied area 47 are distinguished in the three-dimensional space 40. The occupied area 46 indicates an area where an object exists, and the unoccupied area 47 indicates an area where an object does not exist. The occupied grid generation unit 22b, for example, transforms the coordinate system in the distance image 30 to associate each pixel in the recognition target area 35 with the voxel 41 in the three-dimensional space 40. The occupied grid generation unit 22b determines which voxel 41 in the three-dimensional space 40 corresponds to each pixel indicating the distance value in the recognition target area 35, and assigns a flag to each voxel 41 to occupy the voxel 41. Occupied grid data indicating grid 45 is generated. The flag is, for example, a binary indicating whether or not it is an object. A solid composed of voxels 41 in the occupied area 46 corresponds to at least a part of an object.

２Ｄ図面生成部２２ｃは、占有領域４６内のボクセル４１で構成される立体の三面図を生成する（Ｓ１０５）。例えば、２Ｄ図面生成部２２ｃは、図７において、占有領域４６内のボクセル４１を方向Ａ，Ｂ，Ｃから見た二次元図面を生成する。図８Ａは、方向Ａから見た平面図５０Ａである。図８Ｂは、方向Ｂから見た正面図５０Ｂである。図８Ｃは、方向Ｃから見た側面図５０Ｃである。平面図５０Ａ、正面図５０Ｂ、及び側面図５０Ｃをまとめて三面図５０Ａ，５０Ｂ，５０Ｃとも称する。２Ｄ図面生成部２２ｃは、図７に示す方向Ａ，Ｂ，Ｃにおいて、三次元空間４０の立方体を構成する平面４４ａ，４４ｂ，４４ｃから、占有領域４６内の各ボクセルまでの距離に応じた画素値を有する三面図５０Ａ，５０Ｂ，５０Ｃを生成する。図８Ａ〜図８Ｃの例では、距離が近いほど淡い色になり（例えば、画素値が２５５に近づき）、距離が遠いほど濃い色になる（例えば、画素値が０に近づく）。なお、距離が近いほど濃い色になり、距離が遠いほど淡い色になるようにしてもよい。図８Ａ〜図８Ｃにおいて、物体が存在しない画素については白色で表しているが、黒色で表してもよい。また、三面図５０Ａ，５０Ｂ，５０Ｃは、例えば、ＲＧＢのいずれかで表される画素値を有する。一例では、平面図５０Ａは赤色、正面図５０Ｂは緑色、側面図５０Ｃは青色である。 The 2D drawing generation unit 22c generates a three-dimensional view of a solid composed of voxels 41 in the occupied area 46 (S105). For example, the 2D drawing generation unit 22c generates a two-dimensional drawing in which the voxels 41 in the occupied area 46 are viewed from the directions A, B, and C in FIG. 7. FIG. 8A is a plan view 50A viewed from the direction A. FIG. 8B is a front view 50B seen from the direction B. FIG. 8C is a side view 50C seen from the direction C. The plan view 50A, the front view 50B, and the side view 50C are collectively referred to as three views 50A, 50B, and 50C. The 2D drawing generation unit 22c is a pixel corresponding to the distance from the planes 44a, 44b, 44c forming the cube of the three-dimensional space 40 to each voxel in the occupied area 46 in the directions A, B, and C shown in FIG. Generate three views 50A, 50B, 50C with values. In the examples of FIGS. 8A to 8C, the shorter the distance, the lighter the color (for example, the pixel value approaches 255), and the farther the distance, the darker the color (for example, the pixel value approaches 0). The closer the distance, the darker the color, and the farther the distance, the lighter the color. In FIGS. 8A to 8C, the pixels in which no object exists are shown in white, but may be shown in black. Further, the three views 50A, 50B, and 50C have pixel values represented by, for example, RGB. In one example, the plan view 50A is red, the front view 50B is green, and the side view 50C is blue.

２Ｄ図面生成部２２ｃは、三面図５０Ａ，５０Ｂ，５０Ｃを合成して１つの合成図を生成する（Ｓ１０６）。図９に、三面図５０Ａ，５０Ｂ，５０Ｃから生成される合成図５０Ｄの一例を示している。２Ｄ図面生成部２２ｃは、合成図５０の各画素の画素値を、三面図５０Ａ，５０Ｂ，５０Ｃの各画素の画素値に基づいて決定する。よって、例えば、合成図５０において、距離が近いほど淡い色になり、距離が遠いほど濃い色になる。また、合成図５０の各画素の色は、平面図５０Ａ、正面図５０Ｂ、及び側面図５０Ｃの赤色、緑色、及び青色に基づく色となる。 The 2D drawing generation unit 22c synthesizes the three views 50A, 50B, and 50C to generate one composite drawing (S106). FIG. 9 shows an example of the composite diagram 50D generated from the three views 50A, 50B, and 50C. The 2D drawing generation unit 22c determines the pixel value of each pixel in the composite drawing 50 based on the pixel value of each pixel in the three views 50A, 50B, and 50C. Therefore, for example, in the composite drawing 50, the shorter the distance, the lighter the color, and the farther the distance, the darker the color. Further, the color of each pixel in the composite view 50 is a color based on the red, green, and blue in the plan view 50A, the front view 50B, and the side view 50C.

物体認識部２２ｄは、合成図５０Ｄを学習済みの畳み込みニューラルネットワークに入力して、物体の種別を認識する（Ｓ１０７）。図１０は、畳み込みニューラルネットワーク６０による画像処理を説明するための図である。畳み込みニューラルネットワーク６０は、予め、車、バス、歩行者、電柱、縁石などの物体を示す合成図を使用して、物体を認識するように学習されたものである。畳み込みニューラルネットワーク６０の学習方法については後述する。学習済みの畳み込みニューラルネットワーク６０を構築するプログラム及びパラメータは、例えば、記憶部２３に格納されている。物体認識部２２ｄは、畳み込みニューラルネットワーク６０を使用した画像処理を実行することによって、合成図５０Ｄから、車、バス、歩行者、電柱、縁石、ガードレール等である確率を算出する。畳み込みニューラルネットワーク６０は、例えば、入力側から出力側へ順番に、畳み込み層Ｌ１，Ｌ２、全結合層Ｌ３，Ｌ４、及び出力層Ｌ５を含む。図１０の例では、畳み込み層Ｌ１，Ｌ２及び全結合層Ｌ３，Ｌ４の数はそれぞれ２層であるが、層の数は２層に限定しない。また、畳み込み層Ｌ１，Ｌ２の後段にプーリング層があってもよい。 The object recognition unit 22d inputs the composite diagram 50D into the trained convolutional neural network and recognizes the type of the object (S107). FIG. 10 is a diagram for explaining image processing by the convolutional neural network 60. The convolutional neural network 60 has been learned in advance to recognize an object by using a composite diagram showing an object such as a car, a bus, a pedestrian, a utility pole, or a curb. The learning method of the convolutional neural network 60 will be described later. The program and parameters for constructing the learned convolutional neural network 60 are stored in, for example, the storage unit 23. The object recognition unit 22d calculates the probability of being a car, a bus, a pedestrian, a utility pole, a curb, a guardrail, etc. from the composite diagram 50D by executing image processing using the convolutional neural network 60. The convolutional neural network 60 includes, for example, convolutional layers L1 and L2, fully connected layers L3 and L4, and an output layer L5 in order from the input side to the output side. In the example of FIG. 10, the number of convolution layers L1 and L2 and the number of fully connected layers L3 and L4 are two, respectively, but the number of layers is not limited to two. Further, there may be a pooling layer after the convolution layers L1 and L2.

一層目の畳み込み層Ｌ１に、合成図５０Ｄが入力される。各畳み込み層Ｌ１，Ｌ２では、それぞれのフィルタを用いた畳み込み演算が行われる。畳み込み層Ｌ１，Ｌ２のフィルタは、重み付け係数の二次元配列で規定される。出力層Ｌ５からは物体の認識結果が出力される。例えば、物体が、車、バス、歩行者、電柱、縁石、ガードレール等である確率を示すベクトルが出力される。 The composite diagram 50D is input to the convolution layer L1 of the first layer. In each convolution layer L1 and L2, a convolution operation using each filter is performed. The filters of the convolution layers L1 and L2 are defined by a two-dimensional array of weighting coefficients. The object recognition result is output from the output layer L5. For example, a vector indicating the probability that an object is a car, a bus, a pedestrian, a utility pole, a curb, a guardrail, or the like is output.

物体認識部２２ｄは、認識結果を出力する（Ｓ１０８）。例えば、物体認識部２２ｄは、出力層Ｌ５から出力される確率の中で最も確率が高い物体が、合成図５０Ｄに写っている物体であると特定し、入出力インタフェース部２１を介して、特定した物体の種別を示す物体情報を車両駆動装置２に出力する。物体認識装置２００が表示部を備える場合は、認識結果である物体の種別を表示部の画面に表示してもよい。物体認識装置２００がスピーカを備える場合は、認識結果である物体の種別をスピーカから音声で出力してもよい。 The object recognition unit 22d outputs the recognition result (S108). For example, the object recognition unit 22d identifies the object having the highest probability among the probabilities output from the output layer L5 as the object shown in the composite diagram 50D, and identifies the object via the input / output interface unit 21. The object information indicating the type of the object is output to the vehicle driving device 2. When the object recognition device 200 includes a display unit, the type of the object as the recognition result may be displayed on the screen of the display unit. When the object recognition device 200 includes a speaker, the type of the object as the recognition result may be output by voice from the speaker.

２．２学習処理
図１１は、畳み込みニューラルネットワーク６０の学習処理を示している。例えば、演算処理部２２が、物体認識処理を実行する前に、図１１に示す学習処理を行って畳み込みニューラルネットワーク６０を学習させる。2.2 Learning process FIG. 11 shows the learning process of the convolutional neural network 60. For example, the arithmetic processing unit 22 performs the learning process shown in FIG. 11 to learn the convolutional neural network 60 before executing the object recognition process.

演算処理部２２は、学習用の三面図と三面図に対応する正解ラベルを示すデータを取得する（Ｓ２０１）。例えば、演算処理部２２は、予め、三面図と三面図に対応する正解ラベルを示す学習用データを、入出力インタフェース部２１を介して取得して、記憶部２３に格納しておく。ステップＳ２０１において、演算処理部２２は、記憶部２３から学習用データを読み出す。正解ラベルは、例えば、車、バス、歩行者、電柱、縁石、ガードレールである。 The arithmetic processing unit 22 acquires the learning three-view drawing and the data indicating the correct answer label corresponding to the three-view drawing (S201). For example, the arithmetic processing unit 22 acquires learning data indicating the three views and the correct answer labels corresponding to the three views in advance via the input / output interface unit 21 and stores them in the storage unit 23. In step S201, the arithmetic processing unit 22 reads the learning data from the storage unit 23. Correct labels are, for example, cars, buses, pedestrians, utility poles, curbs, and guardrails.

演算処理部２２は、三面図を合成して合成図を生成する（Ｓ２０２）。演算処理部２２は、合成図を畳み込みニューラルネットワーク６０に入力して物体の種別を認識する（Ｓ２０３）。 The arithmetic processing unit 22 synthesizes the three views to generate a composite drawing (S202). The arithmetic processing unit 22 inputs the composite diagram into the convolutional neural network 60 and recognizes the type of the object (S203).

演算処理部２２は、認識結果と正解ラベルとに基づいて、畳み込みニューラルネットワーク６０のパラメータを調整する（Ｓ２０４）。例えば、演算処理部２２は、誤差逆伝播法に従って、畳み込み層Ｌ１，Ｌ２のフィルタの重み付け係数と全結合層Ｌ３，Ｌ４のニューロン間の重み付け係数を調整する。 The arithmetic processing unit 22 adjusts the parameters of the convolutional neural network 60 based on the recognition result and the correct answer label (S204). For example, the arithmetic processing unit 22 adjusts the weighting coefficient of the filter of the convolution layers L1 and L2 and the weighting coefficient between the neurons of the fully connected layers L3 and L4 according to the backpropagation method.

演算処理部２２は、所定回数の学習が終了したか否かを判断する（Ｓ２０５）。所定回数の学習が終了するまで、ステップＳ２０１〜Ｓ２０４を繰り返す。所定回数の学習が終了すれば、図１１に示す学習処理を終了する。演算処理部２２は、学習済みの畳み込みニューラルネットワーク６０に対応するプログラム及びパラメータを、記憶部２３に格納する。 The arithmetic processing unit 22 determines whether or not the predetermined number of learnings has been completed (S205). Steps S201 to S204 are repeated until a predetermined number of times of learning is completed. When the learning of a predetermined number of times is completed, the learning process shown in FIG. 11 is completed. The arithmetic processing unit 22 stores the programs and parameters corresponding to the learned convolutional neural network 60 in the storage unit 23.

図１１の例では、ステップＳ２０１において三面図を取得し、ステップＳ２０２において合成図を生成したが、ステップＳ２０１において合成図を取得してもよい。この場合、ステップＳ２０２は省略する。 In the example of FIG. 11, the three views were acquired in step S201 and the composite drawing was generated in step S202, but the composite drawing may be acquired in step S201. In this case, step S202 is omitted.

２．３三面図とＶｏｘＮｅｔの要素数の比較
図１２は、物体認識に使用される要素の数を比較した図であって、従来のＶｏｘＮｅｔにおける要素数（ボクセル数）と、本開示の三面図における要素数（画素数）を示している。一辺の要素が３２個の場合、ＶｏｘＮｅｔの要素数は３２×３２×３２＝３２７６８となり、三面図では３２×３２×３＝３０７２となる。よって、三面図の要素数は、ＶｏｘＮｅｔの要素数の約１／１０である。一辺の要素が６４個の場合は、ＶｏｘＮｅｔの要素数は６４×６４×６４＝２６２１４４となり、三面図では６４×６４×３＝１２２８８となる。よって、三面図の要素数は、ＶｏｘＮｅｔの要素数の約１／２０となる。一辺のボクセルが１２８個の場合は、ＶｏｘＮｅｔの要素数は２０９７１５２となり、三面図では４９１５２となる。よって、三面図の要素数は、ＶｏｘＮｅｔの要素数の約１／４０となる。このように、三面図にすることによって要素数が減少するため、三面図の画像を用いた二次元のＣＮＮによる画像処理は、三次元のＣＮＮによる画像処理よりも、処理負荷が低減する。よって、物体認識の処理速度が向上する。2.3 Comparison of the number of elements in the three-view drawing and the VoxNet FIG. 12 is a diagram comparing the number of elements used for object recognition, and is a diagram comparing the number of elements (the number of voxels) in the conventional VoxNet and the three-view drawing of the present disclosure. Indicates the number of elements (number of pixels) in. When there are 32 elements on one side, the number of elements of VoxNet is 32 × 32 × 32 = 32768, and 32 × 32 × 3 = 3072 in the three-view view. Therefore, the number of elements in the three views is about 1/10 of the number of elements in VoxNet. When there are 64 elements on one side, the number of elements of VoxNet is 64 × 64 × 64 = 262144, and in the three-view drawing, it is 64 × 64 × 3 = 12288. Therefore, the number of elements in the three views is about 1/20 of the number of elements in VoxNet. When there are 128 voxels on one side, the number of VoxNet elements is 2097152, which is 49152 in the three-view drawing. Therefore, the number of elements in the three-view drawing is about 1/40 of the number of elements in VoxNet. As described above, since the number of elements is reduced by forming the three-view drawing, the processing load of the image processing by the two-dimensional CNN using the image of the three-view drawing is reduced as compared with the image processing by the three-dimensional CNN. Therefore, the processing speed of object recognition is improved.

距離センサ１００による基準点５から投影面Ｒ１０への方向における距離の計測では、物体の表側（投影面Ｒ１０において基準点５がある側）しか計測できず、物体の裏側（投影面Ｒ１０において基準点５と反対側）の距離を計測することができない。すなわち、占有領域４６は、物体の全体には対応していない。よって、占有グリッド４５を三面図に変換しても、物体の距離値の情報量が大幅に低減することはなく、精度良く物体を認識することができる。 In the measurement of the distance in the direction from the reference point 5 to the projection surface R10 by the distance sensor 100, only the front side of the object (the side where the reference point 5 is located on the projection surface R10) can be measured, and the back side of the object (the reference point on the projection surface R10) can be measured. The distance on the opposite side of 5) cannot be measured. That is, the occupied area 46 does not correspond to the entire object. Therefore, even if the occupied grid 45 is converted into a three-view drawing, the amount of information on the distance value of the object is not significantly reduced, and the object can be recognized with high accuracy.

３．まとめ
本実施形態に係る物体認識システム１は、距離センサ１００と物体認識装置２００とを含む。距離センサ１００は、物体までの距離を計測して、物体の少なくとも一部の外形に沿った三次元位置を含むセンシングデータを生成する。本実施形態に係る物体認識装置２００は、入出力インタフェース部２１と演算処理部２２とを備える。入出力インタフェース部２１は、センシングデータを入力する。演算処理部２２は、センシングデータに基づいて物体を認識する。具体的には、演算処理部２２は、センシングデータに基づいて、三次元位置によって表される立体を複数の方向から見た複数の二次元図面を生成し、複数の二次元図面に基づいて物体を認識する。3. 3. Summary The object recognition system 1 according to the present embodiment includes a distance sensor 100 and an object recognition device 200. The distance sensor 100 measures the distance to the object and generates sensing data including a three-dimensional position along the outer shape of at least a part of the object. The object recognition device 200 according to the present embodiment includes an input / output interface unit 21 and an arithmetic processing unit 22. The input / output interface unit 21 inputs sensing data. The arithmetic processing unit 22 recognizes the object based on the sensing data. Specifically, the arithmetic processing unit 22 generates a plurality of two-dimensional drawings of a solid represented by a three-dimensional position viewed from a plurality of directions based on the sensing data, and an object based on the plurality of two-dimensional drawings. Recognize.

物体認識に二次元図面を使用しているため、物体認識に使用されるデータ量が低減する。よって、画像処理の処理負荷が低減し、物体認識の処理速度が向上する。 Since the two-dimensional drawing is used for object recognition, the amount of data used for object recognition is reduced. Therefore, the processing load of image processing is reduced, and the processing speed of object recognition is improved.

演算処理部２２は、生成した複数の二次元図面に基づき畳み込みニューラルネットワーク６０による画像処理を実行して、物体を認識する。二次元の畳み込みニューラルネットワークは三次元の畳み込みニューラルネットワークと比較して、ネットワークの規模を小さくすることができる。例えば、層の数及びニューロンの数等を低減することができる。 The arithmetic processing unit 22 executes image processing by the convolutional neural network 60 based on the generated two-dimensional drawings to recognize the object. A two-dimensional convolutional neural network can reduce the scale of the network as compared with a three-dimensional convolutional neural network. For example, the number of layers, the number of neurons, and the like can be reduced.

本実施形態において、センシングデータは、基準点５から物体の外形までの距離を示す距離画像である。演算処理部２２は、距離画像に基づいて、立体を表すボクセルの集合体を生成する。具体的には、演算処理部２２は、複数のボクセルに分割可能な三次元空間４０を定義し、物体の外形に沿った三次元位置を三次元空間に対応付けて、複数のボクセルのうち物体が占有しているボクセルによって立体を表す。この立体は、占有領域４６のボクセルに相当する。 In the present embodiment, the sensing data is a distance image showing the distance from the reference point 5 to the outer shape of the object. The arithmetic processing unit 22 generates an aggregate of voxels representing a solid based on the distance image. Specifically, the arithmetic processing unit 22 defines a three-dimensional space 40 that can be divided into a plurality of voxels, associates a three-dimensional position along the outer shape of the object with the three-dimensional space, and objects among the plurality of voxels. Represents a solid by the voxels occupied by. This solid corresponds to a voxel in the occupied area 46.

本実施形態では、センシングデータは物体の一部の外形に沿った三次元位置を含み、複数の二次元図面は立体を直交３軸方向から見た三面図である。三面図は、各々の基準位置である平面４４ａ，４４ｂ，４４ｃから立体までの距離に応じた画素値を有する。演算処理部２２は、三面図を合成して合成図を生成し、合成図に基づいて物体を認識する。三面図は、三次元情報と略同程度の距離に関する情報量を有するため、三面図を利用した物体認識は、三次元ボクセルを利用した物体認識と同程度の精度が得られる。 In the present embodiment, the sensing data includes a three-dimensional position along the outer shape of a part of the object, and the plurality of two-dimensional drawings are three-view views of the solid viewed from three orthogonal axes. The three-view drawing has pixel values according to the distance from the planes 44a, 44b, 44c, which are the reference positions, to the solid. The arithmetic processing unit 22 synthesizes the three views to generate a composite drawing, and recognizes the object based on the composite drawing. Since the three-view drawing has about the same amount of information about the distance as the three-dimensional information, the object recognition using the three-view drawing can obtain the same degree of accuracy as the object recognition using the three-dimensional voxel.

（実施形態２）
実施形態１では、物体認識装置２００は、一つの距離センサ１００による投影面Ｒ１０内の測距に基づいて生成した三面図を使用して物体認識を行った。本実施形態では、六面図を使用して物体認識を行う。六面図は、複数の二次元情報の一例である。(Embodiment 2)
In the first embodiment, the object recognition device 200 recognizes an object by using a three-view drawing generated based on the distance measurement in the projection surface R10 by one distance sensor 100. In this embodiment, object recognition is performed using a hexagonal view. The hexagonal view is an example of a plurality of two-dimensional information.

図１３は、実施形態２に係る物体認識装置２００による物体認識処理の一例を示すフローチャートである。図１３のステップＳ３０２〜Ｓ３０４、Ｓ３０７、及びＳ３０８は、実施形態１の図４のステップＳ１０２〜Ｓ１０４、Ｓ１０７、及びＳ１０８と同一である。 FIG. 13 is a flowchart showing an example of the object recognition process by the object recognition device 200 according to the second embodiment. Steps S302 to S304, S307, and S308 of FIG. 13 are the same as steps S102 to S104, S107, and S108 of FIG. 4 of the first embodiment.

本実施形態では、領域検出部２２ａは複数のセンシングデータを、入出力インタフェース部２１を介して取得する（Ｓ３０１）。各センシングデータは物体の一部の外形に沿った三次元位置を含み、複数のセンシングデータは物体の外形全体に沿った三次元位置を含む。複数のセンシングデータは、例えば、複数の基準点５からの測距に基づいて生成された距離画像である。複数の基準点５は、例えば、対向する位置に設けられる。一例では、領域検出部２２ａは、異なる位置に配置された複数の距離センサからそれぞれセンシングデータを取得する。別の例では、領域検出部２２ａは、１つの距離センサが異なる位置で測距して生成した複数のセンシングデータを取得してもよい。 In the present embodiment, the area detection unit 22a acquires a plurality of sensing data via the input / output interface unit 21 (S301). Each sensing data includes a three-dimensional position along the outer shape of a part of the object, and a plurality of sensing data includes a three-dimensional position along the entire outer shape of the object. The plurality of sensing data are, for example, distance images generated based on distance measurement from a plurality of reference points 5. The plurality of reference points 5 are provided at positions facing each other, for example. In one example, the area detection unit 22a acquires sensing data from a plurality of distance sensors arranged at different positions. In another example, the area detection unit 22a may acquire a plurality of sensing data generated by one distance sensor measuring the distance at different positions.

領域検出部２２ａは、各距離画像においてそれぞれ、物体が存在する領域を認識対象領域として検出する（Ｓ３０２）。占有グリッド生成部２２ｂは、三次元空間４０を定義する（Ｓ３０３）。占有グリッド生成部２２ｂは、各認識対象領域のローカル座標をワールド座標に変換して１つの占有グリッド４５を示す占有グリッドデータを生成する（Ｓ３０４）。図１４は、実施形態２における占有グリッド４５を説明するための図である。占有グリッド４５は、実施形態１と同様に、三次元空間４０内において占有領域４６と非占有領域４７とが区別された、ボクセル４１の集合体である。 The region detection unit 22a detects a region in which an object exists as a recognition target region in each distance image (S302). The occupied grid generation unit 22b defines a three-dimensional space 40 (S303). The occupied grid generation unit 22b converts the local coordinates of each recognition target area into world coordinates to generate occupied grid data indicating one occupied grid 45 (S304). FIG. 14 is a diagram for explaining the occupied grid 45 in the second embodiment. The occupied grid 45 is an aggregate of voxels 41 in which the occupied area 46 and the unoccupied area 47 are distinguished in the three-dimensional space 40, as in the first embodiment.

２Ｄ図面生成部２２ｃは、占有領域４６内のボクセル４１で構成される立体の六面図を生成する（Ｓ３０５）。図１５は、図１４の矢印ａ〜ｆの方向からそれぞれ占有領域４６内のボクセル４１を見た六面図を示している。具体的には、図１５の（ａ）は、図１４の矢印ａの方向から見た正面図である。図１５の（ｂ）は図１４の矢印ｂの方向から見た背面図である。図１５の（ｃ）は図１４の矢印ｃの方向から見た左側面図である。図１５の（ｄ）は図１４の矢印ｄの方向から見た右側面図である。図１５の（ｅ）は図１４の矢印ｅの方向から見た平面図（上面図）である。図１５の（ｆ）は図１４の矢印ｆの方向から見た底面図である。図１５の（ａ）〜（ｆ）に示す六面図は、三次元空間４０を示す立方体の各平面から、占有領域４６の各ボクセルまでの距離に応じた画素値を有する。図１５（ａ）〜（ｆ）の例では、距離が近いほど淡い色になり（例えば、画素値が２５５に近づき）、距離が遠いほど濃い色になる（例えば、画素値が０に近づく）ようにしている。しかし、距離が近いほど濃い色になり、距離が遠いほど淡い色になるようにしてもよい。図１５（ａ）〜（ｆ）において、物体が存在しない画素については白色で表しているが、黒色で表してもよい。 The 2D drawing generation unit 22c generates a six-view drawing of a solid composed of voxels 41 in the occupied area 46 (S305). FIG. 15 shows a six-view view of the voxels 41 in the occupied area 46 from the directions of arrows a to f in FIG. Specifically, FIG. 15A is a front view seen from the direction of arrow a in FIG. FIG. 15B is a rear view seen from the direction of arrow b in FIG. FIG. 15 (c) is a left side view seen from the direction of arrow c in FIG. FIG. 15D is a right side view seen from the direction of arrow d in FIG. FIG. 15 (e) is a plan view (top view) seen from the direction of arrow e in FIG. FIG. 15 (f) is a bottom view seen from the direction of arrow f in FIG. The six views shown in FIGS. 15A to 15F have pixel values corresponding to the distances from each plane of the cube showing the three-dimensional space 40 to each voxel in the occupied area 46. In the examples of FIGS. 15A to 15F, the shorter the distance, the lighter the color (for example, the pixel value approaches 255), and the farther the distance, the darker the color (for example, the pixel value approaches 0). I am trying to do it. However, the closer the distance, the darker the color, and the farther the distance, the lighter the color. In FIGS. 15A to 15F, the pixels in which no object exists are shown in white, but may be shown in black.

２Ｄ図面生成部２２ｃは、六面図を合成して一つの合成図を生成する（Ｓ３０６）。物体認識部２２ｄは、合成図を学習済みの畳み込みニューラルネットワーク６０に入力して、物体の種別を認識する（Ｓ３０７）。物体認識部２２ｄは、認識結果を出力する（Ｓ３０８）。 The 2D drawing generation unit 22c synthesizes the six views to generate one composite drawing (S306). The object recognition unit 22d inputs the composite diagram into the trained convolutional neural network 60 and recognizes the type of the object (S307). The object recognition unit 22d outputs the recognition result (S308).

以上のように、本実施形態では、複数のセンシングデータは物体の外形全体に沿った三次元位置を含む。物体認識装置２００は、占有領域４６内のボクセルで構成される立体を直交３軸の正負方向からそれぞれ見た六面図を生成する。物体認識装置２００は、六面図を使用して物体を認識する。これにより、物体の外形全体に基づく物体認識を行うことができる。よって、精度良く物体を認識することができる。 As described above, in the present embodiment, the plurality of sensing data include the three-dimensional position along the entire outer shape of the object. The object recognition device 200 generates a hexagonal view of a solid composed of voxels in the occupied area 46 as viewed from the positive and negative directions of three orthogonal axes. The object recognition device 200 recognizes an object using a hexagonal view. As a result, object recognition can be performed based on the entire outer shape of the object. Therefore, the object can be recognized with high accuracy.

（他の実施形態）
上記実施形態では、センシングデータが距離画像であって、物体認識装置２００は、距離画像に基づいて占有グリッド４５を生成する例について説明した。しかし、センシングデータは、距離画像に限らない。センシングデータは、物体の少なくとも一部の外形に沿った三次元位置を含めばよい。例えば、センシングデータは、距離センサ１００が計測した物体の外形までの距離を示す三次元点群情報であってもよい。このような三次元点群情報は、例えば、ｘ座標、ｙ座標、及びｚ座標を含む。(Other embodiments)
In the above embodiment, an example in which the sensing data is a distance image and the object recognition device 200 generates the occupied grid 45 based on the distance image has been described. However, the sensing data is not limited to the distance image. The sensing data may include a three-dimensional position along the outer shape of at least a part of the object. For example, the sensing data may be three-dimensional point cloud information indicating the distance to the outer shape of the object measured by the distance sensor 100. Such three-dimensional point cloud information includes, for example, x-coordinate, y-coordinate, and z-coordinate.

上記実施形態では、２Ｄ図面生成部２２ｃは、占有領域４６内のボクセル４１で構成される立体の三面図又は六面図を生成したが、生成する二次元図面の数は２つ以上であればよい。 In the above embodiment, the 2D drawing generation unit 22c generates a three-dimensional view or a six-view view composed of the voxels 41 in the occupied area 46, but if the number of two-dimensional drawings to be generated is two or more. Good.

上記実施形態では、物体認識装置２００の演算処理部２２が、図１１に示す学習処理を行う例について説明したが、畳み込みニューラルネットワーク６０の学習処理は、物体認識装置２００とは別の装置が行ってもよい。例えば、コンピュータクラスタ或いはクラウドコンピューティングなどによって、畳み込みニューラルネットワーク６０が構築されてもよい。 In the above embodiment, an example in which the arithmetic processing unit 22 of the object recognition device 200 performs the learning process shown in FIG. 11 has been described, but the learning process of the convolutional neural network 60 is performed by a device different from the object recognition device 200. You may. For example, the convolutional neural network 60 may be constructed by computer cluster or cloud computing.

上記実施形態では、距離センサ１００と物体認識装置２００が車両３に搭載される例について説明したが、車両３に限らず、自走ロボット又はＡＧＶ（Automated Guided Vehicle）等に搭載されてもよい。また、物体認識装置２００は、車両３等に搭載されなくてもよい。本開示に係る物体認識装置２００は、種々の情報処理装置であってもよい。例えば、物体認識装置２００は、ＡＳＰサーバなどの一つ又は複数のサーバ装置であってもよい。例えば、物体認識装置２００は、通信ネットワークを介して、距離センサ１００からセンシングデータを取得して、畳み込みニューラルネットワーク６０による画像処理を実行してもよい。また、物体認識装置２００は、物体の認識結果を示す情報を、通信ネットワークを介して、車両駆動装置２に送信してもよい。 In the above embodiment, the example in which the distance sensor 100 and the object recognition device 200 are mounted on the vehicle 3 has been described, but the distance sensor 100 and the object recognition device 200 may be mounted on a self-propelled robot, an AGV (Automated Guided Vehicle), or the like, not limited to the vehicle 3. Further, the object recognition device 200 does not have to be mounted on the vehicle 3 or the like. The object recognition device 200 according to the present disclosure may be various information processing devices. For example, the object recognition device 200 may be one or more server devices such as an ASP server. For example, the object recognition device 200 may acquire sensing data from the distance sensor 100 via a communication network and execute image processing by the convolutional neural network 60. Further, the object recognition device 200 may transmit information indicating the recognition result of the object to the vehicle driving device 2 via the communication network.

上記実施形態では、距離センサ１００と物体認識装置２００は別個の機器であったが、距離センサ１００と物体認識装置２００は一つの機器であってもよい。例えば、物体認識装置２００は距離センサ１００の内部に設けられて、距離センサ１００が物体認識装置２００と同一の機能を備えてもよい。 In the above embodiment, the distance sensor 100 and the object recognition device 200 are separate devices, but the distance sensor 100 and the object recognition device 200 may be one device. For example, the object recognition device 200 may be provided inside the distance sensor 100, and the distance sensor 100 may have the same function as the object recognition device 200.

上記実施形態では、物体認識部２２ｄが物体の種別を認識する例について説明したが、認識の動作は、物体の種別を識別することに限らない。認識とは、対象物の特徴量を抽出することを含む。例えば、対象物が車の場合、物体認識部２２ｄが行う認識は、車の特徴量として「直方体」及び「車輪」を抽出することを含む。 In the above embodiment, an example in which the object recognition unit 22d recognizes the type of the object has been described, but the recognition operation is not limited to identifying the type of the object. Recognition includes extracting features of an object. For example, when the object is a car, the recognition performed by the object recognition unit 22d includes extracting "rectangular parallelepiped" and "wheel" as feature quantities of the car.

（付記）
以上のように、本開示の各種実施形態について説明したが、本開示は上記の内容に限定されるものではなく、技術的思想が実質的に同一の範囲内で種々の変更を行うことができる。以下、本開示に係る各種態様を付記する。(Additional note)
As described above, the various embodiments of the present disclosure have been described, but the present disclosure is not limited to the above contents, and various changes can be made within the scope of substantially the same technical idea. .. Hereinafter, various aspects relating to the present disclosure will be added.

本開示に係る第１の態様の物体認識装置は、物体の少なくとも一部の外形に沿った三次元位置を含む三次元情報を入力する入力部（２１）と、前記三次元情報に基づいて前記物体を認識する演算処理部（２２）と、を備え、前記演算処理部（２２）は、前記三次元情報に基づいて、前記三次元位置によって表される立体を複数の方向から見た二次元図面を示す複数の二次元情報を生成し、前記複数の二次元情報に基づいて前記物体を認識する。 The object recognition device of the first aspect according to the present disclosure includes an input unit (21) for inputting three-dimensional information including a three-dimensional position along the outer shape of at least a part of the object, and the object recognition device based on the three-dimensional information. The arithmetic processing unit (22) includes an arithmetic processing unit (22) for recognizing an object, and the arithmetic processing unit (22) has a two-dimensional view of a solid represented by the three-dimensional position from a plurality of directions based on the three-dimensional information. A plurality of two-dimensional information showing a drawing is generated, and the object is recognized based on the plurality of two-dimensional information.

第２の態様では、第１の態様の物体認識装置において、前記演算処理部は、生成した前記複数の二次元情報に基づき畳み込みニューラルネットワークによる画像処理を実行して、前記物体を認識する。 In the second aspect, in the object recognition device of the first aspect, the arithmetic processing unit executes image processing by the convolutional neural network based on the generated two-dimensional information to recognize the object.

第３の態様では、第１の態様又は第２の態様の物体認識装置において、前記三次元情報は、基準点から前記物体の少なくとも一部の外形までの距離を示す距離画像である。 In the third aspect, in the object recognition device of the first aspect or the second aspect, the three-dimensional information is a distance image showing a distance from a reference point to the outer shape of at least a part of the object.

第４の態様では、第１の態様から第３の態様のいずれかの物体認識装置において、前記演算処理部は、前記三次元情報に基づいて、前記立体を表すボクセルの集合体を生成する。 In the fourth aspect, in the object recognition device according to any one of the first to third aspects, the arithmetic processing unit generates an aggregate of voxels representing the solid based on the three-dimensional information.

第５の態様では、第４の態様の物体認識装置において、前記演算処理部は、複数のボクセルに分割可能な三次元空間を定義し、前記三次元位置を前記三次元空間に対応付けて、前記複数のボクセルのうち前記物体が占有しているボクセルによって前記立体を表す。 In the fifth aspect, in the object recognition device of the fourth aspect, the arithmetic processing unit defines a three-dimensional space that can be divided into a plurality of voxels, and associates the three-dimensional position with the three-dimensional space. The solid is represented by the voxels occupied by the object among the plurality of voxels.

第６の態様では、第１の態様から第５の態様のいずれかの物体認識装置において、前記複数の二次元情報は、各々の基準位置から前記立体までの距離に応じた画素値を有する。 In the sixth aspect, in the object recognition device according to any one of the first to fifth aspects, the plurality of two-dimensional information has pixel values corresponding to the distance from each reference position to the solid.

第７の態様では、第１の態様から第６の態様のいずれかの物体認識装置において、前記三次元情報は、前記物体の一部の外形に沿った三次元位置を含み、前記複数の二次元情報は、前記立体を直交３軸方向の各々から見た三面図である。 In the seventh aspect, in the object recognition device according to any one of the first to sixth aspects, the three-dimensional information includes a three-dimensional position along the outer shape of a part of the object, and the plurality of two. The dimensional information is a three-dimensional view of the solid viewed from each of the three orthogonal axial directions.

第８の態様では、第１の態様から第６の態様のいずれかの物体認識装置において、前記三次元情報は、前記物体の外形全体に沿った三次元位置を含み、前記複数の二次元情報は、前記立体を直交３軸の正負方向の各々から見た六面図である。 In the eighth aspect, in the object recognition device according to any one of the first to sixth aspects, the three-dimensional information includes a three-dimensional position along the entire outer shape of the object, and the plurality of two-dimensional information. Is a hexagonal view of the solid viewed from each of the positive and negative directions of the three orthogonal axes.

第９の態様では、第１の態様から第８の態様のいずれかの物体認識装置において、前記演算処理部は、前記複数の二次元情報を合成して合成図を生成し、前記合成図に基づいて前記物体を認識する。 In the ninth aspect, in the object recognition device according to any one of the first to eighth aspects, the arithmetic processing unit synthesizes the plurality of two-dimensional information to generate a composite diagram, and the composite diagram is formed. Recognize the object based on.

本開示に係る物体認識システムは、物体までの距離を計測して前記三次元情報を生成するセンサと、第１の態様から第９の態様のいずれか一つに記載の前記物体認識装置と、を含む。 The object recognition system according to the present disclosure includes a sensor that measures a distance to an object and generates the three-dimensional information, and the object recognition device according to any one of the first to ninth aspects. including.

本開示に係るプログラムは、物体の少なくとも一部の外形に沿った三次元位置を含む三次元情報を入力するステップと、前記三次元情報に基づいて、前記三次元位置によって表される立体を複数の方向から見た二次元図面を示す複数の二次元情報を生成するステップと、前記複数の二次元情報に基づいて前記物体を認識するステップと、をコンピュータに実行させる。 The program according to the present disclosure includes a step of inputting three-dimensional information including a three-dimensional position along the outer shape of at least a part of an object, and a plurality of solids represented by the three-dimensional position based on the three-dimensional information. A computer is made to execute a step of generating a plurality of two-dimensional information showing a two-dimensional drawing viewed from the above-mentioned direction and a step of recognizing the object based on the plurality of two-dimensional information.

本開示の物体認識装置及び物体認識システムは、例えば、自動運転車、自走ロボット、及びＡＧＶなどに適用可能である。 The object recognition device and the object recognition system of the present disclosure can be applied to, for example, an automatic guided vehicle, a self-propelled robot, an AGV, and the like.

１物体認識システム
２車両駆動装置
３車両
１１投光部
１２受光部
１３走査部
１４センサ制御部
１５、２１入出力インタフェース部
２２演算処理部
２２ａ領域検出部
２２ｂ占有グリッド生成部
２２ｃ２Ｄ図面生成部
２２ｄ物体認識部
２３記憶部
１００距離センサ
２００物体認識装置1 Object recognition system 2 Vehicle drive device 3 Vehicle 11 Floodlight unit 12 Light receiving unit 13 Scanning unit 14 Sensor control unit 15, 21 Input / output interface unit 22 Arithmetic processing unit 22a Area detection unit 22b Occupied grid generation unit 22c 2D drawing generation unit 22d Object recognition unit 23 Storage unit 100 Distance sensor 200 Object recognition device

Claims

An input unit for inputting 3D information including a 3D position along the outer shape of at least a part of an object.
An arithmetic processing unit that recognizes the object based on the three-dimensional information,
With
The arithmetic processing unit
Based on the three-dimensional information, a plurality of two-dimensional information showing a two-dimensional drawing of the solid represented by the three-dimensional position viewed from a plurality of directions is generated.
Recognizing the object based on the plurality of two-dimensional information.
Object recognition device.

The arithmetic processing unit executes image processing by a convolutional neural network based on the generated two-dimensional information, and recognizes the object.
The object recognition device according to claim 1.

The three-dimensional information is a distance image showing a distance from a reference point to the outer shape of at least a part of the object.
The object recognition device according to claim 1 or 2.

The arithmetic processing unit generates an aggregate of voxels representing the solid based on the three-dimensional information.
The object recognition device according to any one of claims 1 to 3.

The arithmetic processing unit defines a three-dimensional space that can be divided into a plurality of voxels, associates the three-dimensional position with the three-dimensional space, and uses the voxels occupied by the object among the plurality of voxels. Represents a solid
The object recognition device according to claim 4.

The plurality of two-dimensional information has pixel values according to the distance from each reference position to the solid.
The object recognition device according to any one of claims 1 to 5.

The three-dimensional information includes a three-dimensional position along the outer shape of a part of the object.
The plurality of two-dimensional information is a three-view view of the solid viewed from each of the three orthogonal axes.
The object recognition device according to any one of claims 1 to 6.

The three-dimensional information includes a three-dimensional position along the entire outer shape of the object.
The plurality of two-dimensional information is a six-view view of the solid viewed from each of the positive and negative directions of the three orthogonal axes.
The object recognition device according to any one of claims 1 to 6.

The arithmetic processing unit synthesizes the plurality of two-dimensional information to generate a composite diagram, and recognizes the object based on the composite diagram.
The object recognition device according to any one of claims 1 to 8.

A sensor that measures the distance to an object and generates the three-dimensional information,
The object recognition device according to any one of claims 1 to 9.
Object recognition system, including.

The step of inputting 3D information including the 3D position along the outer shape of at least a part of the object,
Based on the three-dimensional information, a step of generating a plurality of two-dimensional information when the solid represented by the three-dimensional position is viewed from a plurality of directions, and
A step of recognizing the object based on the plurality of two-dimensional information,
A program that causes a computer to run.