JP2018173707A

JP2018173707A - Person estimation system and estimation program

Info

Publication number: JP2018173707A
Application number: JP2017069865A
Authority: JP
Inventors: ドラジェンブルシュチッチ; Drazen Brscic; 神田　崇行; Takayuki Kanda; 崇行神田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2017-03-31
Filing date: 2017-03-31
Publication date: 2018-11-08
Anticipated expiration: 2037-03-31
Also published as: JP6919882B2

Abstract

PROBLEM TO BE SOLVED: To provide a person estimation system which can suppress wrong detection as much as possible.SOLUTION: An estimation program includes steps of: extracting three dimensional distance image data of a moving object by comparing the three dimensional distance image data from a three dimensional laser distance meter mounted on a movable robot with three dimensional environmental map data, after identifying the current location of the movable robot (S1); generating a three dimensional point group of the moving object by clustering the three dimensional distance image data of the moving object (S3); generating two dimensional distance image data of the moving object by projecting the three dimensional point group (S5); and convoluting the two dimensional distance image data to supply the data to a neural network (S7).SELECTED DRAWING: Figure 8

Description

この発明は、人推定システムおよび推定プログラムに関し、特に、たとえば移動ロボットに搭載した３次元レーザ距離計で獲得した３次元スキャンデータを用いて人の位置などを推定する、人推定システムおよび推定プログラム The present invention relates to a human estimation system and an estimation program, and in particular, a human estimation system and an estimation program for estimating the position of a person using, for example, three-dimensional scan data acquired by a three-dimensional laser rangefinder mounted on a mobile robot.

ロボットやその他の移動機械が人との共存環境で活動するためには、自分の位置、そして周りの人の位置とその特徴を知ることが重要である。ロボットのセンサとして一般のカメラを使用する場合、プライバシーが問われることがあるが、距離だけを計っている３次元測域センサ（３次元レーザ距離計）は、そういった心配がないので、人との共存にふさわしいセンサである。 In order for robots and other mobile machines to operate in a coexisting environment with people, it is important to know their own position, the positions of surrounding people and their characteristics. When using a general camera as a robot sensor, privacy may be required, but the 3D range sensor (3D laser rangefinder) that measures only the distance does not have such a concern. This sensor is suitable for coexistence.

特許文献１には、３次元レーザ距離計を用いて人を検出したり、その位置を推定したりする手法が開示されている。 Patent Document 1 discloses a technique for detecting a person using a three-dimensional laser distance meter and estimating the position thereof.

特許第５９５３４８４号 [G01S 17/88]Patent No. 595484 [G01S 17/88]

特許文献１の背景技術では、移動している物体はすべて人であるという前提に立つものであり、それに伴う誤検出を避けられない。 In the background art of Patent Document 1, it is based on the premise that all moving objects are people, and the erroneous detection associated therewith cannot be avoided.

それゆえに、この発明の主たる目的は、新規な、人推定システムおよび推定プログラムを提供することである。 Therefore, a main object of the present invention is to provide a novel human estimation system and estimation program.

この発明の他の目的は、誤検出を可及的抑制できる、人推定システムおよび推定プログラムを提供することである。 Another object of the present invention is to provide a human estimation system and an estimation program that can suppress erroneous detection as much as possible.

この発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、本発明の理解を助けるために後述する実施の形態との対応関係を示したものであって、本発明を何ら限定するものではない。 The present invention employs the following configuration in order to solve the above problems. The reference numerals in parentheses, supplementary explanations, and the like indicate correspondence relationships with embodiments described later to help understanding of the present invention, and do not limit the present invention in any way.

第１の発明は、移動するロボット、ロボットに搭載された３次元レーザ距離計、３次元レーザ距離計からの３次元距離画像データと３次元環境地図データとを比較して移動物体の３次元距離画像データを抽出する抽出部、移動物体の３次元距離画像データをクラスタリングして移動物体の３次元点群を生成するクラスタリング部、３次元点群を投影することによって移動物体の２次元距離画像データを生成する投影部、および２次元距離画像データを入力とするディープニューラルネットワークを備え、ディープニューラルネットワークが人と物とを識別する、人推定システムである。 The first invention compares the three-dimensional distance image data from the moving robot, the three-dimensional laser rangefinder mounted on the robot, the three-dimensional laser rangefinder, and the three-dimensional environment map data to determine the three-dimensional distance of the moving object. Extraction unit for extracting image data, clustering unit for generating 3D point cloud of moving object by clustering 3D distance image data of moving object, and 2D distance image data of moving object by projecting 3D point cloud Is a human estimation system that includes a projecting unit that generates and a deep neural network that receives two-dimensional distance image data as input, and the deep neural network identifies a person and an object.

第１の発明では、人推定システムは、移動するロボット（１０：実施例において相当する部分を例示する参照符号。以下同じ。）を含み、このロボットには３次元レーザ距離計（２６）が搭載される。抽出部（６０，９４、Ｓ１）は、３次元レーザ距離計からの３次元距離画像データと３次元環境地図データとを比較して移動物体の３次元距離画像データを抽出する。クラスタリング部（６０，９４、Ｓ３）では、この抽出された移動物体の３次元距離画像データをクラスタリングして、移動物体の３次元点群を生成する。投影部（６０，９４、Ｓ５）は、３次元点群を投影することによって移動物体の２次元距離画像データを生成し、その２次元距離画像データがディープニューラルネットワークに投入される（Ｓ７）。ディープニューラルネットワークでは、２次元距離画像データに基づいて、人と物とを識別する。さらに、ディープニューラルネットワークでは、人の特徴（たとえば、性別、向きなど）を識別することもできる。 In the first invention, the human estimation system includes a moving robot (10: reference numerals exemplifying corresponding parts in the embodiment; the same applies hereinafter), and the robot is equipped with a three-dimensional laser distance meter (26). Is done. The extraction unit (60, 94, S1) compares the three-dimensional distance image data from the three-dimensional laser rangefinder and the three-dimensional environment map data to extract the three-dimensional distance image data of the moving object. In the clustering unit (60, 94, S3), the extracted three-dimensional distance image data of the moving object is clustered to generate a three-dimensional point group of the moving object. The projection unit (60, 94, S5) generates two-dimensional distance image data of the moving object by projecting the three-dimensional point group, and the two-dimensional distance image data is input to the deep neural network (S7). In a deep neural network, a person and an object are identified based on two-dimensional distance image data. Furthermore, a deep neural network can also identify human characteristics (eg, gender, orientation, etc.).

第１の発明によれば、３次元環境地図にのっていない物体の３次元距離画像を抽出することにより、環境（固定物）の一部を移動物体として誤認識することが可及的抑制される。しかも、ディープラーニング等の手法により、３次元の移動物体の形から人か物か検出する認識プログラムが実現可能となり、物を無視して、人だけの検出を行うことができる。 According to the first invention, by extracting a three-dimensional distance image of an object that is not on the three-dimensional environment map, it is possible to suppress erroneous recognition of a part of the environment (fixed object) as a moving object. Is done. In addition, a recognition program for detecting whether a person or an object from the shape of a three-dimensional moving object can be realized by a technique such as deep learning, so that only a person can be detected ignoring the object.

第２の発明は、第１の発明に従属し、ディープニューラルネットワークは畳み込みニューラルネットワークである、人推定システムである。 A second invention is a human estimation system according to the first invention, wherein the deep neural network is a convolutional neural network.

第２の発明によれば、ディープニューラルネットワーク（ＤＮＮ）として畳み込みニューラルネットワークを用いたので、人推定が容易に行える。 According to the second invention, since the convolutional neural network is used as the deep neural network (DNN), human estimation can be easily performed.

第３の発明は、第１または第２の発明に従属し、投影部は、３次元レーザ距離計からの線に垂直な投影面を設定し、各ピクセルに入ったすべての点の投影面からの平均距離に基づいて各ピクセルのピクセル値を決める、人推定システムである。 A third invention is dependent on the first or second invention, and the projection unit sets a projection plane perpendicular to a line from the three-dimensional laser rangefinder, and from the projection plane of all points entering each pixel. It is a human estimation system that determines the pixel value of each pixel based on the average distance of.

第３の発明によれば、クラスタリングした３次元点群を効率的に２次元距離画像として投影することができる。 According to the third invention, the clustered three-dimensional point group can be efficiently projected as a two-dimensional distance image.

第４の発明は、３次元レーザ距離計を搭載した移動可能ロボットを含む人推定システムのコンピュータによって実行される推定プログラムであって、推定プログラムは、コンピュータを、３次元レーザ距離計からの３次元距離画像データと３次元環境地図データとを比較して移動物体の３次元距離画像データを抽出する抽出部、移動物体の３次元距離画像データをクラスタリングして移動物体の３次元点群を生成するクラスタリング部、３次元点群を投影することによって移動物体の２次元距離画像データを生成する投影部、および２次元距離画像データを入力とするディープニューラルネットワークとして機能させ、ディープニューラルネットワークが人と物とを識別する、推定プログラムである。 A fourth invention is an estimation program executed by a computer of a human estimation system including a movable robot equipped with a three-dimensional laser rangefinder, the estimation program comprising: An extraction unit that compares the distance image data with the 3D environment map data and extracts the 3D distance image data of the moving object, and generates a 3D point cloud of the moving object by clustering the 3D distance image data of the moving object. The clustering unit functions as a projection unit that generates two-dimensional distance image data of a moving object by projecting a three-dimensional point cloud, and a deep neural network that receives the two-dimensional distance image data as input. It is an estimation program that identifies

第５の発明は、３次元レーザ距離計を搭載した移動可能ロボットを含む人推定システムのコンピュータによって実行される推定方法であって、３次元レーザ距離計からの３次元距離画像データと３次元環境地図データとを比較して移動物体の３次元距離画像データを抽出する抽出ステップ、移動物体の３次元距離画像データをクラスタリングして移動物体の３次元点群を生成するクラスタリングステップ、３次元点群を投影することによって移動物体の２次元距離画像データを生成する投影ステップ、および２次元距離画像データをディープニューラルネットワークに投入するステップを含む、推定方法である。 A fifth invention is an estimation method executed by a computer of a human estimation system including a movable robot equipped with a three-dimensional laser distance meter, and includes three-dimensional distance image data from the three-dimensional laser distance meter and a three-dimensional environment. Extraction step for extracting 3D distance image data of moving object by comparing with map data, Clustering step for generating 3D point cloud of moving object by clustering 3D distance image data of moving object, 3D point cloud Is a projection method for generating two-dimensional distance image data of a moving object by projecting and a step of inputting the two-dimensional distance image data into a deep neural network.

この発明によれば、環境の一部を移動物体であるという誤認を防止できるので、誤検出を可及的抑制することができる。 According to the present invention, since it is possible to prevent misidentification that a part of the environment is a moving object, it is possible to suppress erroneous detection as much as possible.

この発明の上述の目的、その他の目的、特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features, and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１はこの発明の一実施例である人推定システムに用いられるロボットの一例を示す図解図である。FIG. 1 is an illustrative view showing one example of a robot used in a human estimation system according to an embodiment of the present invention. 図２は図１実施例のロボットの電気的構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the electrical configuration of the robot of FIG. 1 embodiment. 図３は図１実施例のロボットの移動空間の一例を示す図解図である。FIG. 3 is an illustrative view showing one example of a movement space of the robot of FIG. 1 embodiment. 図４は図１実施例のロボットの遠隔操作装置の一例を示すブロック図である。FIG. 4 is a block diagram showing an example of the robot remote control apparatus of FIG. 1 embodiment. 図５は図２実施例におけるロボットのメモリのメモリマップの一例を示す図解図である。FIG. 5 is an illustrative view showing one example of a memory map of a memory of the robot in the embodiment of FIG. 図６は図１実施例のロボットの移動空間の一部を示す図解図である。FIG. 6 is an illustrative view showing a part of the movement space of the robot of FIG. 1 embodiment. 図７はロボットに搭載した全方位レーザ距離計の実際のスキャンデータの一例を示す図解図である。FIG. 7 is an illustrative view showing one example of actual scan data of the omnidirectional laser rangefinder mounted on the robot. 図８は図１実施例のロボットを用いる推定システムの動作の一例を示すフロー図である。FIG. 8 is a flowchart showing an example of the operation of the estimation system using the robot of FIG. 1 embodiment. 図９は図８実施例において地図にない部分の３次元距離画像データをクラスタリングした結果の移動物体の３次元点群の一例を示す図解図である。FIG. 9 is an illustrative view showing one example of a three-dimensional point group of a moving object as a result of clustering three-dimensional distance image data of a portion not on the map in the embodiment of FIG. 図１０は図８実施例において図９に示す移動物体の３次元点群を投影した２次元距離画像の一例を示す図解図である。10 is an illustrative view showing one example of a two-dimensional distance image obtained by projecting the three-dimensional point group of the moving object shown in FIG. 9 in the embodiment of FIG. 図１１は図９に示す移動物体の３次元点群を投影する手法の一例を示す図解図である。FIG. 11 is an illustrative view showing one example of a method for projecting the three-dimensional point group of the moving object shown in FIG.

図１はこの発明の一実施例の推定システムに用いるロボット１０の一例を示す。推定システムは、この実施例では、ロボット１０に搭載した３次元測域センサ（３次元全方位レーザ距離計）を用いて、ロボット１０の自己位置や人の位置、向き、または属性の同時推定を可能とするものである。 FIG. 1 shows an example of a robot 10 used in an estimation system according to an embodiment of the present invention. In this embodiment, the estimation system uses the three-dimensional range sensor (three-dimensional omnidirectional laser rangefinder) mounted on the robot 10 to simultaneously estimate the self-position of the robot 10 and the position, orientation, or attribute of the person. It is possible.

ロボット１０は台車１２を含み、台車１２の下面にはロボット１０を移動させる２つの車輪１４および１つの従輪１６が設けられる。２つの車輪１４は車輪モータ８２（図２参照）によってそれぞれ独立に駆動され、台車１２すなわちロボット１０を前後左右の任意方向に動かすことができる。 The robot 10 includes a carriage 12, and two wheels 14 and one slave wheel 16 for moving the robot 10 are provided on the lower surface of the carriage 12. The two wheels 14 are independently driven by a wheel motor 82 (see FIG. 2), and the carriage 12, that is, the robot 10 can be moved in any direction, front, back, left, and right.

台車１２の上には、円柱形のセンサ取り付けパネル１８が設けられ、このセンサ取り付けパネル１８には、多数の距離センサ２０が取り付けられる。これらの距離センサ２０は、たとえば赤外線や超音波などを用いてロボット１０の周囲の物体（人や障害物など）との距離を測定するものである。 A cylindrical sensor mounting panel 18 is provided on the carriage 12, and a number of distance sensors 20 are mounted on the sensor mounting panel 18. These distance sensors 20 measure distances from objects (people, obstacles, etc.) around the robot 10 using, for example, infrared rays or ultrasonic waves.

センサ取り付けパネル１８の上には、胴体２２が直立して設けられる。また、胴体２２の前方中央上部（人の胸に相当する位置）には、上述した距離センサ２０がさらに設けられ、ロボット１０の前方の、主として人との距離を計測する。また、胴体２２には、その側面側上端部のほぼ中央から伸びる支柱２４が設けられ、支柱２４の上には、全方位レーザ距離計２６が設けられる。全方位レーザ距離計２６は、たとえば、水平を基準として上下４０°（＋３０°‐−１０°）の検知角度（垂直視野角）を有する３次元レーザ距離計である。この３次元全方位レーザ距離計（以下、単に「レーザ距離計」ということがある。）２６は、たとえば０．１秒に１回転して、およそ１００ｍまでの距離を計測する。実験では、レーザ距離計２６として、Ｖｅｌｏｄｉｎｅ社製のイメージングユニットＬｉＤＡＲ（ＨＤＬ‐３２ｅ）（商品名）を用いた。 A body 22 is provided upright on the sensor mounting panel 18. Further, the above-described distance sensor 20 is further provided at the upper front upper portion of the body 22 (a position corresponding to a person's chest), and measures the distance mainly to the person in front of the robot 10. Further, the body 22 is provided with a support column 24 extending from substantially the center of the upper end of the side surface, and an omnidirectional laser rangefinder 26 is provided on the support 24. The omnidirectional laser rangefinder 26 is, for example, a three-dimensional laser rangefinder having a detection angle (vertical viewing angle) of 40 ° above and below (+ 30 ° −−10 °) with respect to the horizontal. The three-dimensional omnidirectional laser rangefinder (hereinafter, simply referred to as “laser rangefinder”) 26 measures a distance of up to about 100 m, for example, by making one revolution per 0.1 second. In the experiment, an imaging unit LiDAR (HDL-32e) (trade name) manufactured by Velodine was used as the laser distance meter 26.

なお、図示していないが、支柱２４にはさらに全方位カメラが設けられてもよい。全方位カメラは、ロボット１０の周囲を撮影するものであり、後述する眼カメラ５０とは区別される。この３全方位カメラとしては、たとえばＣＣＤやＣＭＯＳのような固体撮像素子を用いるカメラを採用することができる。 Although not shown, the support column 24 may be further provided with an omnidirectional camera. The omnidirectional camera captures the surroundings of the robot 10 and is distinguished from an eye camera 50 described later. As the three omnidirectional camera, for example, a camera using a solid-state imaging device such as a CCD or a CMOS can be adopted.

胴体２２の両側面上端部（人の肩に相当する位置）には、それぞれ、肩関節２８Ｒおよび肩関節２８Ｌによって、上腕３０Ｒおよび上腕３０Ｌが設けられる。図示は省略するが、肩関節２８Ｒおよび肩関節２８Ｌは、それぞれ、直交する３軸の自由度を有する。すなわち、肩関節２８Ｒは、直交する３軸のそれぞれの軸廻りにおいて上腕３０Ｒの角度を制御できる。肩関節２８Ｒの或る軸（ヨー軸）は、上腕３０Ｒの長手方向（または軸）に平行な軸であり、他の２軸（ピッチ軸およびロール軸）は、その軸にそれぞれ異なる方向から直交する軸である。同様にして、肩関節２８Ｌは、直交する３軸のそれぞれの軸廻りにおいて上腕３０Ｌの角度を制御できる。肩関節２８Ｌの或る軸（ヨー軸）は、上腕３０Ｌの長手方向（または軸）に平行な軸であり、他の２軸（ピッチ軸およびロール軸）は、その軸にそれぞれ異なる方向から直交する軸である。 An upper arm 30R and an upper arm 30L are provided at upper end portions on both sides of the body 22 (a position corresponding to a human shoulder) by a shoulder joint 28R and a shoulder joint 28L, respectively. Although illustration is omitted, each of the shoulder joint 28R and the shoulder joint 28L has three orthogonal degrees of freedom. That is, the shoulder joint 28R can control the angle of the upper arm 30R around each of three orthogonal axes. A certain axis (yaw axis) of the shoulder joint 28R is an axis parallel to the longitudinal direction (or axis) of the upper arm 30R, and the other two axes (pitch axis and roll axis) are orthogonal to the axis from different directions. It is an axis to do. Similarly, the shoulder joint 28L can control the angle of the upper arm 30L around each of three orthogonal axes. A certain axis (yaw axis) of the shoulder joint 28L is an axis parallel to the longitudinal direction (or axis) of the upper arm 30L, and the other two axes (pitch axis and roll axis) are orthogonal to the axes from different directions. It is an axis to do.

また、上腕３０Ｒおよび上腕３０Ｌのそれぞれの先端には、肘関節３２Ｒおよび肘関節３２Ｌが設けられる。図示は省略するが、肘関節３２Ｒおよび肘関節３２Ｌは、それぞれ１軸の自由度を有し、この軸（ピッチ軸）の軸回りにおいて前腕３４Ｒおよび前腕３４Ｌの角度を制御できる。 In addition, an elbow joint 32R and an elbow joint 32L are provided at the respective distal ends of the upper arm 30R and the upper arm 30L. Although illustration is omitted, each of the elbow joint 32R and the elbow joint 32L has one degree of freedom, and the angles of the forearm 34R and the forearm 34L can be controlled around the axis (pitch axis).

前腕３４Ｒおよび前腕３４Ｌのそれぞれの先端には、人の手に相当するハンド３６Ｒおよびハンド３６Ｌがそれぞれ設けられる。これらのハンド３６Ｒおよび３６Ｌは、詳細な図示は省略するが、開閉可能に構成され、それによってロボット１０は、ハンド３６Ｒおよび３６Ｌを用いて物体を把持または挟持することができる。ただし、ハンド３６Ｒ、３６Ｌの形状は実施例の形状に限らず、人の手に酷似した形状や機能を持たせるようにしてもよい。 A hand 36 </ b> R and a hand 36 </ b> L corresponding to human hands are respectively provided at the tips of the forearm 34 </ b> R and the forearm 34 </ b> L. Although the detailed illustration is omitted, these hands 36R and 36L are configured to be openable and closable so that the robot 10 can grip or hold an object using the hands 36R and 36L. However, the shape of the hands 36R and 36L is not limited to the shape of the embodiment, and may have a shape and function very similar to a human hand.

また、図示は省略するが、台車１２の前面、肩関節２８Ｒと肩関節２８Ｌとを含む肩に相当する部位、上腕３０Ｒ、上腕３０Ｌ、前腕３４Ｒ、前腕３４Ｌ、ハンド３６Ｒおよびハンド３６Ｌには、それぞれ、接触センサ３８（図２で包括的に示す）が設けられる。台車１２の前面の接触センサ３８は、台車１２への人間や他の障害物の接触を検知する。したがって、ロボット１０は、その自身の移動中に障害物との接触が有ると、それを検知し、直ちに車輪１４の駆動を停止してロボット１０の移動を急停止させることができる。また、その他の接触センサ３８は、当該各部位に触れたかどうかを検知する。 Although not shown, the front surface of the carriage 12, the portion corresponding to the shoulder including the shoulder joint 28R and the shoulder joint 28L, the upper arm 30R, the upper arm 30L, the forearm 34R, the forearm 34L, the hand 36R, and the hand 36L, A contact sensor 38 (shown generically in FIG. 2) is provided. The contact sensor 38 on the front surface of the carriage 12 detects contact of a person or another obstacle with the carriage 12. Therefore, when the robot 10 is in contact with an obstacle during its movement, the robot 10 can detect this and immediately stop driving the wheel 14 to suddenly stop the movement of the robot 10. Further, the other contact sensors 38 detect whether or not the respective parts are touched.

胴体２２の中央上部（人の首に相当する位置）には首関節４０が設けられ、さらにその上には頭部４２が設けられる。図示は省略するが、首関節４０は、３軸の自由度を有し、３軸の各軸廻りに角度制御可能である。或る軸（ヨー軸）はロボット１０の真上（鉛直上向き）に向かう軸であり、他の２軸（ピッチ軸、ロール軸）は、それぞれ、それと異なる方向で直交する軸である。 A neck joint 40 is provided at the upper center of the body 22 (a position corresponding to a person's neck), and a head 42 is further provided thereon. Although illustration is omitted, the neck joint 40 has a degree of freedom of three axes, and the angle can be controlled around each of the three axes. A certain axis (yaw axis) is an axis directed directly above (vertically upward) of the robot 10, and the other two axes (pitch axis and roll axis) are axes orthogonal to each other in different directions.

頭部４２には、人の口に相当する位置に、スピーカ４４が設けられる。スピーカ４４は、ロボット１０が、それの周辺の人に対して音声によってコミュニケーションをとるために用いられる。また、人の耳に相当する位置には、マイク４６Ｒおよびマイク４６Ｌが設けられる。以下、右のマイク４６Ｒと左のマイク４６Ｌとをまとめてマイク４６ということがある。マイク４６は、周囲の音、とりわけコミュニケーションを実行する対象である人間の音声を取り込む。 The head 42 is provided with a speaker 44 at a position corresponding to a human mouth. The speaker 44 is used for the robot 10 to communicate with the surrounding people by voice. A microphone 46R and a microphone 46L are provided at a position corresponding to a human ear. Hereinafter, the right microphone 46R and the left microphone 46L may be collectively referred to as a microphone 46. The microphone 46 captures ambient sounds, in particular, the voices of humans who are subjects of communication.

さらに、人の目に相当する位置には、右の眼球部４８Ｒおよび左の眼球部４８Ｌが設けられる。右の眼球部４８Ｒおよび左の眼球部４８Ｌは、それぞれ右の眼カメラ５０Ｒおよび左の眼カメラ５０Ｌを含む。以下、右の眼球部４８Ｒと左の眼球部４８Ｌとをまとめて眼球部４８ということがある。また、右の眼カメラ５０Ｒと左の眼カメラ５０Ｌとをまとめて眼カメラ５０ということがある。 Further, a right eyeball part 48R and a left eyeball part 48L are provided at positions corresponding to human eyes. The right eyeball portion 48R and the left eyeball portion 48L include a right eye camera 50R and a left eye camera 50L, respectively. Hereinafter, the right eyeball portion 48R and the left eyeball portion 48L may be collectively referred to as the eyeball portion 48. Further, the right eye camera 50R and the left eye camera 50L may be collectively referred to as an eye camera 50.

眼カメラ５０は、ロボット１０に接近した人の顔や他の部分ないし物体などを撮影して、それに対応する映像信号を取り込む。この実施例では、ロボット１０は、この眼カメラ５０からの映像信号によって、人の左右両目のそれぞれの視線方向（ベクトル）を検出する。 The eye camera 50 captures a human face approaching the robot 10 and other parts or objects, and captures a corresponding video signal. In this embodiment, the robot 10 detects the line-of-sight directions (vectors) of the left and right eyes of the person based on the video signal from the eye camera 50.

また、眼カメラ５０は、上述した３次元距離画像計２６と同様のカメラを用いることができる。たとえば、眼カメラ５０は、眼球部４８内に固定され、眼球部４８は、眼球支持部（図示せず）を介して頭部４２内の所定位置に取り付けられる。図示は省略するが、眼球支持部は、２軸の自由度を有し、それらの各軸廻りに角度制御可能である。たとえば、この２軸の一方は、頭部４２の上に向かう方向の軸（ヨー軸）であり、他方は、一方の軸に直交しかつ頭部４２の正面側（顔）が向く方向に直行する方向の軸（ピッチ軸）である。眼球支持部がこの２軸の各軸廻りに回転されることによって、眼球部４８ないし眼カメラ５０の先端（正面）側が変位され、カメラ軸すなわち視線方向が移動される。なお、上述のスピーカ４４、マイク４６および眼カメラ５０の設置位置は、当該部位に限定されず、適宜な位置に設けられてよい。 The eye camera 50 can be the same camera as the three-dimensional distance image meter 26 described above. For example, the eye camera 50 is fixed in the eyeball unit 48, and the eyeball unit 48 is attached to a predetermined position in the head 42 via an eyeball support unit (not shown). Although illustration is omitted, the eyeball support portion has two degrees of freedom, and the angle can be controlled around each of these axes. For example, one of the two axes is an axis (yaw axis) in a direction toward the top of the head 42, and the other is orthogonal to the one axis and goes straight in a direction in which the front side (face) of the head 42 faces. It is an axis (pitch axis) in the direction to be performed. By rotating the eyeball support portion around each of these two axes, the tip (front) side of the eyeball portion 48 or the eye camera 50 is displaced, and the camera axis, that is, the line-of-sight direction is moved. Note that the installation positions of the speaker 44, the microphone 46, and the eye camera 50 described above are not limited to the portions, and may be provided at appropriate positions.

このように、この実施例のロボット１０は、車輪１４の独立２軸駆動、肩関節２８の３自由度（左右で６自由度）、肘関節３２の１自由度（左右で２自由度）、首関節４０の３自由度および眼球支持部の２自由度（左右で４自由度）の合計１７自由度を有する。 As described above, the robot 10 of this embodiment includes independent two-axis driving of the wheel 14, three degrees of freedom of the shoulder joint 28 (6 degrees of freedom on the left and right), one degree of freedom of the elbow joint 32 (two degrees of freedom on the left and right), It has a total of 17 degrees of freedom: 3 degrees of freedom of the neck joint 40 and 2 degrees of freedom of the eyeball support (4 degrees of freedom on the left and right).

図２はロボット１０の電気的な構成を示すブロック図である。この図２を参照して、ロボット１０は、１つまたは２以上のコンピュータ６０を含む。コンピュータ６０は、バス６２を介して、メモリ６４、モータ制御ボード６６、センサ入力／出力ボード６８および音声入力／出力ボード７０に接続される。 FIG. 2 is a block diagram showing an electrical configuration of the robot 10. With reference to FIG. 2, the robot 10 includes one or more computers 60. The computer 60 is connected to the memory 64, the motor control board 66, the sensor input / output board 68, and the audio input / output board 70 via the bus 62.

メモリ６４は、図示は省略をするが、ＲＯＭ、ＨＤＤおよびＲＡＭを含む。ＲＯＭおよびＨＤＤには、後述のように、各種プログラムやデータが予め記憶されるとともに、コンピュータ６０のためのバッファ領域あるいはワーキング領域としても利用される。 Although illustration is omitted, the memory 64 includes a ROM, an HDD, and a RAM. As will be described later, the ROM and HDD store various programs and data in advance, and are also used as a buffer area or a working area for the computer 60.

モータ制御ボード６６は、たとえばＤＳＰで構成され、各腕や首関節４０および眼球部４８などの各軸モータの駆動を制御する。すなわち、モータ制御ボード６６は、コンピュータ６０からの制御データを受け、右眼球部４８Ｒの２軸のそれぞれの角度を制御する２つのモータ（図２では、まとめて「右眼球モータ７２」と示す）の回転角度を制御する。同様にして、モータ制御ボード６６は、コンピュータ６０からの制御データを受け、左眼球部４８Ｌの２軸のそれぞれの角度を制御する２つのモータ（図２では、まとめて「左眼球モータ７４」と示す）の回転角度を制御する。 The motor control board 66 is configured by, for example, a DSP, and controls driving of motors of the axes such as the arms, the neck joint 40, and the eyeball unit 48. That is, the motor control board 66 receives control data from the computer 60, and controls two motors for controlling the angles of the two axes of the right eyeball portion 48R (in FIG. 2, they are collectively referred to as “right eyeball motor 72”). Control the rotation angle. Similarly, the motor control board 66 receives control data from the computer 60 and controls two motors for controlling the respective angles of the two axes of the left eyeball portion 48L (in FIG. 2, collectively referred to as “left eyeball motor 74”). Control the angle of rotation.

また、モータ制御ボード６６は、コンピュータ６０からの制御データを受け、肩関節２８Ｒの直交する３軸のそれぞれの角度を制御する３つのモータと肘関節３２Ｒの角度を制御する１つのモータとの計４つのモータ（図２では、まとめて「右腕モータ７６」と示す）の回転角度を制御する。同様にして、モータ制御ボード６６は、コンピュータ６０からの制御データを受け、肩関節２８Ｌの直交する３軸のそれぞれの角度を制御する３つのモータと肘関節３２Ｌの角度を制御する１つのモータとの計４つのモータ（図２では、まとめて「左腕モータ７８」と示す）の回転角度を制御する。 Further, the motor control board 66 receives control data from the computer 60, and includes a total of three motors for controlling the angles of the three orthogonal axes of the shoulder joint 28R and one motor for controlling the angle of the elbow joint 32R. The rotational angles of four motors (collectively indicated as “right arm motor 76” in FIG. 2) are controlled. Similarly, the motor control board 66 receives control data from the computer 60, three motors for controlling the angles of the three orthogonal axes of the shoulder joint 28L, and one motor for controlling the angle of the elbow joint 32L. The rotation angles of a total of four motors (collectively indicated as “left arm motor 78” in FIG. 2) are controlled.

さらに、モータ制御ボード６６は、コンピュータ６０からの制御データを受け、首関節４０の直交する３軸のそれぞれの角度を制御する３つのモータ（図２では、まとめて「頭部モータ８０」と示す）の回転角度を制御する。そして、モータ制御ボード６６は、コンピュータ６０からの制御データを受け、車輪１４を駆動する２つのモータ（図２では、まとめて「車輪モータ８２」と示す）の回転角度を制御する。 Further, the motor control board 66 receives the control data from the computer 60 and controls three motors for controlling the angles of the three orthogonal axes of the neck joint 40 (in FIG. 2, collectively referred to as “head motor 80”). ) To control the rotation angle. The motor control board 66 receives control data from the computer 60 and controls the rotation angles of two motors (collectively referred to as “wheel motors 82” in FIG. 2) that drive the wheels 14.

センサ入力／出力ボード６８は、モータ制御ボード６６と同様に、ＤＳＰで構成され、各センサからの信号を取り込んでコンピュータ６０に与える。すなわち、距離センサ２０のそれぞれからの反射時間に関するデータがこのセンサ入力／出力ボード６８を通じてコンピュータ６０に入力される。 Similar to the motor control board 66, the sensor input / output board 68 is configured by a DSP, and takes in signals from each sensor and gives them to the computer 60. That is, data relating to the reflection time from each of the distance sensors 20 is input to the computer 60 through the sensor input / output board 68.

３次元距離画像計２６からの距離画像信号が、必要に応じてセンサ入力／出力ボード６８で所定の処理を施してから、距離画像データとしてコンピュータ６０に入力される。 The distance image signal from the three-dimensional distance imager 26 is subjected to predetermined processing by the sensor input / output board 68 as necessary, and then input to the computer 60 as distance image data.

眼カメラ５０からの映像信号も、同様にして、コンピュータ６０に入力される。また、上述した複数の接触センサ３８（図２では、まとめて「接触センサ３８」と示す）からの信号がセンサ入力／出力ボード６８を介してコンピュータ６０に与えられる。 A video signal from the eye camera 50 is also input to the computer 60 in the same manner. Further, signals from the plurality of contact sensors 38 described above (collectively referred to as “contact sensors 38” in FIG. 2) are provided to the computer 60 via the sensor input / output board 68.

音声入力／出力ボード７０もまた、同様に、ＤＳＰで構成され、コンピュータ６０から与えられる音声合成データに従った音声または声がスピーカ４４から出力される。また、マイク４６からの音声入力が、音声入力／出力ボード７０を介してコンピュータ６０に与えられる。 Similarly, the voice input / output board 70 is also configured by a DSP, and voice or voice in accordance with voice synthesis data provided from the computer 60 is output from the speaker 44. Also, audio input from the microphone 46 is given to the computer 60 via the audio input / output board 70.

また、コンピュータ６０は、バス６２を介して通信ＬＡＮボード８４に接続される。通信ＬＡＮボード８４は、たとえばＤＳＰで構成され、コンピュータ６０から与えられた送信データを無線通信モジュール８６に与え、無線通信モジュール８６は送信データを、ネットワークを介して遠隔操作装置（図示せず）等に送信する。また、通信ＬＡＮボード８４は、無線通信モジュール８６を介してデータを受信し、受信したデータをコンピュータ６０に与える。 The computer 60 is connected to the communication LAN board 84 via the bus 62. The communication LAN board 84 is composed of, for example, a DSP, and provides transmission data given from the computer 60 to the wireless communication module 86. The wireless communication module 86 sends the transmission data to a remote control device (not shown) via a network. Send to. The communication LAN board 84 receives data via the wireless communication module 86 and gives the received data to the computer 60.

このようなロボット１０は、たとえば図３に示す移動空間８８の出入り口８８ａの間の移動経路８８ｂに沿って移動し、たとえば商業施設のようなその移動空間８８に存在する人に対して、案内サービスなどのサービスを提供する。 Such a robot 10 moves along a movement path 88b between the entrances and exits 88a of the moving space 88 shown in FIG. 3, for example, and provides guidance services to persons existing in the moving space 88 such as a commercial facility. Provide such services.

なお、移動空間８８には、図示しないが、環境側のセンサとして、ロボット１０のものと同様の、３次元全方位レーザ距離計が複数配置されている。 Although not shown, a plurality of three-dimensional omnidirectional laser rangefinders similar to those of the robot 10 are arranged in the moving space 88 as environmental sensors.

このような移動空間８８中を移動するロボット１０は、基本的には自律移動可能であるが、ロボット１０が自律移動できなくなるなどの事態に備えて、たとえば図４に示すような遠隔操作装置９０によっても制御可能とされている。 The robot 10 moving in the moving space 88 can basically move autonomously, but in preparation for a situation where the robot 10 cannot move autonomously, for example, a remote control device 90 as shown in FIG. It is possible to control by.

図４を参照して、この実施例の遠隔制御装置９０は、移動体としてのロボット１０の動作や移動を制御するためにネットワーク９２を介して、ロボット１０と相互に通信できる。 Referring to FIG. 4, the remote control device 90 of this embodiment can communicate with the robot 10 via a network 92 in order to control the operation and movement of the robot 10 as a moving body.

遠隔操作装置９０は、コンピュータ９４を含み、このコンピュータ９４に接続された操作卓９６に設けられる操作キーやジョイスティック（図示せず）をオペレータが操作することによって、その操作入力に応じて、コンピュータ９４がロボット１０の車輪１４すなわちロボット１０の移動を制御することができる。 The remote operation device 90 includes a computer 94. When an operator operates an operation key or a joystick (not shown) provided on an operation console 96 connected to the computer 94, the computer 94 responds to the operation input. Can control the wheel 14 of the robot 10, that is, the movement of the robot 10.

コンピュータ９４にはメモリ９８が連結されるとともに、表示器１０が接続される。メモリ９８は、ロボット制御のためのプログラム、移動空間８８（図３）の２次元地図データや３次元地図データ等の必要なデータを記憶する。表示器１００は、ロボット１０の移動を安全に行うために、ロボット１０の移動状態や全方位カメラ（図示せず）で撮影した画像を表示して、オペレータに見せる。 A memory 98 is connected to the computer 94 and a display 10 is connected. The memory 98 stores necessary data such as a program for robot control and 2D map data and 3D map data of the moving space 88 (FIG. 3). The display device 100 displays the moving state of the robot 10 and images taken by an omnidirectional camera (not shown) for safe movement of the robot 10 to show to the operator.

コンピュータ９４には、無線通信モジュール１０２が付属され、コンピュータ９４はネットワーク９２を経由して、ロボット１０の無線通信モジュール８６（図２）を通して、コンピュータ６０と無線通信を行うことができる。 A wireless communication module 102 is attached to the computer 94, and the computer 94 can perform wireless communication with the computer 60 through the network 92 and through the wireless communication module 86 (FIG. 2) of the robot 10.

ここで、図１のロボット１０のコンピュータ６０のメモリ６４は、たとえば図５に示すように、プログラム記憶領域１０４およびデータ記憶領域１０６を含む。 Here, the memory 64 of the computer 60 of the robot 10 of FIG. 1 includes a program storage area 104 and a data storage area 106 as shown in FIG. 5, for example.

プログラム記憶領域１０４には、ＯＳなどの必要なプログラムの他、ロボット１０の移動を制御するための移動制御プログラム１０８および後に詳細に説明する推定プログラム１１０などが予め設定されている。この推定プログラム１１０は、移動空間８８内を移動する人の特徴を推定するもので、そのためにディープニューラルネットワーク（たとえば、畳み込みニューラルネットワーク）を含む。 In the program storage area 104, in addition to a necessary program such as an OS, a movement control program 108 for controlling movement of the robot 10 and an estimation program 110 described in detail later are set in advance. The estimation program 110 estimates the characteristics of a person moving in the moving space 88, and includes a deep neural network (for example, a convolutional neural network) for that purpose.

データ記憶領域１０６は、レーザ距離計２６（図１、図２）から得られる３次元距離画像データ（データセット）を一時的に記憶するための３次元距離画像データ記憶領域１１２、および遠隔操作装置９０のメモリ９８から取得した、移動空間８８の３次元地図（３次元環境地図）のデータを記憶する３次元地図データ記憶領域１１４を含む。 The data storage area 106 includes a three-dimensional distance image data storage area 112 for temporarily storing three-dimensional distance image data (data set) obtained from the laser distance meter 26 (FIGS. 1 and 2), and a remote control device. A three-dimensional map data storage area 114 that stores data of a three-dimensional map (three-dimensional environment map) of the moving space 88 acquired from 90 memories 98 is included.

データ記憶領域１０６にはさらに、推定プログラム１１０に従った処理過程で得られる後述の移動物体の３次元点群のデータを記憶する３次元点群データ記憶領域１１６および移動物体の２次元距離画像データを記憶する２次元距離画像データ記憶領域１１８が形成される。 The data storage area 106 further includes a three-dimensional point cloud data storage area 116 for storing data of a three-dimensional point cloud of a moving object, which will be described later, obtained in the process according to the estimation program 110, and two-dimensional distance image data of the moving object. Is stored in the two-dimensional distance image data storage area 118.

このようなロボット１０は、たとえば図３に示すような場所において移動されながら、レーザ距離計２８によってスキャンデータを取得する。図６に例示する場所は発明者等が推定システムの実験のために利用した商業施設の２階の一部である。実験では、ロボット１０は、１回の走行において、商業施設のこの場所を約２５分かけて約５３０ｍ移動し、およそ１億８２００万の点群の位置データ（データセット）を出力する。 Such a robot 10 acquires scan data by the laser distance meter 28 while moving in a place as shown in FIG. The place illustrated in FIG. 6 is a part of the second floor of a commercial facility used by the inventors for the experiment of the estimation system. In the experiment, the robot 10 moves about 530 m over about 25 minutes in this place of the commercial facility in one run, and outputs position data (data set) of about 182 million point clouds.

図７がレーザ距離計２８で取得した代表的な３次元スキャンデータ１２０を示し、この図７の中の円弧状の線が、レーザ距離計２８がスキャンしている様子を模式的に示す。図７に示すように、スキャンデータは、壁や床などのような静的オブジェクトだけでなく、人間のような動的オブジェクトからのデータを含む。他方、図７ではわからないが、実際のスキャンデータでは、高さ方向に色付けされていて、低い位置が赤色で、高い位置が青色でそれぞれ表現され、３次元距離画像としてロボット１０のメモリ８４の３次元距離画像データ記憶領域１１２（図５）に記憶される。 FIG. 7 shows representative three-dimensional scan data 120 acquired by the laser distance meter 28, and the arc-shaped line in FIG. 7 schematically shows a state in which the laser distance meter 28 is scanning. As shown in FIG. 7, the scan data includes not only static objects such as walls and floors but also data from dynamic objects such as humans. On the other hand, although not understood in FIG. 7, the actual scan data is colored in the height direction, and the low position is expressed in red and the high position is expressed in blue, respectively, and 3 in the memory 84 of the robot 10 as a three-dimensional distance image. It is stored in the dimensional distance image data storage area 112 (FIG. 5).

図８に示すステップＳ１では、このような３次元距離画像データとメモリ６４の３次元地図データ記憶領域１１４に予め記憶されている３次元環境地図データとを用いて、まず、自己位置を同定する。具体的には、３次元環境地図において固定物（静止物体）として記録されている壁（図７のスキャンデータ１２２において「１２４」で示す）などの位置とロボット１０の位置との距離を（３角測量の原理で）計算することによって、ロボット１０の現在位置（座標位置）が特定できる。 In step S1 shown in FIG. 8, the self-position is first identified using such 3D distance image data and 3D environment map data stored in advance in the 3D map data storage area 114 of the memory 64. . Specifically, the distance between the position of the robot 10 and the position of the wall (indicated by “124” in the scan data 122 in FIG. 7) recorded as a fixed object (stationary object) in the three-dimensional environment map (3 By calculating (by the principle of angle measurement), the current position (coordinate position) of the robot 10 can be specified.

ただし、このステップＳ１でのロボット１０の現在位置の特定のための計算は、移動空間８８（図３）に分散配置されている複数の３次元全方位レーザ距離計（図示せず）を利用して計算することもできる。 However, the calculation for specifying the current position of the robot 10 in step S1 uses a plurality of three-dimensional omnidirectional laser distance meters (not shown) distributed in the moving space 88 (FIG. 3). Can also be calculated.

そして、ロボット１０の現在位置を特定した後、ロボット１０の３次元全方位レーザ距離計２６から得られた３次元距離画像データと３次元環境地図データと比較することによって、地図に記録されていない次元距離画像データを抽出することができる。これによって抽出される物体は、移動空間内における移動可能な物体（人や移動できるもの）である。なぜなら、３次元環境地図に載っているものは壁などの固定物であるので、その３次元環境地図に入っていないものは一時的な移動物体である。 Then, after the current position of the robot 10 is specified, it is not recorded on the map by comparing the 3D range image data obtained from the 3D omnidirectional laser rangefinder 26 of the robot 10 with the 3D environment map data. Dimensional distance image data can be extracted. The object extracted by this is a movable object (a person or something that can move) in the movement space. Because what is on the 3D environment map is a fixed object such as a wall, what is not on the 3D environment map is a temporary moving object.

このような移動物体を示す３次元距離画像データを、背景技術で挙げた特許第５９５３４８４号に示すような方法でクラスタリングする（ステップＳ３）ことによって、図９に示すような移動物体の３次元点群に分けることができる。 The three-dimensional distance image data indicating such a moving object is clustered by the method shown in Japanese Patent No. 595484 cited in the background art (step S3), so that the three-dimensional points of the moving object as shown in FIG. Can be divided into groups.

１つの考え方として、このような３次元点群をニューラルネットワーク等の推定アルゴリズムの入力に使い、出力はそれぞれの点群に相当する物体の特徴（人かどうか、性別、その瞬間の体の向き、等）を使うことが考えられる。点群と特徴のデータが十分あればニューラルネットワークの学習ができる。たとえば、性のわかる人のスキャンを多く集めれば、性別の学習データに使える。ただし、学習データはシミュレータでも作ることができる。 One way of thinking is to use such a 3D point cloud as an input for an estimation algorithm such as a neural network, and the output is the characteristics of the object corresponding to each point cloud (whether it is a person, gender, body orientation at that moment, Etc.). If the point cloud and feature data are sufficient, the neural network can be learned. For example, if you collect a lot of scans of people who have sex, you can use it as sex learning data. However, learning data can also be created with a simulator.

しかしながら、この実施例においては、３次元点群を直接ニューラルネットワークに投入しない。 However, in this embodiment, the three-dimensional point group is not directly input to the neural network.

まず、ステップＳ５において、投影を使って図１０に示すような２次元の距離画像を作る。 First, in step S5, a two-dimensional distance image as shown in FIG. 10 is created using projection.

具体的には、図１１に示すように、３次元距離計２６から延びる線に対して垂直な四角い面を投影面として使う。ただし、投影面の幅と高さは任意である（たとえば、１ｍ×２ｍ）。そして、投影面のＸ軸の中心を点群の中心に合わせる（物体の画像が真中になるようにする。）。投影面の３次元距離計２６からの距離も点群の平均距離と同じにする。３次元距離計２６から見ればおよそ半分の点が面の前、半分が面の後ろになる。ただし、投影面のＹ軸の位置はあまり厳密に考えなくてもよいが、好ましくは、比較できるようにすべての点群に同じＹ軸の位置を使う。たとえば、人の頭が入るように投影面の一番高いところが２ｍにあるようにする。次いで、投影面のピクセルのグリッドに分ける（ピクセルの幅も高さも一定である）。 Specifically, as shown in FIG. 11, a rectangular plane perpendicular to the line extending from the three-dimensional distance meter 26 is used as the projection plane. However, the width and height of the projection surface are arbitrary (for example, 1 m × 2 m). Then, the center of the X-axis of the projection plane is aligned with the center of the point group (the object image is set to the center). The distance from the three-dimensional distance meter 26 on the projection plane is also made the same as the average distance of the point group. When viewed from the three-dimensional distance meter 26, approximately half of the points are in front of the surface and half are behind the surface. However, the position of the Y axis on the projection plane need not be considered very strictly, but preferably the same Y axis position is used for all point groups so that they can be compared. For example, the highest point of the projection plane is 2 m so that a human head can enter. It is then divided into a grid of pixels on the projection plane (pixel width and height are constant).

このようにして、３次元点群のすべての点を投影面に投影する。そして、あるピクセルに入ったすべての点の面からの平均距離を計算することによって、そのピクセルの値を決める。たとえば、平均距離０ｍは０．５、レーザ距離計２６の方向に０．５ｍは「０」、逆方向に０．５ｍは「１」とする。ただし、１つの点も入らなかったピクセルの値は０（ゼロ）になる。 In this way, all the points of the three-dimensional point group are projected onto the projection plane. Then, the value of the pixel is determined by calculating the average distance from the plane of all points that entered the pixel. For example, the average distance 0 m is 0.5, 0.5 m is “0” in the direction of the laser distance meter 26, and 0.5 m is “1” in the reverse direction. However, the value of a pixel that does not contain one point is 0 (zero).

このような手法で３次元点群を投影した２次元距離画像の一例が図１０に示される。 An example of a two-dimensional distance image obtained by projecting a three-dimensional point group by such a method is shown in FIG.

その後、ステップＳ７で、畳み込みニューラルネットワークを使う。このときの畳み込みニューラルネットワークへの入力は、ステップＳ５で作成した２次元距離画像であり、通常のカメラの入力に使う畳み込みニューラルネットワークでよく、畳み込みニューラルネットワークの構成が複雑になるのを回避できる。 Thereafter, in step S7, a convolutional neural network is used. The input to the convolutional neural network at this time is the two-dimensional distance image created in step S5, which may be a convolutional neural network used for normal camera input, and avoids the complicated configuration of the convolutional neural network.

そして、２次元距離画像を投入した畳み込みニューラルネットワークは、人と物とを識別するとともに、人の特徴、たとえば体の向き、性別などを抽出することができる。 The convolutional neural network into which the two-dimensional distance image is input can distinguish between a person and an object, and can extract a person's characteristics such as a body orientation and sex.

上述のようにこの実施例においてはディープニューラルネットワーク（ＤＮＮ）など最新の機械学習手法を使用することで、このプロセスの正確性やロバスト性を上げることが可能となる。人と物を区別すると同時に、誤検出を減らし、人の位置だけでなく、その他の様々な特徴が推定できる。さらに、この特徴推定結果をフィードバックとして使えば、人の検出精度を向上させることができる。また、人の追跡結果を使えばロボットの自己位置推定のロバスト性を上げることが可能になる。 As described above, in this embodiment, it is possible to improve the accuracy and robustness of this process by using the latest machine learning technique such as a deep neural network (DNN). At the same time as distinguishing between people and objects, false detection can be reduced and not only the position of the person but also various other features can be estimated. Furthermore, if this feature estimation result is used as feedback, human detection accuracy can be improved. In addition, if the tracking result of the person is used, the robustness of the robot's self-position estimation can be improved.

また、従来の３次元測域センサ（３次元全方位レーザ距離計）の人検出システムでは一般的に移動している物体はすべて人だという概念を使っていたが、これは必ずしも正しくない。特に移動しているロボットの場合自己位置推定の誤差によって環境の一部が移動物体だと判断を間違える可能性が高い。これに対して実施例のように、ロボット１０の現在位置を特定した後に、３次元環境地図にのっていない物体の３次元距離画像を抽出することにより、環境（固定物）の一部を移動物体として誤認識することが可及的抑制される。しかも、ディープラーニング等の手法により、３次元の移動物体の形から人か物か検出する認識プログラムが実現可能となり、物を無視し、人だけの検出および追跡を行うことができる。 Further, in the conventional human detection system of a three-dimensional range sensor (three-dimensional omnidirectional laser rangefinder), the concept that all moving objects are people is used, but this is not necessarily correct. Especially in the case of a moving robot, there is a high possibility that a part of the environment is a moving object due to an error in self-position estimation. On the other hand, as in the embodiment, after specifying the current position of the robot 10, a part of the environment (fixed object) is extracted by extracting a three-dimensional distance image of an object not on the three-dimensional environment map. Misrecognition as a moving object is suppressed as much as possible. In addition, a recognition program for detecting whether a person or an object from the shape of a three-dimensional moving object can be realized by a technique such as deep learning, so that only the person can be detected and tracked while ignoring the object.

さらに、従来の方法では、３次元測域センサの３次元距離画像データから人の特徴検出は困難であったが、上記と同じように、ディープラーニング等の機械学習手法を用いて、３次元測域センサに写っている人の３次元距離画像データから体の向きを推定できる。また、人とセンサの距離が十分近い場合、顔の向きの推定も可能である。さらに、向きだけでなく、人のその他の特徴を推定することも可能である。たとえば、性別の識別、または身長、荷物の有無、ヘアスタイル、服のタイプ（たとえばスカートかズボン）の同定ができる。ただし、これらの推定精度は３次元測域センサの解像度や人からの距離などに依存する。 Furthermore, in the conventional method, it has been difficult to detect human features from the 3D range image data of the 3D range sensor, but as described above, 3D measurement is performed using machine learning techniques such as deep learning. The direction of the body can be estimated from the three-dimensional distance image data of the person shown in the area sensor. In addition, when the distance between the person and the sensor is sufficiently close, the orientation of the face can be estimated. Furthermore, it is possible to estimate not only the orientation but also other characteristics of the person. For example, gender identification or height, baggage presence, hairstyle, clothing type (eg skirt or trousers) can be identified. However, the estimation accuracy depends on the resolution of the three-dimensional range sensor, the distance from a person, and the like.

人追跡システムにおいては、ある人を追跡している間に周りの人や物が妨げになり、見えなくなることがあるので、再び検出したときに同じ人かどうかを判断する必要がある。従来の３次元のデータを用いた人検出アルゴリズムでは人の特徴は使えなかったのでこの再認識の段階が困難であったが、上記の特徴同定の結果を使うことにより、同一人物なのかの判断がより正確になる。その結果、人の検出や追跡のロバスト性が上がり、その精度が向上する。 In the human tracking system, it is necessary to determine whether or not the same person is detected when the person is detected again because the surrounding person or object may be obstructed and may be invisible while the person is being tracked. In the conventional human detection algorithm using three-dimensional data, human features could not be used, and this re-recognition stage was difficult. However, by using the result of the above feature identification, it is determined whether or not they are the same person. Becomes more accurate. As a result, the robustness of human detection and tracking is improved, and the accuracy is improved.

従来の方法ではロボットが環境の中の特徴を検出して自己位置推定を行うのが一般的なアプローチである。これは人がいないまたは少ない場所では安定して使えるが、ロボットの周りに人が多い場合、センサの妨げになり、検出できる環境の特徴の数が減るので、自己位置推定も困難で不安定になることがある。これは特にサービスロボットの場合よく見られる現象である。一方、環境の特徴だけでなく、上述の実施例でのような人の検出と再認識結果を自己位置推定に使うようにすれば、ロボット１０の自己位置推定の安定性は上がる。 In the conventional method, a general approach is that a robot detects features in the environment and performs self-position estimation. This can be used stably in places where there are few or few people, but if there are many people around the robot, it will interfere with the sensor and reduce the number of environmental features that can be detected, making self-position estimation difficult and unstable. May be. This is a phenomenon often seen especially in the case of service robots. On the other hand, if the human detection and re-recognition results as in the above-described embodiment are used for self-position estimation in addition to the environmental features, the stability of the self-position estimation of the robot 10 increases.

なお、上述の実施例では、実施例が解決すべき問題に好適するという考えでディープニューラルネットワーク（ＤＮＮ）として畳み込みニューラルネットワークを用いたが、それ以外のディープニューラルネットワークを採用してもよい。 In the above-described embodiment, the convolutional neural network is used as the deep neural network (DNN) because it is suitable for the problem to be solved. However, other deep neural networks may be used.

さらに、上述の実施例では図８の全てのステップをロボット１０のコンピュータ６０（図２）が実行するものとして説明した。しかしながら、図４に示す遠隔操作装置９０のコンピュータ９４が全てまたは一部のステップを実行するようにしてもよい。いずれの場合も、人推定システムを構成したものである。 Further, in the above-described embodiment, it has been described that all the steps in FIG. 8 are executed by the computer 60 (FIG. 2) of the robot 10. However, the computer 94 of the remote control device 90 shown in FIG. 4 may execute all or some of the steps. In either case, the person estimation system is configured.

１０ …ロボット
２６ …３次元全方位レーザ距離計
６０、９４ …コンピュータ
６４、９８ …メモリ DESCRIPTION OF SYMBOLS 10 ... Robot 26 ... Three-dimensional omnidirectional laser distance meter 60, 94 ... Computer 64, 98 ... Memory

Claims

Moving robot,
A three-dimensional laser rangefinder mounted on the robot;
An extraction unit that compares the three-dimensional distance image data from the three-dimensional laser rangefinder with the three-dimensional environment map data to extract the three-dimensional distance image data of the moving object;
A clustering unit that clusters the three-dimensional distance image data of the moving object to generate a three-dimensional point group of the moving object;
A projection unit that generates two-dimensional distance image data of the moving object by projecting the three-dimensional point group, and a deep neural network that receives the two-dimensional distance image data,
A human estimation system in which the deep neural network identifies a person and an object.

The human estimation system according to claim 1, wherein the deep neural network is a convolutional neural network.

The projection unit sets a projection plane perpendicular to a line from the three-dimensional laser rangefinder, and determines a pixel value Dell of each pixel based on an average distance from the projection plane of all points entering each pixel. The human estimation system according to claim 1 or 2.

An estimation program executed by a computer of a human estimation system including a movable robot equipped with a three-dimensional laser rangefinder, the estimation program comprising: An extraction unit that compares the three-dimensional environmental map data and extracts the three-dimensional distance image data of the moving object;
A clustering unit that clusters the three-dimensional distance image data of the moving object to generate a three-dimensional point group of the moving object;
Projecting the three-dimensional point cloud to generate two-dimensional distance image data of the moving object, and functioning as a deep neural network that receives the two-dimensional distance image data,
An estimation program in which the deep neural network identifies a person and an object, and identifies a person's characteristics.

An estimation method executed by a computer of a human estimation system including a movable robot equipped with a three-dimensional laser rangefinder,
An extraction step of extracting the 3D distance image data of the moving object by comparing the 3D distance image data from the 3D laser rangefinder with the 3D environment map data;
A clustering step of clustering the three-dimensional distance image data of the moving object to generate a three-dimensional point group of the moving object;
An estimation method comprising: a projecting step of generating two-dimensional distance image data of the moving object by projecting the three-dimensional point group; and inputting the two-dimensional distance image data into a deep neural network.