JP2023135615A

JP2023135615A - Bird's eye view data generation device, learning device, bird's eye view data generation program, bird's eye view data generation method, and robot

Info

Publication number: JP2023135615A
Application number: JP2023022034A
Authority: JP
Inventors: 真衣黒瀬（西村）; Kurose, (Nishimura) Mai; 章平延原; Shohei Nobuhara; 恒西野; Ko Nishino
Original assignee: Omron Corp; Kyoto University; Omron Tateisi Electronics Co
Current assignee: Omron Corp; Kyoto University
Priority date: 2022-03-15
Filing date: 2023-02-15
Publication date: 2023-09-28
Anticipated expiration: 2043-02-15
Also published as: JP7438515B2

Abstract

To make it possible to generate, from two-dimensional observation information observed from a viewpoint of an observation moving body mounted with an observation device in a dynamic environment, a movement locus of the observation moving body on the ground and bird's eye view data representing a movement locus of each of the moving bodies on the ground even in a situation in which a static landmark is not detected.SOLUTION: A bird's eye view data generation device includes: an acquisition unit 22 that acquires time series data of two-dimensional observation information representing at least one moving body observed from a viewpoint from an observation moving body mounted with an observation device in a dynamic environment; and a generation unit 26 that, using a learned model for estimating a movement of the observation moving body on the ground and a movement of each of the moving bodies on the ground, generates a movement locus of the observation moving body on the ground, which is acquired from the time-series data of the two-dimensional observation information when the observation moving body is observed from a bird's eye view position, and bird's-eye view data representing a movement locus of each of the moving bodies on the ground.SELECTED DRAWING: Figure 1

Description

新規性喪失の例外適用申請有り There is an application for exception to loss of novelty.

本発明は、俯瞰データ生成装置、学習装置、俯瞰データ生成プログラム、俯瞰データ生成方法、及びロボットに関する。 The present invention relates to an overhead view data generation device, a learning device, an overhead view data generation program, an overhead view data generation method, and a robot.

従来より、一人称視点で撮影された映像で観測した人物骨格に基づき、俯瞰視点での人物位置分布を推定する技術が知られている（非特許文献１）。 BACKGROUND ART Conventionally, there has been known a technique for estimating a person's position distribution from an overhead perspective based on a human skeleton observed in a video shot from a first-person perspective (Non-Patent Document 1).

また、静的なランドマーク基準の自己位置推定（ＳｉｍｕｌｔａｎｅｏｕｓｌｙＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ：ＳＬＡＭ）の最適化対象に移動体を加えて逐次最適化を行う技術が知られている（非特許文献２）。 Furthermore, a technique is known in which a moving object is added to the optimization target of static landmark-based self-position estimation (SLAM) and sequential optimization is performed (Non-Patent Document 2).

また、ＧＮＳＳ（ＧｌｏｂａｌＮａｖｉｇａｔｉｏｎＳａｔｅｌｌｉｔｅＳｙｓｔｅｍ）により位置を推定する技術が知られている（非特許文献３）。 Furthermore, a technique for estimating a position using GNSS (Global Navigation Satellite System) is known (Non-Patent Document 3).

また、俯瞰視点映像中における一人称映像の撮影位置を推定する技術が知られている（特許文献１）。この技術では、推定のために俯瞰視点及び一人称視点の両視点から抽出された動き特徴の照合を行っている。 Furthermore, a technique for estimating the shooting position of a first-person video in an overhead view video is known (Patent Document 1). In this technique, motion features extracted from both the bird's-eye view and the first-person view are compared for estimation.

"ＭｏｎｏＬｏｃｏ：Ｍｏｎｏｃｕｌａｒ３ＤＰｅｄｅｓｔｒｉａｎＬｏｃａｌｉｚａｔｉｏｎａｎｄＵｎｃｅｒｔａｉｎｔｙＥｓｔｉｍａｔｉｏｎ"，インターネット検索＜ＵＲＬ：ｈｔｔｐｓ：／／ａｒｘｉｖ．ｏｒｇ／ａｂｓ／１９０６．０６０５９＞，Ｊｕｎ２０１９"MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation", Internet search <URL: https://arxiv. org/abs/1906.06059>, Jun 2019 "ＣｕｂｅＳＬＡＭ：Ｍｏｎｏｃｕｌａｒ３ＤＯｂｊｅｃｔＳＬＡＭ"，インターネット検索＜ＵＲＬ：ｈｔｔｐｓ：／／ａｒｘｉｖ．ｏｒｇ／ａｂｓ／１８０６．００５５７＞，Ｊｕｎ２０１８"CubeSLAM: Monocular 3D Object SLAM", Internet search <URL: https://arxiv. org/abs/1806.00557>, Jun 2018 「フィールドロボティクスの現状と展望」、インターネット検索＜ＵＲＬ：ｈｔｔｐｓ：／／ｃｏｍｍｉｔｔｅｅｓ．ｊｓｃｅ．ｏｒ．ｊｐ／ｏｐｃｅｔ＿ｓｉｐ／ｓｙｓｔｅｍ／ｆｉｌｅｓ／０１３０＿０１．ｐｄｆ＞"Current status and prospects of field robotics", Internet search <URL: https://committees. jsce. or. jp/opcet_sip/system/files/0130_01. pdf＞

特開２０２１－７７２８７号公報JP2021-77287A

しかしながら、上記非特許文献１記載の技術では、観測カメラの運動や周辺の移動体の移動軌跡を復元することはできない。 However, with the technique described in Non-Patent Document 1, it is not possible to restore the movement of the observation camera or the movement trajectory of surrounding moving objects.

また、上記非特許文献２記載の技術は、移動体と共に静的なランドマークが安定して観測可能な環境でしか適用できない。また、移動体の動きモデルが単純な剛体運動に限られ、相互作用を考慮した移動体の動きに対応できない。 Further, the technique described in Non-Patent Document 2 can only be applied in an environment where static landmarks can be stably observed together with moving objects. Furthermore, the motion model of the moving body is limited to simple rigid body motion, and cannot support the movement of the moving body in consideration of interaction.

また、上記非特許文献３記載の技術では、ＧＮＳＳを搭載した装置自身の自己位置の復元のみを対象とし、周辺の移動体の位置を復元できない。また、高層ビルなどによる遮蔽が生じる環境では、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）電波の受信が不安定となり、位置復元結果が不正確となる。 Further, the technique described in Non-Patent Document 3 is only intended to restore the self-position of the device equipped with GNSS, and cannot restore the positions of surrounding moving objects. Furthermore, in an environment where there is shielding from a skyscraper or the like, reception of GPS (Global Positioning System) radio waves becomes unstable, resulting in inaccurate position restoration results.

また、上記特許文献１記載の技術は、俯瞰視点の映像が手に入らない場合には適用できない。 Further, the technique described in Patent Document 1 cannot be applied when an overhead view video is not available.

本発明は、上記の点に鑑みてなされたものであり、静的なランドマークが検出されない状況であっても、動的な環境において観測装置を搭載した観測移動体からの視点で観測された２次元観測情報から、観測移動体の地面上の移動軌跡、及び移動体の各々の地面上の移動軌跡を表す俯瞰データを生成することができる俯瞰データ生成装置、学習装置、俯瞰データ生成プログラム、俯瞰データ生成方法、及びロボットを提供することを目的とする。 The present invention has been made in view of the above points, and even in a situation where static landmarks are not detected, it can be observed from the perspective of an observation vehicle equipped with an observation device in a dynamic environment. A bird's-eye view data generation device, a learning device, and a bird's-eye view data generation program capable of generating bird's-eye view data representing a movement trajectory on the ground of an observed moving object and a movement trajectory on the ground of each moving object from two-dimensional observation information, The purpose of this invention is to provide a bird's-eye view data generation method and a robot.

開示の第１態様は、俯瞰データ生成装置であって、動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成部と、を含む。 A first aspect of the disclosure is a bird's-eye view data generation device that generates time-series data of two-dimensional observation information representing at least one moving object observed from the viewpoint of an observation moving object equipped with an observation device in a dynamic environment. and a trained model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground, from the time series data of the two-dimensional observation information. a generation unit that generates bird's-eye view data representing a movement trajectory of the observed moving object on the ground and a movement trajectory of each of the moving objects on the ground obtained when the observed moving object is observed from a bird's-eye view position; include.

上記第１態様において、前記生成部は、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きの分布を推定する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の位置分布を表す移動軌跡、及び前記移動体の各々の地面上の位置分布を表す移動軌跡を表す俯瞰データを生成するようにしてもよい。 In the first aspect, the generation unit uses a trained model that estimates the movement of the observation moving object on the ground and the distribution of the movement of each of the moving objects on the ground to estimate the two-dimensional observation information. A movement trajectory representing the position distribution of the observed moving object on the ground, obtained when the observed moving object is observed from a bird's-eye view position, and a position distribution of each of the moving objects on the ground, obtained from time-series data. Bird's-eye view data representing the movement trajectory may be generated.

上記第１態様において、前記２次元観測情報の時系列データから、前記移動体の各々を追跡し、前記２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさを取得する追跡部を更に含み、前記生成部は、前記２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさを入力として、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する前記学習済みモデルを用いて、前記俯瞰データを生成するようにしてもよい。 In the first aspect, tracking each of the moving objects is tracked from time-series data of the two-dimensional observation information, and the position and size of each of the moving objects at each time on the two-dimensional observation information is acquired. The generation unit receives the position and size at each time of each of the moving objects on the two-dimensional observation information as input, and generates the movement of the observed moving object on the ground and each of the moving objects. The bird's-eye view data may be generated using the learned model that estimates movement on the ground.

上記第１態様において、前記学習済みモデルは、前記移動体の各々の対象時刻の位置及び大きさを入力とし、ベクトルを出力する第１エンコーダと、一時刻前について得られた、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを入力とし、ベクトルを出力する第２エンコーダと、前記第１エンコーダによって出力された前記ベクトル、及び前記第２エンコーダによって出力された前記ベクトルを入力とし、前記対象時刻についての前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを出力するデコーダとを含むようにしてもよい。 In the first aspect, the learned model includes a first encoder that receives as input the position and size of each of the moving objects at the target time and outputs a vector, and a first encoder that receives the position and size of each of the moving objects at the target time and outputs a vector, and a second encoder that takes as input the movement on the ground of The decoder may also include a decoder that receives the vector as an input and outputs the movement of the observation moving object on the ground at the target time and the movement of each of the moving objects on the ground.

開示の第２態様は、学習装置であって、動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさの時系列データと、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きの時系列データとの組み合わせを教師データとして取得する取得部と、前記教師データに基づいて、前記２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさを入力として、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定するモデルを学習する学習部と、を含む。 A second aspect of the disclosure is a learning device that learns about at least one moving object on two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment. Obtaining a combination of time-series data of the position and size at each time, movement of the observed moving object on the ground, and time-series data of the movement of each of the moving objects on the ground as training data. and the movement of the observed moving object on the ground, and the movement of each of the moving objects on the ground based on the teacher data and the position and size of each of the moving objects on the two-dimensional observation information at each time as input. a learning unit that learns a model for estimating movement on the ground.

上記第２態様において、前記モデルは、前記移動体の各々の対象時刻の位置及び大きさを入力とし、ベクトルを出力する第１エンコーダと、一時刻前について得られた、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを入力とし、ベクトルを出力する第２エンコーダと、前記第１エンコーダによって出力された前記ベクトル、及び前記第２エンコーダによって出力された前記ベクトルを入力とし、対象時刻についての前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを出力するデコーダとを含むようにしてもよい。 In the second aspect, the model includes a first encoder that receives as input the position and size of each of the moving objects at a target time and outputs a vector, and a ground surface of the observed moving object obtained one time ago. a second encoder that inputs the above movement and the movement of each of the moving objects on the ground and outputs a vector; the vector output by the first encoder; and the vector output by the second encoder. The decoder may also include a decoder that receives as input and outputs the movement of the observed moving object on the ground at the target time and the movement of each of the moving objects on the ground.

開示の第３態様は、俯瞰データ生成装置であって、動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成部と、を含む。 A third aspect of the disclosure is an overhead data generation device that generates time-series data of two-dimensional observation information representing at least one moving object observed from the viewpoint of an observation moving object equipped with an observation device in a dynamic environment. from the time-series data of the two-dimensional observation information using an acquisition unit that acquires the movement of the observed moving object on the ground and a trained model that predicts the movement of each of the moving objects on the ground. a generation unit that generates a prediction result of bird's-eye view data representing a movement trajectory of the observed moving object on the ground, and a movement trajectory of each of the moving objects on the ground, obtained when the observed moving object is observed from a bird's-eye view position; and, including.

上記第３態様において、前記生成部は、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きの分布を予測する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の位置分布を表す移動軌跡、及び前記移動体の各々の地面上の位置分布を表す移動軌跡を表す俯瞰データの予測結果を生成するようにしてもよい。 In the third aspect, the generation unit uses a trained model that predicts the movement of the observation moving object on the ground and the distribution of the movement of each of the moving objects on the ground to predict the two-dimensional observation information. A movement trajectory representing the position distribution of the observed moving object on the ground, obtained when the observed moving object is observed from a bird's-eye view position, and a position distribution of each of the moving objects on the ground, obtained from time-series data. A prediction result of bird's-eye view data representing a movement trajectory may be generated.

開示の第４態様は、俯瞰データ生成プログラムであって、コンピュータに、動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成工程と、を含む処理を実行させるためのプログラムである。 A fourth aspect of the disclosure is a bird's-eye view data generation program that causes a computer to generate two-dimensional observation information representing at least one moving object observed from the viewpoint of an observation moving object equipped with an observation device in a dynamic environment. an acquisition step of acquiring time-series data; and a learned model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground. A generation step of generating bird's-eye view data representing a movement trajectory of the observed moving object on the ground, and a movement trajectory of each of the moving objects on the ground, obtained when the observed moving object is observed from a bird's-eye view position. This is a program for executing processing including.

開示の第５態様は、俯瞰データ生成方法であって、コンピュータが、動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成工程と、を含む処理を実行する。 A fifth aspect of the disclosure is a bird's-eye view data generation method, in which a computer generates two-dimensional observation information representing at least one moving object observed from the viewpoint of an observation moving object equipped with an observation device in a dynamic environment. an acquisition step of acquiring time-series data; and a learned model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground. A generation step of generating bird's-eye view data representing a movement trajectory of the observed moving object on the ground, and a movement trajectory of each of the moving objects on the ground, obtained when the observed moving object is observed from a bird's-eye view position. Execute processing including.

開示の第６態様は、ロボットであって、動的な環境において観測装置を搭載したロボットからの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、前記ロボットの地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記ロボットを俯瞰した位置から観測した場合に得られる、前記ロボットの地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成部と、前記ロボットを自律走行させる自律走行部と、前記俯瞰データを用いて、前記ロボットが目的地に移動するように前記自律走行部を制御する制御部と、を含む。 A sixth aspect of the disclosure is a robot, and an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of a robot equipped with an observation device in a dynamic environment. Then, using a trained model that estimates the movement of the robot on the ground and the movement of each of the moving objects on the ground, the robot is observed from a bird's-eye view position based on the time series data of the two-dimensional observation information. a generation unit that generates bird's-eye view data representing a movement trajectory of the robot on the ground and a movement trajectory of each of the moving bodies on the ground, obtained when the robot moves autonomously; A control unit that controls the autonomous traveling unit so that the robot moves to a destination using bird's-eye view data.

開示の第７態様は、俯瞰データ生成プログラムであって、コンピュータに、動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成工程と、を含む処理を実行させるためのプログラムである。 A seventh aspect of the disclosure is a bird's-eye view data generation program that causes a computer to generate two-dimensional observation information representing at least one moving object observed from the viewpoint of an observation moving object equipped with an observation device in a dynamic environment. an acquisition step of acquiring time-series data, and a learned model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground, to obtain the time-series data of the two-dimensional observation information. From this, a prediction result of bird's-eye view data representing the movement trajectory of the observed moving object on the ground and the movement trajectory of each of the moving objects on the ground obtained when the observed moving object is observed from a bird's-eye view position is generated. This is a program for executing processing including a generation process.

開示の第８態様は、俯瞰データ生成方法であって、コンピュータが、動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成工程と、を含む処理を実行する。 An eighth aspect of the disclosure is a bird's-eye view data generation method, in which a computer generates two-dimensional observation information representing at least one moving object observed from the perspective of an observation moving object equipped with an observation device in a dynamic environment. an acquisition step of acquiring time-series data, and a learned model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground, to obtain the time-series data of the two-dimensional observation information. From this, a prediction result of bird's-eye view data representing the movement trajectory of the observed moving object on the ground and the movement trajectory of each of the moving objects on the ground obtained when the observed moving object is observed from a bird's-eye view position is generated. A process including a generation process is executed.

開示の第９態様は、ロボットであって、動的な環境において観測装置を搭載したロボットからの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、前記ロボットの地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、前記２次元観測情報の時系列データから、前記ロボットを俯瞰した位置から観測した場合に得られる、前記ロボットの地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成部と、前記ロボットを自律走行させる自律走行部と、前記俯瞰データの予測結果を用いて、前記ロボットが目的地に移動するように前記自律走行部を制御する制御部と、を含む。 A ninth aspect of the disclosure is a robot, and an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of a robot equipped with an observation device in a dynamic environment. Then, using a trained model that predicts the movement of the robot on the ground and the movement of each of the moving objects on the ground, the robot is observed from a bird's-eye view position based on the time series data of the two-dimensional observation information. a generation unit that generates a prediction result of bird's-eye view data representing a movement trajectory of the robot on the ground and a movement trajectory of each of the moving objects on the ground, obtained when and a control unit that controls the autonomous traveling unit so that the robot moves to a destination using the prediction result of the bird's-eye view data.

本発明によれば、静的なランドマークが検出されない状況であっても、動的な環境において観測装置を搭載した観測移動体からの視点で観測された２次元観測情報から、観測移動体の地面上の移動軌跡、及び移動体の各々の地面上の移動軌跡を表す俯瞰データを生成することができる。 According to the present invention, even in a situation where static landmarks are not detected, the observation vehicle can be detected from two-dimensional observation information observed from the viewpoint of the observation vehicle equipped with an observation device in a dynamic environment. It is possible to generate overhead view data representing a movement trajectory on the ground and a movement trajectory of each moving object on the ground.

第１実施形態に係るロボットの概略構成を示す図である。FIG. 1 is a diagram showing a schematic configuration of a robot according to a first embodiment. カメラにより撮影される画像の一例を示す図である。FIG. 3 is a diagram showing an example of an image taken by a camera. 画像から人物を検出した結果の一例を示す図である。It is a figure which shows an example of the result of detecting a person from an image. 学習済みモデルの一例を示す図である。FIG. 3 is a diagram showing an example of a trained model. 俯瞰データの一例を示す図である。FIG. 3 is a diagram showing an example of bird's-eye view data. 第１、第２実施形態に係る俯瞰データ生成装置及び学習装置のハードウェア構成を示すブロック図である。FIG. 2 is a block diagram showing the hardware configuration of an overhead view data generation device and a learning device according to the first and second embodiments. 第１、第２実施形態に係る学習装置の概略構成を示す図である。FIG. 1 is a diagram showing a schematic configuration of a learning device according to first and second embodiments. 第１、第２実施形態に係る学習装置による学習処理の流れを示すフローチャートである。It is a flowchart showing the flow of learning processing by the learning device according to the first and second embodiments. 第１、第２実施形態に係る俯瞰データ生成装置による俯瞰データ生成処理の流れを示すフローチャートである。2 is a flowchart showing the flow of overhead view data generation processing by the overhead view data generation device according to the first and second embodiments. 第２実施形態に係る情報処理端末の概略構成を示す図である。FIG. 2 is a diagram showing a schematic configuration of an information processing terminal according to a second embodiment. 俯瞰データの一例を示す図である。FIG. 3 is a diagram showing an example of bird's-eye view data. 俯瞰データの他の例を示す図である。FIG. 7 is a diagram showing another example of bird's-eye view data. 画像から人物を検出した結果の一例を示す図である。It is a figure which shows an example of the result of detecting a person from an image.

以下、本発明の実施形態の一例を、図面を参照しつつ説明する。なお、各図面において同一又は等価な構成要素及び部分には同一の参照符号を付与している。また、図面の寸法比率は、説明の都合上誇張されている場合があり、実際の比率とは異なる場合がある。 An example of an embodiment of the present invention will be described below with reference to the drawings. In addition, the same reference numerals are given to the same or equivalent components and parts in each drawing. Further, the dimensional ratios in the drawings may be exaggerated for convenience of explanation and may differ from the actual ratios.

［第１実施形態］
図１は、本発明の第１実施形態に係るロボット１００の概略構成を示す図である。図１に示すように、ロボット１００は、カメラ１０、俯瞰データ生成装置２０、報知部５０、及び自律走行部６０を備える。俯瞰データ生成装置２０は、取得部２２、追跡部２４、生成部２６、モデル記憶部２７、及び制御部２８を備える。なお、ロボット１００が、観測移動体の一例であり、カメラ１０が、観測装置の一例である。 [First embodiment]
FIG. 1 is a diagram showing a schematic configuration of a robot 100 according to a first embodiment of the present invention. As shown in FIG. 1, the robot 100 includes a camera 10, an overhead view data generation device 20, a notification section 50, and an autonomous traveling section 60. The bird's-eye view data generation device 20 includes an acquisition section 22 , a tracking section 24 , a generation section 26 , a model storage section 27 , and a control section 28 . Note that the robot 100 is an example of an observation moving object, and the camera 10 is an example of an observation device.

カメラ１０は、スタート地点から目的地に移動するまでの間、ロボット１００の周囲を予め定めた間隔で撮影し、撮影した画像を俯瞰データ生成装置２０の取得部２２に出力する。なお、画像が、２次元観測情報の一例である。 The camera 10 photographs the surroundings of the robot 100 at predetermined intervals until it moves from the start point to the destination, and outputs the photographed images to the acquisition unit 22 of the bird's-eye view data generation device 20. Note that the image is an example of two-dimensional observation information.

例えば、動的な環境においてロボット１００からの視点で観測された少なくとも１人の人物を表す画像が、カメラ１０により撮影される（図２参照）。 For example, an image representing at least one person observed from the viewpoint of the robot 100 in a dynamic environment is captured by the camera 10 (see FIG. 2).

カメラ１０として、透視投影のＲＧＢカメラを用いてもよいし、魚眼カメラや３６０度カメラを用いてもよい。 As the camera 10, a perspective projection RGB camera, a fisheye camera, or a 360-degree camera may be used.

取得部２２は、カメラ１０によって撮影された画像の時系列データを取得する。 The acquisition unit 22 acquires time-series data of images captured by the camera 10.

追跡部２４は、取得した画像の時系列データから、人物の各々を追跡し、画像上の人物の各々の各時刻の位置及び大きさを取得する。 The tracking unit 24 tracks each person from the time series data of the acquired images, and acquires the position and size of each person on the image at each time.

例えば、図３に示すように、画像上の人物の各々について、当該人物を表すバウンディングボックスを検出して追跡し、画像上の人物の中心位置（バウンディングボックスの中心位置）及び高さ（バウンディングボックスの高さ）を時刻毎に取得する。 For example, as shown in Figure 3, for each person on an image, a bounding box representing the person is detected and tracked, and the center position (center position of the bounding box) and height (center position of the bounding box) of the person on the image are detected and tracked. height) at each time.

生成部２６は、ロボット１００の地面上の動き、及び人物の各々の地面上の動きを推定する学習済みモデルを用いて、画像の時系列データから取得した画像上の人物の各々の各時刻の位置及び大きさから、ロボット１００を俯瞰した位置から観測した場合に得られる、ロボット１００の地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データを生成する。 The generation unit 26 uses a learned model that estimates the movement of the robot 100 on the ground and the movement of each person on the ground to calculate the time of each person on the image obtained from the time series data of the image. From the position and size, overhead view data representing the movement trajectory of the robot 100 on the ground and the movement trajectory of each person on the ground, obtained when the robot 100 is observed from a bird's-eye view position, is generated.

具体的には、生成部２６は、画像上の人物の各々の各時刻の位置及び大きさを入力として、ロボット１００の地面上の動き、及び人物の各々の地面上の動きを推定する学習済みモデルを用いて、俯瞰データを生成する。 Specifically, the generation unit 26 receives the position and size of each person on the image at each time as input, and is trained to estimate the movement of the robot 100 on the ground and the movement of each person on the ground. Generate bird's-eye view data using the model.

ここで、学習済みモデルは、人物の各々の対象時刻の位置及び大きさを入力とし、ベクトルを出力する第１エンコーダと、一時刻前について得られた、ロボット１００の地面上の動き、及び人物の各々の地面上の動きを入力とし、ベクトルを出力する第２エンコーダと、第１エンコーダによって出力されたベクトル、及び第２エンコーダによって出力されたベクトルを入力とし、対象時刻についてのロボット１００の地面上の動き、及び人物の各々の地面上の動きを出力するデコーダとを含む。 Here, the trained model includes a first encoder that inputs the position and size of each target time of the person and outputs a vector, and the movement of the robot 100 on the ground obtained one time ago and the person. a second encoder that inputs the movement on the ground of each of the robots 100 and outputs a vector; and a second encoder that inputs the movement on the ground of each of and a decoder that outputs the movement of each person on the ground.

より具体的には、図４に示すように、学習済みモデル７０は、第１エンコーダ７２と、第２エンコーダ７４と、デコーダ７６とを備えている。 More specifically, as shown in FIG. 4, the learned model 70 includes a first encoder 72, a second encoder 74, and a decoder 76.

第１エンコーダ７２は、ロボット１００が一人称視点で観測した各人物の位置及び大きさを入力とし、人物間のセルフアテンションをとり、得られたベクトルを出力する。 The first encoder 72 inputs the position and size of each person observed by the robot 100 from a first-person viewpoint, takes self-attention between the people, and outputs the obtained vector.

具体的には、画像上の人物の各々の時刻ｔの位置及び大きさを表すベクトルを、多層パーセプトロン（ＭＬＰ：Multilayer perceptron）７２０に入力して得られたベクトルを、第１エンコーダ７２の入力ベクトルとする。 Specifically, a vector representing the position and size of each person on the image at time t is input into a multilayer perceptron (MLP) 720, and the obtained vector is used as the input vector of the first encoder 72. shall be.

第１エンコーダ７２のマルチヘッドセルフアテンション層７２２が、第１エンコーダ７２の入力ベクトルを、Query、Key、Valueの各々として受け付け、セルフアテンションをとってベクトルを出力する。 The multi-head self-attention layer 722 of the first encoder 72 receives the input vectors of the first encoder 72 as each of Query, Key, and Value, takes self-attention, and outputs the vectors.

第１エンコーダ７２の第１正規化層７２４は、第１エンコーダ７２の入力ベクトルと、マルチヘッドセルフアテンション層７２２の出力ベクトルとを加算した後に、正規化を行い、ベクトルを出力する。 The first normalization layer 724 of the first encoder 72 adds the input vector of the first encoder 72 and the output vector of the multi-head self-attention layer 722, performs normalization, and outputs the vector.

順伝播型ニューラルネットワーク７２６は、第１正規化層７２４の出力ベクトルを入力とし、ベクトルを出力する。 The forward propagation neural network 726 inputs the output vector of the first normalization layer 724 and outputs the vector.

第２正規化層７２８は、第１正規化層７２４の出力ベクトルと、順伝播型ニューラルネットワーク７２６の出力ベクトルとを加算した後に、正規化を行い、ベクトルを出力し、これを第１エンコーダ７２の出力ベクトルとする。この出力ベクトルは、一人称視点の埋め込みを表している。 The second normalization layer 728 adds the output vector of the first normalization layer 724 and the output vector of the forward propagation neural network 726, performs normalization, outputs the vector, and sends this to the first encoder 72. Let be the output vector of . This output vector represents the first-person perspective embedding.

第２エンコーダ７４は、一時刻前について得られた、ロボット１００の地面上の動き、及び人物の各々の地面上の動きを入力とし、ロボット１００の位置に対する各人物の相対位置及び速度をエンコーディングし、得られたベクトルを出力する。 The second encoder 74 inputs the movement of the robot 100 on the ground and the movement of each person on the ground obtained one time ago, and encodes the relative position and velocity of each person with respect to the position of the robot 100. , output the obtained vector.

具体的には、時刻ｔ－１について得られた、ロボット１００の地面上の動き、及び人物の各々の地面上の動きから、ロボット１００の位置に対する人物の各々の地面上の動きを表すベクトルを求め、このベクトルを、多層パーセプトロン７４０に入力して得られたベクトルを、第２エンコーダ７４の入力ベクトルとする。 Specifically, a vector representing the movement of each person on the ground with respect to the position of the robot 100 is calculated from the movement of the robot 100 on the ground and the movement of each person on the ground obtained at time t-1. This vector is input to the multilayer perceptron 740, and the obtained vector is used as the input vector to the second encoder 74.

第２エンコーダ７４のマルチヘッドセルフアテンション層７４２が、第２エンコーダ７４の入力ベクトルを、Query、Key、Valueの各々として受け付け、セルフアテンションをとってベクトルを出力する。 The multi-head self-attention layer 742 of the second encoder 74 receives the input vectors of the second encoder 74 as each of Query, Key, and Value, takes self-attention, and outputs the vectors.

第２エンコーダ７４の正規化層７４４は、第２エンコーダ７４の入力ベクトルと、マルチヘッドセルフアテンション層７４２の出力ベクトルとを加算した後に、正規化を行い、ベクトルを出力する。このベクトルは、俯瞰視点の埋め込みを表している。 The normalization layer 744 of the second encoder 74 adds the input vector of the second encoder 74 and the output vector of the multi-head self-attention layer 742, then performs normalization and outputs the vector. This vector represents the embedding of the bird's-eye view.

デコーダ７６は、第１エンコーダ７２の出力ベクトルと第２エンコーダ７４の出力ベクトルとの間で、クロスアテンションをとり、クロスアテンションの結果から得られたベクトルを出力する。このベクトルは、ロボット１００の地面上の動き、及び人物の各々の地面上の動きをマルチヘッドで予測した結果を表している。 The decoder 76 performs cross-attention between the output vector of the first encoder 72 and the output vector of the second encoder 74, and outputs a vector obtained as a result of the cross-attention. This vector represents the result of multi-head prediction of the movement of the robot 100 on the ground and the movement of each person on the ground.

具体的には、第１エンコーダ７２の出力ベクトルと第２エンコーダ７４の出力ベクトルを、デコーダ７６の入力とする。 Specifically, the output vector of the first encoder 72 and the output vector of the second encoder 74 are input to the decoder 76.

デコーダ７６のマルチヘッドクロスアテンション層７６０が、第１エンコーダ７２の出力ベクトルを、Key、Valueの各々として受け付け、第２エンコーダ７４の出力ベクトルを、Queryとして受け付け、クロスアテンションをとってベクトルを出力する。 The multi-head cross-attention layer 760 of the decoder 76 receives the output vector of the first encoder 72 as a key and value, receives the output vector of the second encoder 74 as a query, takes cross-attention, and outputs the vector. .

デコーダ７６の第１正規化層７６２は、第２エンコーダ７４の出力ベクトルと、マルチヘッドクロスアテンション層７６０の出力ベクトルとを加算した後に、正規化を行い、ベクトルを出力する。 The first normalization layer 762 of the decoder 76 adds the output vector of the second encoder 74 and the output vector of the multi-head cross-attention layer 760, performs normalization, and outputs the vector.

順伝播型ニューラルネットワーク７６４は、第１正規化層７６２の出力ベクトルを入力とし、ベクトルを出力する。 The forward propagation neural network 764 inputs the output vector of the first normalization layer 762 and outputs the vector.

第２正規化層７６６は、第１正規化層７６２の出力ベクトルと、順伝播型ニューラルネットワーク７６４の出力ベクトルとを加算した後に、正規化を行い、ベクトルを出力し、これをデコーダ７６の出力ベクトルとする。 The second normalization layer 766 adds the output vector of the first normalization layer 762 and the output vector of the forward propagation neural network 764, performs normalization, outputs the vector, and outputs the vector from the decoder 76. Let it be a vector.

順伝播型ニューラルネットワーク７６８は、デコーダ７６の出力ベクトルを入力とし、時刻ｔのロボット１００の動きを表すベクトルを出力する。 The forward propagation neural network 768 inputs the output vector of the decoder 76 and outputs a vector representing the movement of the robot 100 at time t.

また、順伝播型ニューラルネットワーク７７０は、デコーダ７６の出力ベクトルを入力とし、時刻ｔの人物の各々の動きを表すベクトルを出力する。 Further, the forward propagation neural network 770 receives the output vector of the decoder 76 as input, and outputs a vector representing each movement of the person at time t.

ここで、動きを表すベクトルは、例えば、一時刻前に対する相対位置及び相対速度を表すベクトルである。なお、動きを表すベクトルは、一時刻前に対する相対位置を表すベクトル、又は一時刻前に対する相対速度を表すベクトルであってもよい。 Here, the vector representing movement is, for example, a vector representing relative position and relative velocity with respect to one time ago. Note that the vector representing movement may be a vector representing a relative position with respect to one time ago, or a vector representing relative velocity with respect to one time ago.

本実施形態では、生成部２６は、画像上の人物の各々の時刻ｔの位置及び大きさを表すベクトル、並びに時刻ｔ－１について得られた、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きを表すベクトルから、学習済みモデル７０を用いて、時刻ｔにおける、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きを表すベクトルを求めることを、各時刻ｔについて繰り返すことにより、俯瞰データを生成する。 In this embodiment, the generation unit 26 generates a vector representing the position and size of each person on the image at time t, a vector representing the movement of the robot 100 on the ground obtained at time t-1, and Using the learned model 70, find a vector representing the movement of the robot 100 on the ground and a vector representing the movement of each person on the ground at time t from vectors representing the movement of each person on the ground. By repeating this for each time t, overhead view data is generated.

生成部２６は、例えば、図５に示すような俯瞰データを生成する。図５は、黒丸をつないだ線でロボット１００の地面上の移動軌跡を示し、破線で人物の地面上の移動軌跡を示す例を示している。 The generation unit 26 generates bird's-eye view data as shown in FIG. 5, for example. FIG. 5 shows an example in which a line connecting black circles indicates the locus of movement of the robot 100 on the ground, and a broken line indicates the locus of movement of the person on the ground.

制御部２８は、俯瞰データを用いて、ロボット１００が目的地に移動するように自律走行部６０を制御する。例えば、制御部２８は、ロボット１００の移動方向及び速度を指定し、指定された移動方向及び速度で移動するように自律走行部６０を制御する。 The control unit 28 uses the bird's-eye view data to control the autonomous traveling unit 60 so that the robot 100 moves to the destination. For example, the control unit 28 specifies the moving direction and speed of the robot 100, and controls the autonomous traveling unit 60 to move in the specified moving direction and speed.

また、制御部２８は、俯瞰データを用いて、介入行動が必要と判断した場合には、「道を空けてください」等のメッセージを音声出力したり、警告音を鳴らすよう報知部５０を制御する。 In addition, if the control unit 28 determines that intervention is necessary using the bird's-eye view data, the control unit 28 controls the notification unit 50 to output a message such as "Please clear the road" or to sound a warning sound. do.

次に、ロボット１００の俯瞰データ生成装置２０のハードウェア構成について説明する。 Next, the hardware configuration of the bird's-eye view data generation device 20 of the robot 100 will be described.

図６に示すように、俯瞰データ生成装置２０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）６２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）６３、ストレージ６４、及び通信インタフェース（Ｉ/Ｆ）６５を有する。各構成は、バス６６を介して相互に通信可能に接続されている。 As shown in FIG. 6, the bird's-eye view data generation device 20 includes a CPU (Central Processing Unit) 61, a ROM (Read Only Memory) 62, a RAM (Random Access Memory) 63, a storage 64, and a communication interface (I/F) 65. has. Each component is communicably connected to each other via a bus 66.

本実施形態では、ストレージ６４には、俯瞰データ生成プログラムが格納されている。ＣＰＵ６１は、中央演算処理ユニットであり、各種プログラムを実行したり、各構成を制御したりする。すなわち、ＣＰＵ６１は、ストレージ６４からプログラムを読み出し、ＲＡＭ６３を作業領域としてプログラムを実行する。ＣＰＵ６１は、ストレージ６４に記録されているプログラムに従って、上記各構成の制御及び各種の演算処理を行う。 In this embodiment, the storage 64 stores an overhead view data generation program. The CPU 61 is a central processing unit that executes various programs and controls each component. That is, the CPU 61 reads the program from the storage 64 and executes the program using the RAM 63 as a work area. The CPU 61 controls each of the above components and performs various arithmetic operations according to programs recorded in the storage 64.

ＲＯＭ６２は、各種プログラム及び各種データを格納する。ＲＡＭ６３は、作業領域として一時的にプログラム又はデータを記憶する。ストレージ６４は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）又はＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）により構成され、オペレーティングシステムを含む各種プログラム、及び各種データを格納する。 The ROM 62 stores various programs and various data. The RAM 63 temporarily stores programs or data as a work area. The storage 64 is configured with an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various programs including an operating system and various data.

通信インタフェース６５は、他の機器と通信するためのインタフェースであり、例えば、イーサネット（登録商標）、ＦＤＤＩ又はＷｉ－Ｆｉ（登録商標）等の規格が用いられる。 The communication interface 65 is an interface for communicating with other devices, and uses, for example, a standard such as Ethernet (registered trademark), FDDI, or Wi-Fi (registered trademark).

上記学習済みモデル７０は、図７に示す学習装置１２０によって予め学習される。以下、この学習装置１２０について説明する。 The trained model 70 is trained in advance by the learning device 120 shown in FIG. This learning device 120 will be explained below.

図７は、本発明の第１実施形態に係る学習装置１２０の概略構成を示す図である。図７に示すように、学習装置１２０は、教師データ記憶部１２２、取得部１２４、学習部１２６、及びモデル記憶部１２８を備える。 FIG. 7 is a diagram showing a schematic configuration of the learning device 120 according to the first embodiment of the present invention. As shown in FIG. 7, the learning device 120 includes a teacher data storage section 122, an acquisition section 124, a learning section 126, and a model storage section 128.

教師データ記憶部１２２には、動的な環境においてロボット１００からの視点で観測された画像上の人物の各々の各時刻の位置及び大きさの時系列データと、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの時系列データとの組み合わせが、教師データとして複数記憶されている。 The teacher data storage unit 122 stores time series data of the position and size of each person on the image at each time observed from the viewpoint of the robot 100 in a dynamic environment, the movement of the robot 100 on the ground, A plurality of combinations of the time-series data of the movement of each person on the ground and the movement of each person on the ground are stored as teacher data.

取得部１２４は、教師データ記憶部１２２から、複数の教師データを取得する。 The acquisition unit 124 acquires a plurality of pieces of teacher data from the teacher data storage unit 122.

学習部１２６は、複数の教師データに基づいて、教師データの画像上の人物の各々の各時刻の位置及び大きさの時系列データを入力としたときに、学習済みモデル７０と同様の構成を有するモデルが、教師データのロボット１００の地面上の動き、及び人物の各々の地面上の動きの時系列データを出力するように、当該モデルのパラメータを学習する。 The learning unit 126 has the same configuration as the learned model 70 when inputting time series data of the position and size of each person on the image of the teacher data at each time based on a plurality of pieces of teacher data. The parameters of the model are learned so that the model outputs time-series data of the movement of the robot 100 on the ground and the movement of each person on the ground as teacher data.

モデル記憶部１２８には、学習部１２６による学習結果が、学習済みモデルとして記憶される。 The model storage unit 128 stores the learning results obtained by the learning unit 126 as a learned model.

次に、学習装置１２０のハードウェア構成について説明する。 Next, the hardware configuration of the learning device 120 will be explained.

上記図６に示すように、学習装置１２０は、俯瞰データ生成装置２０と同様に、ＣＰＵ６１、ＲＯＭ６２、ＲＡＭ６３、ストレージ６４、及び通信インタフェース６５を有する。各構成は、バス６６を介して相互に通信可能に接続されている。本実施形態では、ストレージ６４には、学習プログラムが格納されている。 As shown in FIG. 6 above, the learning device 120 includes a CPU 61, a ROM 62, a RAM 63, a storage 64, and a communication interface 65, similarly to the bird's-eye view data generation device 20. Each component is communicably connected to each other via a bus 66. In this embodiment, the storage 64 stores a learning program.

次に、学習装置１２０の作用について説明する。 Next, the operation of the learning device 120 will be explained.

まず、学習装置１２０に、動的な環境においてロボット１００からの視点で観測された画像上の人物の各々の各時刻の位置及び大きさの時系列データと、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの時系列データとの組み合わせが、教師データとして複数入力され、教師データ記憶部１２２に記憶される。 First, the learning device 120 is provided with time series data of the position and size at each time of each person on the image observed from the viewpoint of the robot 100 in a dynamic environment, the movement of the robot 100 on the ground, and A plurality of combinations of time-series data of each person's movement on the ground are input as teacher data and stored in the teacher data storage unit 122.

図８は、学習装置１２０による学習処理の流れを示すフローチャートである。ＣＰＵ６１がストレージ６４から学習プログラムを読み出して、ＲＡＭ６３に展開し実行することにより、学習処理が行なわれる。 FIG. 8 is a flowchart showing the flow of learning processing by the learning device 120. The learning process is performed by the CPU 61 reading the learning program from the storage 64, loading it onto the RAM 63, and executing it.

ステップＳ１００では、ＣＰＵ６１が、取得部１２４として、教師データ記憶部１２２から、複数の教師データを取得する。 In step S100, the CPU 61, as the acquisition unit 124, acquires a plurality of pieces of teacher data from the teacher data storage unit 122.

ステップＳ１０２では、ＣＰＵ６１が、学習部１２６として、複数の教師データに基づいて、教師データの画像上の人物の各々の各時刻の位置及び大きさの時系列データを入力としたときに、学習済みモデル７０と同様の構成を有するモデルが、教師データのロボット１００の地面上の動き、及び人物の各々の地面上の動きの時系列データを出力するように、当該モデルのパラメータを学習する。 In step S102, when the CPU 61, as the learning unit 126, inputs time series data of the position and size of each person on the image of the teacher data at each time based on a plurality of pieces of teacher data, The parameters of a model having a configuration similar to the model 70 are learned so that the model outputs time-series data of the movement of the robot 100 on the ground and the movement of each person on the ground as teacher data.

そして、学習部１２６による学習結果が、学習済みモデルとしてモデル記憶部１２８に記憶される。 The learning result by the learning unit 126 is then stored in the model storage unit 128 as a trained model.

次に、ロボット１００の作用について説明する。 Next, the operation of the robot 100 will be explained.

まず、学習装置１２０によって学習された学習済みモデルが、俯瞰データ生成装置２０のモデル記憶部２７に記憶される。 First, the learned model learned by the learning device 120 is stored in the model storage unit 27 of the bird's-eye view data generation device 20.

そして、ロボット１００が、自律走行部６０により目的地まで移動する際に、カメラ１０は、ロボット１００の周囲を予め定めた間隔で撮影し、俯瞰データ生成装置２０は、定期的に、図９に示す俯瞰データ生成処理により俯瞰データを生成し、俯瞰データに基づいて、ロボット１００が目的地に移動するように自律走行部６０を制御する。 Then, when the robot 100 moves to the destination by the autonomous traveling unit 60, the camera 10 takes pictures of the surroundings of the robot 100 at predetermined intervals, and the bird's-eye view data generation device 20 periodically takes pictures of the surroundings of the robot 100 as shown in FIG. Bird's-eye view data is generated by the bird's-eye view data generation process shown in FIG.

図９は、俯瞰データ生成装置２０による俯瞰データ生成処理の流れを示すフローチャートである。ＣＰＵ６１がストレージ６４から俯瞰データ生成プログラムを読み出して、ＲＡＭ６３に展開し実行することにより、俯瞰データ生成処理が行なわれる。 FIG. 9 is a flowchart showing the process of generating bird's-eye view data by the bird's-eye view data generating device 20. The CPU 61 reads the bird's-eye view data generation program from the storage 64, expands it to the RAM 63, and executes it, thereby performing the bird's-eye view data generation process.

ステップＳ１１０では、ＣＰＵ６１が、取得部２２として、カメラ１０によって撮影された画像の時系列データを取得する。 In step S110, the CPU 61, as the acquisition unit 22, acquires time-series data of images photographed by the camera 10.

ステップＳ１１２では、ＣＰＵ６１が、追跡部２４として、取得した画像の時系列データから、人物の各々を追跡し、画像上の人物の各々の各時刻の位置及び大きさを取得する。 In step S112, the CPU 61, as the tracking unit 24, tracks each person from the time series data of the acquired image, and acquires the position and size of each person on the image at each time.

ステップＳ１１４では、ＣＰＵ６１が、生成部２６として、取得した画像の時系列データの最初の時刻より一時刻前についての、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きを表すベクトルに対し、初期値を設定する。また、画像の時系列データの最初の時刻を時刻ｔとする。 In step S114, the CPU 61, as the generation unit 26, generates a vector representing the movement of the robot 100 on the ground and the movement of each person on the ground one hour before the first time of the time series data of the acquired image. Set the initial value for the vector representing . Further, the first time of the time series data of the image is assumed to be time t.

ステップＳ１１６では、ＣＰＵ６１が、生成部２６として、画像上の人物の各々の時刻ｔの位置及び大きさを表すベクトル、並びに時刻ｔ－１について得られた、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きを表すベクトルから、学習済みモデル７０を用いて、時刻ｔにおける、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きを表すベクトルを推定する。 In step S116, the CPU 61, as the generation unit 26, generates a vector representing the position and size of each person on the image at time t, and a vector representing the movement of the robot 100 on the ground obtained at time t-1. , and a vector representing the movement of each person on the ground, using the learned model 70, a vector representing the movement of the robot 100 on the ground and a vector representing the movement of each person on the ground at time t. Estimate.

ステップＳ１１８では、ＣＰＵ６１が、生成部２６として、予め定められた反復終了条件を満たしたか否かを判定する。例えば、画像の時系列データの最後の時刻に到達したことを、反復終了条件として用いればよい。反復終了条件を満たした場合には、ＣＰＵ６１は、ステップＳ１２０へ移行する。一方、反復終了条件を満たしていない場合には、ＣＰＵ６１は、ステップＳ１１６へ戻り、次の時刻を時刻ｔとして、処理を繰り返す。 In step S118, the CPU 61, acting as the generation unit 26, determines whether a predetermined repetition end condition is satisfied. For example, reaching the final time of the time-series data of an image may be used as the repetition termination condition. If the repetition end condition is satisfied, the CPU 61 moves to step S120. On the other hand, if the repetition end condition is not satisfied, the CPU 61 returns to step S116, sets the next time to time t, and repeats the process.

ステップＳ１２０では、ＣＰＵ６１が、生成部２６として、各時刻について得られた、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きを表すベクトルから、各時刻についての、ロボット１００の地面上の位置、カメラ１０の観測方向、及び人物の各々の地面上の位置を表す俯瞰データを生成し、制御部２８に出力し、俯瞰データ生成処理を終了する。 In step S120, the CPU 61, as the generation unit 26, generates a vector for the robot 100 at each time from a vector representing the movement of the robot 100 on the ground and a vector representing the movement of each person on the ground. 100 on the ground, the observation direction of the camera 10, and the position of each person on the ground are generated and output to the control unit 28, and the bird's-eye view data generation process is completed.

制御部２８は、生成された俯瞰データを用いて、ロボット１００が目的地に移動するように、ロボット１００の移動方向及び速度を指定し、指定された移動方向及び速度で移動するように自律走行部６０を制御する。また、制御部２８は、俯瞰データを用いて、介入行動が必要と判断した場合には、「道を空けてください」等のメッセージを音声出力したり、警告音を鳴らすよう報知部５０を制御する。 Using the generated bird's-eye view data, the control unit 28 specifies the moving direction and speed of the robot 100 so that the robot 100 moves to the destination, and autonomously runs the robot 100 so that the robot 100 moves in the specified moving direction and speed. 60. In addition, if the control unit 28 determines that intervention is necessary using the bird's-eye view data, the control unit 28 controls the notification unit 50 to output a message such as "Please clear the road" or to sound a warning sound. do.

このように、本実施形態では、ロボット１００の地面上の動き、及び人物の各々の地面上の動きを推定する学習済みモデルを用いて、画像の時系列データから、ロボット１００を俯瞰した位置から観測した場合に得られる、ロボット１００の地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データを生成する。これにより、静的なランドマークが検出されない状況であっても、動的な環境においてカメラ１０を搭載したロボット１００からの視点で観測された画像から、ロボット１００の地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データを生成することができる。 In this way, in this embodiment, using a trained model that estimates the movement of the robot 100 on the ground and the movement of each person on the ground, the robot 100 is estimated from a bird's-eye view position based on time-series data of images. Overhead view data representing the movement trajectory of the robot 100 on the ground and the movement trajectory of each person on the ground, obtained when observed, is generated. As a result, even in a situation where static landmarks are not detected, the movement trajectory of the robot 100 on the ground and the person It is possible to generate bird's-eye view data representing the movement locus on the ground of each of the ground planes.

また、学習済みモデルを用いた計算で実現できるため、計算量が少なくなり、リアルタイムでの俯瞰データの生成が可能となる。 In addition, since it can be realized by calculation using a trained model, the amount of calculation is reduced and it is possible to generate bird's-eye view data in real time.

また、教師データとして、画像上の人物の各々の各時刻の位置及び大きさの時系列データを用いるため、実画像を用いる必要がない。これにより、教師データを作成する負担が軽減される。 Further, since time-series data of the position and size of each person on the image at each time is used as the training data, there is no need to use an actual image. This reduces the burden of creating teacher data.

［第２実施形態］
次に、第２実施形態に係る俯瞰データ生成装置について説明する。なお、第１実施形態と同様の構成となる部分については、同一符号を付して詳細な説明を省略する。 [Second embodiment]
Next, a bird's-eye view data generation device according to a second embodiment will be described. Note that parts having the same configuration as those in the first embodiment are given the same reference numerals and detailed explanations will be omitted.

第２実施形態では、ユーザが保持している情報処理端末が、俯瞰データ生成装置を備えている場合を例に説明する。 In the second embodiment, an example will be described in which an information processing terminal held by a user includes an overhead view data generation device.

図１０は、本発明の第２実施形態に係る情報処理端末２００の概略構成を示す図である。図１０に示すように、情報処理端末２００は、カメラ１０、俯瞰データ生成装置２２０、及び出力部２５０を備える。俯瞰データ生成装置２２０は、取得部２２、追跡部２４、生成部２６、及びモデル記憶部２７を備える。なお、ユーザが、観測移動体の一例であり、カメラ１０が、観測装置の一例である。 FIG. 10 is a diagram showing a schematic configuration of an information processing terminal 200 according to the second embodiment of the present invention. As shown in FIG. 10, the information processing terminal 200 includes a camera 10, an overhead data generation device 220, and an output unit 250. The bird's-eye view data generation device 220 includes an acquisition section 22 , a tracking section 24 , a generation section 26 , and a model storage section 27 . Note that the user is an example of an observation moving object, and the camera 10 is an example of an observation device.

情報処理端末２００は、ユーザにより直接保持されているか、あるいは、ユーザが保持する保持物体（例えば、スーツケース）に搭載されている。 The information processing terminal 200 is held directly by the user, or is mounted on a holding object (for example, a suitcase) held by the user.

カメラ１０は、ユーザの周囲を予め定めた間隔で撮影し、撮影した画像を俯瞰データ生成装置２２０の取得部２２に出力する。 The camera 10 photographs the surroundings of the user at predetermined intervals, and outputs the photographed images to the acquisition unit 22 of the bird's-eye view data generation device 220.

生成部２６は、ユーザの地面上の動き、及び人物の各々の地面上の動きを推定する学習済みモデルを用いて、画像の時系列データから取得した画像上の人物の各々の各時刻の位置及び大きさから、ユーザを俯瞰した位置から観測した場合に得られる、ユーザの地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データを生成し、出力部２５０へ出力する。 The generation unit 26 uses a trained model that estimates the movement of the user on the ground and the movement of each person on the ground to calculate the position of each person on the image at each time obtained from the time series data of the image. and the size, it generates bird's-eye view data representing the movement trajectory of the user on the ground and the movement trajectory of each person on the ground, obtained when observing the user from a position overlooking the user, and outputs it to the output unit 250. .

モデル記憶部２７には、上記第１実施形態と同様に学習装置１２０によって学習された、ユーザの地面上の動き、及び人物の各々の地面上の動きを推定する学習済みモデルが、記憶されている。 The model storage unit 27 stores learned models for estimating the movement of the user on the ground and the movements of each person on the ground, which are learned by the learning device 120 in the same manner as in the first embodiment. There is.

出力部２５０は、生成された俯瞰データをユーザに提示したり、インターネットを介してサーバ（図示省略）へ俯瞰データを送信する。 The output unit 250 presents the generated bird's-eye view data to the user and transmits the bird's-eye view data to a server (not shown) via the Internet.

また、俯瞰データ生成装置２２０は、図６に示すように、上記第１実施形態の俯瞰データ生成装置２０と同様のハードウェア構成を有する。 Further, as shown in FIG. 6, the bird's-eye view data generation device 220 has the same hardware configuration as the bird's-eye view data generation device 20 of the first embodiment.

なお、俯瞰データ生成装置２２０の他の構成及び作用については、第１実施形態と同様であるため、説明を省略する。 Note that the other configurations and operations of the bird's-eye view data generation device 220 are the same as those in the first embodiment, and therefore description thereof will be omitted.

また、上記図７に示すように、第２実施形態に係る学習装置１２０は、教師データ記憶部１２２、取得部１２４、学習部１２６、及びモデル記憶部１２８を備える。 Further, as shown in FIG. 7 above, the learning device 120 according to the second embodiment includes a teacher data storage section 122, an acquisition section 124, a learning section 126, and a model storage section 128.

教師データ記憶部１２２には、動的な環境においてユーザからの視点で観測された画像上の人物の各々の各時刻の位置及び大きさの時系列データと、ユーザの地面上の動き、及び人物の各々の地面上の動きの時系列データとの組み合わせが、教師データとして複数記憶されている。 The teacher data storage unit 122 stores time-series data of the position and size of each person on the image at each time observed from the user's viewpoint in a dynamic environment, the movement of the user on the ground, and the person. A plurality of combinations of each movement on the ground with time series data are stored as teacher data.

学習部１２６は、複数の教師データに基づいて、教師データの画像上の人物の各々の各時刻の位置及び大きさの時系列データを入力としたときに、学習済みモデル７０と同様の構成を有するモデルが、教師データのユーザの地面上の動き、及び人物の各々の地面上の動きの時系列データを出力するように、当該モデルのパラメータを学習する。 The learning unit 126 has the same configuration as the learned model 70 when inputting time series data of the position and size of each person on the image of the teacher data at each time based on a plurality of pieces of teacher data. The parameters of the model are learned so that the model outputs time-series data of the user's movement on the ground and the movement of each person on the ground in the training data.

なお、学習装置１２０の他の構成及び作用については、第１実施形態と同様であるため、説明を省略する。 Note that the other configurations and functions of the learning device 120 are the same as those in the first embodiment, so their explanations will be omitted.

このように、本実施形態では、情報処理端末２００を保持したユーザの地面上の動き、及び人物の各々の地面上の動きを推定する学習済みモデルを用いて、画像の時系列データから、ユーザを俯瞰した位置から観測した場合に得られる、ユーザの地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データを生成する。これにより、静的なランドマークが検出されない状況であっても、動的な環境においてカメラ１０を有する情報処理端末２００を保持したユーザからの視点で観測された画像から、ユーザの地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データを生成することができる。 In this way, in this embodiment, the user's movement is estimated from the time-series data of images using a trained model that estimates the movement of the user holding the information processing terminal 200 on the ground and the movement of each person on the ground. Bird's-eye view data representing the user's movement trajectory on the ground and the movement trajectory of each person on the ground, obtained when observed from a bird's-eye view position, is generated. As a result, even in a situation where static landmarks are not detected, the movement of the user on the ground can be determined based on the image observed from the viewpoint of the user holding the information processing terminal 200 having the camera 10 in a dynamic environment. It is possible to generate overhead view data representing the trajectory and the movement trajectory of each person on the ground.

本発明は、自動運転車両にも応用することができる。この場合、観測移動体は自動運転車両であり、観測装置は、カメラ、レーザーレーダー、ミリ波レーダーであり、移動体は他の車両、オートバイ、歩行者等である。 The present invention can also be applied to autonomous vehicles. In this case, the observation moving object is a self-driving vehicle, the observation device is a camera, a laser radar, a millimeter wave radar, and the moving object is another vehicle, a motorcycle, a pedestrian, etc.

［第３実施形態］
次に、第３実施形態に係る俯瞰データ生成装置について説明する。なお、第３実施形態に係る俯瞰データ生成装置は、第１実施形態と同様の構成であるため、同一符号を付して詳細な説明を省略する。 [Third embodiment]
Next, a bird's-eye view data generation device according to a third embodiment will be described. Note that the bird's-eye view data generation device according to the third embodiment has the same configuration as the first embodiment, so the same reference numerals are given and detailed explanation will be omitted.

第３実施形態では、ユーザの地面上の動きの分布、及び人物の各々の地面上の動きの分布を予測する点が、第１実施形態と異なっている。 The third embodiment differs from the first embodiment in that the distribution of the user's movement on the ground and the distribution of the movement of each person on the ground are predicted.

第３実施形態に係る俯瞰データ生成装置２０の生成部２６は、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの分布を予測する学習済みモデルを用いて、画像の時系列データから取得した画像上の人物の各々の各時刻の位置及び大きさから、ロボット１００を俯瞰した位置から観測した場合に得られる、ロボット１００の地面上の移動軌跡、及び人物の各々の地面上の位置分布を表す移動軌跡を表す俯瞰データの予測結果を生成する。 The generation unit 26 of the bird's-eye view data generation device 20 according to the third embodiment generates a time series of images using a trained model that predicts the distribution of the movement of the robot 100 on the ground and the movement of each person on the ground. From the position and size at each time of each person on the image acquired from the data, the movement trajectory of the robot 100 on the ground and the position of each person on the ground obtained when observing the robot 100 from a bird's-eye view position. A prediction result of bird's-eye view data representing a movement trajectory representing the position distribution of is generated.

具体的には、生成部２６は、画像上の人物の各々の各時刻の位置及び大きさを入力として、一時刻先のロボット１００の地面上の動き、及び人物の各々の地面上の動きの分布を予測する学習済みモデルを用いて、俯瞰データの予測結果を生成する。 Specifically, the generation unit 26 inputs the position and size of each person on the image at each time, and calculates the movement of the robot 100 on the ground one time ahead and the movement of each person on the ground. Generate prediction results for bird's-eye view data using a trained model that predicts distribution.

ここで、学習済みモデルは、人物の各々の対象時刻の位置及び大きさを入力とし、ベクトルを出力する第１エンコーダと、対象時刻について得られた、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの分布を入力とし、ベクトルを出力する第２エンコーダと、第１エンコーダによって出力されたベクトル、及び第２エンコーダによって出力されたベクトルを入力とし、対象時刻より一時刻先についてのロボット１００の地面上の動き、及び人物の各々の地面上の動きの分布を出力するデコーダとを含む。 Here, the trained model includes a first encoder that inputs the position and size of each target time of the person and outputs a vector, and a first encoder that outputs a vector, the movement of the robot 100 on the ground obtained at the target time, and the person's A second encoder takes as input the distribution of movement on each ground and outputs a vector, and takes as input the vector output by the first encoder and the vector output by the second encoder, and calculates the time one time ahead from the target time. The decoder outputs the movement of the robot 100 on the ground, and the distribution of the movement of each person on the ground.

より具体的には、学習済みモデル７０の第１エンコーダ７２は、ロボット１００が一人称視点で観測した各人物の位置及び大きさを入力とし、人物間のセルフアテンションをとり、得られたベクトルを出力する。 More specifically, the first encoder 72 of the trained model 70 inputs the position and size of each person observed by the robot 100 from a first-person viewpoint, takes self-attention between the people, and outputs the obtained vector. do.

第２エンコーダ７４は、対象時刻について得られた、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの分布を入力とし、ロボット１００の位置に対する各人物の相対位置の分布及び速度の分布をエンコーディングし、得られたベクトルを出力する。 The second encoder 74 inputs the distribution of the movement of the robot 100 on the ground and the movement of each person on the ground obtained at the target time, and the distribution of the relative position and velocity of each person with respect to the position of the robot 100. encodes the distribution of and outputs the obtained vector.

具体的には、時刻ｔについて得られた、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの分布から、ロボット１００の位置に対する人物の各々の地面上の動きの分布を表すベクトルを求め、このベクトルを、多層パーセプトロン７４０に入力して得られたベクトルを、第２エンコーダ７４の入力ベクトルとする。 Specifically, from the distribution of the movement of the robot 100 on the ground and the movement of each person on the ground obtained at time t, the distribution of the movement of each person on the ground with respect to the position of the robot 100 is expressed. A vector is determined, and this vector is input to the multilayer perceptron 740, and the obtained vector is used as the input vector to the second encoder 74.

デコーダ７６は、第１エンコーダ７２の出力ベクトルと第２エンコーダ７４の出力ベクトルとの間で、クロスアテンションをとり、クロスアテンションの結果から得られたベクトルを出力する。このベクトルは、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの分布をマルチヘッドで予測した結果を表している。 The decoder 76 performs cross-attention between the output vector of the first encoder 72 and the output vector of the second encoder 74, and outputs a vector obtained as a result of the cross-attention. This vector represents the result of multi-head prediction of the movement of the robot 100 on the ground and the distribution of the movement of each person on the ground.

ここで、動きの分布を表すベクトルは、例えば、対象時刻に対する相対位置のガウス分布（平均及び分散）、並びに相対速度のガウス分布（平均及び分散）を表すベクトルである。なお、動きの分布を表すベクトルは、対象時刻に対する相対位置のガウス分布（平均及び分散）を表すベクトル、又は対象時刻に対する相対速度のガウス分布（平均及び分散）を表すベクトルであってもよい。 Here, the vector representing the distribution of motion is, for example, a vector representing a Gaussian distribution (average and variance) of relative position with respect to the target time and a Gaussian distribution (average and variance) of relative velocity. Note that the vector representing the distribution of motion may be a vector representing a Gaussian distribution (average and variance) of relative positions to the target time, or a vector representing a Gaussian distribution (average and variance) of relative velocities to the target time.

本実施形態では、生成部２６は、画像上の人物の各々の時刻ｔの位置及び大きさを表すベクトル、並びに時刻ｔについて得られた、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きの分布を表すベクトルから、学習済みモデル７０を用いて、時刻ｔ＋１における、ロボット１００の地面上の動きを表すベクトル、及び人物の各々の地面上の動きの分布を表すベクトルを求めることを、各時刻ｔについて繰り返すことにより、俯瞰データの予測結果を生成する。 In this embodiment, the generation unit 26 generates a vector representing the position and size of each person on the image at time t, a vector representing the movement of the robot 100 on the ground obtained at time t, and a vector representing the movement of the robot 100 on the ground obtained at time t. Using the learned model 70, a vector representing the movement of the robot 100 on the ground and a vector representing the distribution of the movement of each person on the ground at time t+1 are obtained from vectors representing the distribution of movement on the ground. By repeating the calculation for each time t, a prediction result of the bird's-eye view data is generated.

生成部２６は、例えば、図１１Ａに示すような俯瞰データの予測結果を生成する。図１１Ａは、相対位置から求まる位置を示す黒丸をつないだ線でロボット１００の地面上の移動軌跡を示している。また、図１１Ａは、相対位置の平均から求まる平均位置を示す×印をつないだ線で人物の各々の地面上の移動軌跡を示し、×印の周りの楕円で、相対位置の分布から求まる位置の分布を示す例を示している。分布を示す楕円は、円であってもよいし、等高線、高さの分布を示す色分けをして表示してもよい。また、ロボット１００の位置は、ロボット１００の制御や位置を特定するセンサの誤差を含むため、その不確定性の分布を含み計算し、分布と共に表示してもよい
また、図１１Ｂに示すような、次の時刻の人物の各々の地面上の位置の分布を表す俯瞰データを生成してもよい。図１１Ｂでは、縦軸、横軸は距離を表し、ロボット（逆三角）、人の位置の分布を含む俯瞰図の例を示している。等高線の楕円は、不確かさの分布を伴う人の位置を示し、点線はロボット１００のカメラの視界を示している。図１１Ｂの例は、ロボット１００内の情報を表した図のため、ロボットの位置は固定（不確かさの分布はない）され、人のみが不確かさの分布を持つ。 The generation unit 26 generates a prediction result of bird's-eye view data as shown in FIG. 11A, for example. FIG. 11A shows the movement trajectory of the robot 100 on the ground by a line connecting black circles indicating positions determined from relative positions. In addition, in FIG. 11A, the movement locus of each person on the ground is shown by a line connecting x marks indicating the average position found from the average relative position, and the ellipse around the x mark indicates the position found from the distribution of relative positions. An example showing the distribution of is shown. The ellipse indicating the distribution may be a circle, or may be displayed using contour lines or color-coded lines indicating the height distribution. Furthermore, since the position of the robot 100 includes errors in the control of the robot 100 and sensors that specify the position, it may be calculated including the uncertainty distribution and displayed together with the distribution. , overhead view data representing the distribution of the positions of each person on the ground at the next time may be generated. In FIG. 11B, the vertical and horizontal axes represent distance, and an example of an overhead view including the distribution of the positions of robots (inverted triangles) and people is shown. The contour ellipse shows the position of the person with the distribution of uncertainty, and the dotted line shows the field of view of the camera of the robot 100. The example in FIG. 11B is a diagram showing information within the robot 100, so the position of the robot is fixed (there is no uncertainty distribution), and only the person has an uncertainty distribution.

制御部２８は、俯瞰データを用いて、ロボット１００が人物と衝突せず、かつ、ロボット１００が目的地に移動するように自律走行部６０を制御する。例えば、制御部２８は、ロボット１００の移動方向及び速度を指定し、指定された移動方向及び速度で移動するように自律走行部６０を制御する。このとき、上記図１１の俯瞰データの楕円の範囲を回避するように、ロボット１００の移動方向及び速度を指定することにより、ロボット１００と人物との衝突をより回避することができる。 The control unit 28 uses the bird's-eye view data to control the autonomous traveling unit 60 so that the robot 100 does not collide with a person and the robot 100 moves to the destination. For example, the control unit 28 specifies the moving direction and speed of the robot 100, and controls the autonomous traveling unit 60 to move in the specified moving direction and speed. At this time, by specifying the moving direction and speed of the robot 100 so as to avoid the elliptical range of the bird's-eye view data in FIG. 11, a collision between the robot 100 and the person can be further avoided.

第３実施形態に係る学習装置１２０の教師データ記憶部１２２には、動的な環境においてロボット１００からの視点で観測された画像上の人物の各々の各時刻の位置及び大きさの時系列データと、ロボット１００の地面上の動き、及び人物の各々の地面上の動きの時系列データとの組み合わせが、教師データとして複数記憶されている。ここで、教師データでは、動的な環境においてユーザからの視点で観測された画像上の人物の各々の当該時刻の位置及び大きさと、ユーザの次時刻の地面上の動き、及び人物の各々の次時刻の地面上の動きとが対応付けられている。 The teacher data storage unit 122 of the learning device 120 according to the third embodiment stores time-series data of the position and size of each person on the image at each time observed from the viewpoint of the robot 100 in a dynamic environment. A plurality of combinations of the time-series data of the movement of the robot 100 on the ground, and the movement of each person on the ground are stored as teacher data. Here, the training data includes the position and size at the relevant time of each person on the image observed from the user's viewpoint in a dynamic environment, the user's movement on the ground at the next time, and the position and size of each person on the image at the next time. It is associated with the movement on the ground at the next time.

学習部１２６は、複数の教師データに基づいて、教師データの画像上の人物の各々の各時刻の位置及び大きさの時系列データを入力としたときに、学習済みモデル７０と同様の構成を有するモデルが、教師データのロボット１００の地面上の動きに対応する動きの時系列データ、及び教師データの人物の各々の地面上の動きに対応する動きの分布の時系列データを出力するように、当該モデルのパラメータを学習する。 The learning unit 126 has the same configuration as the learned model 70 when inputting time series data of the position and size of each person on the image of the teacher data at each time based on a plurality of pieces of teacher data. The model outputs time-series data of movement corresponding to the movement of the robot 100 on the ground in the teacher data, and time-series data of movement distribution corresponding to the movement of each person in the teacher data on the ground. , learn the parameters of the model.

なお、第３実施形態に係る俯瞰データ生成装置２０及び学習装置１２０の他の構成及び作用については、第１実施形態と同様であるため、説明を省略する。 Note that the other configurations and operations of the bird's-eye view data generation device 20 and the learning device 120 according to the third embodiment are the same as those in the first embodiment, and therefore description thereof will be omitted.

このように、本実施形態によれば、ロボット１００の次時刻の地面上の動き、及び人物の各々の次時刻の地面上の動きを予測する学習済みモデルを用いて、画像の時系列データから、ロボット１００を俯瞰した位置から観測した場合に得られる、ロボット１００の地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する。これにより、静的なランドマークが検出されない状況であっても、動的な環境においてカメラ１０を搭載したロボット１００からの視点で観測された画像から、ロボット１００の地面上の移動軌跡、及び人物の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成することができる。 As described above, according to the present embodiment, the learned model that predicts the movement of the robot 100 on the ground at the next time and the movement of each person on the ground at the next time is used to predict the movement of the robot 100 on the ground at the next time. , a prediction result of bird's-eye view data representing the movement trajectory of the robot 100 on the ground and the movement trajectory of each person on the ground obtained when the robot 100 is observed from a bird's-eye view position is generated. As a result, even in a situation where static landmarks are not detected, the moving trajectory of the robot 100 on the ground and the person It is possible to generate a prediction result of bird's-eye view data representing the movement trajectory on the ground of each of the ground planes.

［実施例］
上記第１実施形態の俯瞰データ生成装置２０により、画像の時系列データから、俯瞰データを生成した例について説明する。 [Example]
An example in which bird's-eye view data is generated from time-series data of images by the bird's-eye view data generation device 20 of the first embodiment will be described.

比較例として、各時刻についての、ロボットからの人物の相対位置と、動きモデルとを用いて表される事後分布であって、一時刻前のロボット及び人物の各々の地面上の位置、並びに現時刻における画像上の人物の各々の位置及び大きさが与えられた下での、ロボット及び人物の各々の地面上の位置の事後分布を最大化するように、俯瞰データを生成する方法を用いた。 As a comparative example, the posterior distribution is expressed using the relative position of the person from the robot and the movement model at each time, and the position of the robot and the person on the ground one time ago, as well as the current position. A method of generating bird's-eye view data is used to maximize the posterior distribution of the positions of the robot and the person on the ground, given the position and size of each person on the image at the time. .

「Ｈｏｔｅｌ」、「ＥＴＨ」、「Ｓｔｕｄｅｎｔｓ」という異なるシーンのデータベースに対して、計算量を測定した。また、比較例では、ＣＰＵを用い、実施例では、ＣＰＵ、ＧＰＵを用いた場合について計算量を測定した。計算量を測定した結果を、表１に示す。 The amount of calculation was measured for databases of different scenes: "Hotel", "ETH", and "Students". Further, in the comparative example, a CPU was used, and in the example, the amount of calculation was measured using a CPU and a GPU. Table 1 shows the results of measuring the amount of calculation.

表１に示すように、比較例（ＧｅｏＶＢ）に比べて、実施例（ＶｉｅｗＢｉｒｄｉｆｏｒｍｅｒ）の方が、計算量が少なくなることが分かった。また、デバイスとしてＧＰＵを用いると、更に計算量が少なくなることが分かった。 As shown in Table 1, it was found that the amount of calculation was smaller in the example (ViewBirdiformer) than in the comparative example (GeoVB). Furthermore, it was found that the amount of calculation can be further reduced by using a GPU as the device.

［変形例］
なお、上記の実施形態では、ロボット１００や情報処理端末２００が俯瞰データ生成装置２０、２２０を備えた場合について説明したが、俯瞰データ生成装置２０、２２０の機能を外部サーバに設けてもよい。この場合、ロボット１００や情報処理端末２００は、カメラ１０で撮影した画像の時系列データを外部サーバに送信する。外部サーバは、送信された画像の時系列データから、俯瞰データを生成し、ロボット１００や情報処理端末２００に送信する。 [Modified example]
In addition, although the above-mentioned embodiment explained the case where the robot 100 and the information processing terminal 200 were equipped with the bird's-eye view data generation devices 20 and 220, the functions of the bird's-eye view data generation devices 20 and 220 may be provided in an external server. In this case, the robot 100 and the information processing terminal 200 transmit time-series data of images captured by the camera 10 to an external server. The external server generates bird's-eye view data from the time series data of the transmitted images and transmits it to the robot 100 and the information processing terminal 200.

また、生成部２６は、カメラ１０により撮影される画像から静的なランドマークが検出される条件下では、画像が表す静的なランドマークを用いて俯瞰データを生成してもよい。例えば、上記の非特許文献２に記載の技術を用いてもよい。この場合、カメラ１０により撮影される画像から静的なランドマークが検出される条件下では、画像が表す静的なランドマークを用いて俯瞰データを生成するようにし、カメラ１０により撮影される画像から静的なランドマークが検出されない条件下（例えば、混雑した環境）では、上記の実施形態で説明した手法により、俯瞰データを生成してもよい。また、画像が表す静的なランドマークを用いて生成された俯瞰データと、上記の実施形態で説明した手法により生成された俯瞰データとを統合するようにしてもよい。 Further, under conditions where a static landmark is detected from the image captured by the camera 10, the generation unit 26 may generate the bird's-eye view data using the static landmark represented by the image. For example, the technique described in Non-Patent Document 2 mentioned above may be used. In this case, under the condition that a static landmark is detected from the image photographed by the camera 10, the static landmark represented by the image is used to generate overhead view data, and the image photographed by the camera 10 is Under conditions where static landmarks are not detected (for example, in a crowded environment), overhead view data may be generated using the method described in the above embodiment. Further, the bird's-eye view data generated using the static landmark represented by the image and the bird's-eye view data generated by the method described in the above embodiment may be integrated.

また、追跡部２４は、画像上の人物の各々について、当該人物を表すバウンディングボックスを検出して追跡し、画像上の人物の中心位置（バウンディングボックスの中心位置）及び高さ（バウンディングボックスの高さ）を時刻毎に取得する場合を例に説明したが、これに限定されるものではない。例えば、追跡部２４は、画像上の人物の各々について、当該人物を表す人物骨格を検出して追跡し、画像上の人物の中心位置（人物骨格の中心位置）及び高さ（人物骨格の高さ）を時刻毎に取得するようにしてもよい。また、図１２に示すように、追跡部２４は、画像上の人物の各々について、当該人物を表す高さを示す線を検出して追跡し、画像上の人物の中心位置（線の中心位置）及び高さ（線の高さ）を時刻毎に取得するようにしてもよい。 The tracking unit 24 also detects and tracks a bounding box representing each person on the image, and determines the center position (center position of the bounding box) and height (height of the bounding box) of the person on the image. Although the explanation has been given using an example where the data is acquired at each time, the present invention is not limited to this. For example, the tracking unit 24 detects and tracks a human skeleton representing each person on the image, and determines the center position (center position of the human skeleton) and height (height of the human skeleton) of the person on the image. ) may be acquired at each time. Further, as shown in FIG. 12, the tracking unit 24 detects and tracks a line indicating the height of each person on the image, and detects and tracks the center position of the person on the image (the center position of the line). ) and height (line height) may be acquired at each time.

また、２次元観測情報が、画像である場合を例に説明したが、これに限定されるものではない。例えば、観測装置がイベントカメラであれば、各画素について、動きに応じた画素値を有するデータを、２次元観測情報として用いてもよい。 Further, although the two-dimensional observation information has been described using an example of an image, the present invention is not limited to this. For example, if the observation device is an event camera, data having a pixel value corresponding to the movement of each pixel may be used as the two-dimensional observation information.

また、俯瞰データが表す移動体が人物である場合を例に説明したが、これに限定されるものではない。例えば、俯瞰データが表す移動体が、自転車、車両などのパーソナルモビリティであってもよい。 Further, although the case where the moving object represented by the bird's-eye view data is a person has been described as an example, the present invention is not limited to this. For example, the moving object represented by the bird's-eye view data may be a personal mobility object such as a bicycle or a vehicle.

また、上記第１実施形態において、上記第３実施形態と同様に、ロボットの地面上の動き、及び人物の各々の地面上の動きの分布を推定する学習済みモデルを用いて、２次元観測情報の時系列データから、ロボットを俯瞰した位置から観測した場合に得られる、ロボットの地面上の移動軌跡、及び人物の各々の地面上の位置分布を表す移動軌跡を表す俯瞰データを生成するようにしてもよい。 Furthermore, in the first embodiment described above, similarly to the third embodiment described above, two-dimensional observation information is obtained using a trained model that estimates the distribution of the movement of the robot on the ground and the movement of each person on the ground. From the time series data, overhead view data representing the movement trajectory of the robot on the ground and the movement trajectory representing the position distribution of each person on the ground obtained when the robot is observed from a bird's-eye view position is generated. You can.

また、上記各実施形態でＣＰＵがソフトウェア（プログラム）を読み込んで実行し俯瞰データ生成処理及び学習処理を、ＣＰＵ以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、ＦＰＧＡ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の製造後に回路構成を変更可能なＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、及びＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、俯瞰データ生成処理及び学習処理を、これらの各種のプロセッサのうちの１つで実行してもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡ、及びＣＰＵとＦＰＧＡとの組み合わせ等）で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子等の回路素子を組み合わせた電気回路である。 Further, in each of the above embodiments, the CPU reads and executes the software (program), and the bird's-eye view data generation process and the learning process may be executed by various processors other than the CPU. In this case, the processor includes a PLD (Programmable Logic Device) whose circuit configuration can be changed after manufacturing, such as an FPGA (Field-Programmable Gate Array), and an ASIC (Application Specific Integrated Cipher). rcuit) to execute specific processing such as An example is a dedicated electric circuit that is a processor having a specially designed circuit configuration. Additionally, the bird's-eye view data generation process and the learning process may be executed by one of these various processors, or by a combination of two or more processors of the same type or different types (for example, multiple FPGAs and CPUs). It may also be executed in combination with FPGA, etc.). Further, the hardware structure of these various processors is, more specifically, an electric circuit that is a combination of circuit elements such as semiconductor elements.

また、上記各実施形態では、俯瞰データ生成プログラム及び学習プログラムがストレージ６４に予め記憶されている態様を説明したが、これに限定されない。プログラムは、ＣＤ－ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＤＶＤ－ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、及びＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリ等の記録媒体に記録された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 Further, in each of the embodiments described above, a mode has been described in which the bird's-eye view data generation program and the learning program are stored in advance in the storage 64, but the present invention is not limited thereto. The program can be stored on recording media such as CD-ROM (Compact Disc Read Only Memory), DVD-ROM (Digital Versatile Disc Read Only Memory), and USB (Universal Serial Bus) memory. It may also be provided in recorded form. Further, the program may be downloaded from an external device via a network.

以上の実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional notes are further disclosed.

［付記１］
動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成部と、
を含む俯瞰データ生成装置。 [Additional note 1]
an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation unit that generates bird's-eye view data representing the
A bird's-eye view data generation device including:

［付記２］
前記生成部は、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きの分布を推定する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の位置分布を表す移動軌跡を表す俯瞰データを生成する付記１記載の俯瞰データ生成装置。 [Additional note 2]
The generation unit is
Using a trained model that estimates the movement of the observed moving object on the ground and the distribution of the movement of each of the moving objects on the ground,
A movement trajectory of the observed moving object on the ground, obtained when the observed moving object is observed from a bird's-eye view position, and a position distribution of each of the moving objects on the ground, from the time series data of the two-dimensional observation information. The bird's-eye view data generation device according to supplementary note 1, which generates bird's-eye view data representing a movement trajectory.

［付記３］
前記２次元観測情報の時系列データから、前記移動体の各々を追跡し、前記２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさを取得する追跡部を更に含み、
前記生成部は、前記２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさを入力として、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する前記学習済みモデルを用いて、前記俯瞰データを生成する付記１又は２記載の俯瞰データ生成装置。 [Additional note 3]
further comprising a tracking unit that tracks each of the moving objects from the time series data of the two-dimensional observation information and obtains the position and size of each of the moving objects at each time on the two-dimensional observation information,
The generation unit inputs the position and size at each time of each of the moving objects on the two-dimensional observation information, and generates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground. The bird's-eye view data generation device according to supplementary note 1 or 2, which generates the bird's-eye view data using the learned model that estimates the bird's-eye view data.

［付記４］
前記学習済みモデルは、
前記移動体の各々の対象時刻の位置及び大きさを入力とし、ベクトルを出力する第１エンコーダと、
一時刻前について得られた、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを入力とし、ベクトルを出力する第２エンコーダと、
前記第１エンコーダによって出力された前記ベクトル、及び前記第２エンコーダによって出力された前記ベクトルを入力とし、前記対象時刻についての前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを出力するデコーダとを含む付記３記載の俯瞰データ生成装置。 [Additional note 4]
The trained model is
a first encoder that receives as input the position and size of each target time of the moving body and outputs a vector;
a second encoder that receives as input the movement of the observation moving object on the ground and the movement of each of the moving objects on the ground obtained one time ago, and outputs a vector;
The vector output by the first encoder and the vector output by the second encoder are input, and the movement of the observation moving object on the ground at the target time and the movement of each of the moving objects on the ground are calculated. 3.

［付記５］
動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさの時系列データと、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きの時系列データとの組み合わせを教師データとして取得する取得部と、
前記教師データに基づいて、前記２次元観測情報上の前記移動体の各々の各時刻の位置及び大きさを入力として、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定するモデルを学習する学習部と、
を含む学習装置。 [Additional note 5]
Time-series data of the position and size of each of the moving objects at each time on two-dimensional observation information representing at least one moving object observed from the viewpoint of an observation moving object equipped with an observation device in a dynamic environment. and an acquisition unit that acquires a combination of the movement of the observed moving object on the ground and time-series data of the movement of each of the moving objects on the ground as training data;
Based on the teacher data, the position and size at each time of each of the moving objects on the two-dimensional observation information are input, and the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground are calculated. a learning unit that learns a model for estimating the movement of the
learning devices including;

［付記６］
前記モデルは、
前記移動体の各々の対象時刻の位置及び大きさを入力とし、ベクトルを出力する第１エンコーダと、
一時刻前について得られた、前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを入力とし、ベクトルを出力する第２エンコーダと、
前記第１エンコーダによって出力された前記ベクトル、及び前記第２エンコーダによって出力された前記ベクトルを入力とし、対象時刻についての前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを出力するデコーダとを含む付記５記載の学習装置。 [Additional note 6]
The model is
a first encoder that receives as input the position and size of each target time of the moving body and outputs a vector;
a second encoder that receives as input the movement of the observation moving object on the ground and the movement of each of the moving objects on the ground obtained one time ago, and outputs a vector;
The vector output by the first encoder and the vector output by the second encoder are input, and the movement of the observation moving object on the ground at the target time and the movement of each of the moving objects on the ground are calculated. The learning device according to supplementary note 5, including a decoder that outputs motion.

［付記７］
動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成部と、
を含む俯瞰データ生成装置。 [Additional note 7]
an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation unit that generates a prediction result of bird's-eye view data representing;
A bird's-eye view data generation device including:

［付記８］
前記生成部は、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きの分布を予測する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の位置分布を表す移動軌跡を表す俯瞰データの予測結果を生成する付記７記載の俯瞰データ生成装置。 [Additional note 8]
The generation unit is
Using a trained model that predicts the movement of the observed moving object on the ground and the distribution of the movement of each of the moving objects on the ground,
A movement trajectory of the observed moving object on the ground, obtained when the observed moving object is observed from a bird's-eye view position, and a position distribution of each of the moving objects on the ground, from the time series data of the two-dimensional observation information. 8. The bird's-eye view data generation device according to supplementary note 7, which generates a prediction result of bird's-eye view data representing a movement trajectory.

［付記９］
コンピュータに、
動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成工程と、
を含む処理を実行させるための俯瞰データ生成プログラム。 [Additional note 9]
to the computer,
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating bird's-eye view data representing the
A bird's-eye view data generation program for executing processing including.

［付記１０］
コンピュータが、
動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成工程と、
を含む処理を実行する俯瞰データ生成方法。 [Additional note 10]
The computer is
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating bird's-eye view data representing the
A bird's-eye view data generation method that performs processing including.

［付記１１］
動的な環境において観測装置を搭載したロボットからの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、
前記ロボットの地面上の動き、及び前記移動体の各々の地面上の動きを推定する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記ロボットを俯瞰した位置から観測した場合に得られる、前記ロボットの地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データを生成する生成部と、
前記ロボットを自律走行させる自律走行部と、
前記俯瞰データを用いて、前記ロボットが目的地に移動するように前記自律走行部を制御する制御部と、
を含むロボット。 [Additional note 11]
an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from the viewpoint of a robot equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the robot on the ground and the movement of each of the moving objects on the ground,
Bird's-eye view data representing the movement trajectory of the robot on the ground and the movement trajectory of each of the mobile objects on the ground, obtained when the robot is observed from a bird's-eye view position from the time series data of the two-dimensional observation information. a generation unit that generates
an autonomous traveling unit that causes the robot to autonomously travel;
a control unit that uses the bird's-eye view data to control the autonomous traveling unit so that the robot moves to a destination;
including robots.

［付記１２］
コンピュータに、
動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成工程と、
を含む処理を実行させるための俯瞰データ生成プログラム。 [Additional note 12]
to the computer,
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating a prediction result of bird's-eye view data representing the
A bird's-eye view data generation program for executing processing including.

［付記１３］
コンピュータが、
動的な環境において観測装置を搭載した観測移動体からの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得工程と、
前記観測移動体の地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記観測移動体を俯瞰した位置から観測した場合に得られる、前記観測移動体の地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成工程と、
を含む処理を実行する俯瞰データ生成方法。 [Additional note 13]
The computer is
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating a prediction result of bird's-eye view data representing the
A bird's-eye view data generation method that performs processing including.

［付記１４］
動的な環境において観測装置を搭載したロボットからの視点で観測された少なくとも１つの移動体を表す２次元観測情報の時系列データを取得する取得部と、
前記ロボットの地面上の動き、及び前記移動体の各々の地面上の動きを予測する学習済みモデルを用いて、
前記２次元観測情報の時系列データから、前記ロボットを俯瞰した位置から観測した場合に得られる、前記ロボットの地面上の移動軌跡、及び前記移動体の各々の地面上の移動軌跡を表す俯瞰データの予測結果を生成する生成部と、
前記ロボットを自律走行させる自律走行部と、
前記俯瞰データの予測結果を用いて、前記ロボットが目的地に移動するように前記自律走行部を制御する制御部と、
を含むロボット。 [Additional note 14]
an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from the viewpoint of a robot equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the robot on the ground and the movement of each of the moving objects on the ground,
Bird's-eye view data representing the movement trajectory of the robot on the ground and the movement trajectory of each of the mobile objects on the ground, obtained when the robot is observed from a bird's-eye view position from the time series data of the two-dimensional observation information. a generation unit that generates a prediction result of
an autonomous traveling unit that causes the robot to autonomously travel;
a control unit that controls the autonomous traveling unit so that the robot moves to a destination using a prediction result of the bird's-eye view data;
including robots.

１０カメラ
２０俯瞰データ生成装置
２２取得部
２４追跡部
２６生成部
２８制御部
５０報知部
６０自律走行部
７０学習済みモデル
７２第１エンコーダ
７４第２エンコーダ
７６デコーダ
１００ロボット
１２０学習装置
１２２教師データ記憶部
１２４取得部
１２６学習部
２００情報処理端末
２２０俯瞰データ生成装置 10 Camera 20 Overhead data generation device 22 Acquisition unit 24 Tracking unit 26 Generation unit 28 Control unit 50 Notification unit 60 Autonomous traveling unit 70 Learned model 72 First encoder 74 Second encoder 76 Decoder 100 Robot 120 Learning device 122 Teacher data storage unit 124 Acquisition unit 126 Learning unit 200 Information processing terminal 220 Overhead view data generation device

Claims

an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation unit that generates bird's-eye view data representing the
A bird's-eye view data generation device including:

The generation unit is
Using a trained model that estimates the movement of the observed moving object on the ground and the distribution of the movement of each of the moving objects on the ground,
A movement trajectory of the observed moving object on the ground, obtained when the observed moving object is observed from a bird's-eye view position, and a position distribution of each of the moving objects on the ground, from the time series data of the two-dimensional observation information. The bird's-eye view data generation device according to claim 1, wherein the bird's-eye view data generating device generates bird's-eye view data representing a movement trajectory.

further comprising a tracking unit that tracks each of the moving objects from the time series data of the two-dimensional observation information and obtains the position and size of each of the moving objects at each time on the two-dimensional observation information,
The generation unit inputs the position and size at each time of each of the moving objects on the two-dimensional observation information, and generates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground. The bird's-eye view data generation device according to claim 1, wherein the bird's-eye view data is generated using the learned model that estimates the bird's-eye view data.

The trained model is
a first encoder that receives as input the position and size of each target time of the moving body and outputs a vector;
a second encoder that receives as input the movement of the observation moving object on the ground and the movement of each of the moving objects on the ground obtained one time ago, and outputs a vector;
The vector output by the first encoder and the vector output by the second encoder are input, and the movement of the observation moving object on the ground at the target time and the movement of each of the moving objects on the ground are calculated. 4. The bird's-eye view data generation device according to claim 3, further comprising a decoder that outputs a motion of the object.

Time-series data of the position and size of each of the moving objects at each time on two-dimensional observation information representing at least one moving object observed from the viewpoint of an observation moving object equipped with an observation device in a dynamic environment. and an acquisition unit that acquires a combination of the movement of the observed moving object on the ground and time-series data of the movement of each of the moving objects on the ground as training data;
Based on the teacher data, the position and size at each time of each of the moving objects on the two-dimensional observation information are input, and the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground are calculated. a learning unit that learns a model for estimating the movement of the
learning devices including;

The model is
a first encoder that receives as input the position and size of each target time of the moving body and outputs a vector;
a second encoder that receives as input the movement of the observation moving object on the ground and the movement of each of the moving objects on the ground obtained one time ago, and outputs a vector;
The vector output by the first encoder and the vector output by the second encoder are input, and the movement of the observation moving object on the ground at the target time and the movement of each of the moving objects on the ground are calculated. The learning device according to claim 5, further comprising a decoder that outputs motion.

an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation unit that generates a prediction result of bird's-eye view data representing;
A bird's-eye view data generation device including:

The generation unit is
Using a trained model that predicts the movement of the observed moving object on the ground and the distribution of the movement of each of the moving objects on the ground,
A movement trajectory of the observed moving object on the ground, obtained when the observed moving object is observed from a bird's-eye view position, and a position distribution of each of the moving objects on the ground, from the time series data of the two-dimensional observation information. 8. The bird's-eye view data generation device according to claim 7, which generates a prediction result of bird's-eye view data representing a movement trajectory.

to the computer,
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating bird's-eye view data representing the
A bird's-eye view data generation program for executing processing including.

The computer is
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating bird's-eye view data representing the
A bird's-eye view data generation method that performs processing including.

an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from the viewpoint of a robot equipped with an observation device in a dynamic environment;
Using a trained model that estimates the movement of the robot on the ground and the movement of each of the moving objects on the ground,
Bird's-eye view data representing the movement trajectory of the robot on the ground and the movement trajectory of each of the mobile objects on the ground, obtained when the robot is observed from a bird's-eye view position from the time series data of the two-dimensional observation information. a generation unit that generates
an autonomous traveling unit that causes the robot to autonomously travel;
a control unit that uses the bird's-eye view data to control the autonomous traveling unit so that the robot moves to a destination;
including robots.

to the computer,
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating a prediction result of bird's-eye view data representing the
A bird's-eye view data generation program for executing processing including.

The computer is
an acquisition step of acquiring time-series data of two-dimensional observation information representing at least one moving object observed from a viewpoint of an observation moving object equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the observed moving object on the ground and the movement of each of the moving objects on the ground,
A movement trajectory on the ground of the observed moving object, obtained when the observed moving object is observed from a bird's-eye view position, and a moving trajectory on the ground of each of the moving objects, from the time series data of the two-dimensional observation information. a generation step of generating a prediction result of bird's-eye view data representing the
A bird's-eye view data generation method that performs processing including.

an acquisition unit that acquires time-series data of two-dimensional observation information representing at least one moving object observed from the viewpoint of a robot equipped with an observation device in a dynamic environment;
Using a trained model that predicts the movement of the robot on the ground and the movement of each of the moving objects on the ground,
Bird's-eye view data representing the movement trajectory of the robot on the ground and the movement trajectory of each of the mobile objects on the ground, obtained when the robot is observed from a bird's-eye view position from the time series data of the two-dimensional observation information. a generation unit that generates a prediction result of
an autonomous traveling unit that causes the robot to autonomously travel;
a control unit that controls the autonomous traveling unit so that the robot moves to a destination using a prediction result of the bird's-eye view data;
including robots.