JP7259454B2

JP7259454B2 - Mobile position estimation system and mobile position estimation method

Info

Publication number: JP7259454B2
Application number: JP2019055699A
Authority: JP
Inventors: 麻子北浦
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-03-22
Filing date: 2019-03-22
Publication date: 2023-04-18
Anticipated expiration: 2039-03-22
Also published as: JP2020153956A

Description

本発明は、移動体位置推定システムおよび移動体位置推定方法に関する。 The present invention relates to a mobile body position estimation system and a mobile body position estimation method.

従来、移動体が移動中に取得した周辺状況に関するデータを入力として、移動体の走行経路と周辺環境地図を同時に作成するＳＬＡＭ（ＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ）という技術がある。また、このＳＬＡＭ技術の中でも、移動体が撮影した映像を入力として、移動体の走行時におけるカメラ位置・姿勢を推定する技術は、Ｖｉｓｕａｌ－ＳＬＡＭ（以後、「Ｖ－ＳＬＡＭ」という）と呼ばれる。 Conventionally, there is a technology called SLAM (Simultaneous Localization and Mapping) that simultaneously creates a travel route of a mobile object and a map of the surrounding environment by inputting data about the surrounding situation acquired while the mobile object is moving. Among the SLAM techniques, a technique for estimating the camera position/orientation of a mobile body while it is running based on an image captured by the mobile body is called Visual-SLAM (hereinafter referred to as "V-SLAM").

関連する先行技術としては、時系列的に変化する撮影画像から特徴点の抽出と追跡をおこない、この特徴点を用いて、移動体の走行時におけるカメラ位置・姿勢を推定する技術がある。 As a related prior art, there is a technique of extracting and tracking feature points from captured images that change in time series, and using these feature points to estimate the camera position/orientation of a moving object during travel.

特開２０１７－５３７９５号公報JP 2017-53795 A 特開２００７－３２２４０３号公報Japanese Patent Application Laid-Open No. 2007-322403

しかしながら、従来技術では、生成する環境マップ（特徴点群の３次元位置）の精度が悪いという問題点がある。このため、その環境マップを利用した撮影位置姿勢推定も推定精度が悪化したり、環境マップと実際の撮影映像の齟齬が大きすぎて、最適化補正をすることができず、撮影位置姿勢推定自体が失敗してしまい、推定できない区間が発生するという問題点がある。 However, the conventional technique has a problem that the generated environment map (three-dimensional position of the feature point group) is inaccurate. For this reason, the estimation accuracy of shooting position and orientation using the environment map deteriorates, and the discrepancy between the environment map and the actual shot image is too large to optimize correction, and the shooting position and orientation estimation itself. fails, and there is a problem that an interval that cannot be estimated occurs.

一つの側面では、本発明は、高精度な位置姿勢推定をおこなうことを目的とする。 In one aspect, an object of the present invention is to perform highly accurate position and orientation estimation.

一つの実施態様では、撮影された時系列画像のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理をおこなう移動体位置推定システムであって、前記時系列画像を時刻に対して逆行させた逆行画像を取得し、当該逆行画像を用いて前記情報処理をおこなう、情報処理装置を有することを特徴とする移動体位置推定システムが提供される。 In one embodiment, for an arbitrary image out of the photographed time-series images, from the characteristics of the arbitrary image, a moving body position for performing information processing for estimating the photographing position/orientation of the arbitrary image and generating an environment map. An estimation system, comprising: an information processing device that obtains retrograde images obtained by reversing the time-series images with respect to time, and performs the information processing using the retrograde images. A system is provided.

また、一つの実施態様では、撮影された時系列画像のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理をおこなう移動体位置推定システムであって、相互に区別可能な特徴量を持つ画像特徴点が映る画像群のうち、最も遅い時刻近傍で撮影した画像を用いて前記情報処理をおこなうことを特徴とする移動体位置推定システムが提供される。 Further, in one embodiment, for an arbitrary image out of the photographed time-series images, from the characteristics of the arbitrary image, the movement for performing information processing such as estimating the photographing position/orientation of the arbitrary image and generating an environment map. A system for estimating body position, wherein said information processing is performed by using an image captured near the latest time in a group of images showing image feature points having mutually distinguishable feature amounts. A position estimation system is provided.

本発明の一側面によれば、高精度な位置姿勢推定をおこなうことができる。 According to one aspect of the present invention, highly accurate position and orientation estimation can be performed.

図１Ａは、実施の形態にかかる移動体位置推定方法における姿勢グラフと最適化の一例を模式的に示す説明図である。FIG. 1A is an explanatory diagram schematically showing an example of a posture graph and optimization in a mobile body position estimation method according to an embodiment; 図１Ｂは、従来技術にかかる移動体位置推定方法における姿勢グラフと最適化の一例を模式的に示す説明図である。FIG. 1B is an explanatory diagram schematically showing an example of a posture graph and optimization in a moving body position estimation method according to the prior art. 図２Ａは、実施の形態にかかる移動体位置推定方法の処理の手順の一例を示すフローチャート（その１）である。FIG. 2A is a flowchart (part 1) illustrating an example of a procedure of processing of the mobile body position estimation method according to the embodiment; 図２Ｂは、実施の形態にかかる移動体位置推定方法の処理の手順の一例を示すフローチャート（その２）である。FIG. 2B is a flowchart (part 2) showing an example of the procedure of the mobile body position estimation method according to the embodiment; 図３Ａは、実施の形態にかかる移動体位置推定方法における逆時刻によるＶ－ＳＬＡＭの環境マップの一例を示す説明図である。FIG. 3A is an explanatory diagram showing an example of a V-SLAM environment map in reverse time in the moving body position estimation method according to the embodiment. 図３Ｂは、従来技術にかかる移動体位置推定方法における順時刻によるＶ－ＳＬＡＭの環境マップの一例を示す説明図である。FIG. 3B is an explanatory diagram showing an example of a V-SLAM environment map based on forward time in the mobile body position estimation method according to the prior art. 図４Ａは、実施の形態にかかる移動体位置推定方法における周辺環境マップの一例を示す説明図である。FIG. 4A is an explanatory diagram showing an example of a surrounding environment map in the mobile body position estimation method according to the embodiment. 図４Ｂは、従来技術にかかる移動体位置推定方法における周辺環境マップの一例を示す説明図である。FIG. 4B is an explanatory diagram showing an example of a surrounding environment map in the mobile body position estimation method according to the prior art. 図５は、実施の形態にかかる移動体位置推定システムのシステム構成の一例を示す説明図である。FIG. 5 is an explanatory diagram of an example of the system configuration of the mobile position estimation system according to the embodiment. 図６は、移動体位置推定装置（サーバ）のハードウェア構成の一例を示すブロック図である。FIG. 6 is a block diagram showing an example of the hardware configuration of the mobile position estimation device (server). 図７は、車載機のハードウェア構成の一例を示すブロック図である。FIG. 7 is a block diagram showing an example of the hardware configuration of the vehicle-mounted device. 図８は、実座標環境マップのデータ構成の一例を示す説明図である。FIG. 8 is an explanatory diagram showing an example of the data configuration of the real coordinate environment map. 図９は、全画像位置姿勢データのデータ構成の一例を示す説明図である。FIG. 9 is an explanatory diagram showing an example of the data configuration of all image position/orientation data. 図１０は、実施の形態にかかる移動体位置推定システム、移動体位置推定方法の内容の一例を示す説明図である。FIG. 10 is an explanatory diagram showing an example of contents of the mobile body position estimation system and the mobile body position estimation method according to the embodiment. 図１１Ａは、初期姿勢・座標系設定部における変換行列算出の一例を示す説明図（その１）である。11A is an explanatory diagram (part 1) showing an example of conversion matrix calculation in the initial orientation/coordinate system setting unit; FIG. 図１１Ｂは、初期姿勢・座標系設定部における変換行列算出の一例を示す説明図（その２）である。FIG. 11B is an explanatory diagram (part 2) showing an example of conversion matrix calculation in the initial posture/coordinate system setting unit; 図１１Ｃは、初期姿勢・座標系設定部における変換行列算出の一例を示す説明図（その３）である。11C is an explanatory diagram (part 3) showing an example of conversion matrix calculation in the initial attitude/coordinate system setting unit; FIG. 図１１Ｄは、スケール変換行列Ｍ１の算出の一例を示す説明図である。FIG. 11D is an explanatory diagram showing an example of calculation of the scale conversion matrix M1. 図１１Ｅは、回転変換行列Ｍ２の算出の一例を示す説明図である。FIG. 11E is an explanatory diagram showing an example of calculation of the rotation transformation matrix M2. 図１２は、ＫＦ（キーフレーム）更新部の処理の手順の一例を示すフローチャートである。FIG. 12 is a flowchart illustrating an example of a procedure of processing by a KF (key frame) updating unit.

以下に図面を参照して、本発明にかかる移動体位置推定システムおよび移動体位置推定
方法の実施の形態を詳細に説明する。 Embodiments of a mobile body position estimation system and a mobile body position estimation method according to the present invention will be described in detail below with reference to the drawings.

（実施の形態）
移動する一般車の車載機のデータ（たとえば映像データなど）は、大量に収集（プローブ）される。この一般車の映像データから、走行時の位置および姿勢を推定し、車載データ解析に用いている。一般的に搭載されているＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）装置では誤差の大きな自車位置測定しかできないため、詳細な自車位置を必要とするサービスには適用することができない。 (Embodiment)
A large amount of data (for example, video data, etc.) from on-board devices of moving general vehicles is collected (probed). From this video data of general vehicles, the position and posture during driving are estimated and used for in-vehicle data analysis. A generally installed GPS (Global Positioning System) device can only measure the position of the vehicle with a large error, so it cannot be applied to services that require a detailed position of the vehicle.

この一般車のデータに対して、たとえば、走行中の撮影時の位置・姿勢を高精度に推定してその車載画像に付加するならば、画像から走路周辺の地物を抽出して自動運転向け地図などの地図を作成更新したり、自動運転向けに撮影時の周辺状況を解析したりする新しいサービス分野に応用することができる。このため、これら一般車映像を使った新しいサービス利用の前提として、車載画像を撮影したカメラ位置・姿勢（映像の撮影位置・姿勢）を正確に推定する技術が必要とされている。 For this general vehicle data, for example, if the position and posture at the time of shooting while driving are estimated with high accuracy and added to the in-vehicle image, features around the road can be extracted from the image and used for autonomous driving. It can be applied to new service fields such as creating and updating maps and other maps, and analyzing surrounding conditions at the time of shooting for autonomous driving. Therefore, as a premise for the use of new services using general vehicle images, there is a need for technology that can accurately estimate the position and orientation of the camera that captured the in-vehicle image (the position and orientation of the video).

ＳＬＡＭは、移動中に取得した周辺状況に関する車載データ、たとえばＬＩＤＡＲ（ＬａｓｅｒＩｍａｇｉｎｇＤｅｔｅｃｔｉｏｎａｎｄＲａｎｇｉｎｇ）データなどを入力とし、自車走行経路（自車の位置と姿勢）と周辺環境地図（周辺物の３次元位置マップなど）を同時に作成する技術の総称である。 SLAM receives in-vehicle data related to surrounding conditions acquired during movement, such as LIDAR (Laser Imaging Detection and Ranging) data, etc. Dimensional position map, etc.) is a general term for technologies that simultaneously create.

なかでも、Ｖ－ＳＬＡＭは、車載カメラで撮影した映像を入力とし、撮影した映像に映る被写体の変化を用いて、自車走行経路（自車位置・姿勢）と周辺環境地図（周辺の被写体の画像特徴点群の３次元位置マップ、以下、環境地図という）を推定作成することができる技術であり、一般車の単眼映像から自車位置と姿勢を推定することができる。 Among them, V-SLAM uses images captured by an in-vehicle camera as an input, and uses changes in the subject reflected in the captured image to determine the vehicle's driving route (vehicle position/posture) and the surrounding environment map (the location of the surrounding subject). It is a technology that can estimate and create a three-dimensional position map of image feature point groups (hereinafter referred to as an environment map), and it is possible to estimate the position and attitude of a vehicle from a monocular image of a general vehicle.

また、Ｖ－ＳＬＡＭでは、新たに複数画像上で観測された画像特徴点に対し、三角測量で初期３次元位置を推定し、環境マップに追加する。その上で、随時、撮影画像での該環境マップの画像特徴点の想定映り込み状態と実際の撮影画像との違いを考慮しながら、画像群の撮影位置姿勢と、環境マップの画像特徴点群の３次元位置を最適化補正し、最終的な撮影画像群の撮影位置姿勢を得ることができる。 In V-SLAM, initial three-dimensional positions are estimated by triangulation for image feature points newly observed on a plurality of images, and added to the environment map. On that basis, while considering the difference between the assumed reflection state of the image feature points of the environment map in the captured image and the actual captured image, the shooting position and orientation of the image group and the image feature point group of the environment map are calculated. The three-dimensional position of is optimized and corrected to obtain the final photographing position and orientation of the group of photographed images.

後述する図５に示す、実施の形態にかかる移動体位置推定システム５００は、逆行判定部５１１、逆行画像取得部５１２、初期姿勢・座標系設定部５１３、フレーム姿勢推定部５２１、キーフレーム（ＫＦ）更新部５２２、３Ｄマップ特徴点更新部５３１、グラフ制約生成部５３２、ＫＦ姿勢・特徴点マップ最適化部５３３、ループ検出・クロージング部５４１、の９つのＶ－ＳＬＡＭをベースとする処理部を持ち、かつ、実座標環境マップ５５０（ＫＦ群情報５５１、特徴点群情報５５２）、全画像位置姿勢データ５６０、の２つの内部保持データを持っている。 A moving body position estimation system 500 according to an embodiment shown in FIG. ) Nine V-SLAM-based processing units including an update unit 522, a 3D map feature point update unit 531, a graph constraint generation unit 532, a KF posture/feature point map optimization unit 533, and a loop detection/closing unit 541 In addition, it has two internally held data: an actual coordinate environment map 550 (KF group information 551, feature point group information 552) and all image position/orientation data 560. FIG.

また、ＫＦ（キーフレーム）は、特に映像に含まれる全画像群の中で特に主要な画像である。Ｖ－ＳＬＡＭでは、ＫＦ画像に対して撮影位置・姿勢を大域／局所の双方で矛盾が無いように、環境地図を用いて詳細な解析技法を用いて推定しておき、残る画像群は、ＫＦからの相対関係を使って簡易に推定する、という方法が取られることが多い。そのため、実施の形態にかかる移動体位置推定方法においても、同様の方法を採用するが、全画像群をＫＦと見なしてもかまわない。 KFs (key frames) are particularly important images among all images included in a video. In V-SLAM, the shooting position and posture of KF images are estimated using detailed analysis techniques using an environment map so that there is no contradiction in both global and local areas. In many cases, a method of simply estimating using the relative relationship from . Therefore, in the mobile body position estimation method according to the embodiment, a similar method is adopted, but the entire image group may be regarded as KF.

また、環境地図（実座標環境マップ５５０）は、各画像特徴点の３次元位置のほかに、どの画像特徴点はどの画像で用いるかに関する情報を持っている。これは、後述する図５に示す、ＫＦ群情報５５１と、特徴点群情報５５２である。ＫＦ群情報５５１は、主要な映像内の画像（ＫＦ）群の情報と、該ＫＦ画像上に各画像特徴が映っている２次元位置に関する情報の２つであり、環境地図を任意の画像の位置姿勢推定に用いるために必須な情報である。実座標環境マップ５５０の内容については、後述する図８において詳細に説明する。 Also, the environment map (actual coordinate environment map 550) has information about which image feature points are used in which images, in addition to the three-dimensional positions of each image feature point. This is KF group information 551 and feature point group information 552 shown in FIG. 5 to be described later. The KF group information 551 is two pieces of information on the image (KF) group in the main video and information on the two-dimensional position where each image feature is shown on the KF image, and the environment map can be used for any image. This is essential information for use in position and orientation estimation. The contents of the real coordinate environment map 550 will be described in detail in FIG. 8, which will be described later.

一般的に特徴点ベースのＶ－ＳＬＡＭの環境マップは、画像特徴点群の画像特徴と３次元位置、該特徴点群を閲覧している画像ＫＦの情報、また、画像ＫＦと似た画像を検索できるようにするための画像ＫＦ内の画像特徴群（多くは、３次元位置を保持している特徴点群よりも多くの画像特徴を持つ）、を含んでいる。 In general, the feature point-based V-SLAM environment map includes image features and three-dimensional positions of the image feature point group, information on the image KF viewing the feature point group, and an image similar to the image KF. image features in the image KF to enable retrieval (many have more image features than feature points that hold 3D positions).

また、後述する図５に示す全画像位置姿勢データ５６０に示すように、画像位置姿勢は映像の全画像フレームの撮影位置・姿勢である。実座標環境マップ５５０のＫＦ群情報５５１に含まれるＫＦの撮影位置・姿勢データは、全画像位置・姿勢の一部でもあり、重複するが、分かり易さのために、ＫＦ位置・姿勢として他のＫＦ情報と共に記載している。なお、全画像位置姿勢データ５６０の内容については、後述する図９において詳細に説明する。 In addition, as shown in all image position/orientation data 560 shown in FIG. 5 to be described later, the image position/orientation is the shooting position/orientation of all image frames of the video. The KF shooting position/orientation data included in the KF group information 551 of the real coordinate environment map 550 is also part of the entire image position/orientation, and overlaps. is described together with the KF information of Note that the contents of the total image position/orientation data 560 will be described in detail later with reference to FIG.

また、後述する図１０に示す映像１００１は、移動体（車両）に搭載したカメラ映像である。車載機などの車両の通信手段を用いたり、記録メディアを介して人手を使ったり、任意の方法で入手し、本発明の撮影位置推定システムの入力とする。また、特に記載していないが、映像の歪み補正等で用いるため、映像を撮影したカメラの内部パラメータは既知とし、適宜歪み補正を実施してよい。 Further, an image 1001 shown in FIG. 10, which will be described later, is an image of a camera mounted on a moving body (vehicle). It is obtained by an arbitrary method such as using communication means of the vehicle such as an in-vehicle device or manually via a recording medium, and is used as an input for the photographing position estimation system of the present invention. Also, although not specifically described, since it is used for image distortion correction, etc., the internal parameters of the camera that captured the image may be assumed to be known, and distortion correction may be performed as appropriate.

（移動体位置推定方法の概要）
まず、図１Ａおよび図１Ｂを用いて、本実施の形態にかかる移動体位置推定システムおよび移動体位置推定方法の概要について説明する。 (Outline of mobile body position estimation method)
First, an outline of a mobile body position estimation system and a mobile body position estimation method according to the present embodiment will be described with reference to FIGS. 1A and 1B.

本実施の形態にかかる移動体位置推定方法においては、Ｖ－ＳＬＡＭ処理をするにあたり、２つの処理を実行する。１つ目の処理は、映像を時間経過とは逆方向に処理するよう、映像を逆行するように取得する処理（逆行画像取得処理）である。この逆行画像取得処理は、たとえば、後述する逆行画像取得部５１２によっておこなわれる。 In the moving body position estimation method according to the present embodiment, two processes are executed for V-SLAM processing. The first process is a process (reverse image acquisition process) of acquiring an image in a reverse direction so as to process the image in a direction opposite to the passage of time. This retrograde image acquisition process is performed, for example, by the retrograde image acquisition unit 512, which will be described later.

この逆行画像取得処理においては、たとえば、入力した映像ファイルに対し、撮影時刻が遡るように各画像を抽出して、後続のＶ－ＳＬＡＭ処理部に渡す。入力映像がライブ入力であった場合には、任意時間分の画像群をバッファリングして貯めておいて、貯めた時間分の映像に対して、随時、取得時刻と逆にＶ－ＳＬＡＭ処理部へと渡すようにすることができる。 In this retrograde image acquisition process, for example, each image is extracted from the input video file so that the shooting time goes back, and the extracted image is passed to the subsequent V-SLAM processing unit. When the input video is a live input, a group of images for an arbitrary time is buffered and stored, and the video for the stored time is acquired at any time in reverse to the acquisition time by the V-SLAM processing unit. can be passed to

２つ目の処理は、逆行画像取得処理をおこなう際に、Ｖ－ＳＬＡＭの処理対象の映像の状態や、逆行なしで試行したＶ－ＳＬＡＭの結果などから、入力した映像を時間反転（時間逆行）するか否かの判定をおこなう処理（逆行判定処理）である。したがって、この逆行判定処理によって、判定された結果に基づいて、逆行画像取得処理を実行するか否かを決定する。この逆行判定処理は、たとえば、後述する逆行判定部５１１によっておこなわれる。 The second process is to time-reverse the input video (time-reverse ) is a process (reverse determination process). Therefore, whether or not to execute the retrograde image acquisition process is determined based on the result determined by the retrograde determination process. This retrograde determination process is performed, for example, by the retrograde determination unit 511, which will be described later.

逆行判定処理は、具体的には、たとえば、対象映像が移動体前方に向けて設置したカメラの映像であるか否かを判定して、移動体の前方を撮影した映像であれば、逆行画像（逆行映像）取得処理をおこなうと判定し、それ以外（たとえば、移動体の後方を撮影した映像）であれば、逆行画像取得処理をおこなわないと判定する。 Specifically, in the retrograde determination process, for example, it is determined whether or not the target image is the image of the camera installed facing the front of the moving body. (Retrograde image) acquisition processing is determined to be performed, and if otherwise (for example, an image of the rear of a moving object), it is determined that retrograde image acquisition processing is not to be performed.

ここで、当該カメラが前向き撮影カメラか否かは、任意の既知の方法、たとえば、移動体上のカメラ設置位置情報から得てもよいし、あるいは、撮影映像の被写体変化として、オプティカルフローによって周辺背景が近づく／遠ざかることを推定して、その推定結果に基づいて、当該カメラが前向き撮影であるか否かを判断するようにしてもよい。 Here, whether or not the camera is a front-facing camera may be obtained by any known method, for example, from camera installation position information on a moving object, or may be obtained by optical flow as a subject change in a captured image. It may be possible to estimate whether the background is approaching or going away, and based on the estimation result, it may be determined whether or not the camera is forward-facing shooting.

後述するように、従来方式では、Ｖ－ＳＬＡＭ処理する画像群内で被写体の画像特徴点を最初に検知した２画像を、後述する、初期姿勢・座標系設定部５１３、３Ｄマップ特徴点更新部５３１の該画像特徴の算出、画像特徴の対応付けおよび三角測量で用いる。 As will be described later, in the conventional method, the two images in which the image feature points of the subject are first detected in the group of images to be V-SLAM processed are processed by the initial posture/coordinate system setting unit 513 and the 3D map feature point updating unit, which will be described later. 531, the image feature calculation, image feature matching and triangulation.

このため、前向き撮影カメラの映像を処理する場合に、従来方式では、被写体が遠くに映り始めたときの２画像を該処理部５１３、５３１において用いることがほとんどであるのに対して、本実施の形態にかかる移動体位置推定方法では、逆行画像取得をおこなうため、被写体が最もカメラの近くに映ったときの２画像を使うことになる。この２画像は、被写体が大きく映っているため、画像特徴の算出および対応付けミスが生じにくく、従来のように被写体が遠くに映った時のように誤った対応づけを実施してしまうことが少ない。 For this reason, when processing an image captured by a front-facing camera, in most cases, the conventional method uses two images when the subject starts to appear far away in the processing units 513 and 531. In the method for estimating the position of a moving object according to the form of (1), two images are used when the subject is closest to the camera in order to obtain a retrograde image. Since the subject appears large in these two images, errors in image feature calculation and association are less likely to occur, and it is possible to make incorrect associations, as in conventional cases when the subject appears far away. few.

このため、前向き撮影カメラの映像を処理する場合に、従来方式では、被写体が近くに映った時にも本来は対応付け済であるはずの画像特徴が対応付けミスで残ってしまっていて、それと路面微小凹凸などの本来は対応付けされないはずの間違った特徴と対応付けしてしまい、結果的に多数の間違ったマップ特徴が生じてしまうのに対して、本実施の形態にかかる移動体位置推定方法では、そのようなことが生じにくい。 For this reason, when processing images from a front-facing camera, in the conventional method, image features that should have already been associated remain due to errors in association even when the subject is captured close. The method for estimating the position of a mobile object according to the present embodiment is used to prevent the association with erroneous features that should not be associated, such as fine unevenness, resulting in a large number of erroneous map features. Then such a thing is unlikely to occur.

また、本実施の形態にかかる移動体位置推定方法では、画像特徴の３Ｄ初期位置を求める三角測量でも、従来より見えの変化（交会角）が非常に大きな複数画像を使って実施することができるので、変化量から３Ｄ位置を推定する三角測量の精度が向上するため、環境マップにおける当該画像特徴の３Ｄ位置精度が向上する。 In addition, in the method for estimating the position of a moving object according to the present embodiment, triangulation for obtaining the 3D initial positions of image features can also be performed using a plurality of images with significantly larger changes in appearance (angle of intersection) than in the conventional art. Therefore, the accuracy of triangulation for estimating the 3D position from the amount of change is improved, so the 3D positional accuracy of the image feature in the environment map is improved.

さらに、当該特徴点の３Ｄ位置は、最適化前の初期位置精度が向上したことに加えて、当該特徴点が大きく映る画像上では、３Ｄ位置の小さな変動は映り込む投影位置の大きな変化として現れることにより、３Ｄ特徴点が画像に映り込む投影位置と実際に該特徴が画像上に現れた位置との違い（再投影誤差）を使った該特徴点の３Ｄ位置の微調整最適化の精度も向上しやすい。この結果、環境マップの当該画像特徴の３Ｄ位置精度が向上する（後述する図４Ａを参照）。 Furthermore, the 3D position of the feature point has improved initial positional accuracy before optimization, and small changes in the 3D position appear as large changes in the projected position of the feature point on an image in which the feature point appears large. As a result, the accuracy of fine adjustment optimization of the 3D position of the feature point using the difference (reprojection error) between the projected position where the 3D feature point is reflected in the image and the position where the feature actually appears on the image. Easy to improve. This results in improved 3D positional accuracy of the image feature of interest in the environment map (see FIG. 4A below).

図１Ａは、実施の形態にかかる移動体位置推定方法における姿勢グラフと最適化の一例を模式的に示す説明図である。また、図１Ｂは、従来技術にかかる移動体位置推定方法における姿勢グラフと最適化の一例を模式的に示す説明図である。 FIG. 1A is an explanatory diagram schematically showing an example of a posture graph and optimization in a mobile body position estimation method according to an embodiment; Also, FIG. 1B is an explanatory diagram schematically showing an example of a posture graph and optimization in a moving body position estimation method according to the prior art.

図１Ａ、図１Ｂにおいて、被写体（特徴点Ａ）１００に対して、移動体の各撮影位置において撮影された画像１０１～１１０を示している。矢印は移動体の進行方向を示している。したがって、各三角形１０１～１１０は、それぞれの画像が撮影された位置を示しており、また、画像１０１が一番最初に撮影されたものであり、時系列に沿って、１０２→１０３→１０４→・・・→１０９→１１０の順で撮影されたことを示している。 FIGS. 1A and 1B show images 101 to 110 of a subject (feature point A) 100 taken at respective shooting positions of a moving body. The arrow indicates the traveling direction of the moving object. Therefore, each triangle 101 to 110 indicates the position where each image was taken, and the image 101 was taken first, and in chronological order, 102 → 103 → 104 → . . . →109→110.

そして、特徴点Ａ１００は、移動体が三角形１０１、１０２では撮影されておらず、三角形１０３の画像において、特徴点Ａ１００が初めて映っている。その後、三角形１０３の画像まで、特徴点Ａ１００が映っている。したがって、特徴点Ａ１００が映っている画像群は、画像１０３～１０９である。 The feature point A100 is not captured in the triangles 101 and 102 of the moving object, and the feature point A100 appears for the first time in the image of the triangle 103 . After that, the feature point A100 is reflected up to the image of the triangle 103 . Therefore, the group of images in which the feature point A100 is shown is images 103-109.

図１Ｂに示すように、従来方式では、被写体（特徴点Ａ）１００が遠くに見え始めたときの画像を使うのが一般的であった。すなわち、画像１０３と画像１０４が三角測量を実施する画像ペアであった。この場合、見えの変化（交会角）θｂが小さい画像のペアで三角測量を実施するため、変化量から位置を推定する三角測量の精度が悪かった。 As shown in FIG. 1B, the conventional method generally uses an image when the subject (feature point A) 100 begins to appear far away. That is, image 103 and image 104 were the image pair for which triangulation was performed. In this case, since triangulation is performed on a pair of images with a small change in appearance (angle of intersection) θb, the accuracy of triangulation for estimating a position from the amount of change is poor.

これに対して、実施の形態にかかる移動体位置推定方法は、図１Ａに示すように、より被写体（特徴点Ａ）１００が最後に見え始めた、すなわち最も近いときの画像を使うものである。すなわち、画像１０８と画像１０９を三角測量を実施する画像ペアとする。見えの変化（交会角）θａが大きい画像のペアで三角測量を実施するため、変化量から位置を推定する三角測量の精度がよくなるものである。 On the other hand, the moving object position estimation method according to the embodiment uses the image when the subject (feature point A) 100 finally started to be seen, that is, when it was the closest, as shown in FIG. 1A. . That is, images 108 and 109 are taken as an image pair for triangulation. Since triangulation is performed on a pair of images having a large change in appearance (angle of intersection) θa, the accuracy of triangulation for estimating a position from the amount of change is improved.

なお、本実施の形態では、後ろ向き撮影カメラによる映像は、そもそも、「被写体がカメラに最も大きく映った画像」から順次遠くに映った画像へと撮影されているため、逆行画像取得を実施する必要がないと判断している。しかし、前後向き撮影カメラ以外のカメラ、たとえばやや斜め向きのカメラなどは、前後どちらの撮影カメラにより近いかを、任意の方法で判断して決定してもよい。 Note that in the present embodiment, since the image captured by the rearward-facing camera is originally captured in order from the "image in which the subject appears in the camera at its largest" to the image in which the subject appears in the distance, it is necessary to acquire the retrograde image. It is determined that there is no However, in the case of cameras other than the front-back camera, for example, a slightly oblique camera, it may be determined by judging which of the front and rear cameras is closer.

逆行判定処理は、この他に、対象映像のフレームレートが規定値よりも小さい場合は、逆行しないものと判定し、規定値よりも大きい場合は、逆行するものと判定するようにしてもよい。 In addition, the retrograde determination process may determine that the frame rate of the target video is not retrograde when the frame rate is smaller than a specified value, and determines that it is retrograde when the frame rate is greater than the specified value.

フレームレートが低い映像は、２画像間で被写体が大きく変化し、特に、カメラ近くに被写体が映っている場合は、遠方に被写体が映っている時よりも映りの変化が大きい。このため、フレームレートが非常に低い映像では、カメラ近くに被写体が映っている２画像によって画像特徴点の対応付けやそれを用いた三角測量などを行おうとすると、被写体の特徴点の出現位置や見えが大きく変化しているために、同じ特徴点と判定することができない虞れがある。 In a video with a low frame rate, the subject changes greatly between two images, and in particular, when the subject appears near the camera, the change in appearance is greater than when the subject appears far away. For this reason, in video with a very low frame rate, if you try to match the image feature points with two images in which the subject is captured near the camera or perform triangulation using that, the appearance position of the subject's feature points and the Since the appearance has changed greatly, there is a possibility that it cannot be determined as the same feature point.

この結果、逆行画像取得でカメラ近くに被写体が映った画像から処理をおこなっても、結局、被写体が適度に遠くに映るまで同じ特徴点として判定できないため、該被写体の画像特徴を環境マップに反映できず、環境マップの精度があまり向上しない上、環境マップに反映されてから該被写体が遠ざかって特徴が見えなくなるまでしか、該環境マップの特徴点をＶ－ＳＬＡＭで利用できないので、利用できる画像数が減ってしまい、また、最も被写体の映りの良いカメラ至近の画像が全く使えない。 As a result, even if processing is performed from an image in which a subject is captured near the camera in retrograde image acquisition, the same feature points cannot be determined until the subject is captured at a moderate distance. In addition, the feature points of the environment map can only be used in the V-SLAM after being reflected in the environment map until the subject moves away and the feature disappears. The number is reduced, and the image closest to the camera, which captures the best subject, cannot be used at all.

このため、初期の３Ｄ特徴点位置精度は若干向上するものの環境マップへの該特徴点登録が遅れるため、カメラ撮影位置姿勢算出に該被写体特徴点をあまり使えないので、従来方式に比べて逆行画像取得をおこなう利点が非常に少なくなる。 For this reason, although the initial 3D feature point position accuracy is slightly improved, the registration of the feature points to the environment map is delayed. The benefit of doing the acquisition is greatly reduced.

すなわち、図１Ｂにおいて、被写体（特徴点Ａ）１００が遠くに見え始めたときの画像を使うと、精度は低いが、低フレームレートでも、被写体が最もカメラに近い時の画像を含め、画像１０３～１０９の７つの画像をフルでＶ－ＳＬＡＭに利用できる。したがって、至近画像は、再投影誤差による最適化に使うことができる。 That is, in FIG. 1B, if the image when the subject (feature point A) 100 begins to appear far away is used, the accuracy is low, but even at a low frame rate, the image 103 including the image when the subject is closest to the camera is used. A full seven images of ~109 are available for V-SLAM. Therefore, the near-field image can be used for optimization with reprojection error.

これに対して、図１Ａにおいて、より被写体（特徴点Ａ）１００が最後に見えた、すなわち、最も近いときの画像を使うと、至近に映る画像群は、映り変化が激しすぎて特徴対応付けに失敗して、三角測量利用できないため、初期の３Ｄ位置（三角測量精度）はやや高いが、画像１０３～１０７の５つの画像しかＶ－ＳＬＡＭに利用できず、特に最も映りのよい最至近画像は、最適計算にさえも使うことができないため、精度が悪化する。 On the other hand, in FIG. 1A, if the image when the subject (feature point A) 100 was seen last, that is, when the closest image was used, the group of images captured at the close distance would have too much change in appearance and correspond to the feature. The initial 3D position (triangulation accuracy) is somewhat high, but only 5 images, images 103 to 107, can be used for V-SLAM, and the closest object, which has the best image quality, can be used for V-SLAM. Accuracy suffers because the image cannot be used even for optimal computation.

このように、低フレームレートだと、時間逆行で三角測量が遅れ、特徴点Ａ１００を利用できる画像数が減り、結果的に誤差が大きくなって、逆行利点が無い。したがって、本実施の形態にかかる移動体位置推定方法では、フレームレートが低い映像は、前向き撮影カメラの場合は、逆行画像を取得しない、という判断をおこなうことができる。 Thus, when the frame rate is low, triangulation is delayed due to time retrogression, the number of images that can use the feature point A100 is reduced, and as a result the error becomes large and there is no retrogression advantage. Therefore, in the method for estimating the position of a moving object according to the present embodiment, it is possible to determine not to obtain a retrograde image for a video with a low frame rate in the case of a front-facing camera.

また、後ろ向き撮影カメラでは、フレームレートが低い映像の場合、前向き撮影カメラを逆行画像取得した時と同じ問題が生じ、環境マップ反映が遅れてしまうので、前向き撮影カメラとは逆に、逆行画像取得をおこなうと判断してもよい。後ろ向き撮影カメラに逆行画像取得をおこなうことで、被写体が最も遠方に映った画像から順にＶ－ＳＬＡＭ処理をおこなうことになり、三角測量による初期の３Ｄ特徴点の位置精度は若干悪化するものの、環境マップへの登録が比較的早く実施できるため、該特徴点を使ったカメラ撮影位置姿勢算出に該特徴点を十分活用できるという利点がある。 Also, with a rear-facing camera, when the frame rate is low, the same problem as when acquiring a retrograde image with a front-facing camera occurs, and the environment map is delayed. You may decide to do By performing retrograde image acquisition with a rear-facing camera, V-SLAM processing is performed in order from the image in which the subject is most distant. Since the registration on the map can be performed relatively quickly, there is an advantage that the feature points can be fully utilized in calculating the camera shooting position and orientation using the feature points.

なお、フレームレートが低い映像であっても、特徴点の対応付けのパラメータ、たとえば、対応付け対象の特徴点を画像内のどの範囲までを探索するかの探索範囲のしきい値などを、フレームレートが高い映像とは異なる適切な値に変更することで、カメラに近い画像同士を使った特徴点対応付けが実施できる場合もあり、その場合はフレームレートによる逆行判定を省略してもよい。 Note that even for video with a low frame rate, parameters for matching feature points, for example, the search range threshold for searching for feature points to be matched, can be set in frames. By changing to an appropriate value that is different from high-rate video, it may be possible to perform feature point matching using images close to the camera, in which case retrograde determination based on the frame rate may be omitted.

一般的には、探索範囲が大きい程、対応付け処理にかかる処理負荷が高くなるため、特に低フレームレートの映像に対しては、対応付けパラメータの変更をおこなうのではなく、低フレームレートの逆行判定をおこなうことが望ましい。 In general, the larger the search range, the higher the processing load for the matching process. A judgment is desirable.

逆行判定処理においては、この他にさらに、逆行画像取得なしのＶ－ＳＬＡＭ、すなわち従来技術と同じＶ－ＳＬＡＭをあらかじめ一度実施しておき、その時に推定した撮影位置・姿勢の推定精度、または、推定失敗状況、または作成した環境マップの推定精度のいずれかを調べて、それが特に悪化している場合だけ、時間逆行によるＶ－ＳＬＡＭを実施すると判断してもよい。 In the retrograde determination process, in addition to this, V-SLAM without retrograde image acquisition, that is, the same V-SLAM as the conventional technique is performed once in advance, and the estimation accuracy of the shooting position and posture estimated at that time, or Either the estimation failure situation or the estimation accuracy of the created environment map may be examined, and only if it is particularly degraded may it be decided to implement V-SLAM with time retrogression.

このように判定することで、渋滞シーンなど他の被写体（他の並走車等）で隠れてしまって、道路周辺地物が最もカメラ近くでは映り込まないことが多いシーン映像など、逆行画像を取得することに効果があるのか判断に迷う場合にも、必要な場合だけ、逆行画像の取得をおこなうことができる。 By judging in this way, it is possible to detect retrograde images such as scenes in which road features are often hidden by other subjects (such as other vehicles running in parallel), such as in traffic jams, and the features around the road are often not captured when they are closest to the camera. Even when it is difficult to determine whether acquisition is effective, retrograde images can be acquired only when necessary.

たとえば、撮影位置・姿勢の推定精度は、位置や姿勢変化のばらつきが大きい場合に精度が悪いと判断する。また、推定失敗状況はＶ－ＳＬＡＭで入力した処理対象映像の画像数に対して、撮影位置・姿勢を算出できた画像数の占める割合を求め、割合は小さいときに精度が悪いと判断する。環境マップの推定精度は、たとえば環境マップの特徴点３Ｄ位置群の位置のばらつきの大きさや、明らかに高さや位置が異常値として推定されている特徴点群の含まれる割合などから、ばらつきや割合が大きいときに精度が悪いと判断する。 For example, the accuracy of estimating the photographing position/orientation is determined to be low when there are large variations in position or orientation change. In addition, the estimation failure state is obtained by calculating the ratio of the number of images for which the shooting position/orientation can be calculated to the number of images of the video to be processed input by the V-SLAM. The estimation accuracy of the environment map can be determined by, for example, the degree of variation in the position of the 3D position group of feature points in the environment map, and the percentage of feature point groups whose height and position are clearly estimated as abnormal values. It is determined that the accuracy is poor when is large.

逆行判定処理では、これらの例で示した判断の１つ以上を、任意に組み合わせて判定してよい。 In the retrograde determination process, one or more of the determinations shown in these examples may be combined arbitrarily for determination.

（逆行判定処理の手順）
図２Ａおよび図２Ｂは、実施の形態にかかる移動体位置推定方法の処理の手順の一例を示すフローチャートである。図２Ａは、逆行判定処理の手順の一例を示すフローチャートであり、図２Ｂは、逆行画像取得処理以降の処理の手順の一例を示すフローチャートである。 (Procedure for retrograde determination processing)
2A and 2B are flowcharts showing an example of the procedure of processing of the mobile position estimation method according to the embodiment. FIG. 2A is a flow chart showing an example of the procedure of the retrograde determination process, and FIG. 2B is a flow chart showing an example of the procedure of the process after the retrograde image acquisition process.

図２Ａのフローチャートにおいて、逆行判定部５１１は、まず、カメラが前向きか否かを判断する（ステップＳ２０１）。ここで、カメラが前向きでない場合（ステップＳ２０１：Ｎｏ）は、ステップＳ２０５へ移行する。一方、カメラが前向きの場合（ステップＳ２０１：Ｙｅｓ）は、カメラの撮影フレームレートがしきい値以上か否かを判断する（ステップＳ２０２）。 In the flowchart of FIG. 2A, the retrograde determination unit 511 first determines whether or not the camera is facing forward (step S201). Here, if the camera is not facing forward (step S201: No), the process proceeds to step S205. On the other hand, if the camera faces forward (step S201: Yes), it is determined whether or not the shooting frame rate of the camera is equal to or higher than the threshold (step S202).

ステップＳ２０２において、カメラの撮影フレームレートがしきい値以上の場合（ステップＳ２０２：Ｙｅｓ）は、「映像を逆行取得する」と判定する（ステップＳ２０３）。一方、ステップＳ２０２において、カメラの撮影フレームレートがしきい値以上でない場合（ステップＳ２０２：Ｎｏ）は、「映像を逆行取得しない」と判定する（ステップＳ２０６）。 In step S202, if the shooting frame rate of the camera is equal to or higher than the threshold value (step S202: Yes), it is determined that "backward image acquisition" is performed (step S203). On the other hand, if the imaging frame rate of the camera is not equal to or greater than the threshold value in step S202 (step S202: No), it is determined that "video is not retrogradely acquired" (step S206).

ステップＳ２０３において、「映像を逆行取得する」と判定した後、従来の時間逆行なしのＶ－ＳＬＡＭ結果があり、その撮影位置・姿勢算出が良好だったか否かを判断する（ステップＳ２０４）。ここで、Ｖ－ＳＬＡＭ結果があり、その撮影位置・姿勢算出が良好だった場合（ステップＳ２０４：Ｙｅｓ）は、逆行画像を取得する必要がないので、Ｖ－ＳＬＡＭの一連の処理を終了する。 After it is determined in step S203 that "retrograde image acquisition is to be performed", it is determined whether or not there is a conventional V-SLAM result without time retrogression and the shooting position/orientation calculation is satisfactory (step S204). Here, if there is a V-SLAM result and the imaging position/orientation calculation is good (step S204: Yes), there is no need to acquire a retrograde image, and the series of V-SLAM processing ends.

一方、ステップＳ２０４において、Ｖ－ＳＬＡＭ結果がない場合、あるいは、Ｖ－ＳＬＡＭ結果があっても、その撮影位置・姿勢算出が良好でない場合（ステップＳ２０４：Ｎｏ）は、逆行画像を取得する必要があるので、図２ＢのステップＳ２１１へ移行する。 On the other hand, in step S204, if there is no V-SLAM result, or if there is a V-SLAM result but the imaging position/orientation calculation is not good (step S204: No), it is not necessary to acquire a retrograde image. Therefore, the process proceeds to step S211 in FIG. 2B.

カメラが前向きでない場合（ステップＳ２０１：Ｎｏ）に移行したステップＳ２０５においても、カメラの撮影フレームレートがしきい値以上か否かを判断する（ステップＳ２０５）。ここでは、カメラの撮影フレームレートがしきい値以上でない場合（ステップＳ２０５：Ｎｏ）は、「映像を逆行取得する」と判定する（ステップＳ２０３）。一方、ステップＳ２０５において、カメラの撮影フレームレートがしきい値以上の場合（ステップＳ２０５：Ｙｅｓ）は、「映像を逆行取得しない」と判定する（ステップＳ２０６）。 Also in step S205 to which the camera is not facing forward (step S201: No), it is determined whether or not the shooting frame rate of the camera is equal to or higher than the threshold value (step S205). Here, if the imaging frame rate of the camera is not equal to or greater than the threshold value (step S205: No), it is determined that "backward image acquisition" is performed (step S203). On the other hand, in step S205, if the shooting frame rate of the camera is equal to or higher than the threshold value (step S205: Yes), it is determined that "images are not retrogradely acquired" (step S206).

ステップＳ２０６において、「映像を逆行取得しない」と判定した後、従来の時間逆行なしのＶ－ＳＬＡＭ結果があるか否かを判断する（ステップＳ２０７）。ここで、Ｖ－ＳＬＡＭ結果がある場合（ステップＳ２０７：Ｙｅｓ）は、Ｖ－ＳＬＡＭの一連の処理を終了する。一方、ステップＳ２０７において、Ｖ－ＳＬＡＭ結果がない場合（ステップＳ２０７：Ｎｏ）は、図２ＢのステップＳ２１１へ移行する。 After determining in step S206 that "images are not retrograded", it is determined whether or not there is a conventional V-SLAM result without time retrograde (step S207). Here, if there is a V-SLAM result (step S207: Yes), the series of V-SLAM processing ends. On the other hand, if there is no V-SLAM result in step S207 (step S207: No), the process proceeds to step S211 in FIG. 2B.

図２Ｂのフローチャートにおいて、判定の結果、映像を逆行取得する必要があるか否かを判定する（ステップＳ２１１）。判定の結果は、「映像を逆行取得する」（図２ＡのステップＳ２０３）または「映像を逆行取得しない」（図２ＡのステップＳ２０６）との判定の結果である。 In the flowchart of FIG. 2B, as a result of the determination, it is determined whether or not it is necessary to acquire the video backward (step S211). The result of the determination is the determination result of "obtain retrograde image" (step S203 in FIG. 2A) or "not retrograde acquire image" (step S206 in FIG. 2A).

ステップＳ２１１において、映像を逆行取得する必要がある場合（ステップＳ２１１：Ｙｅｓ）は、映像を逆時刻順に取得する『逆行取得フラグ』をＯＮにして（ステップＳ２１２）、ステップＳ２１４へ移行する。一方、映像を逆行取得する必要がない場合（ステップＳ２１１：Ｎｏ）は、『逆行取得フラグ』をＯＦＦにして（ステップＳ２１３）、ステップＳ２１４へ移行する。 In step S211, if it is necessary to acquire images in reverse order (step S211: Yes), the "reverse acquisition flag" for acquiring images in reverse chronological order is turned ON (step S212), and the process proceeds to step S214. On the other hand, if it is not necessary to acquire the video retrograde (step S211: No), the "retrograde acquisition flag" is turned off (step S213), and the process proceeds to step S214.

ステップＳ２１４において、すべての映像を処理したか否かを判断する（ステップＳ２１４）。ここで、未だ、すべての映像を処理していない場合（ステップＳ２１４：Ｎｏ）は、映像内の未処理画像群から、『逆行取得フラグ』を用いて、最も後で撮影した画像を１つ取得する（ステップＳ２１５）。これは、『逆行取得フラグ』がＯＮになっている場合であって、『逆行取得フラグ』がＯＦＦになっている場合は、最も前で撮影した画像を１つ取得するようにするとよい。ここまでが、逆行画像取得部５１２によって実行される処理である。 In step S214, it is determined whether or not all images have been processed (step S214). Here, if all the images have not been processed yet (step S214: No), one image taken last is acquired from the group of unprocessed images in the image using the "backward acquisition flag". (step S215). This is a case where the "reverse acquisition flag" is ON, and if the "reverse acquisition flag" is OFF, it is preferable to acquire one image that has been photographed foremost. The above is the processing executed by the retrograde image acquisition unit 512 .

そして、取得した画像内で特徴点群を取得し（ステップＳ２１６）、画像の撮影位置姿勢を、最新ＫＦの撮影位置姿勢・特徴点群マップから推定する（ステップＳ２１７）。このように、特徴点群マップの投影位置から撮影位置姿勢を推定するが、マップ精度が向上しているので、推定精度も向上する。ただし、フレームレートが低いと、近くに映っている画像が２枚ないとマップ登録されないので、近くの特徴がマップにない可能性がある。ここまでが、後述するフレーム姿勢推定部５２１によって実行される処理である。 Then, a feature point group is acquired in the acquired image (step S216), and the shooting position and orientation of the image are estimated from the shooting position and orientation/feature point group map of the latest KF (step S217). In this manner, the photographing position and orientation are estimated from the projection positions of the feature point cloud map, and since the map accuracy is improved, the estimation accuracy is also improved. However, if the frame rate is low, two nearby images are required to be registered on the map, so there is a possibility that nearby features may not be present on the map. The processing up to this point is executed by the frame orientation estimation unit 521, which will be described later.

つぎに、取得した画像は、新ＫＦ画像であるか否かを判断する（ステップＳ２１８）。新ＫＦ画像であるか否かは、たとえば、最新ＫＦとの相違が大きい、または、指定時間が離れているか否かによって判断することができる。ここで、取得した画像が、新ＫＦ画像でない場合（ステップＳ２１８：Ｎｏ）は、何もせずに、ステップＳ２１４へ戻る。 Next, it is determined whether or not the acquired image is the new KF image (step S218). Whether or not it is a new KF image can be determined by, for example, whether or not the difference from the latest KF is large, or whether or not there is a specified time difference. Here, if the acquired image is not the new KF image (step S218: No), the process returns to step S214 without doing anything.

一方、取得した画像が、新ＫＦ画像である場合（ステップＳ２１８：Ｙｅｓ）は、ステップＳ２１９へ移行する。ここまでが、後述するＫＦ（キーフレーム）更新部５２２によって実行される処理である。 On the other hand, if the acquired image is the new KF image (step S218: Yes), the process proceeds to step S219. The above is the processing executed by the KF (key frame) update unit 522, which will be described later.

つぎに、既存のＫＦ画像内で特徴点と同じ特徴で、対応付けしていない特徴があるか否かを判断する（ステップＳ２１９）。ここで、対応付けしていない特徴がない場合（ステップＳ２１９：Ｎｏ）は、ステップＳ２１４へ戻る。 Next, it is determined whether or not there is an uncorrelated feature that is the same as the feature point in the existing KF image (step S219). Here, if there is no uncorrelated feature (step S219: No), the process returns to step S214.

一方、対応付けしていない特徴がある場合（ステップＳ２１９：Ｙｅｓ）は、見つけた既存ＫＦ画像と現（ＫＦ）画像の、画像内特徴位置と撮影位置姿勢を使って、三角測量を実施する（ステップＳ２２０）。そして、特徴点の特徴と、三角測量による３次元位置を、環境マップに追加更新する（ステップＳ２２１）。ここまでが、後述する３Ｄマップ特徴点更新部５３１によって実行される処理である。 On the other hand, if there are unmatched features (step S219: Yes), triangulation is performed using the in-image feature positions and shooting positions and orientations of the found existing KF image and the current (KF) image ( step S220). Then, the features of the feature points and the three-dimensional positions obtained by triangulation are additionally updated to the environment map (step S221). The above is the processing executed by the 3D map feature point updating unit 531, which will be described later.

その後、追加更新した環境マップとＫＦ群を使って、現（ＫＦ）画像の位置姿勢と環境マップを最適化する（ステップＳ２２２）。ここまでが、後述するグラフ制約生成部５３２、ＫＦ姿勢・特徴点マップ最適化部５３３によって実行される処理である。 After that, the additionally updated environment map and KFs are used to optimize the pose and environment map of the current (KF) image (step S222). The processes up to this point are executed by the graph constraint generation unit 532 and the KF posture/feature point map optimization unit 533, which will be described later.

その後、ステップＳ２１４へ戻る。そして、ステップＳ２１４において、すべての映像を処理した場合（ステップＳ２１４：Ｙｅｓ）は、図２Ａに戻って、一連の処理を終了する。 After that, the process returns to step S214. Then, in step S214, if all the images have been processed (step S214: Yes), the process returns to FIG. 2A and the series of processes ends.

このように、本実施の形態にかかる移動体位置推定方法は、カメラ映像の設置位置（撮影方向）や、フレームレート、従来Ｖ－ＳＬＡＭの実施結果などから、映像を時間逆行するかを判定し、適宜、時間逆行させた画像群でＶ－ＳＬＡＭ処理することで、被写体およびその特徴点が画像内でより大きく映った画像を用いて三角測量をおこなうことができ、該特徴点の３Ｄ位置で構成された環境マップの精度を向上させると共に、該マップを使って推定算出するカメラの撮影位置・姿勢の推定精度を向上させるものである。 As described above, the method for estimating the position of a moving object according to the present embodiment determines whether or not the video is time-reversed based on the installation position (shooting direction) of the camera video, the frame rate, the result of conventional V-SLAM, and the like. By appropriately performing V-SLAM processing on a group of images that are reversed in time, triangulation can be performed using an image in which the subject and its feature points appear larger in the image, and the 3D position of the feature point This improves the accuracy of the constructed environment map and improves the accuracy of estimating the photographing position/orientation of the camera that is estimated and calculated using the map.

（Ｖ－ＳＬＡＭの環境マップの比較）
図３Ａは、実施の形態にかかる移動体位置推定方法における逆時刻によるＶ－ＳＬＡＭの環境マップの一例を示す説明図である。また、図３Ｂは、従来技術にかかる移動体位置推定方法における順時刻によるＶ－ＳＬＡＭの環境マップの一例を示す説明図である。図３Ａと図３Ｂとは、その比較を示している。 (Comparison of V-SLAM environment map)
FIG. 3A is an explanatory diagram showing an example of a V-SLAM environment map in reverse time in the moving body position estimation method according to the embodiment. FIG. 3B is an explanatory diagram showing an example of a V-SLAM environment map based on forward time in the mobile body position estimation method according to the prior art. Figures 3A and 3B show the comparison.

図３Ａ、図３Ｂは、環境マップの画像特徴の３Ｄ位置を、上空俯瞰した画像である。図３Ａ、図３Ｂは、後述する図４Ａ、図４Ｂのように、画像のほぼ中央を上から下へ伸びる道路を走行する移動体の映像を使って、同じ処理区間に対してＶ－ＳＬＡＭで作成された環境マップを示しており、各ドットはマップ特徴点群を示している。 3A and 3B are overhead images of the 3D positions of the image features of the environment map. 3A and 3B, like FIGS. 4A and 4B to be described later, V-SLAM is used for the same processing section using an image of a mobile object running on a road extending from top to bottom in the center of the image. The created environment map is shown, and each dot represents a map feature point group.

図３Ａにおいて、逆画像を取得した場合に、逆時刻マップ特徴点群は、大きく被写体が映る画像で特徴対応付けと三角測量をおこなうため、本来の特徴のみのマップとなり、正しく道路周辺に３Ｄ位置が集中していることがわかる。一方、図３Ｂにおいて、逆画像を取得していない場合に、環境マップの特徴点が画像中央に縦に存在する道路より大きく離れた場所まで散らばっており、また、誤った特徴対応付けの結果、間違った特徴対応付けも増えてしまい、路面微小凹凸の特徴に対応するマップ特徴点も増えている。 In FIG. 3A, when the reverse image is acquired, the reverse time map feature point group is a map of only the original features because feature correspondence and triangulation are performed on the image in which the subject is shown in a large size, and the 3D position is correctly located around the road. are found to be concentrated. On the other hand, in FIG. 3B, when the reverse image is not acquired, the feature points of the environment map are scattered to places far away from the road vertically existing in the center of the image, and as a result of erroneous feature matching, The erroneous feature correspondence has also increased, and the number of map feature points corresponding to the features of minute unevenness on the road surface has also increased.

このように、逆画像を取得することにより、生成する特徴点の３次元位置群である周辺環境マップの位置精度が向上して、位置誤差による特徴点位置の散らばりがなくなっていることがわかる。 In this way, it can be seen that by acquiring the reverse image, the positional accuracy of the surrounding environment map, which is a group of three-dimensional positions of feature points to be generated, is improved, and the scattering of feature point positions due to positional errors is eliminated.

（撮影位置・姿勢の推定結果の比較）
図４Ａは、実施の形態にかかる移動体位置推定方法における周辺環境マップの一例を示す説明図である。また、図４Ｂは、従来技術にかかる移動体位置推定方法における周辺環境マップの一例を示す説明図である。図４Ａと図４Ｂとは、撮影位置・姿勢の推定結果の比較を示している。 (Comparison of estimation results of shooting position/orientation)
FIG. 4A is an explanatory diagram showing an example of a surrounding environment map in the mobile body position estimation method according to the embodiment. Also, FIG. 4B is an explanatory diagram showing an example of a surrounding environment map in the mobile body position estimation method according to the prior art. FIG. 4A and FIG. 4B show a comparison of estimation results of shooting position/orientation.

図４Ａと図４Ｂは、上空俯瞰した画像で、処理映像の同じ走行区間を示している。図４Ａにおいて、逆画像を取得した場合には、ほぼ走行区間全体で撮影位置・姿勢を推定できている。一方、図４Ｂにおいて、逆画像を取得していない場合には、環境マップ精度が低いために、撮影位置・姿勢の推定ができた画像が非常に少ない。 4A and 4B are aerial overhead images showing the same travel segment of the processed video. In FIG. 4A, when the reverse image is acquired, the shooting position/orientation can be estimated in almost the entire traveling section. On the other hand, in FIG. 4B, when the reverse image is not acquired, the environment map accuracy is low, so the number of images for which the shooting position/orientation can be estimated is very small.

このように、初期の特徴点３Ｄ位置算出をおこなうタイミングが、従来方式おける「最も遠方に被写体が映った時（遠方に見え始めた時）」だったものが、本発明の実施にかかる移動体位置推定方法により、「最も近傍に被写体が映った時（最後に見えた時）」に変更することが可能になるため、入力映像状況に応じて、必要なら、被写体に対してより大きな映像内変化が得られる画像を使って実施することができる。このため、映像内変化から推定する特徴点の３次元位置推定の精度が向上し、高精度な周辺環境マップ（周辺特徴点群の３次元位置群）が得られる。 In this way, the initial timing for calculating the 3D positions of the feature points is "when the subject appears at the farthest point (when the subject starts to appear far away)" in the conventional method, but the moving object according to the embodiment of the present invention is Since the position estimation method makes it possible to change to "when the subject is closest (when it was last seen)", depending on the input image situation, if necessary, the subject can be positioned within a larger image. It can be implemented using an image from which the change is obtained. Therefore, the accuracy of 3D position estimation of feature points estimated from intra-video changes is improved, and a highly accurate surrounding environment map (3D position group of surrounding feature points) can be obtained.

この結果、課題だった生成する周辺環境マップ（画像特徴点の３次元位置群）の位置精度が向上し、環境マップから推定した画像内容と実際の映像の映りの齟齬が拡大することなくＶ－ＳＬＡＭ処理を実施できるので、撮影位置姿勢推定処理の失敗（以後推定処理できず、撮影位置姿勢が推定できない区間が生じる）が起こりにくくなり、撮影位置姿勢の推定区間が伸びると共に、その精度も向上することができる。 As a result, the positional accuracy of the generated surrounding environment map (three-dimensional position group of image feature points), which had been a problem, has been improved, and the discrepancy between the image content estimated from the environment map and the actual image has not increased. Since SLAM processing can be performed, failure of shooting position/orientation estimation processing (estimation processing cannot be performed thereafter, resulting in an interval where the shooting position/orientation cannot be estimated) is less likely to occur, and the interval for estimating the shooting position/orientation is extended, and accuracy is also improved. can do.

（システム構成例）
つぎに、実施の形態にかかる移動体位置推定システム５００のシステム構成について説明する。図５は、実施の形態にかかる移動体位置推定システムのシステム構成の一例を示す説明図である。 (System configuration example)
Next, the system configuration of the mobile body position estimation system 500 according to the embodiment will be described. FIG. 5 is an explanatory diagram of an example of the system configuration of the mobile position estimation system according to the embodiment.

図５において、移動体位置推定システム５００は、移動体位置推定装置の一例であるサーバ５０１と、移動体５０３に搭載された、映像を収集する情報収集装置の一例である車載機５０２とを備える。そして、サーバ５０１と車載機５０２とが、ネットワーク５０４によって接続されることにより、移動体位置推定システム５００を構成する。また、移動体位置推定システム５００は、図示は省略するが、クラウドコンピューティングシステムによって、その機能を実現するようにしてもよい。また、車載機５０２は、衛星５０５からのＧＮＳＳ情報を収集するようにしてもよい。 In FIG. 5, a mobile body position estimation system 500 includes a server 501 that is an example of a mobile body position estimation device, and an on-vehicle device 502 that is an example of an information collection device that collects images and is mounted on a mobile body 503. . The server 501 and the vehicle-mounted device 502 are connected via a network 504 to form a mobile body position estimation system 500 . Further, although not shown, the mobile body position estimation system 500 may realize its functions by a cloud computing system. In-vehicle device 502 may also collect GNSS information from satellite 505 .

サーバ５０１は、逆行判定部５１１と、逆行画像取得部５１２と、初期姿勢・座標系設定部５１３と、フレーム姿勢推定部５２１と、ＫＦ（キーフレーム）更新部５２２と、３Ｄマップ特徴点更新部５３１と、グラフ制約生成部５３２と、ＫＦ姿勢・特徴点マップ最適化部５３３と、ループ検出・クロージング部５４１の各機能部を有する。各構成部５１１～５１３、５２１、５２２、５３１～５３３、５４１によって、サーバ５０１の制御部を構成することができる。これらの構成部の詳細については、後述する。 The server 501 includes a retrograde determination unit 511, a retrograde image acquisition unit 512, an initial orientation/coordinate system setting unit 513, a frame orientation estimation unit 521, a KF (key frame) update unit 522, and a 3D map feature point update unit. 531 , graph constraint generation unit 532 , KF posture/feature point map optimization unit 533 , and loop detection/closing unit 541 . A control unit of the server 501 can be configured by the configuration units 511 to 513, 521, 522, 531 to 533, and 541. FIG. Details of these components will be described later.

また、サーバ５０１は、ＫＦ群情報５５１および特徴点群情報５５２などを記憶する実座標環境マップ５５０を備えている。あるいは、サーバ５０１は、実座標環境マップ５５０とアクセス可能に接続されている。 The server 501 also has a real coordinate environment map 550 that stores KF group information 551, feature point group information 552, and the like. Alternatively, server 501 is operably connected to real coordinate environment map 550 .

すなわち、実座標環境マップ５５０は、サーバ５０１内に設けられて（記憶されて）いてもよく、また、実座標環境マップ５５０は、図示を省略する別のサーバ内に設けられ、ネットワーク５０４などのネットワークによってサーバ５０１と接続されていてもよい。実座標環境マップ５５０の詳細については、後述する。 That is, the real coordinate environment map 550 may be provided (stored) in the server 501, or the real coordinate environment map 550 may be provided in another server (not shown) and stored in the network 504 or the like. It may be connected to the server 501 via a network. Details of the real coordinate environment map 550 will be described later.

そして、上記構成部は、大きく分けて４つの機能部に分けることができる。逆行判定部５１１と、逆行画像取得部５１２と、初期姿勢・座標系設定部５１３と、によって、システムの初期化処理機能５１０を実現することができる。また、フレーム姿勢推定部５２１と、ＫＦ更新部５２２と、によって、位置姿勢推定（トラッキング）処理機能５２０を実現することができる。 The above configuration can be roughly divided into four functional units. The system initialization processing function 510 can be realized by the retrograde determination unit 511 , the retrograde image acquisition unit 512 , and the initial posture/coordinate system setting unit 513 . A position/orientation estimation (tracking) processing function 520 can be implemented by the frame orientation estimation unit 521 and the KF update unit 522 .

また、３Ｄマップ特徴点更新部５３１と、グラフ制約生成部５３２と、ＫＦ姿勢・特徴点マップ最適化部５３３と、によって、環境マップ作成（ローカルマッピング）処理機能５３０を実現することができる。また、ループ検出・クロージング部５４１によって、ループクローズ処理機能５４０を実現することができる。 An environment map creation (local mapping) processing function 530 can be realized by the 3D map feature point update unit 531, the graph constraint generation unit 532, and the KF posture/feature point map optimization unit 533. A loop close processing function 540 can be implemented by the loop detection/closing unit 541 .

移動体５０３は、具体的には、たとえば、情報を収集するコネクテッドカーであるが、これには限定されない。一般乗用車やタクシーなどの営業車、二輪車（自動二輪や自転車）、大型車（バスやトラック）などであってもよい。また、移動体５０３には、水上を移動する船舶や上空を移動する航空機、無人航空機（ドローン）、自動走行ロボットなどであってもよい。 Mobile object 503 is, for example, a connected car that collects information, but is not limited to this. Commercial vehicles such as general passenger cars and taxis, motorcycles (motorcycles and bicycles), large vehicles (buses and trucks), and the like may be used. The moving object 503 may be a ship that moves on water, an aircraft that moves in the sky, an unmanned aerial vehicle (drone), an automatic traveling robot, or the like.

車載機５０２は、撮影映像に関する情報を収集する。また、車載機５０２は、ＧＮＳＳ情報を含む移動体５０３の情報を収集するようにしてもよい。移動体５０３の情報には、移動体５０３から収集する、移動体５０３の姿勢情報なども含まれる。 The in-vehicle device 502 collects information about captured images. Also, the vehicle-mounted device 502 may collect information on the mobile object 503 including GNSS information. The information on the mobile object 503 includes posture information of the mobile object 503 collected from the mobile object 503 .

移動体５０３には、車載機５０２が備えられている。車載機５０２は、移動体５０３に搭載された専用の装置であってもよく、取り外し可能な機器であってもよい。また、スマートフォンやタブレットなどの通信機能を備えた携帯端末装置を移動体５０３において利用するものであってもよい。また、車載機５０２の機能を、移動体５０３が備えている機能を用いて実現するようにしてもよい。 A mobile unit 503 is provided with an on-vehicle device 502 . The in-vehicle device 502 may be a dedicated device mounted on the moving body 503, or may be a removable device. Also, a mobile terminal device having a communication function such as a smart phone or a tablet may be used in the moving object 503 . Also, the functions of the vehicle-mounted device 502 may be realized using the functions of the moving object 503 .

したがって、車載機５０２の『車載』という表現は、移動体に搭載された専用装置という意味には限定されない。車載機５０２は、移動体５０３における情報を収集し、収集した情報をサーバ５０１に対して送信できる機能を持った装置であれば、どのような形態の装置であってもよい。 Therefore, the expression "vehicle-mounted" of the vehicle-mounted device 502 is not limited to the meaning of a dedicated device mounted on a mobile object. The in-vehicle device 502 may be any type of device as long as it has a function of collecting information on the mobile body 503 and transmitting the collected information to the server 501 .

車載機５０２は、撮影映像に関する情報およびＧＮＳＳ情報を含む移動体５０３の情報（車載データ）を取得し、取得した車載データを保存するようにしてもよい。そして、保存した車載データを、無線通信によって、ネットワーク５０４を介して、サーバ５０１へ送信する。また、サーバ５０１から配信されたプログラムを含む各種データを、ネットワーク５０４を介して、無線通信により受信する。 The vehicle-mounted device 502 may acquire information (vehicle-mounted data) of the moving object 503 including information on the captured image and GNSS information, and store the acquired vehicle-mounted data. Then, the stored in-vehicle data is transmitted to the server 501 via the network 504 by wireless communication. Also, various data including programs distributed from the server 501 are received by wireless communication via the network 504 .

また、車載機５０２は、近距離通信機能により、近くを走行中の別の移動体５０３の情報を取得し、サーバ５０１へ送信するようにしてもよい。また、車載機５０２どうしが、近距離通信機能により、通信をおこない、他の車載機５０２を介して、サーバ５０１との通信をおこなうようにしてもよい。 Also, the vehicle-mounted device 502 may acquire information on another moving object 503 running nearby by using the short-range communication function, and transmit the information to the server 501 . Also, the vehicle-mounted devices 502 may communicate with each other by the short-range communication function, and communicate with the server 501 via other vehicle-mounted devices 502 .

このようにして、移動体位置推定システム５００において、サーバ５０１は、移動体５０３に搭載された車載機５０２から車載データを取得するとともに、各車載機５０２へ各種データを配信することができる。 In this manner, in the mobile body position estimation system 500 , the server 501 can acquire vehicle data from the vehicle-mounted device 502 mounted on the mobile body 503 and distribute various data to each vehicle-mounted device 502 .

また、車載機５０２は、通信手段を備えていなくてよい。すなわち、車載機５０２は、サーバ５０１とは、ネットワーク５０４を介して接続されていなくてもよい。その場合は、車載機５０２に蓄積されたデータは、オフラインで（たとえば、記録メディアを介して人手などにより）、サーバ５０１に入力することができる。 Also, the vehicle-mounted device 502 does not have to have communication means. That is, the in-vehicle device 502 does not have to be connected to the server 501 via the network 504 . In that case, the data accumulated in the vehicle-mounted device 502 can be input to the server 501 off-line (for example, manually via recording media).

図５において、サーバ５０１が、逆行判定部５１１と、逆行画像取得部５１２と、初期姿勢・座標系設定部５１３と、フレーム姿勢推定部５２１と、ＫＦ更新部５２２と、３Ｄマップ特徴点更新部５３１と、グラフ制約生成部５３２と、ＫＦ姿勢・特徴点マップ最適化部５３３と、ループ検出・クロージング部５４１の各機能部を有する構成とした。図示は省略するが、これらの各機能部の少なくとも一つを、サーバ５０１に加えて、あるいは、サーバに代えて、車載機５０２が有するようにしてもよい。 5, the server 501 includes a retrograde determination unit 511, a retrograde image acquisition unit 512, an initial orientation/coordinate system setting unit 513, a frame orientation estimation unit 521, a KF update unit 522, and a 3D map feature point update unit. 531 , graph constraint generation unit 532 , KF posture/feature point map optimization unit 533 , and loop detection/closing unit 541 . Although illustration is omitted, at least one of these functional units may be included in the vehicle-mounted device 502 in addition to the server 501 or instead of the server.

車載機５０２が、各機能部５１１～５１３、５２１、５２２、５３１、５３２、５３３、５４１の少なくとも一つを有する場合は、サーバ５０１が実施する処理の内容と同じであってもよい。ただし、３Ｄ地図マップ情報は、任意の媒体（ＤＶＤ／ＢＬディスク、ＨＤＤなど）に保持していて利用する以外にも、適宜、図示を省略する外部サーバから無線ネットなどを経由して取得するようにしてもよい。 If the vehicle-mounted device 502 has at least one of the functional units 511 to 513, 521, 522, 531, 532, 533, and 541, the contents of the processing performed by the server 501 may be the same. However, the 3D map information may be stored in any medium (DVD/BL disc, HDD, etc.) and used, or may be obtained from an external server (not shown) via a wireless network or the like as appropriate. can be

このように、移動体位置推定システム５００は、逆行判定部５１１、逆行画像取得部５１２、初期姿勢・座標系設定部５１３、フレーム姿勢推定部５２１、キーフレーム（ＫＦ）更新部５２２、３Ｄマップ特徴点更新部５３１、グラフ制約生成部５３２、ＫＦ姿勢・特徴点マップ最適化部５３３、ループ検出・クロージング部５４１、の９つのＶ－ＳＬＡＭをベースとする処理部を持ち、かつ、実座標環境マップ５５０（ＫＦ群情報５５１、特徴点群情報５５２）、全画像位置姿勢データ５６０、の２つの内部保持データを持っている。 As described above, the moving body position estimation system 500 includes a retrograde determination unit 511, a retrograde image acquisition unit 512, an initial orientation/coordinate system setting unit 513, a frame orientation estimation unit 521, a key frame (KF) update unit 522, a 3D map feature It has nine V-SLAM-based processing units: a point update unit 531, a graph constraint generation unit 532, a KF posture/feature point map optimization unit 533, and a loop detection/closing unit 541, and a real coordinate environment map 550 (KF group information 551, feature point group information 552) and all image position/orientation data 560 are held internally.

（移動体位置推定装置のハードウェア構成例）
図６は、移動体位置推定装置のハードウェア構成の一例を示すブロック図である。移動体位置推定装置の一例であるサーバ５０１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６０１と、メモリ６０２と、ネットワークＩ／Ｆ（Ｉｎｔｅｒｆａｃｅ）６０３と、記録媒体Ｉ／Ｆ６０４と、記録媒体６０５と、を有する。また、各構成部は、バス６００によってそれぞれ接続される。 (Hardware configuration example of mobile position estimation device)
FIG. 6 is a block diagram showing an example of the hardware configuration of the mobile body position estimation device. A server 501, which is an example of a mobile position estimation device, has a CPU (Central Processing Unit) 601, a memory 602, a network I/F (Interface) 603, a recording medium I/F 604, and a recording medium 605. . Also, each component is connected by a bus 600 .

ここで、ＣＰＵ６０１は、サーバ（移動体位置推定装置）５０１の全体の制御を司る。メモリ６０２は、たとえば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびフラッシュＲＯＭなどを有する。具体的には、たとえば、フラッシュＲＯＭやＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ６０１のワークエリアとして使用される。メモリ６０２に記憶されるプログラムは、ＣＰＵ６０１にロードされることで、コーディングされている処理をＣＰＵ６０１に実行させる。 Here, the CPU 601 controls the entire server (mobile body position estimation device) 501 . The memory 602 has, for example, ROM (Read Only Memory), RAM (Random Access Memory), flash ROM, and the like. Specifically, for example, a flash ROM or ROM stores various programs, and a RAM is used as a work area for the CPU 601 . A program stored in the memory 602 is loaded into the CPU 601 to cause the CPU 601 to execute coded processing.

ネットワークＩ／Ｆ６０３は、通信回線を通じてネットワーク５０４に接続され、ネットワーク５０４を介して他の装置（たとえば、車載機５０２、実座標環境マップ５５０や全画像位置姿勢データ５６０が格納される装置、あるいは、他のサーバやシステム）に接続される。そして、ネットワークＩ／Ｆ６０３は、ネットワーク５０４と自装置内部とのインターフェースを司り、他の装置からのデータの入出力を制御する。ネットワークＩ／Ｆ６０３には、たとえば、モデムやＬＡＮアダプタなどを採用することができる。 The network I/F 603 is connected to the network 504 via a communication line, and via the network 504, another device (for example, the vehicle-mounted device 502, the device in which the real coordinate environment map 550 and the total image position and orientation data 560 are stored, or other servers or systems). A network I/F 603 serves as an interface between the network 504 and the inside of the device itself, and controls input/output of data from other devices. For network I/F 603, for example, a modem, LAN adapter, or the like can be adopted.

記録媒体Ｉ／Ｆ６０４は、ＣＰＵ６０１の制御にしたがって記録媒体６０５に対するデータのリード／ライトを制御する。記録媒体６０５は、記録媒体Ｉ／Ｆ６０４の制御で書き込まれたデータを記憶する。記録媒体６０５としては、たとえば、磁気ディスク、光ディスクなどが挙げられる。 A recording medium I/F 604 controls reading/writing of data from/to the recording medium 605 under the control of the CPU 601 . The recording medium 605 stores data written under control of the recording medium I/F 604 . Examples of recording medium 605 include a magnetic disk and an optical disk.

なお、サーバ５０１は、上述した構成部のほかに、たとえば、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、キーボード、ポインティングデバイス、ディスプレイなどを有していてもよい。 Note that the server 501 may have, for example, an SSD (Solid State Drive), a keyboard, a pointing device, a display, etc., in addition to the components described above.

（車載機のハードウェア構成例）
図７は、車載機のハードウェア構成の一例を示すブロック図である。情報収集装置の一例である車載機５０２は、ＣＰＵ７０１と、メモリ７０２と、無線通信装置７０３と、移動体Ｉ／Ｆ７０４と、撮像装置７０５と、受信装置７０６を有する。また、各構成部は、バス７００によってそれぞれ接続される。 (Hardware configuration example of in-vehicle device)
FIG. 7 is a block diagram showing an example of the hardware configuration of the vehicle-mounted device. A vehicle-mounted device 502 , which is an example of an information collection device, has a CPU 701 , a memory 702 , a wireless communication device 703 , a mobile body I/F 704 , an imaging device 705 , and a receiving device 706 . Also, each component is connected by a bus 700 .

ＣＰＵ７０１は、車載機５０２の全体の制御を司る。メモリ７０２は、たとえば、ＲＯＭ、ＲＡＭおよびフラッシュＲＯＭなどを有する。具体的には、たとえば、フラッシュＲＯＭやＲＯＭが各種プログラムを記憶し、ＲＡＭがＣＰＵ７０１のワークエリアとして使用される。メモリ７０２に記憶されるプログラムは、ＣＰＵ７０１にロードされることで、コーディングされている処理をＣＰＵ７０１に実行させる。 The CPU 701 controls the entire vehicle-mounted device 502 . The memory 702 has, for example, ROM, RAM and flash ROM. Specifically, for example, a flash ROM or ROM stores various programs, and a RAM is used as a work area for the CPU 701 . A program stored in the memory 702 is loaded into the CPU 701 to cause the CPU 701 to execute coded processing.

無線通信装置７０３は、発信された電波を受信したり、電波を発信したりする。アンテナと受信装置とを含む構成であり、各種通信規格による移動通信（具体的には、たとえば、３Ｇ、４Ｇ、５Ｇ、ＰＨＳ通信など）、Ｗｉ－Ｆｉ（登録商標）などの通信を送受信する機能を備えている。 The wireless communication device 703 receives and transmits radio waves that have been transmitted. A configuration including an antenna and a receiving device, and a function of transmitting and receiving communication such as mobile communication according to various communication standards (specifically, 3G, 4G, 5G, PHS communication, etc.), Wi-Fi (registered trademark), etc. It has

移動体Ｉ／Ｆ７０４は、移動体５０３と車載機５０２の自装置内部とのインターフェースを司り、移動体５０３からのデータの入出力を制御する、したがって、車載機５０２は、移動体Ｉ／Ｆ７０４を介して移動体５０３が備えるＥＣＵ（各種センサなどを含む）７０７から情報を収集する。移動体Ｉ／Ｆ７０４は、具体的には、たとえば、有線により接続する際に用いるコネクタや近距離無線通信（具体的には、たとえば、Ｂｌｕｅｔｏｏｔｈ（登録商標））装置などであってもよい。 The mobile unit I/F 704 serves as an interface between the mobile unit 503 and the vehicle-mounted device 502 and controls input/output of data from the mobile unit 503. Therefore, the vehicle-mounted unit 502 uses the mobile unit I/F 704 Information is collected from an ECU (including various sensors and the like) 707 provided in the moving body 503 via the mobile body 503 . Specifically, the mobile unit I/F 704 may be, for example, a connector used for wired connection, a short-range wireless communication (specifically, for example, Bluetooth (registered trademark)) device, or the like.

受信装置（たとえばＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ）受信装置などのＧＮＳＳ受信装置）７０６は、複数の衛星５０５からの電波を受信し、受信した電波に含まれる情報から、地球上の現在位置を算出する。 A receiver (for example, a GNSS receiver such as a GPS (Global Positioning System) receiver) 706 receives radio waves from a plurality of satellites 505 and calculates the current position on the earth from information included in the received radio waves.

撮像装置（たとえばカメラなど）７０５は、静止画や動画を撮像する機器である。具体的には、たとえば、レンズと撮像素子とを備える構成である。撮像装置７０５による撮像画像は、メモリ７０２に保存される。また、カメラなどの撮像装置７０５は、画像認識機能や、バーコードやＱＲコード（登録商標）を読み取る機能や、ＯＭＲ（ＯｐｔｉｃａｌＭａｒｋＲｅａｄｅｒ）、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）機能などを備えていてもよい。 An imaging device (for example, a camera) 705 is a device that captures still images and moving images. Specifically, for example, it is a configuration including a lens and an imaging device. An image captured by the imaging device 705 is stored in the memory 702 . In addition, the imaging device 705 such as a camera may have an image recognition function, a barcode and QR code (registered trademark) reading function, an OMR (Optical Mark Reader) function, an OCR (Optical Character Reader) function, and the like. .

図７に示したように、撮像装置（カメラなど）７０５および受信装置（ＧＮＳＳ受信装置など）７０６は、車載機５０２が備えていてもよく、また、移動体５０３が備えていたり、別途、外付けされたものを用いるようにしてもよい。その際、撮像装置７０５あるいは受信装置７０６、車載機５０２とのデータのやりとりは、有線または無線通信によりおこなうようにしてもよい。 As shown in FIG. 7, an imaging device (such as a camera) 705 and a receiving device (such as a GNSS receiving device) 706 may be included in the vehicle-mounted device 502, or may be included in the mobile unit 503, or may be separately provided externally. You may make it use what was attached. At this time, data exchange with the imaging device 705, the receiving device 706, and the vehicle-mounted device 502 may be performed by wired or wireless communication.

撮像装置７０５やＧＮＳＳ受信装置７０６を、車載機５０２が備えていない場合は、移動体Ｉ／Ｆ７０４などを介して、それらの情報を取得するようにしてもよい。また、車載機５０２は、図示は省略するが、各種入力装置、ディスプレイ、メモリカードなどの記録媒体の読み書き用のインターフェース、各種入力端子などを備えていてもよい。 If the vehicle-mounted device 502 does not have the imaging device 705 or the GNSS receiver 706, the information thereof may be acquired via the mobile I/F 704 or the like. Although not shown, the in-vehicle device 502 may include various input devices, a display, an interface for reading and writing recording media such as memory cards, and various input terminals.

（実座標環境マップの内容）
図８は、実座標環境マップのデータ構成の一例を示す説明図である。図８において、実座標環境マップ５５０は、ＫＦ群情報５５１と、特徴点群情報（３次元位置情報）５５２ａと、特徴点群情報（ＫＦ画像内位置情報）５５２ｂと、を有する。 (Contents of real coordinate environment map)
FIG. 8 is an explanatory diagram showing an example of the data configuration of the real coordinate environment map. 8, the real coordinate environment map 550 has KF group information 551, feature point group information (three-dimensional position information) 552a, and feature point group information (KF intra-image position information) 552b.

ここで、既存技術の環境地図に相当するものが、実座標環境マップ５５０であり、各画像特徴点の３次元位置（特徴点群情報（３次元位置情報）５５２ａ）の他に、どの画像特徴点はどの画像で閲覧されたか（用いるか）に関する情報を持っている。これを、特徴点群情報（ＫＦ画像内位置情報）５５２ｂと、ＫＦ群情報５５１の２つで表している。主要な映像内の画像（ＫＦ）群の情報であるＫＦ群情報５５１と、該ＫＦ画像上に各画像特徴が映っている２次元位置に関する情報である特徴点群情報（ＫＦ画像内位置情報）５５２ｂは、環境地図を任意の画像の位置・姿勢推定に用いるために必須な情報である。 Here, what corresponds to the environment map of the existing technology is the real coordinate environment map 550, and in addition to the three-dimensional position of each image feature point (feature point group information (three-dimensional position information) 552a), which image feature A point has information about which image was viewed (used). This is represented by two feature point group information (KF image position information) 552 b and KF group information 551 . KF group information 551, which is information on the image (KF) group in the main video, and feature point group information (KF image position information), which is information on the two-dimensional position where each image feature appears on the KF image. 552b is information essential for using the environment map for estimating the position and orientation of any image.

図８に示すように、ＫＦ群情報５５１は、「ＩＤ」、「親ＫＦＩＤ」、「子ＫＦＩＤ」、「ループＫＦＩＤ」、「姿勢情報」、「位置情報」、「特徴量」、「ＧＮＳＳ位置」および「映像フレーム番号」を含む各種情報を持っている。なお、「ＧＮＳＳ位置」に関する情報を持っていなくてもよい。 As shown in FIG. 8, the KF group information 551 includes "ID", "parent KF ID", "child KF ID", "loop KF ID", "attitude information", "position information", "feature amount", It has various information including "GNSS position" and "video frame number". In addition, it is not necessary to have information about the "GNSS position".

ここで、「ＩＤ」は、当該ＫＦの情報を識別する一意の識別情報であり、「親ＫＦＩＤ」および「子ＫＦＩＤ」は、ＫＦどうしをつなぐ情報であり、「ループＫＦＩＤ」は、ループクローズ処理などで使う情報である。 Here, "ID" is unique identification information that identifies the information of the KF, "parent KF ID" and "child KF ID" are information that connects KFs, and "loop KF ID" is This information is used for loop closing processing and the like.

また、「姿勢情報」・「位置情報」は、ＫＦの推定撮影位置・姿勢情報であり、「特徴量」は、任意の画像に対して似た画像か否かを判断するのに用いる画像全体としての特徴量であり、「ＧＮＳＳ位置」は、新たに入力ＧＮＳＳ情報に相当する該ＫＦの撮影時のＧＮＳＳ位置であり、「映像フレーム番号」は、対応する映像のフレーム番号である。 "Position information" and "Position information" are estimated shooting position and orientation information of KF, and "Feature amount" is the entire image used to judge whether or not an arbitrary image is similar. "GNSS position" is the GNSS position at the time of shooting of the KF corresponding to the new input GNSS information, and "video frame number" is the frame number of the corresponding video.

図８に示すように、特徴点群情報（３次元位置情報）５５２ａは、「ＩＤ」、「位置座標」、「特徴量」および「観測ＫＦのＩＤ群」を含む各種情報を持っている。 As shown in FIG. 8, the feature point group information (three-dimensional position information) 552a has various information including "ID", "position coordinates", "feature amount" and "ID group of observed KF".

ここで、「ＩＤ」は、当該特徴点情報を識別する一意の識別情報であり、「位置座標」は、推定した特徴点の実座標位置座標であり、「特徴量」は、画像特徴であり、「観測ＫＦのＩＤ群」は、当該特徴点が映っているＫＦの情報であり、ＫＦ群情報５５１の中の該当するＫＦ情報の「ＩＤ」が関連付けされる。なお、実座標位置座標は、初期姿勢・座標系設定部で作成した任意の実座標変換を利用して実座標化するものとして、ローカル値で保持していてもよい。 Here, the “ID” is unique identification information for identifying the feature point information, the “positional coordinates” are the actual coordinates of the estimated feature point, and the “feature amount” is the image feature. , “ID group of observed KF” is information of the KF in which the feature point is shown, and “ID” of the corresponding KF information in the KF group information 551 is associated. Note that the actual coordinate position coordinates may be stored as local values assuming that they are converted into actual coordinates using arbitrary real coordinate transformation created by the initial attitude/coordinate system setting unit.

特徴点群情報（ＫＦ画像内位置情報）５５２ｂは、ＫＦ画像から抽出された画像特徴点群の情報であり、複数のＫＦ画像から同時閲覧されて選定されて３次元位置を持つ特徴点群と、３次元位置を持たない特徴点群の二種類が存在する。３次元位置を持たないＫＦ特徴点群は、任意の画像が該ＫＦ画像と似ているかを詳細評価するのに使ったり、将来新たなＫＦ画像が得られた時に新しく選定されて３次元位置を持つ特徴点群になるのに備えて、保持しておく。 The feature point group information (positional information in the KF image) 552b is information of the image feature point group extracted from the KF image, and is simultaneously viewed from a plurality of KF images and selected to have a feature point group having a three-dimensional position. , there are two types of feature points that do not have 3D positions. A KF feature point group that does not have a three-dimensional position is used for detailed evaluation of whether an arbitrary image resembles the KF image, or is newly selected when a new KF image is obtained in the future, and the three-dimensional position is used. It is held in preparation for becoming a feature point group having.

図８に示すように、特徴点群情報（ＫＦ画像内位置情報）５５２ｂは、「ＩＤ」、「ＫＦＩＤ」、「マップ点ＩＤ」、「特徴点位置」、「特徴点角度」および「縮小階層番号」を含む各種情報を持っている。 As shown in FIG. 8, the feature point group information (KF intra-image position information) 552b includes "ID", "KF ID", "map point ID", "feature point position", "feature point angle" and "reduction It has various information including "hierarchy number".

ここで、「ＩＤ」は、当該特徴点情報を識別する一意の識別情報である。「ＫＦＩＤ」は、当該ＫＦ特徴点を抽出したＫＦを特定するための情報であり、ＫＦ群情報５５１の中の該当するＫＦ情報の「ＩＤ」が関連付けされる。「マップ点ＩＤ」は、特徴点群情報（３次元位置情報）５５２ａへの参照情報であり、特徴点群情報（３次元位置情報）５５２ａの中の該当する特徴点情報の「ＩＤ」が関連付けされる。この「マップ点ＩＤ」は、複数のＫＦ画像から同時閲覧されて選定されて３次元位置を持つ特徴点群だけが持っており、３次元位置を持たない特徴点群は持っていない。 Here, "ID" is unique identification information for identifying the feature point information. The “KF ID” is information for specifying the KF that extracted the KF feature point, and is associated with the “ID” of the corresponding KF information in the KF group information 551 . "Map point ID" is reference information to the feature point group information (three-dimensional position information) 552a, and "ID" of the corresponding feature point information in the feature point group information (three-dimensional position information) 552a is associated be done. This "map point ID" is possessed only by feature point groups that are simultaneously browsed and selected from a plurality of KF images and have three-dimensional positions, and not possessed by feature point groups that do not have three-dimensional positions.

また、「特徴点位置」・「特徴点角度」は、たとえば、ＯＲＢ（ＯｒｉｅｎｔｅｄＦＡＳＴａｎｄＲｏｔａｔｅｄＢＲＩＥＦ）特徴の重心位置および方向ベクトルに関する情報である。また、「縮小階層番号」は、当該ＫＦ画像内での抽出状況に関する情報である。この「縮小階層番号」は、たとえば、画像特徴として縮小率を変えてピラミッド階層的に求めた縮小画像群を用いて算出したＯＲＢ特徴点を想定している場合に、縮小画像群のどれで抽出したのかに関する情報である。これら「特徴点位置」「特徴点角度」「縮小階層番号」などは、他の画像特徴を使う場合は、その特徴に合わせた情報にしてよい。 Also, the "feature point position" and "feature point angle" are, for example, information relating to the barycentric position and direction vector of ORB (Oriented FAST and Rotated Brief) features. Also, the "reduced hierarchy number" is information regarding the extraction status within the KF image. For example, when assuming ORB feature points calculated using a group of reduced images obtained in a pyramid hierarchy with different reduction ratios as image features, this "reduced layer number" is extracted from which group of reduced images. This is information about whether the If other image features are used, these "feature point positions", "feature point angles", and "reduced hierarchy numbers" may be information matching the features.

このようにして、実座標環境マップ５５０が形成され、ＫＦ群情報と特徴点群情報とが関連付けされて、記憶される。なお、一般的に特徴点ベースのＶ－ＳＬＡＭの環境マップは、画像特徴点群の画像特徴と３次元位置、該特徴点群を閲覧している画像ＫＦの情報、また、画像ＫＦと似た画像を検索できるようにするための画像ＫＦ内の画像特徴群を含むが、実座標環境マップ５５０は、従来のＶ－ＳＬＡＭの環境マップと同じデータであってもよい。また、実座標環境マップ５５０は、ＫＦ群情報５５１に「ＧＮＳＳ位置」情報を保持するようにしてもよい。 Thus, the real coordinate environment map 550 is formed, and the KF group information and the feature point group information are associated and stored. In general, the feature point-based V-SLAM environment map includes image features and three-dimensional positions of the image feature point group, information on the image KF viewing the feature point group, and information similar to the image KF. The real coordinate environment map 550 may be the same data as the conventional V-SLAM environment map, although it contains the image features in the image KF to allow the image to be searched. Also, the real coordinate environment map 550 may hold “GNSS position” information in the KF group information 551 .

（全画像位置姿勢データの内容）
図９は、全画像位置姿勢データのデータ構成の一例を示す説明図である。全画像位置姿勢データ５６０は、主要な画像で構成するＫＦとは異なり、すべての映像中の画像に対し、推定した撮影位置と姿勢を保持する。ここで、既存技術の映像内の全画像の撮影位置・姿勢に相当するものが、全画像位置姿勢データ５６０である。 (Contents of all image position and orientation data)
FIG. 9 is an explanatory diagram showing an example of the data configuration of all image position/orientation data. The all-image position/orientation data 560 holds the estimated photographing positions and orientations for all the images in the video, unlike the KF composed of the main images. Here, the all-image position/orientation data 560 corresponds to the photographing positions/orientations of all the images in the video of the existing technology.

図９に示すように、全画像位置姿勢データ５６０は、「ＩＤ」、「親ＫＦＩＤ」、「姿勢情報」、「位置情報」、「映像フレーム番号」を含む各種情報を持っている。ここで、「ＩＤ」は、当該位置姿勢データを識別する一意の識別情報である。「親ＫＦＩＤ」は、映像的に近く位置・姿勢を参照するＫＦの情報である。「姿勢情報」・「位置情報」は、親ＫＦからの相対位置および姿勢であり、「映像フレーム番号」は、対応する映像のフレーム番号である。 As shown in FIG. 9, the all image position/posture data 560 has various information including "ID", "parent KF ID", "posture information", "position information", and "video frame number". Here, “ID” is unique identification information that identifies the position and orientation data. "Parent KF ID" is information of a KF that refers to a position/orientation that is visually nearby. "Posture information" and "position information" are the relative position and posture from the parent KF, and "video frame number" is the frame number of the corresponding video.

位置姿勢情報は、たとえば、映像的に近いＫＦに対する相対位置・姿勢として保持しておき、最終的にＶ－ＳＬＡＭ結果を出力する際に、ＫＦの位置・姿勢を反映させながら、実座標値にする。このようにすることで、逐次的にＶ－ＳＬＡＭを処理する際に、ＫＦの位置・姿勢が最適化処理の途中で変化することを気にせずに、全画像の位置・姿勢を最終的なＫＦの位置・姿勢に合わせて簡単に算出することができる。また、位置姿勢情報は、ＫＦと同様に、実座標値でもローカル値で保持するようにしてもよい。 The position/orientation information is stored, for example, as a relative position/orientation with respect to the KF that is close to the image, and when the V-SLAM result is finally output, the position/orientation of the KF is reflected in the real coordinate values. do. In this way, when V-SLAM is sequentially processed, the position/orientation of all images can be adjusted to the final position/orientation without worrying that the position/orientation of the KF changes during the optimization process. It can be easily calculated according to the position and orientation of the KF. Further, the position and orientation information may be held as local values even in real coordinate values, like the KF.

なお、図８および図９からもわかるように、この例では、ＫＦの位置・姿勢に関する情報は、ＫＦの他の情報とともに全画像位置姿勢とは別に保持するものとしている。全画像位置姿勢は映像の全画像フレームの撮影位置・姿勢であり、実座標環境マップ５５０のＫＦ群情報５５１に含まれるＫＦの位置姿勢情報は、映像中の一部画像であるＫＦ画像の撮影位置・姿勢であるため、全画像位置姿勢データ５６０に含めるようにしてもよい。 As can be seen from FIGS. 8 and 9, in this example, information about the position and orientation of the KF is held separately from all image positions and orientations together with other information of the KF. The all-image position/orientation is the shooting position/orientation of all image frames of the video, and the KF position-orientation information included in the KF group information 551 of the real coordinate environment map 550 is the shooting position/orientation of the KF image, which is a partial image in the video. Since it is a position/orientation, it may be included in the total image position/orientation data 560 .

また、全画像位置姿勢データ５６０は、従来のＶ－ＳＬＡＭと同じデータであってもよい。 Also, the total image position and orientation data 560 may be the same data as in conventional V-SLAM.

また、図５の移動体位置推定システム５００は、図示を省略するが、図８、図９で示した実座標環境マップ５５０、全画像位置姿勢データ５６０の各種情報の他に、従来と同様に、実座標環境マップを用いたＶ－ＳＬＡＭ計算を高速化するための様々な情報を追加で保持するようにしてもよい。たとえば、画像ＫＦ群内で３次元位置を持つマップ特徴点群を共有しているＫＦどうし、さらにその中でも最も特徴点群の共有数の多いＫＦ群、などの関係を保持して、各ＫＦどうしで互いに参照できてもよい。 Although not shown, the moving object position estimation system 500 of FIG. 5 also includes various information such as the real coordinate environment map 550 and the all image position/orientation data 560 shown in FIGS. , may additionally hold various information for speeding up the V-SLAM calculation using the real coordinate environment map. For example, KFs sharing a map feature point group having a three-dimensional position in the image KF group, and among them, a KF group with the largest number of feature point groups shared, etc. are held, and each KF is may refer to each other with

より具体的には、たとえば、Ｃｏｖｉｓｉｖｉｌｉｔｙグラフであり、各ＫＦをノードとしてエッジにマップ特徴点を共有するＫＦ群、エッジの重みを共有するマップ特徴点数とする、グラフ構造のデータとして保持してよい。これらは、後述するローカルマッピング処理などで、ＫＦ位置・姿勢や環境マップの最適化計算対象を求めたり、ループクローズ処理などで現在の画像フレームに似た画像を探索したりするのを高速化するのに利用することができる。 More specifically, for example, it is a covisibility graph, and may be held as graph-structured data in which each KF is a node, a group of KFs sharing map feature points with edges, and the number of map feature points sharing the edge weights. . These speed up the search for the optimization calculation target of the KF position/orientation and environment map in local mapping processing, etc., which will be described later, and search for an image similar to the current image frame in loop closing processing, etc. can be used for

（移動体位置推定システムの内容）
図１０は、実施の形態にかかる移動体位置推定システム、移動体位置推定方法の内容の一例を示す説明図である。 (Details of mobile object position estimation system)
FIG. 10 is an explanatory diagram showing an example of contents of the mobile body position estimation system and the mobile body position estimation method according to the embodiment.

図１０において、カメラなどによる映像１００１の入力データと、逆行判定部５１１と、逆行画像取得部５１２と、初期姿勢・座標系設定部５１３と、フレーム姿勢推定部５２１、キーフレーム（ＫＦ）更新部５２２、３Ｄマップ特徴点更新部５３１、グラフ制約生成部５３２、ＫＦ姿勢・特徴点マップ最適化部５３３、ループ検出・クロージング部５４１、の７つのＶ－ＳＬＡＭをベースとする処理部と、実座標環境マップ５５０（ＫＦ群情報５５１、特徴点群情報５５２）、全画像位置姿勢データ５６０、の２つの内部保持データ、さらに、初期環境マップ１０１０のデータを持っていてもよい。また、このうち内部保持データの少なくともどちらかを出力データ（実座標環境マップ５５０’、全画像位置姿勢データ５６０’）として出力することができる。また、映像１００１と同時に取得したＧＮＳＳ情報１００２の入力データを持っていてもよい。 10, input data of an image 1001 from a camera or the like, a retrograde determination unit 511, a retrograde image acquisition unit 512, an initial orientation/coordinate system setting unit 513, a frame orientation estimation unit 521, a key frame (KF) update unit 522, a 3D map feature point updating unit 531, a graph constraint generation unit 532, a KF pose/feature point map optimization unit 533, and a loop detection/closing unit 541. The environment map 550 (KF group information 551, feature point group information 552), all image position/orientation data 560, two internally stored data, and the data of the initial environment map 1010 may be included. At least one of the internally held data can be output as output data (real coordinate environment map 550', all image position/orientation data 560'). Also, the input data of the GNSS information 1002 acquired at the same time as the image 1001 may be included.

なお、本実施の形態にかかる移動体位置推定システム５００は、従来のＶ－ＳＬＡＭ技術をベースとしているため、各処理部の処理の一部で、従来のＶ－ＳＬＡＭの処理と同じ処理をおこなうようにしてもよい。本実施の形態では、従来のＶ－ＳＬＡＭとして特徴点ベースのＶ－ＳＬＡＭ、特にＯＲＢ特徴を用いたＯＲＢ－ＳＬＡＭの基本的な処理例をあげ、従来のＶ－ＳＬＡＭ処理との差を示すようにして、以下に説明する。 Since the mobile position estimation system 500 according to the present embodiment is based on the conventional V-SLAM technology, part of the processing of each processing unit performs the same processing as the conventional V-SLAM processing. You may do so. In this embodiment, a basic processing example of feature point-based V-SLAM, particularly ORB-SLAM using ORB features, will be given as conventional V-SLAM, and the difference from conventional V-SLAM processing will be shown. are described below.

（入力される情報の内容）
移動体位置推定システム５００には、映像１００１、ＧＮＳＳ情報１００２、姿勢情報１００３の各情報が入力される。なお、ＧＮＳＳ情報１００２は入力されなくてもよい。映像１００１は、逆行判定部５１１に入力され、ＧＮＳＳ情報１００２は、初期姿勢・座標系設定部５１３に入力され、姿勢情報１００３は、グラフ制約生成部５３２に入力される。ただし、初期姿勢・座標系設定部５１３に入力されるＧＮＳＳ情報１００２、グラフ制約生成部５３２に入力される姿勢情報１００３については、必須の入力情報でなくてもよい。 (Contents of input information)
Information of image 1001 , GNSS information 1002 , and attitude information 1003 is input to mobile body position estimation system 500 . Note that the GNSS information 1002 may not be input. The image 1001 is input to the retrograde determination unit 511 , the GNSS information 1002 is input to the initial attitude/coordinate system setting unit 513 , and the attitude information 1003 is input to the graph constraint generation unit 532 . However, the GNSS information 1002 input to the initial attitude/coordinate system setting unit 513 and the attitude information 1003 input to the graph constraint generation unit 532 may not be essential input information.

映像１００１は、車両などの移動体５０３に搭載した車載機５０２が有する撮像装置７０５によって撮影された映像である。車載機５０２などの車両の通信手段を用いたり、記録メディアを介して人手を使ったり、任意の方法で入手し、本システム５００の入力とすることができる。また、映像の歪み補正などで用いるため、映像を撮影した撮像装置７０５の内部パラメータは既知とし、適宜歪み補正を実施するものとする。 An image 1001 is an image captured by an imaging device 705 of an in-vehicle device 502 mounted on a moving body 503 such as a vehicle. It can be obtained by an arbitrary method such as using communication means of the vehicle such as the in-vehicle device 502 or manually using a recording medium, and can be used as an input of the system 500 . Further, since the internal parameters of the imaging device 705 that captured the image are known, and the distortion correction is performed as appropriate, the internal parameters are assumed to be known since they are used for image distortion correction and the like.

ＧＮＳＳ情報１００２は、映像撮影時の移動体５０３の位置であり、ＧＰＳなどの任意の既存の測位手段によるデータであり、映像と同等の任意の方法で入手して、本システム５００の入力とすることができる。 The GNSS information 1002 is the position of the moving object 503 at the time of video shooting, is data obtained by any existing positioning means such as GPS, is obtained by any method equivalent to video, and is input to the system 500. be able to.

なお、ＧＮＳＳ情報１００２は、映像によるＶ－ＳＬＡＭのスケールドリフトを補正するために新たに利用するものであり、できるだけ映像の全フレームで保持することが望ましいが、必ずしも全フレームで保持していなくてもよい。保持するフレームが多い程、本システムで出力する全画像位置姿勢、および、実座標環境マップの位置および姿勢精度を改善することができる。 Note that the GNSS information 1002 is newly used to correct the scale drift of V-SLAM due to video, and is preferably held in all frames of the video as much as possible, but is not necessarily held in all frames. good too. The more frames that are retained, the better the position and pose accuracy of the overall image pose and real coordinate environment map output by the system.

また、後述するように、本システムの初期化で利用する少なくとも映像解析開始地点付近の２つの画像フレームは、ＧＮＳＳ情報を保持している必要があり、なるべく密にＧＮＳＳ情報を保持しているほど、映像開始から早い段階で初期化処理が終了でき、撮影位置・姿勢推定処理を実施することができる。ただし、ＧＮＳＳ情報１００２は保持していなくてもよい。 In addition, as will be described later, at least two image frames near the video analysis start point used in the initialization of this system must hold GNSS information. , the initialization process can be completed at an early stage from the start of the video, and the shooting position/orientation estimation process can be performed. However, the GNSS information 1002 may not be held.

同様に、ＧＮＳＳ情報１００２は、なるべく正確な位置であることが望ましく、精度が高い程、本システムの出力結果の位置および姿勢精度を改善することができる。また、ＧＮＳＳ情報は、ＧＰＳ受信機などの位置になることが多いが、ＧＰＳ受信機とカメラの相対位置関係を用いて、できるだけカメラの位置情報に変換してあることが望ましい。 Similarly, the GNSS information 1002 should be as accurate as possible, and the higher the accuracy, the better the position and attitude accuracy of the output of the system. Also, GNSS information is often the position of a GPS receiver or the like, but it is desirable to convert it into camera position information as much as possible using the relative positional relationship between the GPS receiver and the camera.

また、姿勢情報１００３は、任意のＩＭＵ（ｉｎｅｒｔｉａｌｍｅａｓｕｒｅｍｅｎｔｕｎｉｔ）などから取得する、映像を撮影した時のカメラ姿勢情報である。ＩＭＵは、具体的には、加速度センサ、ジャイロセンサなどである。たとえば、カメラを中心とし、自車前方、右方、鉛直上方、などの座標軸に対する回転角、ロール、ピッチ、ヨー角などである。ＧＮＳＳ情報と同様に、映像の画像すべてに対して保持してもよく、任意画像にだけ保持していてもよい。 Also, the posture information 1003 is camera posture information obtained from an arbitrary IMU (inertial measurement unit) or the like when the image was shot. The IMU is specifically an acceleration sensor, a gyro sensor, or the like. For example, it is the rotation angle, roll, pitch, yaw angle, etc. with respect to the coordinate axes such as the front, right, and vertically above the vehicle centered on the camera. As with the GNSS information, it may be held for all images of video, or may be held only for arbitrary images.

なお、ＧＮＳＳ情報１００２および姿勢情報１００３は、上述したように別途センサ群から入手するのではなく、一度、Ｖ－ＳＬＡＭで推定した各カメラ撮影位置・姿勢を、手作業などの任意の手法で補正し、補正した各カメラ撮影位置・姿勢を、再度、同じ映像のＧＮＳＳ情報１００２および姿勢情報１００３として読み込ませるようにしてもよい。 Note that the GNSS information 1002 and the orientation information 1003 are not separately obtained from the sensor group as described above, but are corrected by any method, such as manual work, after each camera shooting position and orientation estimated by V-SLAM. Then, the corrected photographing position/orientation of each camera may be read again as the GNSS information 1002 and the orientation information 1003 of the same image.

また、後述するローカルマッピング機能により、入力されたＧＮＳＳ情報１００２による位置と、映像１００１を解析した結果による位置の双方を適切にマージ反映させた推定ができるようにしてもよい。これにより、手修正した出力結果を入力とする再実施を通して、手修正結果に合わせて特徴点群を含めた実座標環境マップを滑らかに作成することが可能となる。ただし、このような、ＧＮＳＳ情報１００２による位置と、映像１００１を解析した結果による位置の双方を適切にマージ反映させた推定については、用いなくてもよい。 In addition, a local mapping function, which will be described later, may be used to appropriately merge and reflect both the position based on the input GNSS information 1002 and the position based on the analysis result of the image 1001 for estimation. As a result, it is possible to smoothly create a real coordinate environment map including a group of feature points in accordance with the result of manual correction through re-implementation using the output result of manual correction as input. However, such an estimation in which both the position based on the GNSS information 1002 and the position based on the analysis result of the image 1001 are properly merged and reflected may not be used.

なお、最初の実施時は姿勢情報を入力しなくても位置情報とともに姿勢情報も推定出力するので、再実行では、該推定姿勢情報も位置情報とともに入力して使えるが、姿勢情報は使わずに位置情報だけを入力として使ってもよい。たとえば、手修正した位置情報と異なり、まったく姿勢情報を手修正できなかった場合などでは、２つの情報は確からしさが異なっているため、精度の低い推定姿勢情報は使わずに位置情報だけを再実施時に入力利用して、実座標環境マップを作成することができる。 In the first execution, the orientation information is estimated and output together with the position information even if the orientation information is not input. Location information alone may be used as input. For example, unlike the manually corrected position information, if the posture information cannot be manually corrected at all, the likelihood of the two pieces of information is different. Inputs can be used during execution to create a real coordinate environment map.

また、一度出力した実座標環境マップ５５０’を再び入力として利用してもよい。たとえば、ある走路の最初の走行映像の撮影位置・姿勢を推定する際には、実座標環境マップが存在しないので、当該実座標環境マップの入力無しで本システムにおける処理を実行し、つぎに同じ走路を走行した二回目以降の走行映像の撮影位置・姿勢を推定する場合には、最初の走行映像の処理結果として出力した実座標環境マップ５５０’を入力して、あたかも当該映像の処理で作成した内部データかのように利用することができる。このとき、一回目と二回目以降の走行映像では、撮影する車両やカメラ、走行レーン内の位置などが異なっていてもよい。なお、実座標環境マップ５５０’を入力する場合には、ＧＮＳＳ情報１００２の入力を省略してもよい。 Also, the real coordinate environment map 550' that has been output once may be used again as an input. For example, when estimating the photographing position/orientation of the first running video on a certain track, there is no real coordinate environment map, so the processing in this system is executed without inputting the real coordinate environment map, and then the same In the case of estimating the photographing position/orientation of the running video for the second time or later after running on the track, the real coordinate environment map 550′ output as the processing result of the first running video is input, and created as if by processing the relevant video. can be used as if it were internal data. At this time, the vehicle, the camera, the position in the driving lane, and the like may be different between the first and second and subsequent driving images. Note that input of the GNSS information 1002 may be omitted when the real coordinate environment map 550' is input.

なお、車載機５０２が、各機能部５１１～５１３、５２１、５２２、５３１、５３２、５３３、５４１の少なくとも一つを有する場合は、映像１００１やＧＮＳＳ情報１００２は、車載機５０２の内部に保持してＶ－ＳＬＡＭを処理するようにしてもよい。 Note that if the onboard device 502 has at least one of the functional units 511 to 513, 521, 522, 531, 532, 533, and 541, the video 1001 and the GNSS information 1002 are held inside the onboard device 502. may be used to process V-SLAM.

以後の本システムの説明では、特に記載がない場合には、実座標環境マップ入力がなく、一から実座標環境マップを作成する場合（ＧＮＳＳ情報１００２の入力を必須とする場合）について説明をおこなうが、必ずしも、ＧＮＳＳ情報１００２の入力を必須としなくてもよい。また、ＧＮＳＳ情報１００２として、平面直角座標系の値を例として説明をおこなう。 In the following explanation of this system, unless otherwise specified, the case where there is no real coordinate environment map input and the real coordinate environment map is created from scratch (when the input of GNSS information 1002 is required) will be explained. However, the input of the GNSS information 1002 may not necessarily be required. Also, as the GNSS information 1002, the values of the planar rectangular coordinate system will be described as an example.

（逆行判定部５１１の内容）
システムの初期化処理機能５１０を担当する逆行判定部５１１は、逆行画像取得部５１２の動作を決定する処理部であり、Ｖ－ＳＬＡＭの処理対象の映像の状態や、逆行なしで試行したＶ－ＳＬＡＭの結果などから、入力した映像を時間反転（時間逆行）するかの判定をおこなう。 (Contents of retrograde determination unit 511)
The retrograde determination unit 511, which is in charge of the system initialization processing function 510, is a processing unit that determines the operation of the retrograde image acquisition unit 512. Based on the results of SLAM, etc., it is determined whether or not to time-reverse (reverse in time) the input video.

逆行判定部５１１は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the function of the retrograde determination unit 511 can be realized by the CPU 601 executing a program stored in the memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

（逆行画像取得部５１２の内容）
システムの初期化処理機能５１０を担当する逆行画像取得部５１２は、逆行判定部５１１において、「時間逆行させる」と判断した場合に、以後のＶ－ＳＬＡＭ処理部で映像を時間経過とは逆方向に処理するよう、映像を逆行するように取得する。たとえば、入力した映像ファイルに対し、撮影時刻が遡るように各画像を抽出して、後続のＶ－ＳＬＡＭ処理部に渡す。 (Contents of retrograde image acquisition unit 512)
When the retrograde image acquisition unit 512, which is in charge of the system initialization processing function 510, determines to “reverse time” in the retrograde determination unit 511, the subsequent V-SLAM processing unit moves the image in the direction opposite to the passage of time. The video is acquired backwards so that it can be processed into For example, each image is extracted from the input video file so that the shooting time goes back, and is passed to the subsequent V-SLAM processing unit.

逆行画像取得部５１２は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the retrograde image acquisition unit 512 can realize its function by the CPU 601 executing a program stored in the memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

（初期姿勢・座標系設定部５１３の内容）
システムの初期化処理機能５１０を担当する初期姿勢・座標系設定部５１３は、計算する座標系の決定とともに、初期化処理として以後のトラッキングなどの処理機能で必要とする内部データの作成をおこなう。具体的には、映像開始時の場所近傍にある特徴点群の３次元位置を推定するとともに、初期ＫＦ位置・姿勢の推定をおこない、以後の処理で最低限必要となる、映像開始時の場所近傍の実座標環境マップを作成する。この初期姿勢・座標系設定部の処理のうち、計算に用いる座標系の決定処理以外は、従来のＶ－ＳＬＡＭの初期処理と同じ処理でもよい。この初期化処理が完了しないと、以後のフレーム姿勢推定を含めた処理は実行されないことも、従来のＶ－ＳＬＡＭと同じである。 (Contents of Initial Posture/Coordinate System Setting Unit 513)
An initial posture/coordinate system setting unit 513 in charge of the system initialization processing function 510 determines the coordinate system to be calculated, and creates internal data necessary for subsequent processing functions such as tracking as initialization processing. Specifically, the three-dimensional position of the feature point group near the start of the video is estimated, and the initial KF position and orientation are estimated. Create a neighborhood real coordinate environment map. The processing of the initial posture/coordinate system setting unit may be the same as the initial processing of the conventional V-SLAM except for the processing of determining the coordinate system used for calculation. If this initialization process is not completed, subsequent processes including frame attitude estimation are not executed, as in the conventional V-SLAM.

初期姿勢・座標系設定部５１３は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, initial attitude/coordinate system setting unit 513 can realize its function by CPU 601 executing a program stored in memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

初期姿勢・座標系設定部５１３は、まず、歪み補正した映像の各画像に対し、任意の画像特徴群を取得する。つぎに、最初の２フレームで同時に映っている特徴点（各画像の特徴点のペア）を各画像特徴群から探索する。ペアの探索方法は、利用する画像特徴に依存し、既存の２画像の同特徴対の探索方法を利用してもよい。算出したペア数が十分多い場合には、特徴点群の画像変化を用いて、カメラの位置・姿勢変化と２画像に共通で映る各特徴点群の３次元位置を推定する。 The initial posture/coordinate system setting unit 513 first acquires an arbitrary image feature group for each image of the distortion-corrected video. Next, each image feature group is searched for feature points (pairs of feature points in each image) that appear simultaneously in the first two frames. The method of searching for a pair depends on the image feature to be used, and an existing method of searching for the same feature pair of two images may be used. When the calculated number of pairs is sufficiently large, the change in the image of the feature point group is used to estimate the position/posture change of the camera and the three-dimensional position of each feature point group that appears in both images.

すなわち、２画像に映る各特徴点ペアの位置・姿勢変化から既存の方法、たとえば、平面を想定したＨｏｍｏｇｒａｐｈｙや、非平面を仮定した基礎行列などの幾何モデルを使った手法を用いて、２画像のカメラの位置・姿勢の変化を表す変換を推定するとともに、推定した２画像のカメラの位置・姿勢と各特徴ペアの各画像上の位置から、既存の三角測量などの手法を用いて、各特徴の３次元位置を推定する。 That is, from the position and orientation changes of each feature point pair reflected in two images, existing methods, for example, homography assuming a plane or a method using a geometric model such as a basic matrix assuming a non-plane, are used to convert two images. In addition to estimating the transformation that expresses the change in the camera position/orientation of the two images, the estimated camera position/orientation of the two images and the position of each feature pair on each image are used to calculate each Estimate the 3D position of the feature.

なお、ペア数が不足する場合には、２画像のどちらかの画像（たとえば、後時刻の画像）を他の画像（たとえば、さらにその後の時刻の画像）に変えてこの処理をおこなう。また、利用する最初の２フレームは、厳密に映像開始時点の画像でなくてもよく、同じ被写体が映っている可能性のある任意の２フレームであってもよい。たとえば、停車中映像であることがわかっているのであれば、カメラ撮影位置が変わったと思われる画像を、後時刻の画像として選ぶようにしてもよい。 If the number of pairs is insufficient, one of the two images (for example, the image at the later time) is replaced with the other image (for example, the image at the later time) and this process is performed. Also, the first two frames to be used may not strictly be the image at the start of the video, but may be arbitrary two frames that may include the same subject. For example, if it is known that the image is taken while the vehicle is stopped, an image that seems to have been taken with a different camera position may be selected as an image at a later time.

また、このとき、すべての特徴点群ペアの３次元位置を算出するのではなく、他の特徴点群と比べて誤差の大きな特徴点を省いたり、画像全体でまんべんなく規定数の特徴点が得られるように、特徴点群が集中する画像部分では特徴点群を間引いたり、２カメラ位置と当該特徴点の成す角（交会角）が小さな特徴点を省いたり、というように、任意の特徴点選別をおこなってもよい。 At this time, instead of calculating the three-dimensional positions of all feature point group pairs, feature points with large errors compared to other feature point groups may be omitted, or a specified number of feature points may be obtained evenly over the entire image. In the image part where the feature point group is concentrated, the feature point group is thinned out, and the feature point with a small angle (angle of intersection) formed by the two camera positions and the feature point is omitted. Selection may be performed.

また、初期姿勢・座標系設定部５１３は、従来のＶ－ＳＬＡＭと同様にさらに最適化計算をおこなって、算出した初期値をより正確な値へと更新する処理を付加してもよい。具体的には、２画像の各画像に対し、カメラ位置と特徴点群の３次元位置とがわかっており、各画像に特徴点群がどう映り込むのかを計算することができるため、各画像への特徴点群の映り込み位置と、実際のカメラ画像での該特徴点の位置との差（再投影誤差と呼ぶ）を調べ、特徴点群の再投影誤差がなるべく少なくなるよう、特徴点やカメラ位置および姿勢を微調整する最適化補正（ＢＡ（ＢｕｎｄｌｅＡｄｊｕｓｔｍｅｎｔ））をするようにしてもよい。 Further, the initial attitude/coordinate system setting unit 513 may perform optimization calculations in the same manner as in the conventional V-SLAM, and may add processing to update the calculated initial values to more accurate values. Specifically, for each of the two images, the camera position and the three-dimensional position of the feature point group are known, and it is possible to calculate how the feature point group is reflected in each image. Investigate the difference (called reprojection error) between the position of the feature point group reflected in the camera image and the position of the feature point in the actual camera image. Alternatively, optimization correction (BA (Bundle Adjustment)) for finely adjusting the camera position and orientation may be performed.

続いて、初期姿勢・座標系設定部５１３は、算出結果から初期環境マップ１０１０を作成する。すなわち、用いた２画像をＫＦとして推定した該画像の撮影位置・姿勢とともに初期環境マップ１０１０に登録して、同じく推定した特徴点群の情報（２画像上の位置や、３次元位置）も、初期環境マップ１０１０に登録する。この初期環境マップ１０１０は、初期化処理ということで、以後のトラッキングやローカルマッピングなどの機能部でおこなう方法とはやや異なる方法でＫＦ位置・姿勢や特徴点群の３次元位置を推定しており、若干精度が低い。 Subsequently, the initial attitude/coordinate system setting unit 513 creates an initial environment map 1010 from the calculation results. That is, the two images used are registered in the initial environment map 1010 together with the shooting positions and orientations of the images estimated as KF, and the information of the similarly estimated feature point group (the position on the two images and the three-dimensional position) is also Register in the initial environment map 1010 . This initial environment map 1010 is an initialization process, and estimates the KF position/orientation and the three-dimensional position of the feature point group by a method slightly different from the method performed by the subsequent functional units such as tracking and local mapping. , slightly less accurate.

なお、これらの２画像のカメラ位置・姿勢、および車両周辺の特徴点群の３次元位置の初期値の算出処理では、従来のＶ－ＳＬＡＭと同様に２画像のうち片方の画像（多くは、より時間の早い画像。以後、「初期カメラの画像」と呼ぶ）のカメラ位置・姿勢（以後、「初期カメラ位置・姿勢」と呼ぶ）を原点および基準座標系としたローカル系で算出してよい。 In addition, in the processing of calculating the initial values of the camera position/orientation of these two images and the three-dimensional position of the feature point group around the vehicle, one of the two images (often the An earlier image, hereinafter referred to as the ``initial camera image'') may be calculated in a local system with the camera position/orientation (hereinafter referred to as the ``initial camera position/orientation'') as the origin and the reference coordinate system. .

たとえば、一般的に画像処理で用いる画像上の画素位置を示すための画素座標系は、撮影画像の画像横方向をＸ、画像下方向をＹとすることが多い。このため、従来のＶ－ＳＬＡＭもこれと似た基準座標系定義とするために、初期フレームのカメラ位置を原点（０，０，０）とし、自車右手方向Ｘ、自車鉛直下方向Ｙ、自車前方方向Ｚ、という右手系（ＳＬＡＭローカル系）定義とすることが多い。本システムにおいても、このＳＬＡＭローカル系で２画像のカメラ位置・姿勢と、２画像に共通で映る特徴点群の３次元位置を算出する。 For example, in a pixel coordinate system for indicating pixel positions on an image generally used in image processing, it is often the case that X is the horizontal direction of a photographed image and Y is the downward direction of the image. For this reason, in order to define a reference coordinate system similar to this in the conventional V-SLAM, the camera position in the initial frame is defined as the origin (0, 0, 0), the vehicle's right-hand direction X, and the vehicle's vertical downward direction Y , and Z in the forward direction of the vehicle are often defined as right-handed system (SLAM local system) definitions. Also in this system, the SLAM local system calculates the camera position/orientation of the two images and the three-dimensional position of the group of feature points that appear in common in the two images.

このように、初期姿勢・座標系設定部５１３は、従来のＶ－ＳＬＡＭと同様に、初期環境マップ１０１０の作成（ＫＦ位置・姿勢の推定、および特徴点群の３次元位置の推定）処理をおこなう。 In this way, the initial orientation/coordinate system setting unit 513 performs the process of creating the initial environment map 1010 (estimating the KF position/orientation and estimating the three-dimensional position of the feature point group) in the same manner as in the conventional V-SLAM. Do.

つぎに、初期姿勢・座標系設定部５１３は、ＳＬＡＭローカル座標系で算出した環境マップのＫＦ撮影位置・姿勢、および特徴点群の３次元位置を実座標系対応にするために、入力ＧＮＳＳ情報１００２から２画像に対応するＧＮＳＳ位置座標値を得て、ＳＬＡＭローカル（座標）系と実座標系の変換行列を算出する。なお、ＳＬＡＭローカル（座標）系と実座標系の変換行列を算出する際、入力ＧＮＳＳ情報１００２から２画像に対応するＧＮＳＳ位置座標値を得なくてもよい。 Next, the initial orientation/coordinate system setting unit 513 sets the input GNSS information so that the KF imaging position/orientation of the environment map calculated in the SLAM local coordinate system and the three-dimensional position of the feature point group correspond to the real coordinate system. GNSS position coordinate values corresponding to the two images are obtained from 1002, and a conversion matrix between the SLAM local (coordinate) system and the real coordinate system is calculated. Note that when calculating the conversion matrix between the SLAM local (coordinate) system and the real coordinate system, it is not necessary to obtain the GNSS position coordinate values corresponding to the two images from the input GNSS information 1002 .

図１１Ａ～１１Ｃは、初期姿勢・座標系設定部５１３における変換行列算出の一例を示す説明図である。図１１Ａに示すように、本システム５００では、実座標系として平面直角座標系を用いる。具体的には、符号１１０１は、（ａ）ＳＬＡＭローカル系（右手系）を示している。具体的には、原点（初期カメラ）に対して、Ｘ方向が右手方向を示しており、Ｙ方向が下方向を示しており、Ｚ方向が進行方向を示している。 11A to 11C are explanatory diagrams showing an example of conversion matrix calculation in the initial attitude/coordinate system setting unit 513. FIG. As shown in FIG. 11A, the present system 500 uses a planar rectangular coordinate system as the real coordinate system. Specifically, reference numeral 1101 indicates (a) SLAM local system (right-handed system). Specifically, with respect to the origin (initial camera), the X direction indicates the right direction, the Y direction indicates the downward direction, and the Z direction indicates the traveling direction.

これに対して、符号１１０２は、（ｂ）実座標系、すなわち、平面直角座標系（左手系）を示している。具体的には、平面直角座標系原点（０，０，０）に対して、Ｘ方向が「北」、すなわち、平面直角座標系Ｘ値［ｍ］を示しており、Ｙ方向が「東」、すなわち、平面直角座標系Ｙ値［ｍ］を示しており、Ｚ方向が「上」、すなわち、標高値［ｍ］を示している。 On the other hand, reference numeral 1102 indicates (b) a real coordinate system, that is, a planar rectangular coordinate system (left-handed system). Specifically, the X direction is "north" with respect to the planar rectangular coordinate system origin (0, 0, 0), that is, the planar rectangular coordinate system X value [m], and the Y direction is "east". , that is, the plane rectangular coordinate system Y value [m] is indicated, and the Z direction is "up", that is, the altitude value [m] is indicated.

ただし、これは一例であって、従来のＶ－ＳＬＡＭの右手系のＳＬＡＭローカル系とは異なる左手系の平面直角座標系を用いるのではなく、右手系の任意の座標系を用いるようにしてもよい。 However, this is only an example, and instead of using a left-handed planar rectangular coordinate system different from the right-handed SLAM local system of the conventional V-SLAM, any right-handed coordinate system may be used. good.

図１１Ｂは、移動ベクトルを示している。符号１１０３は、ローカル系の移動ベクトルＡであり、符号１１０４は、実座標系の移動ベクトルＢである。移動ベクトルとは、「任意２時刻の画像フレームの遅い時刻の方のフレーム（Ｆ２）の位置」と、「任意２時刻の画像フレームの早い時刻の方のフレーム（Ｆ１）の位置」の差分（Ｆ２－Ｆ１）を示す進行方向ベクトルである。図１１Ｂに示すように、同じ移動ベクトルが、２つの座標系（移動ベクトルＡ１１０３と、移動ベクトルＢ１１０４）で表現されている。そこで、初期姿勢・座標系設定部５１３は、別の系の値に変換する変換行列（ローカル系→実座標系への変換行列Ｍ）を算出する。 FIG. 11B shows motion vectors. Reference numeral 1103 denotes a movement vector A of the local system, and reference numeral 1104 denotes a movement vector B of the real coordinate system. A motion vector is a difference ( F2-F1) is a traveling direction vector. As shown in FIG. 11B, the same movement vector is expressed in two coordinate systems (movement vector A1103 and movement vector B1104). Therefore, the initial posture/coordinate system setting unit 513 calculates a transformation matrix (transformation matrix M from the local system to the real coordinate system) for transformation into values of another system.

図１１Ｃは、ＳＬＡＭローカル系から実座標系の値に変換するための変換行列Ｍの内容について示している。図１１Ｃにおいて、ＳＬＡＭローカル系から実座標系の値に変換するための変換行列Ｍは、座標系間のスケール相違を吸収するためのスケール変換行列Ｍ１、進行方向由来の系から緯度経度由来の系へ座標軸を変換した値にするための回転行列Ｍ２、ＸＹＺの座標軸定義を変えた値にするためのＭ３、右手系から左手系の値に変換するＭ４、原点を初期カメラ位置から平面直角座標系の原点に変更した値に変換するＭ５、の５つの行列の積算から成る。 FIG. 11C shows the contents of the conversion matrix M for conversion from the SLAM local system to the values in the real coordinate system. In FIG. 11C, the conversion matrix M for converting from the SLAM local system to the real coordinate system is a scale conversion matrix M1 for absorbing the scale difference between the coordinate systems, Rotation matrix M2 for converting the coordinate axis to values, M3 for changing the XYZ coordinate axis definition, M4 for converting from the right-handed system to the left-handed system, and the origin from the initial camera position to the planar rectangular coordinate system It consists of the multiplication of the five matrices of M5, which transforms the values modified to the origin of M5.

スケール変換行列Ｍ１は、画像変化による任意スケールを実座標のスケールとする変換行列である。スケール変換行列Ｍ１によって、（１）ＳＬＡＭローカル系１１１１を、（２）ｍスケールのＳＬＡＭローカル系１１１２に変換することができる。 The scale conversion matrix M1 is a conversion matrix that uses an arbitrary scale due to image change as the scale of the real coordinates. (1) SLAM local system 1111 can be transformed into (2) m-scale SLAM local system 1112 by scale transformation matrix M1.

図１１Ｄは、画像変化由来の任意スケールを、緯度経度の座標系のスケール［ｍ］にするスケール変換行列Ｍ１の算出の一例を示す説明図である。 FIG. 11D is an explanatory diagram illustrating an example of calculation of a scale conversion matrix M1 that converts an arbitrary scale derived from an image change to a scale [m] of a coordinate system of latitude and longitude.

図１１Ｄにおいて、まず、ＳＬＡＭローカル系の２画像カメラ位置Ｑ１（前時刻画像Ｆ１に対応）、Ｑ２（同後時刻画像Ｆ２）から、その差分（各位置差、Ｑ２－Ｑ１）である移動ベクトルＡ１１０３を算出する。成分定義はＳＬＡＭローカル系そのもの（従来のＶ－ＳＬＡＭ出力値）であってもよい。 In FIG. 11D, first, from the SLAM local system two-image camera positions Q1 (corresponding to previous time image F1) and Q2 (same time image F2), the difference (each position difference, Q2-Q1) is a movement vector A1103 Calculate The component definition may be the SLAM local system itself (conventional V-SLAM output value).

つぎに、２画像のＧＮＳＳ位置Ｓ１（前時刻画像Ｆ１に対応）、Ｓ２（同後時刻画像Ｆ２）から、実座標系（直角平面座標系）の値を使うが成分（軸）定義が異なる、特殊実座標系の移動ベクトルＢ１１０４（＝Ｓ２－Ｓ１）を算出する。特殊実座標系は、（Ｘ成分＝東が＋の経度座標値差、Ｙ成分＝－（標高値差）、Ｚ成分＝北が＋の緯度座標値差）とする。 Next, from the GNSS positions S1 (corresponding to the previous time image F1) and S2 (same time image F2) of the two images, the values of the real coordinate system (rectangular plane coordinate system) are used, but the component (axis) definition is different. A movement vector B1104 (=S2-S1) in the special real coordinate system is calculated. The special real coordinate system is (X component=longitude coordinate value difference with positive east, Y component=-(elevation value difference), Z component=latitude coordinate value difference with north positive).

そして、移動ベクトルＡ１１０３の大きさ＝ｌｅｎＡ、および、移動ベクトルＢ１１０４の大きさ＝ｌｅｎＢ、をそれぞれ求める。求めた大きさから、大きさ比Ｒａｔｅ＝（ｌｅｎＢ÷ｌｅｎＡ）を求めて、Ｒａｔｅ倍するスケール変換行列をスケール変換行列Ｍ１として求める。Ｅｙｅ（ｍ，ｎ）をｍ行ｎ列の単位行列として表すとき、スケール変換行列Ｍ１は、
Ｍ１＝Ｒａｔｅ×Ｅｙｅ（３，３）；となる。 Then, the magnitude of motion vector A1103=lenA and the magnitude of motion vector B1104=lenB are obtained. A size ratio Rate=(lenB/lenA) is obtained from the obtained magnitude, and a scale conversion matrix multiplied by Rate is obtained as a scale conversion matrix M1. When Eye (m, n) is represented as a unit matrix of m rows and n columns, the scale conversion matrix M1 is
M1=Rate×Eye(3, 3);

図１１Ｃに戻って、回転行列Ｍ２は、進行方向由来の座標系を緯度経度由来に変更する変換行列である。回転行列Ｍ２によって、（２）ｍスケールのＳＬＡＭローカル系１１１２を、（３）特殊実座標系１１１３に変換することができる。 Returning to FIG. 11C, the rotation matrix M2 is a conversion matrix that changes the coordinate system derived from the traveling direction to that derived from latitude and longitude. (2) m-scale SLAM local system 1112 can be transformed into (3) special real coordinate system 1113 by rotation matrix M2.

図１１Ｅは、回転変換行列Ｍ２の算出の一例を示す説明図である。図１１Ｅにおいて、まず、移動ベクトルをそれぞれの長さで割り、正規化したローカル系の移動ベクトルＡ’＝Ａ／ｌｅｎＡと、正規化した実座標系の移動ベクトルＢ’＝Ｂ／ｌｅｎＢを求める。 FIG. 11E is an explanatory diagram showing an example of calculation of the rotation transformation matrix M2. In FIG. 11E, first, the motion vectors are divided by their respective lengths to obtain a normalized local system motion vector A'=A/lenA and a normalized real coordinate system motion vector B'=B/lenB.

つぎに、符号１１０５に示すように、（ａ）ベクトルＡ’からベクトルＢ’、への成す角Θを、内積から求める。
Θ＝ａｃｏｓ（内積（Ａ’，Ｂ’）） Next, as indicated by reference numeral 1105, (a) the angle Θ between vector A' and vector B' is obtained from the inner product.
Θ = acos (inner product (A', B'))

そして、符号１１０６に示すように、（ｂ）ベクトルＡ’とベクトルＢ’の外積＝Ａ’×Ｂ’となる、上方向ベクトル（ＶｅｃｔｏｒＵＰ）を求めて向きを考慮した角度Θ’を算出する。上方向ベクトルのＹ値が正の場合は、角度Θ’＝－Θとし、負の場合は、角度Θ’＝Θ、とする。 Then, as indicated by reference numeral 1106, (b) the cross product of vector A' and vector B'=A'.times.B', an upward vector (VectorUP) is obtained, and an angle .THETA.' is calculated in consideration of the direction. If the Y value of the upward vector is positive, the angle Θ'=−Θ, and if negative, the angle Θ'=Θ.

軸定義の変換と、座標値の変換は逆になるため、Ｙ軸周りの（－Θ’）回転行列を、行列Ｍ２とする。 Since the conversion of the axis definition and the conversion of the coordinate values are reversed, the (-Θ') rotation matrix around the Y-axis is the matrix M2.

図１１Ｃに戻って、実座標系定義変換行列Ｍ３は、Ｘ軸周りの－９０度回転する変換行列である。実座標系定義変換行列Ｍ３によって、（３）特殊実座標系１１１３を、（４）特殊実座標系２、１１１４に変換することができる。 Returning to FIG. 11C, the real coordinate system definition transformation matrix M3 is a transformation matrix that rotates −90 degrees around the X axis. (3) special real coordinate system 1113 can be transformed into (4) special real coordinate systems 2 and 1114 by the real coordinate system definition transformation matrix M3.

実座標系定義変換行列Ｍ４は、右手系から左手系に変換する変換行列である。ＸとＹを互いに変換する。実座標系定義変換行列Ｍ４によって、（４）特殊実座標系２、１１１４を、（５）実座標系、すなわち、平面直角座標系（左手系）１１１５に変換することができる。ここでは、本来の平面直角座標系と原点位置が異なっている。 The real coordinate system definition transformation matrix M4 is a transformation matrix for transforming from the right-handed system to the left-handed system. Convert X and Y to each other. The (4) special real coordinate system 2, 1114 can be transformed into (5) a real coordinate system, that is, a planar rectangular coordinate system (left-handed system) 1115 by the real coordinate system definition transformation matrix M4. Here, the origin position differs from the original planar rectangular coordinate system.

位置移動変換行列Ｍ５は、原点位置を移動させる変換行列である。具体的には、カメラの初期実座標位置（初期カメラ位置）を平行移動し、初期カメラ位置＝平面直角座標系Ｘ［ｍ］，平面直角座標系Ｙ［ｍ］，標高［ｍ］とする。位置移動変換行列Ｍ５によって、（５）平面直角座標系（左手系）１１１５を、（６）本来の平面直角座標系（左手系）１１１６に変換することができる。 The position movement transformation matrix M5 is a transformation matrix for moving the origin position. Specifically, the initial actual coordinate position of the camera (initial camera position) is translated, and the initial camera position is set to X [m] in the plane rectangular coordinate system, Y [m] in the plane rectangular coordinate system, and the altitude [m]. The (5) planar rectangular coordinate system (left-handed system) 1115 can be transformed into (6) the original planar rectangular coordinate system (left-handed system) 1116 by the positional movement transformation matrix M5.

このようにして、ＳＬＡＭローカル系１１１１（図１１Ａに示した１１０１）を実座標（平面直角座標系（左手系））１１１６（図１１Ａに示した１１０２）とすることができる。 In this way, the SLAM local system 1111 (1101 shown in FIG. 11A) can be made into real coordinates (planar rectangular coordinate system (left-handed system)) 1116 (1102 shown in FIG. 11A).

初期姿勢・座標系設定部５１３は、この座標系変換行列Ｍを保持することで、従来のＶ－ＳＬＡＭと同様に算出していたＳＬＡＭローカル座標系での初期環境マップを、平面直角座標系の実座標環境マップに変換することができる。この結果、画像由来で特に大きさには意味がなかったＳＬＡＭローカル系のスケールを、本システムにおいて使うことで、ｍ単位の実座標スケールに統一することが可能になる。 By holding this coordinate system conversion matrix M, the initial posture/coordinate system setting unit 513 converts the initial environment map in the SLAM local coordinate system, which has been calculated in the same manner as in the conventional V-SLAM, into the plane rectangular coordinate system. It can be converted into a real coordinate environment map. As a result, by using the scale of the SLAM local system, which is derived from an image and has no particular meaning in terms of size, in this system, it becomes possible to unify the scale to the real coordinate scale in units of m.

なお、初期姿勢・座標系設定部５１３は、必要があれば、変換行列を持つだけでなく、実際に算出済の特徴点群の３次元初期位置や、２画像の撮影位置・姿勢位置を、この変換行列を用いて実座標系の値に変換してもよい。特に、頻繁に参照する特徴点群の位置は、後述するトラッキング処理機能（フレーム姿勢推定）失敗時の再初期化実施前後で統一のとれた値として使うため、実座標系の値として保持することが望ましい。実座標系の値としてあらかじめ保持しておけば、各画像上への投影位置は、該変換抜きで計算することができる。 If necessary, the initial orientation/coordinate system setting unit 513 not only has a transformation matrix, but also sets the three-dimensional initial position of the feature point group that has actually been calculated, the shooting position/orientation position of the two images, This conversion matrix may be used to convert to values in the real coordinate system. In particular, the positions of feature point groups that are frequently referenced should be stored as values in the real coordinate system, as they will be used as unified values before and after reinitialization when the tracking processing function (frame orientation estimation), which will be described later, fails. is desirable. If the values of the real coordinate system are stored in advance, the projection position onto each image can be calculated without the conversion.

一方で、平面直角座標系のような実座標系の値は、数値が非常に大きくなることが多いため、環境マップの特徴点群の３次元位置は従来と同じローカル座標系の値のまま保持し、加えて新たに変換行列も保持することで、必要な時だけ変換行列を用いて実座標系の値に変換してもよい。あるいは、実座標系の値であっても、適当な初期値からの差分値としてもよい。 On the other hand, the values of the real coordinate system such as the plane Cartesian coordinate system often have very large numerical values, so the 3D position of the feature point group in the environment map is kept as the value of the local coordinate system, which is the same as before. In addition, by newly holding a transformation matrix, the transformation matrix may be used to transform into values in the real coordinate system only when necessary. Alternatively, it may be a value in the real coordinate system or a difference value from an appropriate initial value.

本システム５００においては、従来と同じローカル座標系で初期姿勢・座標系設定をおこなってから、実座標系に変換するための情報を作成して、以後の３次元座標値はすべて実座標変換をおこなった実座標系の値で保持するものとして、説明する。 In this system 500, the initial attitude and coordinate system are set in the same local coordinate system as in the conventional system, and then information for conversion to the real coordinate system is created. Description will be made assuming that the values in the real coordinate system used are held.

なお、既存の実座標環境マップを入力する場合には、初期姿勢・座標系設定部の処理を飛ばし、入力した実座標環境マップを初期実座標環境マップとして、以後の処理と同様の処理を実施する。 When inputting an existing real coordinate environment map, the processing of the initial attitude/coordinate system setting section is skipped. do.

本システム５００においては、従来のＶ－ＳＬＡＭと同様に、初期姿勢・座標系設定部５１３の処理を２画像（初期ＫＦ）に対して実施すると、初期化が完了されたとみなして、以後の処理を、まだ処理していない画像に対して順次実施していくことにする。したがって、以後の処理は、初期化に用いた２画像（初期ＫＦ）に対しては実施せず、それ以後の画像に実施するようにする。 In this system 500, as in the conventional V-SLAM, when the processing of the initial attitude/coordinate system setting unit 513 is performed on two images (initial KF), it is assumed that the initialization is completed, and the subsequent processing is performed. are sequentially performed on images that have not yet been processed. Therefore, subsequent processing is not performed on the two images (initial KF) used for initialization, but is performed on subsequent images.

なお、初期姿勢・座標系設定部５１３は、計算する座標系の決定とともに、初期化処理として以後のトラッキングなどの処理機能で必要とする内部データの作成をおこなうにあたり、上記の方法には限定されない、上記の方法以外の方法も用いておこなうようにしてもよい。 It should be noted that the initial attitude/coordinate system setting unit 513 determines the coordinate system to be calculated and creates internal data necessary for subsequent processing functions such as tracking as initialization processing, and is not limited to the above method. , a method other than the above method may also be used.

以後の処理である、トラッキング処理機能、マッピング処理機能、ループクローズ処理機能は、説明を簡単にするため、シーケンシャルに処理する形とする。実際には複数スレッドを用いた同時処理であってもよい。その場合には、各処理機能がそれぞれ、内部保持するＫＦ位置・姿勢や、実座標環境マップを相互参照するので、適宜、既存の編集ロック機能などを用いて複数処理での同時編集を防ぐことができる。各処理機能は、処理対象の画像がなくなるまで、映像の各画像を順に処理していくこととする。 The tracking processing function, the mapping processing function, and the loop closing processing function, which are subsequent processes, are assumed to be sequentially processed for the sake of simplicity of explanation. In practice, it may be simultaneous processing using multiple threads. In that case, each processing function mutually references the internally held KF position/orientation and the real coordinate environment map, so it is necessary to prevent simultaneous editing in multiple processes by appropriately using the existing edit lock function. can be done. Each processing function sequentially processes each image of the video until there are no more images to be processed.

（フレーム姿勢推定部５２１の内容）
図１０において、位置姿勢推定（トラッキング）処理機能５２０を担当するフレーム姿勢推定部５２１は、通常の処理に失敗した時の対応処理（後述するリローカリゼーション失敗時の処理）以外は、従来のＶ－ＳＬＡＭと同様の処理をおこなう。すなわち、フレーム姿勢推定部５２１は、カメラ移動した入力新画像（歪み補正済）に対し画像特徴群を算出し、画像特徴量を比較することで同じ特徴点と思われる算出済の３Ｄ特徴点群の位置（実座標環境マップ５５０）を得る。 (Contents of Frame Orientation Estimation Unit 521)
In FIG. 10, a frame orientation estimation unit 521 in charge of a position/orientation estimation (tracking) processing function 520 performs the conventional V-frame orientation estimation unit 521 except for processing when normal processing fails (processing when relocalization fails, which will be described later). The same processing as SLAM is performed. That is, the frame orientation estimation unit 521 calculates an image feature group for the input new image (distortion corrected) after camera movement, and compares the image feature amounts to calculate the calculated 3D feature point group that is considered to be the same feature point. position (real coordinate environment map 550).

このとき、定速走行とみなし新画像カメラの初期位置・姿勢を推定し、１つ前の画像で用いた３Ｄ特徴点群を、推定した初期位置・姿勢を用いて新画像上に投影する。そして、その付近で対応する特徴点を検索することによって、同じ特徴点と思われる３Ｄ特徴点の候補を絞り込むようにしてもよい。 At this time, assuming that the vehicle is traveling at a constant speed, the initial position/orientation of the new image camera is estimated, and the 3D feature point group used in the previous image is projected onto the new image using the estimated initial position/orientation. Then, by retrieving corresponding feature points in the vicinity thereof, 3D feature point candidates that are considered to be the same feature point may be narrowed down.

その後、同じ特徴点として見つかった３Ｄ特徴点群全体に対し、新画像上への再投影誤差が小さくなるように、新画像カメラの位置・姿勢を最適化する。すなわち、３Ｄ特徴点群の位置は変化させずに、カメラ位置・姿勢だけ最適化するＢＡをおこなう。続けて、新画像と３Ｄ特徴点群を共有する第１のＫＦ群を探して、ついで、当該第１のＫＦ群１と３Ｄ特徴点群を共有する第２のＫＦ群を探し、これらの第１のＫＦ群、第２のＫＦ群の３Ｄ特徴点群を得る。 After that, the position/orientation of the new image camera is optimized so that the reprojection error onto the new image is small for the entire 3D feature point group found as the same feature point. That is, BA is performed to optimize only the camera position/orientation without changing the position of the 3D feature point group. Subsequently, searching for a first KF group sharing a 3D feature point group with the new image, then searching for a second KF group sharing a 3D feature point group with the first KF group 1, One KF group, a second KF group of 3D feature points are obtained.

このとき、得た３Ｄ特徴点群に対し、新画像のカメラ位置からの距離（規定距離の範囲内）や、カメラからの閲覧方向の違い（たとえば、新画像のカメラ位置から該３Ｄ特徴点へ向けた閲覧方向ベクトルと、今迄のＫＦ群のカメラ位置から該特徴点へ向けた閲覧方向ベクトルとの内積の大きさが規定値以上）などを用いて、任意の取捨選択をしてもよい。第１のＫＦ群、第２のＫＦ群から得た、より多くの３Ｄ特徴点群を使って、フレーム姿勢推定部５２１は、再度新画像上に投影し、再投影誤差が小さくなるような位置・姿勢最適化を実行する。 At this time, for the obtained 3D feature point group, the distance from the camera position of the new image (within the range of specified distance) and the difference in the viewing direction from the camera (for example, from the camera position of the new image to the 3D feature point The size of the inner product of the directed viewing direction vector and the viewing direction vector directed from the camera position of the KF group so far to the feature point is greater than or equal to a specified value). . Using more 3D feature point groups obtained from the first KF group and the second KF group, the frame orientation estimation unit 521 projects again onto the new image, and reprojects the position such that the reprojection error becomes small. - Perform posture optimization.

フレーム姿勢推定部５２１は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the function of frame orientation estimation section 521 can be realized by CPU 601 executing a program stored in memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

なお、フレーム姿勢推定部５２１で、十分な３Ｄ特徴点群が得られないなどの理由で、位置・姿勢推定に失敗した場合には、従来のＶ－ＳＬＡＭと同様に、位置・姿勢復帰処理であるリローカリゼーション処理を実施するようにしてもよい。リローカリゼーション処理では、画像特徴的に似たＫＦがないか全ＫＦ群を探索し、似たＫＦ候補群が見つかったら、それらＫＦの３Ｄ特徴点群と新画像の特徴点群とのマッチングをおこない、マッチング数が多いＫＦを最終的なＫＦとして選定するようにしてもよい。 Note that if the frame orientation estimation unit 521 fails to estimate the position/orientation due to reasons such as insufficient 3D feature point groups being obtained, the position/orientation restoration processing is performed in the same manner as in the conventional V-SLAM. Some relocalization process may be performed. In the relocalization process, all KF groups are searched for KFs similar in image characteristics, and when a similar KF candidate group is found, the 3D feature point group of these KFs and the feature point group of the new image are matched. , a KF with a large matching number may be selected as the final KF.

リローカリゼーション処理は、つぎに、当該ＫＦと新画像のマッチングした特徴点群どうしを使って、より少数の特徴点を用いた既知のＰｎＰ問題を解くことによって、初期位置・姿勢推定をおこなう。そして、得た新たな新画像の初期位置・姿勢から、より多くの特徴点群を使った非線形最小二乗法などの任意の最適化手法を用いて、位置・姿勢最適化を実施し、当該新画像の推定カメラ位置・姿勢とする。 The relocalization process then uses the matched feature points of the KF and the new image to perform an initial position and pose estimation by solving a known PnP problem with a smaller number of feature points. Then, from the initial position/posture of the obtained new image, the position/posture is optimized using an arbitrary optimization method such as the nonlinear least-squares method using a larger number of feature points. This is the estimated camera position/orientation of the image.

ここまで、本システム５００のフレーム姿勢推定部５２１は、従来のＶ－ＳＬＡＭと同じ処理を実施する。一方で、本システム５００のフレーム姿勢推定部５２１は、上述したリローカリゼーション処理も失敗した場合の処理が、従来のＶ－ＳＬＡＭと異なる。リローカリゼーション処理も失敗した場合には、従来のＶ－ＳＬＡＭでは、処理続行が不能であるため、そのまま処理を終了する。しかし、本システム５００の場合は、処理終了をする代わりに初期姿勢・座標系設定部５１３に戻って、既存の実座標環境マップ５５０や全画像位置姿勢データ５６０などの内部算出データを残したまま、初期化処理を再実施することができる。 Up to this point, the frame orientation estimation unit 521 of the present system 500 performs the same processing as in conventional V-SLAM. On the other hand, frame orientation estimation section 521 of system 500 differs from conventional V-SLAM in the processing performed when the relocalization processing described above also fails. If the relocalization process also fails, the conventional V-SLAM cannot continue the process, so the process is terminated. However, in the case of the present system 500, instead of terminating the process, it returns to the initial orientation/coordinate system setting unit 513, leaving the existing internally calculated data such as the actual coordinate environment map 550 and all image position and orientation data 560. , the initialization process can be re-performed.

従来のＶ－ＳＬＡＭでは、リローカリゼーション失敗は、今迄の追跡してきた画像およびＫＦ群と対応関係がまったく取れなくなったことを意味する。上述したように、従来のＶ－ＳＬＡＭは、算出するＳＬＡＭローカル系が初期化で用いた初期画像に関係する座標系であるため、一度対応関係が取れなくなると、以後、再度初期化処理をおこなっても、今迄の算出した環境マップとは異なる新たな初期画像に関する座標系で計算を始めるため、実質的に対応が取れなくなったところまでと、再度初期化を始めて以降とでは、算出する環境マップおよび、カメラの位置・姿勢の値の対応が取れず、実質的に別物の細切れとなってしまう。 In the conventional V-SLAM, relocalization failure means that the images and the KF group that have been tracked so far cannot be matched at all. As described above, in the conventional V-SLAM, the SLAM local system to be calculated is a coordinate system related to the initial image used in the initialization. However, since the calculation is started in a coordinate system related to a new initial image that is different from the environment map calculated so far, the environment to be calculated will be different until the point where correspondence is practically lost and after starting initialization again. The map and the camera position/orientation values cannot be matched, so they are essentially separate pieces.

このため、従来のＶ－ＳＬＡＭは、リローカリゼーション失敗時には、初期化処理をおこなっても意味が無いため、初期化処理をおこなわずに処理終了としていた。しかし、本システム５００では、初期化処理として初期化後の座標系およびＶ－ＳＬＡＭの値は、すべて実座標系とすることができるので、今迄追跡してきた画像およびＫＦ群との対応関係が取れなくなっても、実座標系の値である以上、算出する環境マップやカメラ位置・姿勢の値は整合性の取れた値となっている。 Therefore, in the conventional V-SLAM, when relocalization fails, there is no point in performing the initialization process, so the process ends without performing the initialization process. However, in the present system 500, the coordinate system after initialization and V-SLAM values can all be real coordinate systems as an initialization process, so that the correspondence between the images and the KF group tracked so far is Even if they cannot be obtained, the values of the environment map and the camera position/orientation to be calculated are consistent values as long as they are the values of the real coordinate system.

これにより、初期化の前後で算出する環境マップおよびカメラ位置・姿勢の値は、そのまま両方を混ぜて保持しても問題がないため、本システム５００においては、従来のＶ－ＳＬＡＭと異なり、リローカリゼーション失敗時には、初期姿勢・座標系設定部５１３による処理を再度実施する。このとき、上述したように、特に多数かつ頻繁に参照をする特徴点群の３次元位置は、ＳＬＡＭローカル座標系の値と実座標系への変換行列として保持していると、初期化処理を実行するたびに、双方の値が変わってしまう（積算した実座標系の値は同じ）可能性があって煩雑なため、できるだけ実座標系の値にして保持しておくことが望ましい。 As a result, there is no problem even if the values of the environment map and the camera position/orientation calculated before and after the initialization are mixed and held as they are. When the localization fails, the processing by the initial attitude/coordinate system setting unit 513 is performed again. At this time, as described above, if the three-dimensional positions of feature points that are particularly frequently referred to are stored as values in the SLAM local coordinate system and conversion matrices to the real coordinate system, initialization processing is performed. Each time it is executed, both values may change (the integrated values in the real coordinate system are the same), which is complicated, so it is desirable to hold the values in the real coordinate system as much as possible.

（ＫＦ更新部５２２の内容）
図１０において、位置姿勢推定（トラッキング）処理機能５２０を担当するＫＦ（キーフレーム）更新部５２２は、従来のＶ－ＳＬＡＭのように画像特徴的に新画像をＫＦにするか否かを判定する。 (Contents of KF update unit 522)
In FIG. 10, a KF (key frame) updating unit 522 in charge of a position/orientation estimation (tracking) processing function 520 determines whether or not to make a new image into a KF based on image characteristics like conventional V-SLAM. .

従来のＶ－ＳＬＡＭの画像特徴的なＫＦにするか否かの判定は、たとえば、最後のＫＦからの経過時間や経過フレーム数が規定値を超えた場合や、フレーム姿勢推定部で取得したＫＦ群１のうち、最も新画像と３Ｄ特徴点群を共有するＫＦとの共有３Ｄ特徴点群数が、規定数以下だった場合、などである。その後、新追加するＫＦに対し、ＫＦ更新部は、ＫＦとされた新画像を実座標環境マップのＫＦ群に追加する。上述したように、特徴点を共有するＫＦ群どうしで別途グラフ構造（ＫＦ群１）を保持している場合には、適宜、新追加するＫＦ（新画像）についても、該グラフ構造を更新する。 The determination of whether or not to use a KF that is characteristic of a conventional V-SLAM image is made, for example, when the elapsed time or the number of elapsed frames from the last KF exceeds a specified value, or when the KF acquired by the frame posture estimation unit For example, in group 1, the number of shared 3D feature point groups with the KF that shares the 3D feature point group most with the new image is less than or equal to the specified number. After that, for the newly added KF, the KF updating unit adds the new image, which is KF, to the KF group of the real coordinate environment map. As described above, when a separate graph structure (KF group 1) is held between groups of KFs sharing feature points, the graph structure is updated as appropriate for newly added KFs (new images). .

また、本システム５００のＫＦ更新部５２２は、さらにＧＮＳＳ位置を保持する画像か否かも用いて、新たにＫＦ画像を選定するようにしてもよい。すなわち、すべての画像にＧＮＳＳ位置が無く、ＧＮＳＳ位置が無い画像が規定数以上続いた場合において、ＧＮＳＳ位置がある画像が新画像として入力された場合に、画像特徴的な従来の判定の結果に関わらず、新画像を新しいＫＦとして採用する。 Further, the KF update unit 522 of the system 500 may select a new KF image based on whether or not the image holds the GNSS position. That is, when all the images have no GNSS position and the number of images without the GNSS position continues for a specified number or more, and the image with the GNSS position is input as a new image, the result of the conventional determination characteristic of the image is Regardless, the new image is taken as the new KF.

ＫＦ更新部５２２は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the KF updating unit 522 can realize its function by the CPU 601 executing a program stored in the memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

図１２は、ＫＦ更新部の処理の手順の一例を示すフローチャートである。なお、ＫＦ更新部５２２の処理の手順は、このフローチャートの手順には限られない。図１２のフローチャートにおいて、ＫＦ更新部５２２において、現在の画像が、現在利用中のＫＦから規定フレーム数以上離れているか否かについて判断する（ステップＳ１２０１）。ここで、規定フレーム数以上離れていない場合（ステップＳ１２０１：Ｎｏ）は、何もせずに、一連の処理を終了する。 FIG. 12 is a flowchart illustrating an example of a procedure of processing by the KF updating unit; Note that the procedure of the processing of the KF updating unit 522 is not limited to the procedure of this flowchart. In the flowchart of FIG. 12, the KF updating unit 522 determines whether or not the current image is separated from the currently used KF by a specified number of frames or more (step S1201). Here, if the distance is less than the specified number of frames (step S1201: No), the series of processing ends without doing anything.

一方、規定フレーム数以上離れている場合（ステップＳ１２０１：Ｙｅｓ）は、つぎに、現在の画像が、現在利用中のＫＦとの共通特徴点が規定数以下か否かを判断する（ステップＳ１２０２）。ここで、共通特徴点が規定数以下でない場合（ステップＳ１２０２：Ｎｏ）は、ステップＳ１２０４へ移行する。一方、共通特徴点が規定数以下である場合（ステップＳ１２０２：Ｙｅｓ）は、つぎに、現在の画像フレームが、現在利用中のＫＦと最も共通特徴点が多い他のＫＦに対し、共通特徴点が規定数以下か否かを判断する（ステップＳ１２０３）。 On the other hand, if the distance is greater than or equal to the specified number of frames (step S1201: Yes), then it is determined whether or not the current image has a common feature point with the KF currently in use less than or equal to the specified number (step S1202). . Here, if the number of common feature points is not equal to or less than the specified number (step S1202: No), the process proceeds to step S1204. On the other hand, if the number of common feature points is equal to or less than the specified number (step S1202: Yes), next, the current image frame is compared with the currently used KF and another KF having the largest number of common feature points. is equal to or less than a specified number (step S1203).

ステップＳ１２０３において、共通特徴点が規定数以下でない場合（ステップＳ１２０３：Ｎｏ）は、ステップＳ１２０４へ移行する。一方、共通特徴点が規定数以下である場合（ステップＳ１２０３：Ｙｅｓ）は、ステップＳ１２０６へ移行する。つぎに、ステップＳ１２０４において、現在の画像が、ＧＮＳＳの位置情報を保持しているか否かを判断するようにしてもよい（ステップＳ１２０４）。ここで、ＧＮＳＳの位置情報を保持していない場合（ステップＳ１２０４：Ｎｏ）は、一連の処理を終了する。一方、ＧＮＳＳの位置情報を保持している場合（ステップＳ１２０４：Ｙｅｓ）は、ステップＳ１２０５へ移行する。 In step S1203, if the number of common feature points is not equal to or less than the specified number (step S1203: No), the process proceeds to step S1204. On the other hand, if the number of common feature points is equal to or less than the specified number (step S1203: Yes), the process proceeds to step S1206. Next, in step S1204, it may be determined whether or not the current image holds GNSS position information (step S1204). Here, if the GNSS position information is not held (step S1204: No), a series of processing ends. On the other hand, when the position information of GNSS is held (step S1204: Yes), it transfers to step S1205.

ステップＳ１２０５において、現在のＫＦは、ＧＮＳＳ位置情報を保持する最も新しいＫＦから規定ＫＦ数以上離れているか否かを判断するようにしてもよい（ステップＳ１２０５）。ここで、最も新しいＫＦから規定ＫＦ数以上離れている場合（ステップＳ１２０５：Ｙｅｓ）は、ステップＳ１２０６へ移行する。一方、離れていない場合（ステップＳ１２０５：Ｎｏ）は、一連の処理を終了する。 In step S1205, it may be determined whether or not the current KF is separated from the latest KF holding GNSS position information by a specified number of KFs or more (step S1205). Here, if the distance from the newest KF is greater than or equal to the prescribed number of KFs (step S1205: Yes), the process proceeds to step S1206. On the other hand, if they are not separated (step S1205: No), the series of processing ends.

ステップＳ１２０６において、現在の画像を新ＫＦとする（ステップＳ１２０６）。そして、新ＫＦを実座標環境マップのＫＦ群に追加する（ステップＳ１２０７）。さらに、ＫＦ群の特徴点共有関係のグラフ構造に新ＫＦを追加し、グラフを更新する（ステップＳ１２０８）。これにより、一連の処理を終了する。 At step S1206, the current image is set as the new KF (step S1206). Then, the new KF is added to the KF group of the real coordinate environment map (step S1207). Furthermore, the new KF is added to the graph structure of the feature point sharing relationship of the KF group to update the graph (step S1208). This completes a series of processes.

なお、ＫＦの追加判断だけを、トラッキング処理機能５２０を担当する処理部いずれか（たとえばＫＦ更新部５２２）でおこない、実際のＫＦ追加処理を独立させ、ローカルマッピング処理機能５３０を担当する処理部５３１～５３３のいずれかで実施するようにしてもよい。 Only the KF addition determination is performed by one of the processing units in charge of the tracking processing function 520 (for example, the KF updating unit 522), the actual KF addition processing is made independent, and the processing unit 531 in charge of the local mapping processing function 530 533 may be implemented.

また、本実施の形態においては、ＧＮＳＳ位置情報を取得していなくてもよく、その場合は、ステップＳ１２０４およびＳ１２０５の各処理は実行せずに、省略するようにしてもよい。 Moreover, in the present embodiment, the GNSS position information may not be acquired, and in that case, the processes of steps S1204 and S1205 may be omitted without being executed.

本システム５００においては、ＫＦ追加処理を、ＫＦ更新部５２２で実施するものとして説明した。しかし、トラッキング処理機能５２０は、全画像フレームに対する処理であり、ローカルマッピング処理機能５３０は、ＫＦ追加タイミングで実施するＫＦに関する処理として考える方を優先させるなら、ＫＦ追加処理を実施するか否かの判断のみをトラッキング処理機能５２０を担当する処理部５２１、５２２のいずれかで実施し、実際のＫＦ追加処理は、ローカルマッピング処理機能５３０を担当する処理部５３１～５３３のいずれかで実施した方がよい。従来のＶ－ＳＬＡＭにおいても、ＫＦ追加処理自体は、ローカルマッピング処理機能５３０を担当する処理部５３１～５３３のいずれかで実施することが多い。 In the present system 500, the KF addition process has been described as being performed by the KF updating unit 522. FIG. However, the tracking processing function 520 is a process for all image frames, and the local mapping processing function 530 determines whether or not to implement the KF addition process if priority is given to the process related to the KF to be executed at the KF addition timing. It is preferable that only the determination is performed by one of the processing units 521 and 522 in charge of the tracking processing function 520, and the actual KF addition processing is performed by one of the processing units 531 to 533 in charge of the local mapping processing function 530. good. Also in conventional V-SLAM, the KF addition process itself is often carried out by one of the processing units 531 to 533 in charge of the local mapping processing function 530 .

（３Ｄマップ特徴点更新部５３１の内容）
環境マップ作成（ローカルマッピング）処理機能５３０を担当する３Ｄマップ特徴点更新部５３１は、従来のＶ－ＳＬＡＭと同様の、追加したＫＦを使って最近追加した３Ｄマップ点の除去判断を実施するとともに、新たな３Ｄマップ点の追加処理を実行する。 (Contents of 3D map feature point update unit 531)
A 3D map feature point updating unit 531 in charge of the environment map creation (local mapping) processing function 530 uses the added KF to determine the removal of recently added 3D map points, as in the conventional V-SLAM. , performs the process of adding new 3D map points.

３Ｄマップ特徴点更新部５３１は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the 3D map feature point updating unit 531 can realize its function by the CPU 601 executing a program stored in the memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

３Ｄマップ点の除去判断処理として、３Ｄマップ特徴点更新部５３１は、最近追加した３Ｄマップ点群が、新追加ＫＦを含めたＫＦ群全体で規定数以上のＫＦから閲覧できているか否かなどから、３Ｄマップ点が利用されているか否かを判定する。そして、３Ｄマップ点が利用されていないと判断された場合には、３Ｄマップ点を除去する。 As the 3D map point removal determination process, the 3D map feature point updating unit 531 determines whether or not the recently added 3D map point group can be browsed from a specified number or more of KFs in the entire KF group including the newly added KF. , to determine if a 3D map point is being used. Then, if it is determined that the 3D map point is not used, the 3D map point is removed.

なお、３Ｄマップ特徴点更新部５３１では、除去判断だけおこない、実際の除去処理は、続くＫＦ姿勢・特徴点マップ最適化部５３３のＢＡなどの３Ｄマップ点の利用有無を別途詳細に調査利用する処理などと同時に実施してもよい。 Note that the 3D map feature point update unit 531 performs only removal determination, and the actual removal process is performed by separately investigating whether or not 3D map points such as BA in the subsequent KF posture/feature point map optimization unit 533 are used. You may carry out simultaneously with processing.

新３Ｄマップ追加処理として、３Ｄマップ特徴点更新部５３１は、追加した新ＫＦで３Ｄ特徴点群と対応付いていない特徴点を探し、ＫＦ更新部５２２で更新した新ＫＦと特徴点を共有する第１のＫＦ群の同じく対応付いていない特徴点と、画像特徴量から同じ特徴点を探す。このとき、さらに当該ＫＦでのエピポーラ制約や再投影誤差などの任意の方法で、同じ特徴点か否かを絞り込んでもよい。同じ特徴点が見つかった場合には、２つのＫＦのカメラ位置と、当該ＫＦ上の画像に映る同じ特徴点の画像内位置を用いて、既知の三角測量の手法を使って、当該特徴点の３次元位置を求め、新たな３Ｄ特徴点として、実座標環境マップに追加する。 As the new 3D map addition process, the 3D map feature point update unit 531 searches for feature points that are not associated with the 3D feature point group in the added new KF, and shares the feature points with the new KF updated by the KF update unit 522. A feature point that is the same as the feature point that is not associated with the first KF group and the image feature amount is searched for. At this time, it may be further narrowed down whether or not the feature points are the same by an arbitrary method such as epipolar constraint or reprojection error in the KF. When the same feature point is found, using the camera positions of the two KFs and the in-image position of the same feature point in the image on the KF, a known triangulation method is used to determine the feature point. A 3D position is obtained and added to the real coordinate environment map as a new 3D feature point.

（グラフ制約生成部５３２の内容）
環境マップ作成（ローカルマッピング）処理機能５３０を担当するグラフ制約生成部５３２は、続くＫＦ姿勢・特徴点マップ最適化部５３３で、従来のように現キーフレームと周辺の特徴点群の３次元位置をＢＡで求める処理（ローカルＢＡ）をおこなう前に、新たに、現キーフレームの位置と周辺の特徴点群の３次元位置を入力ＧＮＳＳ情報に合わせて姿勢グラフを用いて事前に補正する処理を実施するため、その準備をおこなう処理部である。 (Contents of graph constraint generator 532)
A graph constraint generation unit 532 in charge of the environment map creation (local mapping) processing function 530 performs the following KF posture/feature point map optimization unit 533 to perform the three-dimensional positions of the current key frame and surrounding feature points as in the conventional case. is calculated by BA (local BA), the position of the current key frame and the 3D position of the surrounding feature point group are newly corrected in advance using the posture graph according to the input GNSS information. It is a processing unit that prepares for implementation.

グラフ制約生成部５３２は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the graph constraint generation unit 532 can realize its function when the CPU 601 executes a program stored in the memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

姿勢グラフという概念、および、グラフ構造を用いた最適化計算自体は、一般的なグラフ理論（姿勢グラフ構造）に基づく最適化計算と同じでもよく、既存のｇ２ｏ（ＧｅｎｅｒａｌＧｒａｐｈＯｐｔｉｍｉｚａｔｉｏｎ）などの最適化ライブラリを用いてもよい。 The concept of posture graph and the optimization calculation itself using the graph structure may be the same as the optimization calculation based on general graph theory (posture graph structure), and the existing optimization such as g2o (General Graph Optimization) A library may be used.

グラフ制約生成部５３２は、この一般的なグラフ構造を活用し、ＫＦ位置・姿勢のみの最適化（実座標環境マップのうちＫＦ情報群のおおまかな最適化）と、当該最適化後のＫＦ群と周辺の特徴点群の双方を使った位置・姿勢の最適化（実座標環境マップ全体の詳細最適化）、という２段階の最適化向けに、それぞれ最適化対象（ノード）と拘束条件（エッジ）の異なる２つの姿勢グラフを作成する。 The graph constraint generation unit 532 utilizes this general graph structure to optimize only the KF position/orientation (rough optimization of the KF information group in the real coordinate environment map) and the optimized KF group and surrounding feature point groups (detailed optimization of the entire real coordinate environment map). ) to create two different pose graphs.

（ＫＦ姿勢・特徴点マップ最適化部５３３の内容）
環境マップ作成（ローカルマッピング）処理機能５３０を担当するＫＦ姿勢・特徴点マップ最適化部５３３は、新たに、グラフ制約生成部５３２において生成した新しい２つの姿勢グラフを用いて、一般的なグラフ最適化計算をおこなう。 (Contents of the KF posture/feature point map optimization unit 533)
A KF posture/feature point map optimization unit 533 in charge of the environment map creation (local mapping) processing function 530 uses two new posture graphs generated by the graph constraint generation unit 532 to perform general graph optimization. calculation.

ＫＦ姿勢・特徴点マップ最適化部５３３は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the KF posture/feature point map optimization unit 533 can realize its function by causing the CPU 601 to execute a program stored in the memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

（ループ検出・クロージング部５４１の内容）
ループクローズ処理機能５４０を担当するループ検出・クロージング部５４１は、従来のＶ－ＳＬＡＭと同様に、新ＫＦと保持するＫＦ画像群との画像全体の画像特徴量を比較して類似度を調べて、映像取得時の走行経路で同じ場所を複数回走行していないか（ループが発生していないか）を確認する。そして、類似度が高く同じ場所を走行していると思われた場合には、関係する実座標環境マップ５５０のＫＦ群情報５５１の「ループＫＦＩＤ」に、該当する過去に同場所走行した時のＫＦ群を設定して、互いの参照を可能としておく。 (Contents of the loop detection/closing unit 541)
A loop detection/closing unit 541 in charge of the loop closing processing function 540 compares the image feature amounts of the whole image of the new KF and the retained KF image group to check the similarity in the same way as in the conventional V-SLAM. , Make sure that the vehicle does not travel in the same place multiple times (whether a loop occurs) on the driving route at the time of image acquisition. Then, if the similarity is high and it is thought that they are traveling in the same place, the "Loop KF ID" of the KF group information 551 of the related real coordinate environment map 550 indicates the time when they traveled in the same place in the past. , so that they can refer to each other.

また、ループ検出・クロージング部５４１は、ループ発生時の新ＫＦに対し、新ＫＦ近傍のＫＦ群を用いたローカルＢＡ、または、全ＫＦ群を用いたグローバルＢＡのいずれかを実施して、同場所走行時のＫＦの位置関係を調整する。新ＫＦ近傍のＫＦ群は、マップ特徴点の共有状態などから選定してもよく、過去に同場所走行した時のＫＦ群との共有状態を利用してもよい。 In addition, the loop detection/closing unit 541 performs either local BA using the KF group near the new KF or global BA using the entire KF group for the new KF at the time of loop occurrence. Adjust the positional relationship of KF when traveling to a place. The KF group in the vicinity of the new KF may be selected from the sharing state of the map feature points, etc., or the sharing state with the KF group when traveling at the same place in the past may be used.

ループ検出・クロージング部５４１は、具体的には、たとえば、図６に示した、メモリ６０２に記憶されたプログラムをＣＰＵ６０１が実行することによって、その機能を実現することができる。また、具体的には、たとえば、図７に示した、メモリ７０２に記憶されたプログラムをＣＰＵ７０１が実行することによって、その機能を実現するようにしてもよい。 Specifically, the function of the loop detection/closing unit 541 can be realized by the CPU 601 executing a program stored in the memory 602 shown in FIG. 6, for example. More specifically, for example, the function may be realized by causing the CPU 701 to execute a program stored in the memory 702 shown in FIG.

なお、本システム５００においては、上述した各処理を通して、入力ＧＮＳＳ情報１００２を用いてスケールドリフトが起きないように、実座標環境マップ５５０を構築済である。したがって、スケールドリフト対策が主体のループ検出・クロージング部５４１における処理は省略してもよい。 In the present system 500, the actual coordinate environment map 550 has been constructed through the above-described processes using the input GNSS information 1002 so that scale drift does not occur. Therefore, the processing in the loop detection/closing unit 541, which mainly deals with scale drift countermeasures, may be omitted.

以上説明したように、本実施の形態によれば、サーバ（情報処理装置／移動体位置推定装置）５０１が、撮影された時系列画像（映像）１００１のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理（Ｓ２１６～Ｓ２２２）をおこなうにあたり、逆行画像取得部５１２が、撮影画像を時刻に対して逆行させた逆行画像を取得し（Ｓ２１２）、当該逆行画像を用いて前記情報処理をおこなう。 As described above, according to the present embodiment, the server (information processing device/moving body position estimation device) 501 processes any image out of the captured time-series images (video) 1001 as In performing information processing (S216 to S222) for estimating the shooting position and orientation of the arbitrary image and generating the environment map from the characteristics of the image, the retrograde image acquisition unit 512 obtains a retrograde image in which the captured image is reversed with respect to time. is acquired (S212), and the information processing is performed using the retrograde image.

また、サーバ（情報処理装置／移動体位置推定装置）５０１が、撮影された時系列画像（１００１）のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理（Ｓ２１６～Ｓ２２２）をおこなうにあたり、逆行画像取得部５１２が、相互に区別可能な特徴量を持つ画像特徴点が映る画像群のうち、最も遅い時刻近傍で撮影した画像を用いて（Ｓ２１５）前記情報処理をおこなう。 In addition, the server (information processing device/moving body position estimation device) 501 calculates the shooting position/location of the arbitrary image from the characteristics of the arbitrary image among the photographed time-series images (1001). When performing information processing (S216 to S222) for posture estimation and environment map generation, the retrograde image acquisition unit 512 captures images near the latest time in a group of images showing image feature points having mutually distinguishable feature amounts. The image is processed (S215).

これにより、用いる画像が、従来における「最も遠方に被写体の画像特徴点が映った時（遠方に見え始めた時）」だったものから、「最も近傍に被写体の画像特徴点が映った時（最後に見えた時）」に変更することができ、それによって、映像内変化から推定する特徴点の３次元位置推定の精度が向上し、高精度な周辺環境マップ（周辺特徴点群の３次元位置群）の作成が可能となる。 As a result, the image to be used is changed from "when the image feature point of the subject appears at the farthest point (when it starts to appear far away)" to "when the image feature point of the subject appears closest ( This will improve the accuracy of 3D position estimation of feature points estimated from changes in the image, and create a highly accurate surrounding environment map (3D map of surrounding feature points). Position group) can be created.

また、本実施の形態によれば、サーバ（情報処理装置／移動体位置推定装置）５０１の逆行判定部５１１が、逆行画像を取得するか否かを判定する。具体的には、たとえば、時系列画像の撮影方向に基づいて（Ｓ２０１）、逆行画像を取得するか否かを判定し、移動体の前方を撮影している場合に、逆行画像を取得すると判定する。また、時系列画像のフレームレートに基づいて（Ｓ２０２、Ｓ２０５）、逆行画像を取得するか否かを判定し、フレームレートが所定のしきい値以上の場合は、逆行画像を取得すると判定する。また、逆行未実施時の撮影位置姿勢推定結果の状態または精度に基づいて（Ｓ２０４、Ｓ２０７）、あるいは、環境地図の精度に基づいて（Ｓ２０４、Ｓ２０７）、逆行画像を取得するか否かを判定する。したがって、いずれの場合においても、逆行画像を用いることが有効な場合のみ、当該逆行画像を利用することができる。 Further, according to the present embodiment, the retrograde determination unit 511 of the server (information processing device/mobile body position estimation device) 501 determines whether or not to acquire a retrograde image. Specifically, for example, based on the shooting direction of the time-series images (S201), it is determined whether or not to acquire a retrograde image. do. Also, based on the frame rate of the time-series images (S202, S205), it is determined whether or not to acquire the retrograde image, and if the frame rate is equal to or higher than a predetermined threshold, it is determined to acquire the retrograde image. Also, it is determined whether or not to acquire a retrograde image based on the state or accuracy of the shooting position/orientation estimation result when retrograde is not performed (S204, S207), or based on the accuracy of the environment map (S204, S207). do. Therefore, in either case, the retrograde image can be used only when it is effective to use the retrograde image.

本実施の形態にかかる移動体位置推定システムは、具体的には、たとえば、自動運転向け高精度地図更新システム、屋外撮影画像の地図貼り付けアプリケーション、屋外画像・映像内被写体の実座標位置の自動推定アプリケーション、任意道路シーンのＣＧ再現システムなどの幅広いシステムやアプリケーションに応用することができる。 Specifically, the moving body position estimation system according to the present embodiment includes, for example, a high-precision map update system for automatic driving, an application for pasting an image captured outdoors onto a map, an automatic It can be applied to a wide range of systems and applications such as estimation applications and CG reproduction systems for arbitrary road scenes.

なお、本実施の形態で説明した移動体位置推定方法は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することにより実現することができる。プログラム配信プログラムは、ハードディスク、フレキシブルディスク、ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）－ＲＯＭ、ＭＯ（Ｍａｇｎｅｔｏ－ＯｐｔｉｃａｌＤｉｓｋ）、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また、移動体位置推定プログラムは、インターネットなどのネットワークを介して配布してもよい。 The moving body position estimation method described in the present embodiment can be realized by executing a prepared program on a computer such as a personal computer or a work station. The program distribution program is stored in a computer-readable recording medium such as a hard disk, flexible disk, CD (Compact Disc)-ROM, MO (Magneto-Optical Disk), DVD (Digital Versatile Disk), USB (Universal Serial Bus) memory, etc. It is executed by being recorded and read from a recording medium by a computer. Also, the mobile body position estimation program may be distributed via a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 Further, the following additional remarks are disclosed with respect to the above-described embodiment.

（付記１）撮影された時系列画像のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理をおこなう移動体位置推定システムであって、
前記時系列画像を時刻に対して逆行させた逆行画像を取得し、当該逆行画像を用いて前記情報処理をおこなう、
情報処理装置を有することを特徴とする移動体位置推定システム。 (Appendix 1) A mobile object position estimation system that performs information processing for estimating the shooting position and orientation of an arbitrary image from the characteristics of the arbitrary image among the time-series images taken and for generating an environment map. and
Acquiring a retrograde image obtained by reversing the time-series image with respect to time, and performing the information processing using the retrograde image;
A mobile position estimation system comprising an information processing device.

（付記２）前記情報処理装置は、
前記逆行画像を用いて、
相互に区別可能な特徴量を持つ画像特徴点を複数抽出する情報処理、
前記時系列画像の系列中の複数の画像相互間で、前記特徴量が類似する前記画像特徴点どうしを対応付ける情報処理、および、
対応付けられた各特徴点の各前記画像上での２次元位置を用いて、前記撮影位置を推定する情報処理、
の少なくともいずれか一つをおこなうことを特徴とする付記１に記載の移動体位置推定システム。 (Appendix 2) The information processing device is
Using the retrograde image,
Information processing that extracts a plurality of image feature points that have mutually distinguishable feature amounts,
Information processing that associates the image feature points having similar feature amounts among a plurality of images in the sequence of the time-series images; and
Information processing for estimating the photographing position using the two-dimensional position of each associated feature point on each image;
The mobile position estimation system according to appendix 1, characterized by performing at least one of:

（付記３）前記情報処理装置は、前記逆行画像を取得するか否かを判定することを特徴とする付記１または２に記載の移動体位置推定システム。 (Supplementary Note 3) The mobile body position estimation system according to Supplementary Note 1 or 2, wherein the information processing device determines whether or not to acquire the retrograde image.

（付記４）前記情報処理装置は、前記時系列画像の撮影方向に基づいて、前記逆行画像を取得するか否かを判定することを特徴とする付記３に記載の移動体位置推定システム。 (Supplementary note 4) The moving body position estimation system according to Supplementary note 3, wherein the information processing device determines whether or not to acquire the retrograde image based on the shooting direction of the time-series images.

（付記５）前記情報処理装置は、前記時系列画像の撮影方向に基づいて、移動体の前方を撮影している場合は、前記逆行画像を取得すると判定することを特徴とする付記４に記載の移動体位置推定システム。
（付記６）前記情報処理装置は、前記時系列画像のフレームレートに基づいて、前記逆行画像を取得するか否かを判定することを特徴とする付記３～５のいずれか一つに記載の移動体位置推定システム。 (Supplementary Note 5) The information processing device according to Supplementary note 4, wherein the information processing device determines to acquire the retrograde image when the front of the moving object is captured based on the shooting direction of the time-series images. mobile object position estimation system.
(Appendix 6) The information processing device according to any one of appendices 3 to 5, characterized in that it determines whether or not to acquire the retrograde image based on the frame rate of the time-series images. Mobile position estimation system.

（付記７）前記情報処理装置は、前記時系列画像のフレームレートに基づいて、前記フレームレートが所定のしきい値以上の場合は、前記逆行画像を取得すると判定することを特徴とする付記６に記載の移動体位置推定システム。 (Supplementary note 7) Supplementary note 6 characterized in that the information processing device determines to acquire the retrograde image based on the frame rate of the time-series images when the frame rate is equal to or greater than a predetermined threshold value. 3. The mobile body position estimation system according to .

（付記８）前記情報処理装置は、逆行未実施時の撮影位置姿勢推定結果の状態または精度に基づいて、前記逆行画像を取得するか否かを判定することを特徴とする付記３～７のいずれか一つに記載の移動体位置推定システム。 (Supplementary note 8) The information processing device determines whether or not to acquire the retrograde image based on the state or accuracy of the shooting position and orientation estimation result when retrograde is not performed. The mobile position estimation system according to any one.

（付記９）前記情報処理装置は、環境地図の精度に基づいて、前記逆行画像を取得するか否かを判定することを特徴とする付記３～８のいずれか一つに記載の移動体位置推定システム。 (Supplementary Note 9) The moving body position according to any one of Supplementary Notes 3 to 8, wherein the information processing device determines whether or not to acquire the retrograde image based on the accuracy of the environment map. estimation system.

（付記１０）撮影された時系列画像のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理をおこなう移動体位置推定システムであって、
相互に区別可能な特徴量を持つ画像特徴点が映る画像群のうち、最も遅い時刻近傍で撮影した画像を用いて前記情報処理をおこなうことを特徴とする移動体位置推定システム。 (Appendix 10) A moving body position estimation system that performs information processing for estimating the shooting position and orientation of an arbitrary image from the characteristics of the arbitrary image among the captured time-series images and generating an environment map. and
A moving body position estimation system, wherein said information processing is performed using an image captured near the latest time in a group of images in which image feature points having mutually distinguishable feature amounts are captured.

（付記１１）情報処理装置が、
撮影された時系列画像のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理をおこなうにあたり、
前記時系列画像を時刻に対して逆行させた逆行画像を取得し、当該逆行画像を用いて前記情報処理をおこなう、
ことを特徴とする移動体位置推定方法。 (Appendix 11) The information processing device
In performing information processing for estimating the shooting position/orientation of an arbitrary image from the characteristics of the arbitrary image among the photographed time-series images and generating an environment map,
Acquiring a retrograde image obtained by reversing the time-series image with respect to time, and performing the information processing using the retrograde image;
A moving body position estimation method characterized by:

（付記１２）情報処理装置が、
撮影された時系列画像のうちの任意の画像について、当該任意の画像の特徴から、当該任意の画像の撮影位置・姿勢推定および環境地図生成の情報処理をおこなうにあたり、
相互に区別可能な特徴量を持つ画像特徴点が映る画像群のうち、最も遅い時刻近傍で撮影した画像を用いて前記情報処理をおこなう、
ことを特徴とする移動体位置推定方法。 (Appendix 12) The information processing device
In performing information processing for estimating the shooting position/orientation of an arbitrary image from the characteristics of the arbitrary image among the photographed time-series images and generating an environment map,
Performing the information processing using an image captured near the latest time in a group of images showing image feature points having mutually distinguishable feature amounts,
A moving body position estimation method characterized by:

１００被写体（特徴点Ａ）
１０１～１１０画像（撮影位置）
５００移動体位置推定システム
５０１サーバ（情報処理装置／移動体位置推定装置）
５０２車載機（情報収集装置）
５０３移動体
５０４ネットワーク
５０５衛星
５１０システムの初期化処理機能
５１１逆行判定部
５１２逆行画像取得部
５１３初期姿勢・座標系設定部
５２０位置姿勢推定（トラッキング）処理機能
５２１フレーム姿勢推定部
５２２ＫＦ（キーフレーム）更新部
５３０環境マップ作成（ローカルマッピング）処理機能
５３１３Ｄマップ特徴点更新部
５３２グラフ制約生成部
５３３ＫＦ（キーフレーム）姿勢・特徴点マップ最適化部
５４０ループクローズ処理機能
５４１ループ検出・クロージング部
５５０実座標環境マップ
５５１ＫＦ（キーフレーム）群情報
５５２特徴点群情報
５６０全画像位置姿勢データ
１００１映像
１００２ＧＮＳＳ情報
１００３姿勢情報 100 subject (feature point A)
101-110 images (shooting position)
500 Mobile body position estimation system 501 Server (information processing device/mobile body position estimation device)
502 on-vehicle device (information collection device)
503 Mobile body 504 Network 505 Satellite 510 System initialization processing function 511 Retrograde determination unit 512 Retrograde image acquisition unit 513 Initial attitude/coordinate system setting unit 520 Position and orientation estimation (tracking) processing function 521 Frame attitude estimation unit 522 KF (key frame ) update unit 530 environment map creation (local mapping) processing function 531 3D map feature point update unit 532 graph constraint generation unit 533 KF (key frame) posture/feature point map optimization unit 540 loop close processing function 541 loop detection/closing unit 550 real coordinate environment map 551 KF (key frame) group information 552 feature point group information 560 all image position and orientation data 1001 video 1002 GNSS information 1003 orientation information

Claims

A mobile body position estimation system that performs information processing for estimating the shooting position and orientation of an arbitrary image from the characteristics of the arbitrary image and generating an environment map for an arbitrary image out of the captured time-series images,
Whether or not to acquire the retrograde image based on the shooting direction of the time-series image when acquiring the retrograde image obtained by reversing the time-series image and performing the information processing using the retrograde image determine whether
A mobile position estimation system comprising an information processing device.

A mobile body position estimation system that performs information processing for estimating the shooting position and orientation of an arbitrary image from the characteristics of the arbitrary image and generating an environment map for an arbitrary image out of the captured time-series images,
Whether or not to acquire the retrograde image based on the frame rate of the time-series image when acquiring the retrograde image obtained by reversing the time-series image and performing the information processing using the retrograde image determine whether
A mobile position estimation system comprising an information processing device.

A mobile body position estimation system that performs information processing for estimating the shooting position and orientation of an arbitrary image from the characteristics of the arbitrary image and generating an environment map for an arbitrary image out of the captured time-series images,
In acquiring a retrograde image obtained by reversing the time-series images in a retrograde manner and performing the information processing using the retrograde image, the determining whether to acquire a retrograde image ;
A mobile position estimation system comprising an information processing device.

A mobile body position estimation system that performs information processing for estimating the shooting position and orientation of an arbitrary image from the characteristics of the arbitrary image and generating an environment map for an arbitrary image out of the captured time-series images,
A retrograde image obtained by reversing the time-series images in time is acquired, and in performing the information processing using the retrograde image, it is determined whether or not to acquire the retrograde image based on the accuracy of the environmental map. do
A mobile position estimation system comprising an information processing device.

The information processing device is
Using the retrograde image,
Information processing that extracts a plurality of image feature points that have mutually distinguishable feature amounts,
Information processing that associates the image feature points having similar feature amounts among a plurality of images in the sequence of the time-series images; and
Information processing for estimating the photographing position using the two-dimensional position of each associated feature point on each image;
5. The mobile body position estimation system according to any one of claims 1 to 4, wherein at least one of the following is performed.

6. The information processing according to any one of claims 1 to 5, wherein an image taken near the latest time in a group of images showing image feature points having mutually distinguishable feature amounts is used for the information processing. 3. The mobile body position estimation system according to 1 .

The information processing device
A mobile body position estimation method for performing information processing for estimating the shooting position and orientation of an arbitrary image among the captured time-series images and generating an environment map from the characteristics of the arbitrary image, comprising:
Whether or not to acquire the retrograde image based on the shooting direction of the time-series image when acquiring the retrograde image obtained by reversing the time-series image and performing the information processing using the retrograde image determine whether
A moving body position estimation method characterized by:

The information processing device
A mobile body position estimation method for performing information processing for estimating the shooting position and orientation of an arbitrary image among the captured time-series images and generating an environment map from the characteristics of the arbitrary image, comprising:
Whether or not to acquire the retrograde image based on the frame rate of the time-series image when acquiring the retrograde image obtained by reversing the time-series image and performing the information processing using the retrograde image determine whether
A moving body position estimation method characterized by:

The information processing device
A mobile body position estimation method for performing information processing for estimating the shooting position and orientation of an arbitrary image among the captured time-series images and generating an environment map from the characteristics of the arbitrary image, comprising:
In acquiring a retrograde image obtained by reversing the time-series images in a retrograde manner and performing the information processing using the retrograde image, the determining whether to acquire a retrograde image ;
A moving body position estimation method characterized by:

The information processing device
A mobile body position estimation method for performing information processing for estimating the shooting position and orientation of an arbitrary image among the captured time-series images and generating an environment map from the characteristics of the arbitrary image, comprising:
A retrograde image obtained by reversing the time-series images in time is acquired, and in performing the information processing using the retrograde image, it is determined whether or not to acquire the retrograde image based on the accuracy of the environmental map. do
A moving body position estimation method characterized by: