JP2018081008A

JP2018081008A - Self-position posture locator using reference video map

Info

Publication number: JP2018081008A
Application number: JP2016223563A
Authority: JP
Inventors: 岩根　和郎; Kazuo Iwane; 和郎岩根
Original assignee: IWANE LABORATORIES Ltd
Current assignee: IWANE LABORATORIES Ltd
Priority date: 2016-11-16
Filing date: 2016-11-16
Publication date: 2018-05-24
Anticipated expiration: 2036-11-16
Also published as: JP6821154B2

Abstract

【課題】移動する車両等の位置と姿勢を標定する自己位置姿勢標定を、簡易かつ低コストで高精度に求める。【解決手段】ＣＶ映像取得装置１０で撮影された基準映像に基づいて、基準映像のカメラ位置と姿勢の三次元座標値を示すＣＶ（カメラベクトル）値を求めるＣＶ演算を行い、基準映像にＣＶ値を付加したＣＶ映像地図を生成するＣＶ映像地図作成装置２０と、ＣＶ映像地図を記憶するＣＶ映像地図データベース３０と、目的移動体４０で撮影された目的画像に対応するＣＶ映像地図を読み出すＣＶ映像地図・目的画像比較装置５０と、読み出されたＣＶ映像地図に付加されたＣＶ値を、対応する目的画像の特徴点に移植することにより、当該目的画像のＣＶ値を取得する自己位置姿勢標定装置６０を備え、自己位置姿勢標定装置６０が、目的画像の対応する特徴点に移植するＣＶ値として、ＣＶ映像地図に含まれる所定の特徴量に付加されたＣＶ値を選択する構成としてある。【選択図】図１４A self-position and orientation determination for determining the position and orientation of a moving vehicle or the like is obtained simply and at low cost with high accuracy. Based on a reference image captured by a CV image acquisition device, CV calculation for obtaining a CV (camera vector) value indicating a three-dimensional coordinate value of a camera position and orientation of the reference image is performed, and the CV is calculated on the reference image. A CV video map creation device 20 that generates a CV video map to which a value is added, a CV video map database 30 that stores the CV video map, and a CV that reads a CV video map corresponding to a target image captured by the target moving body 40 The self-position / posture for acquiring the CV value of the target image by transplanting the CV value added to the read CV video map to the feature point of the corresponding target image by the video map / target image comparison device 50 An orientation device 60 is provided, and the self-position / posture orientation device 60 is added to a predetermined feature amount included in the CV video map as a CV value to be transplanted to a corresponding feature point of the target image. It is constituted to select a CV value. [Selection] Figure 14

Description

本発明は、例えば自動車などの各種車両や航空機，船舶等の移動体の自動運転、ロボット等の自動走行などにおいて、移動する車両等の移動体が自らの位置と姿勢をリアルタイムに標定するための自己位置姿勢標定装置に関する。 The present invention is for, for example, automatic operation of various vehicles such as automobiles, mobile objects such as airplanes and ships, and automatic traveling of robots and the like for a mobile object such as a moving vehicle to determine its position and posture in real time. The present invention relates to a self-position posture locating device.

一般に、自動車や航空機，船舶等の移動体の自動運転においては、移動する車両等が、自らの位置と姿勢を把握・標定するための自己位置姿勢標定技術が重要となる。
ここで、このような移動体の自己位置姿勢標定にはいくつかの方法がある。
具体的には、移動車両等の自己位置姿勢標定装置としては、原理的には航空機などに利用される高価で高精度なＩＭＵ／ＧＹＲＯで取得した６変数（位置座標（Ｘ，Ｙ，Ｚ）とそれぞれの座標軸の回転角（Φｘ，Φｙ，Φｚ）の六個の自由度のベクトル（変数））データを用いることで可能である。しかしながら、実際問題として、自動運転に耐えうる精度を出すには、装置・設備等が非常に高額となり、実用的ではない。
また、ＧＮＳＳ（ＧＰＳ）は、一般に自己位置標定として普及した装置であるが、座標のみの３変数であるから、姿勢を含む６変数を取得することはできない。 In general, in automatic operation of a moving body such as an automobile, an aircraft, and a ship, a self-position / posture positioning technique for a moving vehicle or the like to grasp and position its own position and posture is important.
Here, there are several methods for the self-position / posture determination of the moving body.
Specifically, as a self-position / posture locating apparatus for a moving vehicle or the like, in principle, six variables (position coordinates (X, Y, Z)) acquired by an expensive and highly accurate IMU / GYRO used for an aircraft or the like. And six degrees of freedom vector (variables) data of the rotation angles (Φx, Φy, Φz) of the respective coordinate axes. However, as a matter of fact, in order to achieve accuracy that can withstand automatic driving, the equipment and facilities are very expensive and are not practical.
In addition, GNSS (GPS) is a device that is generally spread as self-positioning, but since it is a three-variable only with coordinates, it cannot acquire six variables including posture.

近年、ＬＩＤＡＲ方式（Light Detection and Ranging，Laser Imaging Detection and Ranging）と呼ばれる自己位置標定方法が主流となっている。これはレーザーパルスをスキャンして三次元空間からの散乱光を発射位置で受光し、時間差から距離を計測することで、点群を作成して、三次元空間を三次元座標を持った点群で密に作り上げる技術である。
このような点群を用いる自己位置姿勢標定技術としては、例えば、車両等の自動運転においてレーザー点群を三次元地図として、車載のレーザー装置からのレーザースキャンデータと比較して、自己位置姿勢標定を実現することが開示されている（特許文献１）。 In recent years, a self-localization method called a LIDAR method (Light Detection and Ranging, Laser Imaging Detection and Ranging) has become mainstream. This is a point cloud created by scanning a laser pulse, receiving scattered light from the three-dimensional space at the launch position, and measuring the distance from the time difference to create a point cloud and the three-dimensional space with the three-dimensional coordinates. It is a technology that is made up closely.
As a self-position / posture determination technique using such a point cloud, for example, in automatic driving of a vehicle or the like, the laser point cloud is made into a three-dimensional map and compared with laser scan data from an in-vehicle laser device, Is disclosed (Patent Document 1).

また、最近では、このような点群を処理して、自己位置姿勢標定と環境地図作成を同時に行う技術が提案されている。これはＳＬＡＭ（Simultaneous Localization And Mapping）と呼ばれる技術で、近年普及している。
このＳＬＡＭから発展した技術として、車載カメラの映像からレーザー点群と同じように点群を作り出し、画像の全体近くを三次元点として表示するＶ−ＳＬＡＭ（VisualＳＬＡＭ）がある。Ｖ−ＳＬＡＭでは、車載カメラからの映像を直接加工して生成した点群から、自己位置姿勢を標定することが試行されている。 Recently, a technique has been proposed in which such a point cloud is processed to simultaneously perform self-position / posture determination and environmental map creation. This is a technology called SLAM (Simultaneous Localization And Mapping), which has become popular in recent years.
As a technology developed from this SLAM, there is V-SLAM (Visual SLAM) in which a point cloud is created in the same manner as a laser point cloud from an on-vehicle camera image, and the entire image is displayed as a three-dimensional point. In V-SLAM, an attempt has been made to determine the self-position / posture from a point cloud generated by directly processing an image from a vehicle-mounted camera.

特開２０１４−０８９６９１号公報JP 2014-086991 A

しかしながら、上記のような従来の自己位置姿勢標定の方法では、膨大な費用と莫大なデータ量が発生するという問題があった。
すなわち、特許文献１で提案されているようなレーザー方式、すなわちＬＩＤＡＲ方式によって三次元地図を生成すると、莫大な費用がかかる上に、さらに三次元地図は環境の変化等に応じて更新しなければならないため、その度に莫大な費用がかかることになる。
また、ＬＩＤＡＲ方式の最大の欠点は、三次元点群を作成し、データ管理するのに莫大な手間と費用がかかることである。さらに更新にも同様な手間と費用がかかり、実用的ではなかった。
このように、従来の自己位置姿勢標定の方法では、膨大な量の三次元点のデータを管理することになり、扱うデータ量が莫大なものとなり、実用的ではなかった。 However, the conventional self-position / posture positioning method as described above has a problem that an enormous cost and an enormous amount of data are generated.
That is, if a 3D map is generated by the laser method as proposed in Patent Document 1, that is, the LIDAR method, it is very expensive, and the 3D map must be updated according to environmental changes. Because it does not become, it will be very expensive each time.
The biggest drawback of the LIDAR method is that it takes enormous effort and cost to create a three-dimensional point cloud and manage the data. Furthermore, renewal required similar efforts and costs, and was not practical.
Thus, in the conventional self-position / posture positioning method, a large amount of three-dimensional point data is managed, and the amount of data to be handled is enormous, which is not practical.

さらに、従来の自己位置姿勢標定技術には、精度の点でも問題があった。
すなわち、特許文献１に提案されているような方法により三次元地図が生成できたとしても、それを参照して、自動運転車両に取り付けた高額の装置等によって、自己位置を自動演算で求めなければならない。このような自己位置の演算技術は、未だその方式は模索の段階であり、演算のための装置が高額である割には、精度と安定度に欠けるという状況であった。 Furthermore, the conventional self-position / posture positioning technique has a problem in terms of accuracy.
That is, even if a three-dimensional map can be generated by the method proposed in Patent Document 1, the self-position must be obtained by automatic calculation with an expensive device or the like attached to the autonomous driving vehicle with reference to it. I must. Such a self-position calculation technique is still in the search stage, and the accuracy and stability are lacking for a high-priced apparatus for calculation.

このように、従来提案されている自己位置姿勢標定の技術では、三次元地図の作成に膨大な費用がかかり、また、自己位置を求めるための装置等にも費用がかかり、さらに、そのためのデータも莫大な量となってしまうという問題があった。
このため、自動運転を普及させるには、より簡便に、安価に、正確な自己位置姿勢標定を実現する必要があった。
しかしながら、現在まで、このような従来の自己位置姿勢標定が有する課題を有効に解決し得る技術や提案はなされていなかった。 In this way, with the conventionally proposed technology for self-position orientation, a huge amount of money is required for creating a three-dimensional map, and a device for obtaining the self-position is also expensive. There was also a problem that it would be a huge amount.
For this reason, in order to spread automatic driving, it was necessary to realize accurate self-position / posture determination more easily and inexpensively.
However, until now, there has been no technique or proposal that can effectively solve the problems of such a conventional self-position orientation determination.

本願発明者は、このような自己位置姿勢標定技術が有する課題を解決し得る発明として、鋭意研究の結果、予め撮影した基準映像に基づいて、基準映像のカメラ位置と姿勢角を示すＣＶ値（位置と姿勢の６変数）を高精度に求めたＣＶ映像地図を生成し、その基準となるＣＶ映像地図の三次元座標を、自己位置姿勢標定の対象となる車両等から撮影した目的画像中に移植・移転させることで、車両等の三次元位置座標を簡易かつ低コストで、高速かつ高精度に求め得ることに想到した。 The inventor of the present application, as an invention that can solve the problem of such a self-position posture locating technique, has obtained a CV value indicating a camera position and a posture angle of a reference image based on a reference image taken in advance as a result of earnest research. A CV video map in which the six variables (position and orientation) are determined with high accuracy is generated, and the three-dimensional coordinates of the CV video map serving as the reference are generated in a target image taken from a vehicle or the like that is the target of self-position orientation. By transplanting / transferring, it was conceived that the three-dimensional position coordinates of a vehicle or the like can be obtained easily and at low cost with high speed and high accuracy.

また、基準映像地図であるＣＶ映像地図と、目的画像との比較で、自己位置姿勢標定に必要な高精度のＣＶ値（６変数）を目的画像のフレーム単位で取得できるだけでは、画像を利用する場合、ＣＶ値は画像のフレーム単位以上のサンプル密度では取得できない。これは画像を使う場合の避けられない原理的な課題である。
そこで本願発明者は、この課題をも解決するために、機械センサーで取得可能な６変数を用いて、画像のフレーム間を内挿して、時間的に連続するＣＶ値を取得し得ることに想到した。しかも、機械センサー自体は、安価な低精度の機械センサーであってもそれが可能であるという優れた特徴点を見出した。 In addition, by comparing the target image with the CV video map that is the reference video map, the image is used only if the high-accuracy CV values (six variables) necessary for the self-position / posture determination can be acquired in units of frames of the target image. In this case, the CV value cannot be obtained at a sample density equal to or higher than the frame unit of the image. This is an inevitable principle problem when using images.
Therefore, in order to solve this problem, the inventor of the present application is able to acquire temporally continuous CV values by interpolating between image frames using six variables that can be acquired by a mechanical sensor. did. Moreover, the present inventors have found that the mechanical sensor itself can be an inexpensive low-accuracy mechanical sensor.

すなわち、本発明は、以上のような従来の技術が有する問題を解決するために提案されたものであり、各種車両や航空機，船舶等の自動運転、ロボット等の自動走行などにおいて、移動する車両等が、自らの位置と姿勢をリアルタイムに標定するための自己位置姿勢標定を、簡易かつ低コストで、高速かつ高精度に求めることができるように、ＣＶ映像地図を基準映像地図として用いた自己位置姿勢標定装置の提供を目的とする。 That is, the present invention has been proposed in order to solve the problems of the conventional techniques as described above, and is a vehicle that moves in automatic driving of various vehicles, aircraft, ships, etc., automatic driving of robots, etc. Self-position and orientation for locating its own position and orientation in real time can be obtained simply and at low cost, with high speed and high accuracy, using a CV video map as a reference video map. The purpose is to provide a position and orientation locator.

上記目的を達成するため、本発明のＣＶ映像地図を基準映像地図として用いた自己位置姿勢標定装置は、所定の映像取得手段で撮影された基準映像に基づいて、当該基準映像のカメラ位置と姿勢の三次元座標値を示すＣＶ（カメラベクトル）値を求めるＣＶ演算を行い、前記基準映像に前記ＣＶ値を付加したＣＶ映像地図を生成するＣＶ映像地図作成手段と、前記ＣＶ映像地図を記憶するＣＶ映像地図データベースと、前記ＣＶ映像地図データベースに記憶されたＣＶ映像地図を基準画像とし、目的移動体に備えられた所定の画像取得手段で撮影された目的画像を前記ＣＶ映像地図と比較して、当該目的画像とＣＶ映像地図の同一箇所を示す複数の特徴点を自動的に対応させることにより、当該目的画像のＣＶ値を取得する自己位置姿勢標定手段と、を備える構成としてある。 In order to achieve the above object, a self-position / posture locating apparatus using a CV video map of the present invention as a reference video map is based on a reference video taken by a predetermined video acquisition means, and the camera position and orientation of the reference video CV video map creation means for generating a CV video map in which the CV value is added to the reference video by performing CV calculation for obtaining a CV (camera vector) value indicating a three-dimensional coordinate value of the CV video map, and storing the CV video map A CV video map database and a CV video map stored in the CV video map database are used as reference images, and a target image taken by a predetermined image acquisition unit provided in a target moving body is compared with the CV video map. The self-position / posture determination for acquiring the CV value of the target image by automatically associating the target image with a plurality of feature points indicating the same portion of the CV video map It is constituted comprising a stage, a.

本発明の基準映像地図を用いた自己位置姿勢標定装置によれば、予め用意した基準映像について、カメラ位置と姿勢の三次元座標値を示すＣＶ（カメラベクトル）値を求めるＣＶ演算を行い、前記基準映像に前記ＣＶ値を付加したＣＶ映像地図を生成し、このＣＶ映像地図に基づいて、対象となる目的画像に対して三次元座標を付加・移植することにより、目的画像の自己位置姿勢標定を、高速かつ高精度に行うことができる。
これにより、車両等の自動運転，自動走行等に必要となる、移動体の位置と姿勢をリアルタイムに標定するための自己位置姿勢標定を、簡易かつ低コストで、高速かつ高精度に求めることが可能となる。 According to the self-position / posture locating apparatus using the reference image map of the present invention, the CV calculation for obtaining the CV (camera vector) value indicating the three-dimensional coordinate values of the camera position and the posture is performed on the reference image prepared in advance. A CV video map in which the CV value is added to a reference video is generated, and based on the CV video map, three-dimensional coordinates are added to and transplanted to a target target image, thereby self-position / posture determination of the target image Can be performed at high speed and with high accuracy.
As a result, it is possible to obtain a self-position / posture determination that is necessary for automatic driving, automatic traveling, etc. of a vehicle, etc., in real time to determine the position and posture of the moving body in a simple, low-cost, high-speed and high-accuracy manner. It becomes possible.

本発明の基準映像地図を用いた自己位置姿勢標定装置において、基準映像のＣＶ演算を行うＣＶ演算手段（ＣＶ映像地図作成装置）の一実施形態の基本構成を示すブロック図である。1 is a block diagram showing a basic configuration of an embodiment of CV calculation means (CV video map creation device) for performing CV calculation of a reference video in a self-position / posture locating device using a reference video map of the present invention. FIG. 図１に示すＣＶ演算手段で使用する全周ビデオ映像を撮影する手段を示す概略図であり、屋根部に全周カメラを搭載した車輌の斜視図である。It is the schematic which shows the means which image | photographs the all-around video image | video used by the CV calculating means shown in FIG. 1, and is a perspective view of the vehicle which mounts the all-around camera on the roof part. 図１に示すＣＶ演算手段で使用する全周ビデオ映像を撮影する手段を示す概略図であり、（ａ）は屋根部に全周カメラを搭載した車輌の正面図、（ｂ）は同じく平面図である。It is the schematic which shows the means which image | photographs the all-around video image | video used by the CV calculating means shown in FIG. 1, (a) is a front view of the vehicle which mounts an all-around camera in a roof part, (b) is a top view similarly. It is. 全周カメラで撮影される映像から得られる変換画像を示す説明図であり、（ａ）は球面画像が貼り付けられる仮想球面を、（ｂ）は仮想球面に貼り付けられた球面画像の一例を、（ｃ）は（ｂ）に示した球面画像をメルカトール図法に従って平面展開した画像を示している。It is explanatory drawing which shows the conversion image obtained from the image | video image | photographed with a omnidirectional camera, (a) is a virtual spherical surface to which a spherical image is affixed, (b) is an example of a spherical image affixed to a virtual spherical surface , (C) shows an image obtained by developing the spherical image shown in (b) on a plane according to the Mercator projection. 本発明の一実施形態に係るＣＶ演算手段おける具体的なカメラベクトルの検出方法を示す説明図である。It is explanatory drawing which shows the detection method of the specific camera vector in the CV calculating means based on one Embodiment of this invention. 本発明の一実施形態に係るＣＶ演算手段における具体的なカメラベクトルの検出方法を示す説明図である。It is explanatory drawing which shows the specific detection method of a camera vector in the CV calculating means which concerns on one Embodiment of this invention. 本発明の一実施形態に係るＣＶ演算手段における具体的なカメラベクトルの検出方法を示す説明図である。It is explanatory drawing which shows the specific detection method of a camera vector in the CV calculating means which concerns on one Embodiment of this invention. 本発明の一実施形態に係るＣＶ演算手段によるカメラベクトルの検出方法における望ましい特徴点の指定態様を示す説明図である。It is explanatory drawing which shows the designation | designated aspect of the desirable feature point in the detection method of the camera vector by the CV calculating means which concerns on one Embodiment of this invention. 本発明の一実施形態に係るＣＶ演算手段により得られる特徴点の三次元座標とカメラベクトルの例を示すグラフである。It is a graph which shows the example of the three-dimensional coordinate of the feature point obtained by the CV calculating means which concerns on one Embodiment of this invention, and a camera vector. 本発明の一実施形態に係るＣＶ演算手段により得られる特徴点の三次元座標とカメラベクトルの例を示すグラフである。It is a graph which shows the example of the three-dimensional coordinate of the feature point obtained by the CV calculating means which concerns on one Embodiment of this invention, and a camera vector. 本発明の一実施形態に係るＣＶデータ演算手段により得られる特徴点の三次元座標とカメラベクトルの例を示すグラフである。It is a graph which shows the example of the three-dimensional coordinate and camera vector of the feature point obtained by the CV data calculating means which concerns on one Embodiment of this invention. 本発明の一実施形態に係るＣＶ演算手段において、カメラから特徴点の距離に応じて複数の特徴点を設定し、それを隣接するフレームに亘って追跡し、複数の演算を繰り返し行う場合を示す説明図である。In the CV calculation means according to an embodiment of the present invention, a case is shown in which a plurality of feature points are set according to the distance of the feature points from the camera, tracked over adjacent frames, and a plurality of calculations are repeated. It is explanatory drawing. 本発明の一実施形態に係るＣＶデータ演算手段で求められたカメラベクトルの軌跡をビデオ映像中に表示した場合の図である。It is a figure at the time of displaying the locus | trajectory of the camera vector calculated | required by the CV data calculating means which concerns on one Embodiment of this invention in a video image | video. 本発明の一実施形態に係る基準映像地図を用いた自己位置姿勢標定装置の基本構成を示すブロック図である。It is a block diagram which shows the basic composition of the self-position attitude | position determination apparatus using the reference | standard video map which concerns on one Embodiment of this invention. 図１４に示す基準映像地図を用いた自己位置姿勢標定装置におけるＣＶ値の移転処理動作の詳細を示すブロック図である。It is a block diagram which shows the detail of the transfer process operation | movement of CV value in the self-position attitude | position determination apparatus using the reference | standard video map shown in FIG. 図１５に示す基準映像地図を用いた自己位置姿勢標定装置におけるＣＶ値の移転処理動作の具体例を模式的に示す説明図である。FIG. 16 is an explanatory diagram schematically showing a specific example of a CV value transfer processing operation in the self-position posture locating device using the reference video map shown in FIG. 15. 本発明の一実施形態に係る基準映像地図を用いた自己位置姿勢標定装置により得られる目的画像のＣＶ値を機械センサーで得られる６変数により高精度化する場合の処理を模式的に示した説明図であり、（ａ）は目的画像を構成する複数フレームの全体を、（ｂ）は（ａ）に示す複数フレームの一部を拡大して示したものである。Description schematically showing processing when the CV value of the target image obtained by the self-position / posture locating apparatus using the reference image map according to the embodiment of the present invention is improved with six variables obtained by the machine sensor. (A) is the whole of a plurality of frames constituting the target image, and (b) is an enlarged view of a part of the plurality of frames shown in (a). 本発明の一実施形態に係る基準映像地図を用いた自己位置姿勢標定装置による移動体の自動運転システムのシステム構成を示す機能ブロック図である。It is a functional block diagram which shows the system configuration | structure of the automatic driving | running | working system of the mobile body by the self-position posture orientation apparatus using the reference | standard video map which concerns on one Embodiment of this invention.

以下、本発明に係る基準映像地図を用いた自己位置姿勢標定装置の好ましい実施形態について、図面を参照しつつ説明する。
ここで、以下に示す本発明の基準映像地図を用いた自己位置姿勢標定装置は、プログラム（ソフトウェア）の命令によりコンピュータで実行される処理，手段，機能によって実現される。プログラムは、コンピュータの各構成要素に指令を送り、以下に示すような所定の処理や機能、例えば、映像中の基準となる特徴点（基準点）やその他の特徴点の自動抽出，抽出した基準点の自動追跡，基準点の三次元座標の算出，ＣＶ（カメラベクトル）値の演算，基準映像と目的画像の対応基準点の検出，基準映像・目的画像間のＣＶ値の移植・統合，機械センサーで得られた６変数によるフレーム間の内挿入等を行わせる。このように、本発明における各処理や手段は、プログラムとコンピュータとが協働した具体的手段によって実現される。 Hereinafter, a preferred embodiment of a self-position / posture locating apparatus using a reference image map according to the present invention will be described with reference to the drawings.
Here, the self-position / posture locating apparatus using the reference video map of the present invention described below is realized by processing, means, and functions executed by a computer in accordance with instructions of a program (software). The program sends commands to each component of the computer, and the following predetermined processing and functions, such as automatic extraction of feature points (reference points) and other feature points that serve as the reference in the video, and extracted criteria Automatic tracking of points, calculation of 3D coordinates of reference points, calculation of CV (camera vector) values, detection of reference points corresponding to reference images and target images, transplantation / integration of CV values between reference images and target images, machine Insertion between frames by 6 variables obtained by the sensor is performed. Thus, each process and means in the present invention are realized by specific means in which the program and the computer cooperate.

なお、プログラムの全部又は一部は、例えば、磁気ディスク，光ディスク，半導体メモリ，その他任意のコンピュータで読取り可能な記録媒体により提供され、記録媒体から読み出されたプログラムがコンピュータにインストールされて実行される。
また、プログラムは、記録媒体を介さず、通信回線を通じて直接にコンピュータにロードし実行することもできる。 Note that all or part of the program is provided by, for example, a magnetic disk, optical disk, semiconductor memory, or any other computer-readable recording medium, and the program read from the recording medium is installed in the computer and executed. The
The program can also be loaded and executed directly on a computer through a communication line without using a recording medium.

［ＣＶ映像地図］
以下に示す本発明の一実施形態に係る基準映像地図を用いた自己位置姿勢標定装置は、例えば自動車などの各種車両や航空機，船舶等の移動体の自動運転、ロボット等の自動走行などにおいて、移動する車両等の移動体が自らの位置と姿勢をリアルタイムに標定するための手段である。
具体的には、本実施形態に係る自己位置姿勢標定装置では、自己位置姿勢標定を実現するために、三次元地図となるＣＶ（カメラベクトル）映像地図を用いている。 [CV video map]
The self-position / posture locating device using the reference image map according to one embodiment of the present invention shown below is, for example, in automatic driving of various vehicles such as automobiles, moving bodies such as aircraft and ships, automatic driving of robots, etc. This is a means for a moving body such as a moving vehicle to determine its position and posture in real time.
Specifically, in the self-position / posture locating apparatus according to the present embodiment, a CV (camera vector) video map serving as a three-dimensional map is used to realize self-position / posture orientation.

一般に、移動体の自動走行、例えば車両の自動走行やロボットの自動移動走行の現状としては、大きく分類して、高精度な三次元（３Ｄ）地図を必要とせず、自ら周囲の環境を判断して走行する自律走行方式と、移動体の案内等のための高精度な三次元地図を必要とする三次元地図案内方式の二種類の方式が存在する。本発明は後者の三次元地図を案内として利用する方式を採用している。
そして、本発明では、各種車両や航空機等の自動運転、ロボット等の自動走行などにおいて、移動する車両等が、自ら取り込んだ画像や映像と、すでに用意してある三次元地図を参照して、その基準となる三次元地図と、移動対象（自動走行する車両等）に取り付けたカメラから取り込んだ画像（目的画像）と、さらに機械センサーで取得される６変数（三次元位置座標と回転座標の計６変数）を取得して、それらを自動的に比較し、補正して、自らの位置と姿勢をリアルタイムに標定する、自己位置姿勢標定を実現するものである。 In general, the current state of automatic traveling of a moving body, for example, automatic traveling of a vehicle and automatic traveling of a robot, is classified broadly and does not require a highly accurate three-dimensional (3D) map, and determines the surrounding environment by itself. There are two types of systems: an autonomous traveling system that travels in a moving manner, and a 3D map guiding system that requires a highly accurate 3D map for guiding a moving object. The present invention employs a method of using the latter three-dimensional map as a guide.
And in the present invention, in the automatic driving of various vehicles and aircraft, etc., in the automatic driving of the robot, etc., the moving vehicle, etc. refers to the image and video captured by itself and the already prepared three-dimensional map, The reference 3D map, the image (target image) taken from the camera attached to the moving object (automatically traveling vehicle, etc.), and 6 variables (3D position coordinates and rotation coordinates) acquired by the machine sensor A total of 6 variables) is acquired, and these are automatically compared and corrected to realize self-position / posture orientation in which the position and orientation of the subject are located in real time.

まず、自己位置姿勢標定の基準となる三次元地図であるＣＶ映像について説明する。
移動体の目的の走行以前に、基準となるＣＶ映像地図の作製用の車両等に撮影カメラを設置し、動画映像、又は連続する静止画を取得し、その画像の中に特徴点を抽出するなどして、数学的演算により、全フレームのカメラ位置と姿勢を演算で求める。
具体的にはカメラ位置と姿勢を６変数、具体的には、カメラの位置座標（Ｘ，Ｙ，Ｚ）とそれぞれの座標軸の回転角（Φｘ，Φｙ，Φｚ）の六個の自由度のベクトル（カメラベクトル：ＣＶ）で表し、それを映像の各フレームに一対一に対応させることで、ＣＶ映像を生成することができる（後述する図１〜１３参照）。
このＣＶ映像を基準として用いるものが、自動走行案内用のＣＶ映像地図である。 First, a CV image that is a three-dimensional map serving as a reference for self-position orientation determination will be described.
Prior to the target travel of the moving body, a shooting camera is installed in a vehicle for creating a reference CV video map, and a moving image or continuous still image is acquired, and feature points are extracted from the image. For example, the camera positions and orientations of all frames are calculated by mathematical calculation.
Specifically, the camera position and orientation are six variables, specifically, a vector of six degrees of freedom of camera position coordinates (X, Y, Z) and rotation angles (Φx, Φy, Φz) of the respective coordinate axes. A CV video can be generated by representing (camera vector: CV) one-to-one with each frame of the video (see FIGS. 1 to 13 described later).
What uses this CV video as a reference is a CV video map for automatic driving guidance.

ここで、目的となるカメラの位置と姿勢を示す６変数とは、座標［Ｘ，Ｙ，Ｚ］と姿勢［Φｘ，Φｙ，Φｚ］の計６種類の変数である。
上述したＶ−ＳＬＡＭも、レーザー点群から生まれた技術のため、三次元点群を作り、それをデータとして持つことになる。これは画像を利用する点で本発明と一見似ているように見えるが、画像内の全域に巨大な３次元点群を持つか、持たないかの重要な違いがあり、巨大な点群を持たない本発明とは大きく異なる。
本発明は、直接には三次元点群データを持たず、すべての三次元情報をカメラ位置と姿勢に集約することで、データを極端に軽くし、一手間かけることで、いつでも任意の点の三次元座標を取得できるようにしたものである。こうすることで、データ量を極端に軽くし、演算処理も効率化できるようになる。 Here, the six variables indicating the position and orientation of the target camera are six types of variables including coordinates [X, Y, Z] and orientation [Φx, Φy, Φz].
The above-described V-SLAM is also a technology born from a laser point cloud, so a three-dimensional point cloud is created and held as data. This seems to be similar to the present invention in that it uses an image, but there is an important difference between having or not having a huge 3D point cloud in the entire area of the image. This is very different from the present invention which does not have.
The present invention does not have 3D point cloud data directly, but aggregates all 3D information into the camera position and orientation, making the data extremely light and laborious, so that any point can be obtained at any time. 3D coordinates can be acquired. By doing so, the amount of data can be extremely reduced and the calculation process can be made more efficient.

すなわち、本実施形態に係る自己位置姿勢標定装置は、目的移動体の位置と姿勢を示す６変数を取得するものである。この６変数の取得とは、上述のとおり、三次元位置座標を示す［Ｘ，Ｙ，Ｚ］と姿勢を示す［Φｘ，Φｙ，Φｚ］の６個の変数を決定することである。
具体的には、まず、自動走行を目的とする移動する物体に取り付けた、安価な機械センサー、例えばＩＭＵ／ＧＹＲＯ／ＧＮＳＳ（ＧＰＳ）などから取得できる低精度なデータに基づいてＣＶ値を取得しておくことができる。あるいは、目的に応じて、ＣＶ値６変数のうちの目的の変数だけを取得することも可能である。
ここで、機械センサーを安価で低精度としたのは、高価で高精度では、上述した従来技術と同様に現実的ではないからである。
ＩＭＵ／ＧＹＲＯ／ＧＮＳＳ（ＧＰＳ）などから取得したＣＶ値は、精度は悪いが時間的に連続で出力できることが特徴であり、この点において、画像のフレーム単位で取得されるＣＶ値と比較して優れた長所である。 That is, the self-position / posture locating apparatus according to the present embodiment acquires six variables indicating the position and posture of the target moving body. The acquisition of the six variables is to determine six variables [X, Y, Z] indicating the three-dimensional position coordinates and [Φx, Φy, Φz] indicating the posture as described above.
Specifically, first, CV values are acquired based on low-accuracy data that can be acquired from inexpensive mechanical sensors such as IMU / GYRO / GNSS (GPS) attached to a moving object for the purpose of automatic driving. I can keep it. Or it is also possible to acquire only the target variable of the CV value 6 variables according to the purpose.
Here, the reason why the mechanical sensor is made inexpensive and low-accuracy is that it is expensive and high-accuracy is not practical as in the above-described prior art.
CV values acquired from IMU / GYRO / GNSS (GPS), etc. are characterized by low accuracy but can be output continuously in time. In this respect, compared with CV values acquired in units of image frames It is an excellent advantage.

そして、本実施形態の自己位置姿勢標定装置は、自動走行を目的とする移動する物体に取り付けたカメラからの映像又は連続する画像（目的画像）と、既に用意されているＣＶ映像地図を三次元地図として、両者を比較することで、自己位置姿勢標定する装置であり、これをＣＶ映像参照型自己位置姿勢標定装置と呼ぶ。
ＣＶ映像地図は、通常は高精度のＧＮＳＳ（ＧＰＳ）により、絶対座標を付与することができる。ここでのＧＮＳＳは、上述した低精度のＧＮＳＳとは異なる別のＧＮＳＳであり、高価で高精度のＧＮＳＳを想定している。
さらに、ＩＭＵ／ＧＹＲＯ／ＧＰＳから取得されるＣＶ値は、フレーム単位（又はその整数倍）で取得されたＣＶ値（既に用意されているＣＶ映像地図を目的画像と比較演算することで得られるＣＶ値）は、時間的に不連続な部分を埋めるために、機械センサーからのＣＶ値を、不連続期間の両端のＣＶ値に合致させるように比例配分するなどして内挿することができる。機械センサーによるＣＶ値は、極短時間（数秒程度）では高精度であるが、長時間では誤差が累積されるため、実用的ではないという性質があるからである。 Then, the self-position / posture locating apparatus of the present embodiment three-dimensionally displays a video or continuous image (target image) from a camera attached to a moving object for the purpose of automatic traveling and a CV video map already prepared. It is a device for self-position / posture determination by comparing the two as a map, and this is called a CV video reference type self-position / posture determination device.
CV video maps can be given absolute coordinates, usually with high precision GNSS (GPS). The GNSS here is another GNSS different from the above-described low-accuracy GNSS, and an expensive and high-accuracy GNSS is assumed.
Furthermore, the CV value acquired from the IMU / GYRO / GPS is the CV value acquired in frame units (or an integer multiple thereof) (CV obtained by comparing the already prepared CV video map with the target image) The value) can be interpolated, for example, by proportionally allocating the CV values from the mechanical sensor to match the CV values at both ends of the discontinuity period, in order to fill the discontinuous portions in time. This is because the CV value obtained by the mechanical sensor is highly accurate in an extremely short time (about several seconds), but has a property that it is not practical because errors are accumulated in a long time.

また、このような本実施形態に係る自己位置姿勢標定装置は、同一場所付近の旧画像と新画像を照らし合わせることで、新旧それぞれの画像のカメラ位置と姿勢を三次元的に対応づける技術から成り立っているので、新旧画像の更新装置としても利用することができる。
つまり、本実施形態に係る自己位置姿勢標定装置を用いることにより、車両等の移動体を走行・移動させることで、画像更新を行いながらの自動走行が可能となる。
さらには、画像更新のために自動運転車両等を走行させれば良いことになる。 In addition, such a self-position posture locating device according to this embodiment is based on a technology that three-dimensionally associates the camera position and posture of each of the new and old images by comparing the old image and the new image near the same location. Therefore, it can also be used as an update device for old and new images.
That is, by using the self-position / posture locating device according to the present embodiment, automatic traveling while updating an image is possible by moving / moving a moving body such as a vehicle.
Furthermore, an autonomous driving vehicle or the like may be run for image update.

［ＣＶ映像地図の特徴］
まず、本発明に係るＣＶ映像地図の特徴について説明する。
後述するように（図１〜１３参照）、ＣＶ映像地図内の任意の点は、隣接するフレーム間で、対応点処理することで、いつでも三次元座標を取得できる状態になっている。この方式は、画像内の任意の点を三次元化するための一手間（ＣＶ演算処理）を必要とするが、ＣＶ演算処理はミリセカンドの時間で処理できるので、実質的には、ＣＶ映像は映像内のすべての三次元座標を持っているのと同じ意味となる。したがって、ＣＶ映像を自己位置姿勢標定装置のための基準となる三次元地図として用いることができる。 [Characteristics of CV video map]
First, features of the CV video map according to the present invention will be described.
As will be described later (see FIGS. 1 to 13), any point in the CV video map is in a state where three-dimensional coordinates can be acquired at any time by performing corresponding point processing between adjacent frames. This method requires a single effort (CV calculation process) to three-dimensionalize an arbitrary point in the image. However, since the CV calculation process can be processed in milliseconds, the CV video is practically used. Means the same as having all 3D coordinates in the video. Therefore, the CV image can be used as a three-dimensional map serving as a reference for the self-position / posture locator.

さらに、ＣＶ演算に使用した多くの特徴点（数十から数百の特徴点）は、演算の途中で三次元座標を持つことになるが、データとして保存されるときには、この数百の特徴点と、その三次元座標はいったん破棄される。そして、破棄されても、カメラ位置と姿勢の６変数があれば、いつでも映像内の任意の点を三次元化することができる。
この三次元化された特徴点の座標データを破棄する処理によって、保存するデータ量を極端に減少させることが可能となる。この点において、移動体の移動範囲の環境全体の三次元点群を、すべて保存した状態で移動・処理等する必要のある従来の方式（特許文献１参照）と大きく異なる、ＣＶ映像の極めて有利な優れた特徴となる。 Further, many feature points (several tens to hundreds of feature points) used in the CV calculation have three-dimensional coordinates in the middle of the calculation. When saved as data, these hundreds of feature points are used. And the 3D coordinates are discarded once. Even if discarded, if there are six variables of camera position and orientation, any point in the video can be three-dimensionalized at any time.
By the process of discarding the coordinate data of the three-dimensional feature points, the amount of data to be saved can be extremely reduced. In this respect, the CV image is extremely advantageous, which is greatly different from the conventional method (see Patent Document 1) in which the three-dimensional point group of the entire environment of the moving range of the moving object needs to be moved and processed in a state where all of them are stored. This is an excellent feature.

また、例えば後の何らかの目的のために、必要に応じてＣＶ演算の途中で得られた三次元特徴点の一部（数十点から数百点程度）を破棄せず残したとしても、全体のデータ量を大きく増加させることはない。
このように、本発明に係るＣＶ映像を用いることによって、初めて自動運転の三次元地図を実用的なデータ量とすることが可能となる。
しかも、このようなＣＶ映像は、各フレーム内の画像の任意の点は、データとしては未だ三次元座標を持っていないが、数ミリセカンドの一手間の演算をすることで、任意の点の三次元座標を直ちに取得することができ、三次元地図として利用することができるという優れた特徴がある。 Further, for example, even if a part of the three-dimensional feature points (about several tens to several hundreds) obtained during the CV calculation is left without being discarded for some purpose later, Does not significantly increase the amount of data.
As described above, by using the CV video according to the present invention, it is possible for the first time to make a three-dimensional map of automatic driving a practical data amount.
Moreover, in such a CV video, an arbitrary point of an image in each frame does not yet have a three-dimensional coordinate as data, but an arbitrary point can be calculated by performing a few millisecond operations. There is an excellent feature that 3D coordinates can be acquired immediately and can be used as a 3D map.

このように、本発明に係るＣＶ映像を三次元地図として用いることにより、自動走行する車両等に設置されたカメラから得られる目的画像の全点ではなく、目的画像内のごく少数の必要な特徴点のみを取り出して、両者画像を比較することで、目的画像の自己位置姿勢標定を行うことが可能となる。
これによって、従来技術のように、三次元地図の作成とそれに基づく自己位置姿勢標定のためのデータが莫大な量となってしまうという問題を生じさせることなく、簡易かつ迅速に、移動体の自動運転に不可欠な自己位置姿勢標定を実現することができるようになる。
さらに、本発明では、移動体に搭載・設置したカメラから得られた画像がそのまま目的画像となるので、従来技術と比較して、費用を大幅に低下・削減することが可能となり、低コストで高精度な自己位置姿勢標定が可能となるという優れた特徴を有するものである。 As described above, by using the CV image according to the present invention as a three-dimensional map, not all the points of the target image obtained from the camera installed in the vehicle or the like that automatically travels, but only a few necessary features in the target image. By extracting only the points and comparing the two images, it is possible to determine the position and orientation of the target image.
As a result, unlike the prior art, the automatic movement of a moving object can be performed easily and quickly without causing a problem that the amount of data for creating a three-dimensional map and self-position / posture based on it is huge. It becomes possible to realize self-position orientation that is indispensable for driving.
Furthermore, in the present invention, since the image obtained from the camera mounted / installed on the mobile object becomes the target image as it is, the cost can be greatly reduced / reduced as compared with the prior art, and the cost can be reduced. It has an excellent feature that high-accuracy self-position orientation determination is possible.

［用語の定義］
次に、本明細書・特許請求の範囲中において使用する用語・語句についての定義を説明する。
・ＣＶ映像地図：
自己位置姿勢標定装置において参照基準となる三次元地図とは、前もってカメラ映像から取得された映像、又は連続する画像から、当該カメラ位置と姿勢を６変数で求めることで生成されたＣＶ映像を三次元地図として利用するものである。これをＣＶ映像地図という。
したがって、ＣＶ映像地図には、ＣＶ映像、及びＣＶ映像から生成された、あるいは他の方法で生成された三次元ＣＧを含むことができ、さらに、当該ＣＶ映像から画像処理して生成された各種の点・図形等を合体させたものまで含ませることができる。また、特殊な場合として、ＣＶ映像から生成した各種三次元形状のみであっても、元がＣＶ映像であれば、ＣＶ映像地図と呼称することができる。特にこれをＣＶ３Ｄ地図と呼称することができる。
つまり、３Ｄ地図の目的を持つＣＶ映像は、ＣＶ映像地図ということができる。 [Definition of terms]
Next, definitions of terms and phrases used in the present specification and claims will be described.
・ CV video map:
A three-dimensional map serving as a reference standard in the self-position / posture locating device is a CV video generated by obtaining the camera position and posture with six variables from a video acquired in advance from a camera video or continuous images. It is used as an original map. This is called a CV video map.
Therefore, the CV video map can include a CV video, a 3D CG generated from the CV video, or generated by other methods, and various types of images generated by image processing from the CV video. It is possible to include a combination of dots, figures, etc. As a special case, even if only various three-dimensional shapes generated from a CV video are used, if they are originally CV video, they can be referred to as a CV video map. In particular, this can be referred to as a CV3D map.
That is, a CV video having the purpose of a 3D map can be called a CV video map.

ここで、本発明に係るＣＶ映像地図は、上述したＳＬＡＭやＶ−Ｓｌａｍのように、環境の全空間の三次元座標を点群として持たない。ＳＬＡＭにしろＶ−Ｓｌａｍにしろ、目的の３Ｄ地図と、目的の自己位置姿勢標定を行うために必要なだけの精度に合った点群（数百万点〜数億点／ｋｍ）を持つことになる。
これに対して、本発明に係るＣＶ映像は、基本的に点群を持たず、いつでも必要なときに、隣接画像から目的の点の三次元座標をその場で自動演算により求めることができるものである。全空間の三次元座標は、カメラ位置（６変数）から演算で求められる。 Here, the CV video map according to the present invention does not have the three-dimensional coordinates of the entire space as a point group unlike the SLAM and V-Slam described above. Whether it is SLAM or V-Slam, it has a target 3D map and a point cloud (millions to hundreds of millions / km) that matches the accuracy required to perform the target self-orientation become.
On the other hand, the CV video according to the present invention basically has no point cloud, and can obtain the three-dimensional coordinates of a target point from an adjacent image by automatic calculation on the spot when necessary. It is. The three-dimensional coordinates of the entire space can be obtained by calculation from the camera position (six variables).

すなわち、ＣＶ映像地図は、従来のＳＬＡＭやＶ−Ｓｌａｍのように、全環境の三次元座標を持たずに、二次元映像とカメラ位置の６変数を持つことを基本としている。必要な三次元点の座標は、その場で演算により求めることができる。その演算速度は、一点に付き、数ミリセカンド以下である。
一般に、自己位置姿勢標定には、三次元点の座標は、４点以上〜１０点程度で十分であるから、本発明に係るＣＶ演算を用いることで、十分な速度で必要な座標の三次元座標値を求めることができる。そして、このようなＣＶ演算方式は、三次元点群を持たないことから、データが軽く、扱いがし易く、データ伝送についてもＳＬＡＭやＶ−Ｓｌａｍと比較して極端に狭帯域で実現できるようになる。 That is, the CV video map is basically based on having two variables of the two-dimensional video and the camera position without having the three-dimensional coordinates of the entire environment like the conventional SLAM and V-Slam. Necessary three-dimensional point coordinates can be obtained by calculation on the spot. The calculation speed per point is several milliseconds or less.
In general, since the coordinates of three-dimensional points are sufficient for self-position and orientation determination, four to ten points are sufficient. Therefore, by using the CV calculation according to the present invention, the necessary three-dimensional coordinates can be obtained at a sufficient speed. Coordinate values can be obtained. Since such a CV calculation method does not have a three-dimensional point group, the data is light and easy to handle, and data transmission can be realized in an extremely narrow band as compared with SLAM and V-Slam. become.

さらに、ＣＶ映像地図に基づいて得られる目的の車両等の自己位置姿勢標定は６変数であるが、一方で、６変数は機械センサーのＩＭＵ／ＧＹＲＯ／ＧＮＳＳ（ＧＰＳ）によっても取得できる。
そのため、車載されたカメラで得られた画像（目的画像）に、機械センサーであるＩＭＵ／ＧＹＲＯ／ＧＮＳＳ（ＧＰＳ）を組み合わせで得られたものであっても、これをＣＶ映像地図・ＣＶ３Ｄ地図とすることができる。
機械センサー（ＩＭＵ／ＧＹＲＯ／ＧＮＳＳ）は、演算遅れや、時間不連続期間などの極微少時間であれば、低価格のＩＭＵ／ＧＹＲＯ／ＧＮＳＳであっても、目的カメラの補正機能として十分に有効である。機械センサーで取得された６変数は、本質的にＣＶ値で示される６変数と同等である。
したがって、ＩＭＵ／ＧＹＲＯ／ＧＮＳＳ等を取り込んだ地図もＣＶ映像地図に含むことができる。 Furthermore, although the self-position / posture orientation of the target vehicle or the like obtained based on the CV video map is six variables, on the other hand, the six variables can also be acquired by IMU / GYRO / GNSS (GPS) of the mechanical sensor.
Therefore, even if the image (target image) obtained with the camera mounted on the vehicle is obtained by combining IMU / GYRO / GNSS (GPS), which is a mechanical sensor, this is a CV video map / CV3D map. can do.
The mechanical sensor (IMU / GYRO / GNSS) is sufficiently effective as a correction function for the target camera even if it is a low-priced IMU / GYRO / GNSS if it is extremely small time such as computation delay or time discontinuous period It is. The six variables acquired with the mechanical sensor are essentially equivalent to the six variables indicated by the CV value.
Therefore, a map incorporating IMU / GYRO / GNSS or the like can also be included in the CV video map.

・ＣＶ値／ＣＶ映像／ＣＶ画像：
移動する物体、（たとえば車両やロボット等）に積載された機械センサーで得られた６変数、及びカメラから得られた連続する静止画像、又は動画像から、カメラの位置と姿勢を算出した６変数をカメラベクトル（ＣＶ）と呼称し、その値をＣＶ値といい、ＣＶ値を演算で求めることをＣＶ演算といい、そしてそのようなＣＶ値を有する画像をＣＶ映像という。
なお、カメラに一体化された機械センサーから得られた６変数も、本発明に係るＣＶ値とすることができる。
また、連続しない単独の画像についてはＣＶ画像と呼称することがある。あるいは、１フレームについてのみ着目するときには、当該１フレームの画像をＣＶ画像と呼称することがある。つまり、ＣＶ画像はＣＶ映像の特殊な状態（単独画像）である。 CV value / CV video / CV image:
Six variables obtained by calculating the position and posture of the camera from six variables obtained by a mechanical sensor mounted on a moving object (for example, a vehicle or a robot) and a continuous still image or moving image obtained from the camera. Is called a camera vector (CV), its value is called a CV value, obtaining a CV value by calculation is called a CV calculation, and an image having such a CV value is called a CV video.
Note that the six variables obtained from the mechanical sensor integrated with the camera can also be the CV value according to the present invention.
A single image that is not continuous may be referred to as a CV image. Alternatively, when focusing on only one frame, the image of the one frame may be referred to as a CV image. That is, the CV image is a special state (single image) of the CV video.

・ＣＶ機械地図：
ＣＶ３Ｄ地図は、人間ではなく、コンピュータを利用して自動で読み取ることが可能である。この場合には、ＣＶ機械地図と呼称することがある。
また、このようなＣＶ機械地図は、ＣＶ３Ｄ地図からコンピュータに読み取り易く処理することが可能であり、これもＣＶ機械地図と呼称することがある。
さらに、データベース化されたＣＶ機械地図をＣＶ機械地図ＤＢと呼称する。
これらはいずれも、本発明に係るＣＶ映像地図である。
なお、上記のような「ＣＶ映像地図／ＣＶ機械地図」の用語は、基本的に明確な区別はなく、本明細書中においては、以下用途によって語句を使い分ける。・ CV machine map:
The CV3D map can be automatically read using a computer, not a human being. In this case, it may be called a CV machine map.
Further, such a CV machine map can be processed from a CV3D map so that it can be easily read by a computer, and this may also be called a CV machine map.
Further, the CV machine map stored in the database is referred to as CV machine map DB.
These are all CV video maps according to the present invention.
Note that the terms “CV video map / CV machine map” as described above are basically not clearly distinguished, and in the present specification, terms are used properly depending on the purpose.

・目的移動体／目的カメラ／目的画像：
本発明を適用して自己位置姿勢標定を行う対象の移動体、すなわち、自動運転の対象となる移動体を目的移動体という。例えば、自動走行する車両、ロボットなどである。
目的移動体に積載されたカメラを目的カメラ、目的カメラによって取得された画像を目的画像という。・ Target moving body / target camera / target image:
A mobile object to be subjected to self-position and orientation by applying the present invention, that is, a mobile object to be subjected to automatic driving is called a target mobile object. For example, a vehicle or a robot that automatically travels.
A camera loaded on the target moving body is called a target camera, and an image acquired by the target camera is called a target image.

［ＣＶ演算］
次に、上記のような本発明の基準映像地図を用いた自己位置姿勢標定装置で用いられるＣＶ映像地図におけるＣＶ演算の詳細について図１〜図１３を参照しつつ説明する。
ＣＶ演算とはＣＶ値を求めることを意味し、求められた結果をＣＶ値，ＣＶデータと呼ぶ。ＣＶという表記は、カメラベクトル：Camera Vectorの略記であり、カメラベクトル（ＣＶ）とは計測等のために映像を取得するビデオカメラ等のカメラの三次元位置と３軸回転姿勢を示す値である。
ＣＶ演算は、動画像（ビデオ映像）を取得し、その映像内の特徴点を検出し、それを隣接する複数のフレームに追跡し、カメラ位置と特徴点の追跡軌跡とが作る三角形を画像内に数多く生成し、その三角形を解析することで、カメラの三次元位置とカメラの３軸回転姿勢を求めるものである。 [CV calculation]
Next, details of the CV calculation in the CV video map used in the self-position / posture locating apparatus using the reference video map of the present invention as described above will be described with reference to FIGS.
The CV calculation means obtaining a CV value, and the obtained result is referred to as CV value or CV data. The notation CV is an abbreviation for camera vector, and the camera vector (CV) is a value indicating a three-dimensional position and a three-axis rotation posture of a camera such as a video camera that acquires an image for measurement or the like. .
The CV calculation acquires a moving image (video image), detects a feature point in the image, tracks it in a plurality of adjacent frames, and creates a triangle formed by the camera position and the tracking locus of the feature point in the image. The three-dimensional position of the camera and the three-axis rotational posture of the camera are obtained by generating a large number of images and analyzing the triangle.

ＣＶ演算では、ＣＶ値を求める過程で、同時に映像内の特徴点（基準点）についても三次元座標が同時に求まることが重要な特性である。
また、動画像から演算で求められるＣＶ値は、動画像の各フレームに対応して、三次元のカメラ位置と三次元のカメラ姿勢とが同時に求まる。しかも、原理的には一台のカメラで、映像と対応してＣＶ値が求められる特性は、ＣＶ演算でしか実現し得ない、優れた特徴である。
例えば、他の方法による計測手段（ＧＰＳやＩＭＵ等）では、動画像の各フレームと、その三次元的カメラ位置と三次元的カメラ姿勢とを同時に取得するためには画像フレームと計測サンプリング時刻を高精度で、しかも完全に同期しなければならないために、巨額の装置となり、実質的には実現が困難である。 In the CV calculation, it is an important characteristic that, in the process of obtaining the CV value, three-dimensional coordinates are simultaneously obtained for feature points (reference points) in the video at the same time.
Further, the CV value obtained by calculation from the moving image is obtained simultaneously with the three-dimensional camera position and the three-dimensional camera posture corresponding to each frame of the moving image. Moreover, in principle, the characteristic that a CV value is obtained corresponding to an image with a single camera is an excellent feature that can be realized only by CV calculation.
For example, in other measurement means (such as GPS and IMU), in order to simultaneously acquire each frame of a moving image and its three-dimensional camera position and three-dimensional camera posture, an image frame and a measurement sampling time are used. Since it has to be highly accurate and completely synchronized, it becomes a huge device, which is practically difficult to realize.

動画像から演算で求められるＣＶデータは、加工しない段階では相対値であるが、短区間であれば高精度で三次元位置情報と３軸回転の角度情報を取得できる。
また、ＣＶデータは画像から取得するため、取得されたデータは相対値であるが、画像内の任意の対象物との位置関係を計測することができるという他の方法では実現は可能な優れた特性を備える。
また、画像に対応したＣＶ値が求まるので、画像内計測や測量において、画像から直接にカメラ位置とその３軸回転姿勢を求めることができるＣＶ演算は画像内計測や画像内測量に好適となる。
そして、本発明の基準映像地図を用いた自己位置姿勢標定装置は、このＣＶ演算により得られたＣＶ値データに基づいて基準映像と目的映像（比較映像）との座標統合処理（ＣＶ値の移植・移転）を行うものである。 CV data obtained by calculation from a moving image is a relative value when not processed, but if it is a short section, three-dimensional position information and three-axis rotation angle information can be acquired with high accuracy.
In addition, since the CV data is acquired from the image, the acquired data is a relative value, but it can be realized by other methods that can measure the positional relationship with an arbitrary object in the image. With characteristics.
Further, since the CV value corresponding to the image can be obtained, the CV calculation capable of obtaining the camera position and its three-axis rotation posture directly from the image in the in-image measurement or survey is suitable for the in-image measurement or the in-image survey. .
The self-position / posture locating apparatus using the reference video map of the present invention performs coordinate integration processing (CV value transplantation) between the reference video and the target video (comparison video) based on the CV value data obtained by the CV calculation.・ Relocation).

［ＣＶ演算手段］
ＣＶ演算は、後述する本発明の基準映像地図を用いた自己位置姿勢標定装置のＣＶ映像地図作成装置２０（図１４参照）として機能するＣＶ演算手段２０で行われる。
ＣＶ演算手段（ＣＶ映像地図作成装置）２０は、図１に示すように、車載のビデオカメラ等で構成されるＣＶ映像取得装置１０から入力されるビデオ映像について所定のＣＶ演算処理を行うようになっており、具体的には、特徴点抽出部２１と、特徴点対応処理部２２と、カメラベクトル演算部２３と、誤差最小化部２４と、三次元情報追跡部２５と、高精度カメラベクトル演算部２６とを備えている。 [CV calculation means]
The CV calculation is performed by the CV calculation means 20 functioning as a CV video map creation device 20 (see FIG. 14) of the self-position / posture locating device using the reference video map of the present invention described later.
As shown in FIG. 1, the CV calculation means (CV video map creation device) 20 performs a predetermined CV calculation process on the video video input from the CV video acquisition device 10 composed of an in-vehicle video camera or the like. Specifically, the feature point extraction unit 21, the feature point correspondence processing unit 22, the camera vector calculation unit 23, the error minimization unit 24, the three-dimensional information tracking unit 25, and a high-precision camera vector And an arithmetic unit 26.

まず、ＣＶ演算に使用する映像としては、どのような映像でもよいが、画角の限られた映像では視点方向を移動した場合に映像がとぎれてしまうので、全周映像（図２〜４参照）とすることが望ましい。なお、動画映像は連続する静止画と同様であり、静止画と同様に扱うことができる。
また、映像は、一般には予め記録した動画映像を使うことになるが、自動車等の移動体の移動に合わせてリアルタイムに取り込んだ映像を使用することも勿論可能である。 First, any video may be used for the CV calculation. However, in the video with a limited angle of view, the video is interrupted when the viewpoint direction is moved. ) Is desirable. Note that a moving image is similar to a continuous still image and can be handled in the same manner as a still image.
In general, a moving image recorded in advance is used as the video, but it is also possible to use a video captured in real time in accordance with the movement of a moving body such as an automobile.

そこで、本実施形態では、ＣＶ演算に使用する映像として、車輌等の移動体の３６０度の全周囲を撮影した全周映像（図２〜４参照）か、又は全周映像に近い広角映像を用いて、その全周映像を視点方向に平面展開することにより、地図と映像の合成画像を生成・表示するＣＶ映像取得装置１０を備えている（図１参照）。
ここで、全周映像の平面展開とは、全周映像を、通常の画像として遠近法的に表現するものである。ここで、「遠近法」と呼称するのは、全周画像のそのものはメルカトール図法や球面投影図法のように、遠近法とは異なる方法で表示されているので（図４参照）、これを平面展開表示することで、通常の遠近法映像に変換表示できるからである。 Therefore, in this embodiment, as an image used for the CV calculation, an all-around image (see FIGS. 2 to 4) obtained by capturing the entire 360 ° circumference of a moving body such as a vehicle, or a wide-angle image close to the all-around image. And a CV video acquisition device 10 that generates and displays a composite image of a map and a video by flattening the entire circumference video in the viewpoint direction (see FIG. 1).
Here, the planar development of the all-around video is to express the all-around video as a normal image in perspective. Here, the term “perspective method” refers to the fact that the entire perimeter image itself is displayed in a different method from the perspective method, such as the Mercator projection or the spherical projection method (see FIG. 4). This is because it can be converted and displayed as a normal perspective image by unfolding and displaying.

ＣＶ映像取得装置１０において全周映像を生成するには、まず、図２及び図３に示すように、全周ビデオカメラ１１を使用して、ＣＶ値データを取得する目的で、走行車輌等の移動体１１ａに固定された全周ビデオカメラ１１で、移動体１１ａの移動とともに移動体周辺を撮影する。
なお、移動体１１ａには、その位置座標を取得する目的で、例えば、絶対座標を取得するＧＰＳ機器単独やＩＭＵ機器を付加したもの等により構成した位置計測機器等を備えることができる。
また、移動体１１ａに搭載される全周ビデオカメラ１１としては、広範囲映像を撮影，取得するカメラであればどのような構成であってもよく、例えば、広角レンズや魚眼レンズ付きカメラ、移動カメラ、固定カメラ、複数のカメラを固定したカメラ、３６０度周囲に回転可能なカメラ等がある。本実施形態では、図２及び図３に示すように、車輌に複数のカメラが一体的に固定され、移動体１１ａの移動に伴って広範囲映像を撮影する全周ビデオカメラ１１を使用している。 In order to generate the all-round video in the CV video acquisition device 10, first, as shown in FIGS. 2 and 3, for the purpose of acquiring CV value data using the all-round video camera 11, With the all-around video camera 11 fixed to the moving body 11a, the periphery of the moving body is photographed with the movement of the moving body 11a.
Note that the mobile body 11a may be provided with a position measuring device or the like constituted by, for example, a GPS device alone or an IMU device to which absolute coordinates are acquired for the purpose of acquiring the position coordinates.
The all-round video camera 11 mounted on the moving body 11a may have any configuration as long as it captures and acquires a wide range of images. For example, a wide-angle lens, a camera with a fisheye lens, a moving camera, There are a fixed camera, a camera in which a plurality of cameras are fixed, a camera that can rotate around 360 degrees, and the like. In this embodiment, as shown in FIGS. 2 and 3, a plurality of cameras are integrally fixed to the vehicle, and an all-around video camera 11 that captures a wide range image as the moving body 11a moves is used. .

そして、以上のような全周ビデオカメラ１１によれば、図３に示すように、移動体１１ａの天井部に設置されることで、カメラの３６０度全周囲の映像を複数のカメラで同時に撮影することができ、移動体１１ａが移動することで、広範囲映像を動画データとして取得できる。
ここで、全周ビデオカメラ１１は、カメラの全周映像を直接取得できるビデオカメラであるが、カメラの全周囲の半分以上を映像として取得できれば全周映像として使用できる。
また、画角が制限された通常のカメラの場合でも、ＣＶ演算の精度としては低下するが、全周映像の一部分として取り扱うことが可能である。 Then, according to the all-round video camera 11 as described above, as shown in FIG. 3, by being installed on the ceiling of the moving body 11a, images of the entire 360 ° circumference of the camera are simultaneously captured by a plurality of cameras. A wide range image can be acquired as moving image data by moving the mobile body 11a.
Here, the omnidirectional video camera 11 is a video camera that can directly acquire the omnidirectional video of the camera, but can be used as the omnidirectional video if more than half of the entire circumference of the camera can be acquired as video.
Even in the case of a normal camera with a limited angle of view, the accuracy of the CV calculation is reduced, but it can be handled as a part of the entire peripheral video.

なお、全周ビデオカメラ１１で撮影された広範囲映像は、一枚の画像として、撮影時の画角に一致する仮想球面に貼り付けることができる。
仮想球面に貼り付けられた球面画像データは、仮想球面に貼り付けた状態の球面画像（３６０度画像）データとして保存・出力される。仮想球面は、広範囲映像を取得するカメラ部を中心点とした任意の球面状に設定することができる。
図４（ａ）は球面画像が貼り付けられる仮想球面の外観イメージであり、同図（ｂ）は仮想球面に貼り付けられた球面画像の一例である。また、同図（ｃ）は、（ｂ）の球面画像をメルカトール図法に従って平面展開した画像例を示す。 Note that the wide-range video imaged by the all-around video camera 11 can be pasted as a single image on a virtual spherical surface that matches the angle of view at the time of imaging.
The spherical image data pasted on the virtual spherical surface is stored and output as spherical image (360 degree image) data pasted on the virtual spherical surface. The virtual spherical surface can be set to an arbitrary spherical shape centered on the camera unit that acquires a wide range image.
4A is an appearance image of a virtual spherical surface to which a spherical image is pasted, and FIG. 4B is an example of a spherical image pasted to the virtual spherical surface. FIG. 4C shows an example of an image obtained by developing the spherical image of FIG.

そして、以上のように生成・取得された全周ビデオ映像が、ＣＶ演算手段（ＣＶ映像地図作成装置）２０に入力されてＣＶ値データが求められる（図１参照）。
ＣＶ演算手段２０では、まず、特徴点抽出部２１が、ＣＶ映像取得装置１０の全周ビデオカメラ１１で撮影されて一時記録された動画像データの中から、十分な数の特徴点（基準点）を自動抽出する。
特徴点対応処理部２２は、自動抽出された特徴点を、各フレーム間で各フレーム画像内において自動的に追跡することで、その対応関係を自動的に求める。
カメラベクトル演算部２３は、対応関係が求められた特徴点の三次元位置座標から各フレーム画像に対応したカメラベクトルを演算で自動的に求める。
誤差最小化部２４は、複数のカメラ位置の重複演算により、各カメラベクトルの解の分布が最小になるように統計処理し、誤差の最小化処理を施したカメラ位置方向を自動的に決定する。 Then, the all-round video image generated / acquired as described above is input to the CV calculation means (CV image map creation device) 20 to obtain CV value data (see FIG. 1).
In the CV calculation means 20, first, the feature point extraction unit 21 has a sufficient number of feature points (reference points) from the moving image data that has been photographed and temporarily recorded by the all-around video camera 11 of the CV video acquisition device 10. ) Is automatically extracted.
The feature point correspondence processing unit 22 automatically obtains the correspondence relationship by automatically tracking the feature points automatically extracted in each frame image between the frames.
The camera vector calculation unit 23 automatically calculates a camera vector corresponding to each frame image from the three-dimensional position coordinates of the feature points for which the correspondence relationship has been determined.
The error minimizing unit 24 performs statistical processing so that the solution distribution of each camera vector is minimized by overlapping calculation of a plurality of camera positions, and automatically determines the camera position direction subjected to the error minimizing process. .

三次元情報追跡部２５は、カメラベクトル演算部２３で得られたカメラベクトルを概略のカメラベクトルと位置づけ、その後のプロセスで順次画像の一部として得られる三次元情報に基づいて、複数のフレーム画像に含まれる部分的三次元情報を隣接するフレームの画像に沿って自動追跡を行う。ここで、三次元情報（三次元形状）とは、主に特徴点の三次元分布情報であり、すなわち、三次元の点の集まりであり、この三次元の点の集まりが三次元形状を構成する。
高精度カメラベクトル演算部２６は、三次元情報追跡部２５で得られた追跡データに基づいて、カメラベクトル演算部２３で得られるカメラベクトルより、さらに高精度なカメラベクトルを生成，出力する。
そして、以上のようにして得られたカメラベクトルが、後述する基準映像地図を用いた自己位置姿勢標定装置１０に入力され、基準映像と目的画像の座標統合処理（ＣＶ値の移転・統合）に利用されることになる。 The three-dimensional information tracking unit 25 positions the camera vector obtained by the camera vector calculation unit 23 as an approximate camera vector, and based on the three-dimensional information sequentially obtained as part of the image in the subsequent process, a plurality of frame images 3D information is automatically tracked along an image of an adjacent frame. Here, three-dimensional information (three-dimensional shape) is mainly three-dimensional distribution information of feature points, that is, a collection of three-dimensional points, and this collection of three-dimensional points constitutes a three-dimensional shape. To do.
The high-precision camera vector calculation unit 26 generates and outputs a higher-precision camera vector than the camera vector obtained by the camera vector calculation unit 23 based on the tracking data obtained by the three-dimensional information tracking unit 25.
Then, the camera vector obtained as described above is input to the self-position / posture locating apparatus 10 using a reference video map, which will be described later, for the coordinate integration processing (transfer / integration of CV values) of the reference video and the target image. Will be used.

複数の画像（動画又は連続静止画）の特徴点からカメラベクトルを検出するには幾つかの方法があるが、図１に示す本実施形態のＣＶ演算手段２０では、画像内に十分に多くの数の特徴点を自動抽出し、それを自動追跡することで、エピポーラ幾何学により、カメラの三次元ベクトル及び３軸回転ベクトルを求めるようにしてある。
特徴点を充分に多くとることにより、カメラベクトル情報が重複することになり、重複する情報から誤差を最小化させて、より精度の高いカメラベクトルを求めることができる。 There are several methods for detecting a camera vector from feature points of a plurality of images (moving images or continuous still images). In the CV calculation means 20 of the present embodiment shown in FIG. By automatically extracting a number of feature points and automatically tracking them, a three-dimensional vector and a three-axis rotation vector of the camera are obtained by epipolar geometry.
By taking a sufficient number of feature points, camera vector information is duplicated, and an error can be minimized from the duplicated information to obtain a more accurate camera vector.

カメラベクトルとは、カメラの持つ自由度のベクトルである。
一般に、静止した三次元物体は、位置座標（Ｘ，Ｙ，Ｚ）と、それぞれの座標軸の回転角（Φｘ，Φｙ，Φｚ）の六個の自由度を持つ。
従って、カメラベクトルは、カメラの位置座標（Ｘ，Ｙ，Ｚ）とそれぞれの座標軸の回転角（Φｘ，Φｙ，Φｚ）の六個の自由度のベクトル（６変数）をいう。なお、カメラが移動する場合は、自由度に移動方向も入るが、これは上記の六個の自由度（変数）から微分して導き出すことができる。
このように、本実施形態のカメラベクトルの検出とは、カメラは各フレーム毎に六個の自由度の値をとり、各フレーム毎に異なる六個の自由度を決定することである。 A camera vector is a vector of degrees of freedom possessed by a camera.
In general, a stationary three-dimensional object has six degrees of freedom of position coordinates (X, Y, Z) and rotation angles (Φx, Φy, Φz) of the respective coordinate axes.
Therefore, the camera vector is a vector of six degrees of freedom (6 variables) of the camera position coordinates (X, Y, Z) and the rotation angles (Φx, Φy, Φz) of the respective coordinate axes. When the camera moves, the direction of movement also enters the degree of freedom, but this can be derived by differentiation from the above six degrees of freedom (variables).
Thus, the detection of the camera vector in the present embodiment means that the camera takes six degrees of freedom values for each frame and determines six different degrees of freedom for each frame.

以下、ＣＶ演算手段２０における具体的なカメラベクトルの検出方法について、図５以下を参照しつつ説明する。
まず、上述したＣＶ映像取得装置１０の全周ビデオカメラ１１で取得された画像データは、間接に又は直接に、ＣＶ演算手段２０の特徴点抽出部２１に入力され、特徴点抽出部２１で、適切にサンプリングされたフレーム画像中に、特徴点となるべき点又は小領域画像が自動抽出され、特徴点対応処理部２２で、複数のフレーム画像間で特徴点の対応関係が自動的に求められる。
具体的には、カメラベクトルの検出の基準となる、十分に必要な数以上の特徴点を求める。画像間の特徴点とその対応関係の一例を、図５〜図７に示す。図中「＋」が自動抽出された特徴点であり、複数のフレーム画像間で対応関係が自動追跡される（図７に示す対応点１〜４参照）。
ここで、特徴点の抽出は、図８に示すように、各画像中に充分に多くの特徴点を指定，抽出することが望ましく（図８の○印参照）、例えば、１００点程度の特徴点を抽出する。 Hereinafter, a specific camera vector detection method in the CV calculation unit 20 will be described with reference to FIG.
First, the image data acquired by the all-round video camera 11 of the CV video acquisition device 10 described above is input to the feature point extraction unit 21 of the CV calculation unit 20 indirectly or directly, and the feature point extraction unit 21 A point or a small area image that should be a feature point is automatically extracted from a frame image that is appropriately sampled, and a feature point correspondence processing unit 22 automatically obtains a correspondence relationship between feature points among a plurality of frame images. .
Specifically, more than a sufficient number of feature points that are used as a reference for detecting a camera vector are obtained. An example of feature points between images and their corresponding relationships are shown in FIGS. In the figure, “+” is a feature point that is automatically extracted, and the correspondence is automatically tracked between a plurality of frame images (see correspondence points 1 to 4 shown in FIG. 7).
Here, for feature point extraction, as shown in FIG. 8, it is desirable to specify and extract a sufficiently large number of feature points in each image (see circles in FIG. 8). For example, about 100 feature points are extracted. Extract points.

続いて、カメラベクトル演算部２３で、抽出された特徴点の三次元座標が演算により求められ、その三次元座標に基づいてカメラベクトルが演算により求められる。具体的には、カメラベクトル演算部２３は、連続する各フレーム間に存在する、十分な数の特徴の位置と、移動するカメラ間の位置ベクトル、カメラの３軸回転ベクトル、各カメラ位置と特徴点をそれぞれ結んだベクトル等、各種三次元ベクトルの相対値を演算により連続的に算出する。
本実施形態では、例えば、３６０度全周画像のエピポーラ幾何からエピポーラ方程式を解くことによりカメラ運動（カメラ位置とカメラ回転）を計算するようになっている。 Subsequently, the camera vector calculation unit 23 calculates the three-dimensional coordinates of the extracted feature points, and calculates the camera vector based on the three-dimensional coordinates. Specifically, the camera vector calculation unit 23 includes a sufficient number of feature positions that exist between successive frames, a position vector between moving cameras, a three-axis rotation vector of the camera, and each camera position and feature. Relative values of various three-dimensional vectors such as vectors connecting points are continuously calculated by calculation.
In this embodiment, for example, camera motion (camera position and camera rotation) is calculated by solving an epipolar equation from the epipolar geometry of a 360-degree all-round image.

図７に示す画像１，２は、３６０度全周画像をメルカトール展開した画像であり、緯度φ、軽度θとすると、画像１上の点は（θ１，φ１）、画像２上の点は（θ２，φ２）となる。そして、それぞれのカメラでの空間座標は、ｚ１＝（ｃｏｓφ１ｃｏｓθ１，ｃｏｓφ１ｓｉｎθ１，ｓｉｎφ１）、ｚ２＝（ｃｏｓφ２ｃｏｓθ２，ｃｏｓφ２ｓｉｎθ２，ｓｉｎφ２）である。カメラの移動ベクトルをｔ、カメラの回転行列をＲ、とすると、ｚ１^T［ｔ］×Ｒｚ２＝０がエピポーラ方程式である。
十分な数の特徴点を与えることにより、線形代数演算により最小自乗法による解としてｔ及びＲを計算することができる。この演算を対応する複数フレームに適用し演算する。 Images 1 and 2 shown in FIG. 7 are images obtained by Mercator expansion of 360-degree all-round images. When latitude φ and light θ are assumed, points on image 1 are (θ1, φ1) and points on image 2 are ( θ2, φ2). The spatial coordinates of each camera are z1 = (cos φ1 cos θ1, cos φ1 sin θ1, sin φ1), z2 = (cos φ2 cos θ2, cos φ2 sin θ2, sin φ2). When the camera movement vector is t and the camera rotation matrix is R, z1 ^T [t] × Rz2 = 0 is the epipolar equation.
By providing a sufficient number of feature points, t and R can be calculated as a solution by the method of least squares by linear algebra calculation. This calculation is applied to a plurality of corresponding frames.

ここで、カメラベクトルの演算に利用する画像としては、３６０度全周画像を用いることが好ましい。
カメラベクトル演算に用いる画像としては、原理的にはどのような画像でも良いが、図７に示す３６０度全周画像のような広角画像の方が特徴点を数多く選択し易くなる。そこで、本実施形態では、ＣＶ演算に３６０度全周画像を用いており、これによって、特徴点の追跡距離を長くでき、特徴点を十分に多く選択することができ、遠距離、中距離、短距離それぞれに都合の良い特徴点を選択することができるようになる。また、回転ベクトルを補正する場合には、極回転変換処理を加えることで、演算処理も容易に行えるようになる。これらのことから、より精度の高い演算結果が得られるようになる。
なお、図７は、ＣＶ演算手段２０における処理を理解し易くするために、１台又は複数台のカメラで撮影した画像を合成した３６０度全周囲の球面画像を地図図法でいうメルカトール図法で展開したものを示しているが、実際のＣＶ演算では、必ずしもメルカトール図法による展開画像である必要はない。 Here, it is preferable to use a 360-degree all-round image as an image used for the calculation of the camera vector.
The image used for the camera vector calculation may be any image in principle, but a wide-angle image such as a 360-degree all-round image shown in FIG. 7 can easily select many feature points. Therefore, in this embodiment, a 360-degree all-round image is used for the CV calculation, whereby the tracking distance of the feature points can be increased, and a sufficiently large number of feature points can be selected. It becomes possible to select a feature point convenient for each short distance. In addition, when correcting the rotation vector, the calculation process can be easily performed by adding the polar rotation conversion process. As a result, a calculation result with higher accuracy can be obtained.
In FIG. 7, in order to make it easy to understand the processing in the CV calculation means 20, a 360-degree all-round spherical image obtained by synthesizing images taken by one or a plurality of cameras is developed by a Mercator projection called map projection. However, in an actual CV calculation, it is not necessarily a developed image by the Mercator projection.

次に、誤差最小化部２４では、各フレームに対応する複数のカメラ位置と複数の特徴点の数により、複数通り生じる演算方程式により、各特徴点に基づくベクトルを複数通り演算して求めて、各特徴点の位置及びカメラ位置の分布が最小になるように統計処理をして、最終的なベクトルを求める。例えば、複数フレームのカメラ位置、カメラ回転及び複数の特徴点について、Levenberg-Marquardt法により最小自乗法の最適解を推定し、誤差を収束してカメラ位置、カメラ回転行列、特徴点の座標を求める。
さらに、誤差の分布が大きい特徴点につては削除し、他の特徴点に基づいて再演算することで、各特徴点及びカメラ位置での演算の精度を上げるようにする。
このようにして、特徴点の位置とカメラベクトルを精度良く求めることができる。 Next, the error minimizing unit 24 calculates a plurality of vectors based on each feature point according to a plurality of calculation equations based on the plurality of camera positions and the number of feature points corresponding to each frame, Statistical processing is performed so that the distribution of the position of each feature point and the camera position is minimized to obtain a final vector. For example, the optimal solution of the least square method is estimated by the Levenberg-Marquardt method for multiple frame camera positions, camera rotations, and multiple feature points, and errors are converged to determine the camera position, camera rotation matrix, and feature point coordinates. .
Further, feature points having a large error distribution are deleted, and recalculation is performed based on other feature points, thereby improving the accuracy of computation at each feature point and camera position.
In this way, the position of the feature point and the camera vector can be obtained with high accuracy.

図９〜図１１に、ＣＶ演算により得られる特徴点の三次元座標とカメラベクトルの例を示す。図９〜図１１は、本実施形態のＣＶ演算によるベクトル検出方法を示す説明図であり、移動するカメラによって取得された複数のフレーム画像によって得られるカメラ及び対象物の相対的な位置関係を示す図である。
図９では、図７の画像１，２に示した特徴点１〜４の三次元座標と、画像１と画像２の間で移動するカメラベクトル（Ｘ，Ｙ，Ｚ）が示されている。
図１０及び図１１は、充分に多くの特徴点とフレーム画像により得られた特徴点の位置と移動するカメラの位置が示されている。同図中、グラフ中央に直線状に連続する○印がカメラ位置であり、その周囲に位置する○印が特徴点の位置と高さを示している。 9 to 11 show examples of three-dimensional coordinates of feature points and camera vectors obtained by CV calculation. 9 to 11 are explanatory diagrams showing a vector detection method by CV calculation according to this embodiment, and showing a relative positional relationship between a camera and an object obtained from a plurality of frame images acquired by a moving camera. FIG.
FIG. 9 shows the three-dimensional coordinates of the feature points 1 to 4 shown in the images 1 and 2 in FIG. 7 and the camera vector (X, Y, Z) that moves between the images 1 and 2.
10 and 11 show a sufficiently large number of feature points, the positions of the feature points obtained from the frame image, and the position of the moving camera. In the figure, a circle mark that continues in a straight line at the center of the graph is the camera position, and a circle mark located around the circle indicates the position and height of the feature point.

ここで、ＣＶ演算手段２０におけるＣＶ演算は、より高精度な特徴点とカメラ位置の三次元情報を高速に得るために、図１２に示すように、カメラから特徴点の距離に応じて複数の特徴点を設定し、複数の演算を繰り返し行うようにする。
具体的には、ＣＶ演算手段２０では、画像内には映像的に特徴がある特徴点を自動検出し、各フレーム画像内に特徴点の対応点を求める際に、カメラベクトル演算に用いるｎ番目とｎ＋ｍ番目の二つのフレーム画像ＦｎとＦｎ＋ｍに着目して単位演算とし、ｎとｍを適切に設定した単位演算を繰り返すことができる。
ｍはフレーム間隔であり、カメラから画像内の特徴点までの距離によって特徴点を複数段に分類し、カメラから特徴点までの距離が遠いほどｍが大きくなるように設定し、カメラから特徴点までの距離が近いほどｍが小さくなるように設定する。このようにするのは、カメラから特徴点までの距離が遠ければ遠いほど、画像間における位置の変化が少ないからである。 Here, the CV calculation in the CV calculation means 20 is performed in accordance with the distance from the camera to the feature point, as shown in FIG. 12, in order to obtain more accurate three-dimensional information of the feature point and the camera position. Set feature points and repeat multiple operations.
Specifically, the CV calculation means 20 automatically detects feature points that have image characteristics in the image, and obtains corresponding points of the feature points in each frame image. And the n + m-th two frame images Fn and Fn + m are used as unit calculations, and unit calculations with n and m appropriately set can be repeated.
m is the frame interval, and the feature points are classified into a plurality of stages according to the distance from the camera to the feature point in the image. The distance from the camera to the feature point is set so that m becomes larger. M is set to be smaller as the distance to is shorter. This is because the change in position between images is less as the distance from the camera to the feature point is longer.

そして、特徴点のｍ値による分類を、十分にオーバーラップさせながら、複数段階のｍを設定し、画像の進行とともにｎが連続的に進行するのにともなって、演算を連続的に進行させる。そして、ｎの進行とｍの各段階で、同一特徴点について複数回重複演算を行う。
このようにして、フレーム画像ＦｎとＦｎ＋ｍに着目した単位演算を行うことにより、ｍ枚毎にサンプリングした各フレーム間（フレーム間は駒落ちしている）では、長時間かけて精密カメラベクトルを演算し、フレーム画像ＦｎとＦｎ＋ｍの間のｍ枚のフレーム（最小単位フレーム）では、短時間処理で行える簡易演算とすることができる。 Then, while sufficiently overlapping the classification of the feature points by the m value, a plurality of stages of m are set, and as n progresses continuously with the progress of the image, the calculation proceeds continuously. Then, the overlap calculation is performed a plurality of times for the same feature point in each step of n and m.
In this way, by performing unit calculation focusing on the frame images Fn and Fn + m, a precise camera vector is calculated over a long time between frames sampled every m frames (frames are dropped). However, in m frames (minimum unit frames) between the frame images Fn and Fn + m, a simple calculation that can be performed in a short time can be performed.

ｍ枚毎の精密カメラベクトル演算に誤差がないとすれば、ｍ枚のフレームのカメラベクトルの両端は、高精度演算をしたＦｎとＦｎ＋ｍのカメラベクトルと重なることになる。従って、ＦｎとＦｎ＋ｍの中間のｍ枚の最小単位のフレームについては簡易演算で求め、簡易演算で求めたｍ枚の最小単位フレームのカメラベクトルの両端を、高精度演算で求めたＦｎとＦｎ＋ｍのカメラベクトルに一致するように、ｍ枚の連続したカメラベクトルのスケール調整をすることができる。
このようにして、画像の進行とともにｎが連続的に進行することにより、同一特徴点について複数回演算されて得られる各カメラベクトルの誤差が最小になるようにスケール調整して統合し、最終のカメラベクトルを決定することができる。
これにより、誤差のない高精度のカメラベクトルを求めつつ、簡易演算を組み合わせることにより、演算処理を高速化することができるようになる。 If there is no error in the precision camera vector calculation for every m frames, both ends of the camera vector of the m frames overlap with the Fn and Fn + m camera vectors that have been subjected to the high precision calculation. Accordingly, m minimum unit frames between Fn and Fn + m are obtained by a simple calculation, and both ends of the camera vector of the m minimum unit frames obtained by the simple calculation are Fn and Fn + m obtained by high precision calculation. The scale adjustment of m consecutive camera vectors can be made to match the camera vectors.
In this way, as n progresses continuously with the progress of the image, the scale adjustment is performed so that the error of each camera vector obtained by calculating a plurality of times for the same feature point is minimized, and integration is performed. A camera vector can be determined.
Accordingly, it is possible to speed up the arithmetic processing by combining simple arithmetic operations while obtaining a highly accurate camera vector having no error.

ここで、簡易演算としては、精度に応じて種々の方法があるが、例えば、(1)高精度演算では１００個以上の多くの特徴点を用いる場合に、簡易演算では最低限の１０個程度の特徴点を用いる方法や、(2)同じ特徴点の数としても、特徴点とカメラ位置を同等に考えれば、そこには無数の三角形が成立し、その数だけの方程式が成立するため、その方程式の数を減らすことで、簡易演算とすることができる。
これによって、各特徴点及びカメラ位置の誤差が最小になるようにスケール調整する形で統合し、距離演算を行い、さらに、誤差の分布が大きい特徴点を削除し、必要に応じて他の特徴点について再演算することで、各特徴点及びカメラ位置での演算の精度を上げることができる。 Here, there are various simple calculation methods depending on the accuracy. For example, when (1) many feature points of 100 or more are used in high-precision calculation, the minimum number of simple calculation is about ten. (2) Even if the number of the same feature points is the same as the feature points and camera positions, innumerable triangles are established there, and equations for that number are established. By reducing the number of equations, it can be simplified.
In this way, integration is performed by adjusting the scale so that the error of each feature point and camera position is minimized, distance calculation is performed, and feature points with large error distribution are deleted, and other features are added as necessary. By recalculating the points, the calculation accuracy at each feature point and camera position can be improved.

また、このように高速な簡易演算を行うことにより、カメラベクトルのリアルタイムに近い処理が可能となる。カメラベクトルの高速演算処理は、目的の精度をとれる最低のフレーム数と、自動抽出した最低の特徴点数で演算を行い、カメラベクトルの概略値を高速演算で求め、表示し、次に、画像が蓄積するにつれて、フレーム数を増加させ、特徴点の数を増加させ、より精度の高いカメラベクトル演算を行い、概略値を精度の高いカメラベクトル値に置き換えて表示することができる。 In addition, by performing high-speed simple calculation in this way, it is possible to perform near real-time processing of camera vectors. High-speed calculation processing of the camera vector is performed with the minimum number of frames that can achieve the target accuracy and the minimum number of feature points that are automatically extracted, and the approximate value of the camera vector is obtained and displayed by high-speed calculation. As it accumulates, it is possible to increase the number of frames, increase the number of feature points, perform more accurate camera vector calculation, and replace the approximate value with a highly accurate camera vector value for display.

さらに、本実施形態では、より高精度のカメラベクトルを求めるために、三次元情報（三次元形状）の追跡を行うことができる。
具体的には、まず、三次元情報追跡部２５で、カメラベクトル演算部２３，誤差最小化部２４を経て得られたカメラベクトルを概略のカメラベクトルと位置づけ、その後のプロセスで生成される画像の一部として得られる三次元情報（三次元形状）に基づいて、複数のフレーム画像に含まれる部分的三次元情報を隣接するフレーム間で連続的に追跡して三次元形状の自動追跡を行う。
そして、この三次元情報追跡部２５で得られた三次元情報の追跡結果から、高精度カメラベクトル演算部２６においてより高精度なカメラベクトルが求められる。 Furthermore, in this embodiment, it is possible to track three-dimensional information (three-dimensional shape) in order to obtain a more accurate camera vector.
Specifically, first, the three-dimensional information tracking unit 25 positions the camera vector obtained through the camera vector calculation unit 23 and the error minimization unit 24 as an approximate camera vector, and the image generated in the subsequent process. Based on the three-dimensional information (three-dimensional shape) obtained as a part, partial three-dimensional information included in a plurality of frame images is continuously tracked between adjacent frames to automatically track the three-dimensional shape.
Then, from the tracking result of the three-dimensional information obtained by the three-dimensional information tracking unit 25, a high-precision camera vector is obtained by the high-precision camera vector calculation unit 26.

上述した特徴点抽出部２１及び特徴点対応処理部２２では、特徴点を複数のフレーム間画像内に自動追跡するが、特徴点が消失するなどして特徴点の追跡フレーム数に制限が出てくることがある。また、画像は二次元であり、追跡途中で形状が変化するために追跡精度にも一定の限界がある。
そこで、特徴点追跡で得られるカメラベクトルを概略値と位置づけ、その後のプロセスで得られる三次元情報（三次元形状）を各フレーム画像上に追跡して、その軌跡から高精度カメラベクトルを求めることができる。
三次元形状の追跡は、マッチング及び相関の精度を得やすく、三次元形状はフレーム画像によって、その三次元形状も大きさも変化しないので、多くのフレームに亘って追跡が可能であり、そのことでカメラベクトル演算の精度を向上させることができる。これはカメラベクトル演算部２３により概略のカメラベクトルが既知であり、三次元形状が既に分かっているから可能となるものである。 In the feature point extraction unit 21 and the feature point correspondence processing unit 22 described above, feature points are automatically tracked in a plurality of inter-frame images, but the number of feature point tracking frames is limited due to disappearance of feature points. May come. In addition, since the image is two-dimensional and the shape changes during tracking, there is a certain limit in tracking accuracy.
Therefore, the camera vector obtained by the feature point tracking is regarded as an approximate value, and the three-dimensional information (three-dimensional shape) obtained in the subsequent process is traced on each frame image, and a high-precision camera vector is obtained from the trajectory. Can do.
The tracking of 3D shapes is easy to obtain matching and correlation accuracy, and the 3D shape does not change in size and size depending on the frame image, so it can be tracked over many frames. The accuracy of the camera vector calculation can be improved. This is possible because the approximate camera vector is already known by the camera vector calculation unit 23 and the three-dimensional shape is already known.

カメラベクトルが概略値の場合、非常に多くのフレームに亘る三次元座標の誤差は、特徴点追跡による各フレームに関係するフレームが少ないので、誤差が累積して長距離では次第に大きな誤差になるが、画像の一部分を切り取ったときの三次元形状の誤差は相対的に少なく、形状の変化と大きさに及ぼす影響はかなり少ないものとなる。このため、三次元形状での比較や追跡は、二次元形状追跡の時よりも極めて有利となる。追跡において、二次元形状での追跡の場合、複数のフレームにおける形状の変化と大きさの変化を避けられないまま追跡することになるので、誤差が大きかったり、対応点が見つからないなどの問題があったが、三次元形状での追跡においては形状の変化が極めて少なく、しかも原理的に大きさの変化もないので、正確な追跡が可能となる。 When the camera vector is an approximate value, the error of 3D coordinates over a very large number of frames is small because there are few frames related to each frame by feature point tracking. The error of the three-dimensional shape when a part of the image is cut is relatively small, and the influence on the change and size of the shape is considerably small. For this reason, the comparison and tracking in the three-dimensional shape is extremely advantageous over the two-dimensional shape tracking. In tracking, when tracking with 2D shape, tracking changes in shape and size in multiple frames are unavoidable, so there are problems such as large errors and missing corresponding points. However, in tracking with a three-dimensional shape, there is very little change in shape, and in principle there is no change in size, so accurate tracking is possible.

ここで、追跡の対象となる三次元形状データとしては、例えば、特徴点の三次元分布形状や、特徴点の三次元分布形状から求められるポリゴン面等がある。
また、得られた三次元形状を、カメラ位置から二次元画像に変換して、二次元画像として追跡することも可能である。カメラベクトルの概略値が既知であることから、カメラ視点からの二次元画像に投影変換が可能であり、カメラ視点の移動による対象の形状変化にも追従することが可能となる。 Here, as the three-dimensional shape data to be tracked, there are, for example, a three-dimensional distribution shape of feature points, a polygon surface obtained from the three-dimensional distribution shape of feature points, and the like.
It is also possible to convert the obtained three-dimensional shape from a camera position into a two-dimensional image and track it as a two-dimensional image. Since the approximate value of the camera vector is known, projection conversion can be performed on a two-dimensional image from the camera viewpoint, and it is also possible to follow a change in the shape of the object due to movement of the camera viewpoint.

以上のようにして求められたカメラベクトルは、全周ビデオカメラ１１で撮影されたビデオ映像中に重ねて表示することができる。
例えば、図１３に示すように、車載カメラからの映像を平面展開して、各フレーム画像内の目的平面上の対応点を自動で探索し、対応点を一致させるように結合して目的平面の結合画像を生成し、同一の座標系に統合して表示する。
さらに、その共通座標系の中にカメラ位置とカメラ方向を次々に検出し、その位置や方向、軌跡をプロットしていくことができる。ＣＶデータは、その三次元位置と３軸回転を示しており、ビデオ映像に重ねて表示することで、ビデオ映像の各フレームでＣＶ値を同時に観察できる。ＣＶデータをビデオ映像に重ねた表示した画像例を図１３に示す。
なお、ビデオ映像内にカメラ位置を正しく表示すると、ＣＶ値が示すビデオ映像内の位置は画像の中心となり、カメラ移動が直線に近い場合は、すべてのフレームのＣＶ値が重なって表示されてしまうので、例えば図１３に示すように、敢えてカメラ位置から真下に１メートルの位置を表示することが適切である。あるいは道路面までの距離を基準として、道路面の高さにＣＶ値を表示するのがより適切である。 The camera vector obtained as described above can be displayed in an overlapping manner in the video image shot by the all-round video camera 11.
For example, as shown in FIG. 13, the image from the in-vehicle camera is developed in a plane, the corresponding points on the target plane in each frame image are automatically searched, and the corresponding points are combined to match the target plane. A combined image is generated and displayed in the same coordinate system.
Furthermore, the camera position and the camera direction can be detected one after another in the common coordinate system, and the position, direction, and locus can be plotted. The CV data indicates the three-dimensional position and the three-axis rotation, and the CV value can be observed simultaneously in each frame of the video image by displaying it over the video image. An example of an image displayed by superimposing CV data on a video image is shown in FIG.
If the camera position is correctly displayed in the video image, the position in the video image indicated by the CV value is the center of the image, and if the camera movement is close to a straight line, the CV values of all frames are displayed overlapping. Therefore, for example, as shown in FIG. 13, it is appropriate to display a position of 1 meter right below the camera position. Alternatively, it is more appropriate to display the CV value at the height of the road surface based on the distance to the road surface.

［自己位置姿勢標定装置］
次に、以上のようにして求められたＣＶ値に基づいて基準映像（ＣＶ映像）と、それに比較される目的画像の座標統合処理を行う本発明に係る基準映像地図（ＣＶ映像地図）を用いた自己位置姿勢標定装置の実施形態について、図面を参照しつつ具体的に説明する。
なお、以下に示す基準映像地図（ＣＶ映像地図）を用いた自己位置姿勢標定装置において、基準映像・目的画像と表現する場合、必ずしも目的画像が時間的・時刻的に新しい映像であり、基準映像が旧い映像であるという意味ではない。
例えば、画像更新装置を目的とした自己位置姿勢標定装置の場合には、ＣＶ値が既知の映像が、基準映像となるものであり、基準映像に基づいてＣＶ値が取得（移植・統合）される映像が、目的画像（目的映像）となるものである。 [Self-position orientation locator]
Next, the reference video map (CV video map) according to the present invention for performing the coordinate integration processing of the reference video (CV video) and the target image to be compared based on the CV value obtained as described above is used. An embodiment of the self-position / posture locating apparatus will be specifically described with reference to the drawings.
In the self-position / posture locator using the reference video map (CV video map) shown below, when the reference video / target image is expressed, the target image is not necessarily a new video in time and time, and the reference video Does not mean that is an old video.
For example, in the case of a self-position / posture locating device for the purpose of an image updating device, a video with a known CV value becomes a reference video, and the CV value is acquired (transplanted / integrated) based on the reference video. The video that becomes the target image (target video).

図１４は、本発明の一実施形態に係る基準映像地図（ＣＶ映像地図）を用いた自己位置姿勢標定装置１の基本構成を示すブロック図である。
図１４に示すように、本実施形態に係る基準映像地図（ＣＶ映像地図）を用いた自己位置姿勢標定装置１は、上述したＣＶ映像地図を基準映像として、移動する車両等の移動体が自らの位置と姿勢をリアルタイムに標定し、その自己位置姿勢標定に基づいて移動体の自動運転等を実現するための装置・手段である。
具体的には、本実施形態に係る基準映像地図を用いた自己位置姿勢標定装置１は、ＣＶ映像取得装置１０と、ＣＶ映像地図作成装置２０と、ＣＶ映像地図データベース（ＣＶ機械地図データベース）３０と、目的移動体（自動運転装置）４０と、ＣＶ映像地図・目的画像比較装置５０と、自己位置姿勢標定装置６０を備えている。 FIG. 14 is a block diagram showing a basic configuration of the self-position / posture locating apparatus 1 using a reference video map (CV video map) according to an embodiment of the present invention.
As shown in FIG. 14, the self-position / posture locating apparatus 1 using the reference video map (CV video map) according to the present embodiment uses the above-mentioned CV video map as a reference video, and a moving body such as a moving vehicle itself Is a device / means for locating the position and orientation of the mobile body in real time and realizing automatic operation of the moving body based on the self-position orientation.
Specifically, the self-position orientation determination apparatus 1 using the reference video map according to the present embodiment includes a CV video acquisition device 10, a CV video map creation device 20, and a CV video map database (CV machine map database) 30. A target moving body (automatic driving device) 40, a CV video map / target image comparing device 50, and a self-position / posture locating device 60.

ＣＶ映像取得装置１０は、自己位置姿勢標定の基準映像地図を生成するための基準映像を撮影・取得するための手段であり、上述した図１〜３に示すように、全周ビデオカメラ１１を備えた走行車両等の移動体１１ａによって構成される。
この移動体１１ａが、基準映像地図を取得する目的で、所定の道路等を一定範囲で走行することで、移動体１１ａに備えられた全周ビデオカメラ１１により、移動体１１ａの移動に伴って移動体周辺の映像を基準映像として撮影・取得する。
このＣＶ映像取得装置１０で取得された基準映像が、ＣＶ映像地図作成装置２０に入力されて、上述したＣＶ演算に基づくＣＶ映像地図の作成処理が行われる（図１〜１３参照）。 The CV video acquisition device 10 is a means for capturing and acquiring a reference video for generating a reference video map for self-position orientation determination. As shown in FIGS. It is comprised by the moving bodies 11a, such as a traveling vehicle provided.
The moving body 11a travels within a certain range on a predetermined road or the like for the purpose of acquiring a reference video map, and thus the moving body 11a is moved by the all-round video camera 11 provided in the moving body 11a. Shoot and acquire video around a moving object as a reference video.
The reference video acquired by the CV video acquisition device 10 is input to the CV video map creation device 20, and a CV video map creation process based on the CV calculation described above is performed (see FIGS. 1 to 13).

ＣＶ映像地図作成装置２０は、所定の映像取得手段で撮影された基準映像に基づいて、当該基準映像のカメラ位置と姿勢の三次元座標値を示すＣＶ（カメラベクトル）値を求めるＣＶ演算を行い、基準映像にＣＶ値を付加したＣＶ映像地図を生成する手段であり、本願請求項１のＣＶ映像地図作成手段を構成している。
具体的には、ＣＶ映像地図作成装置２０は、上述した図１〜１３で示したＣＶ演算手段によって構成される。ＣＶ映像地図作成装置２０によるＣＶ演算の具体的な内容については、上述したとおりである（図１〜１３参照）。 The CV video map creation device 20 performs CV calculation for obtaining a CV (camera vector) value indicating the three-dimensional coordinate values of the camera position and orientation of the reference video based on the reference video taken by a predetermined video acquisition means. A means for generating a CV video map in which a CV value is added to a reference video, and constitutes a CV video map creating means according to claim 1 of the present application.
Specifically, the CV video map creation device 20 is configured by the CV calculation means shown in FIGS. The specific contents of the CV calculation by the CV video map creation device 20 are as described above (see FIGS. 1 to 13).

ＣＶ映像地図データベース（ＣＶ機械地図データベース）３０は、ＣＶ映像地図作成装置２０で生成されたＣＶ映像地図を記憶する記憶手段であり、本願請求項１のＣＶ映像地図データベースを構成している。
このＣＶ映像地図データベース３０に記憶されたＣＶ映像地図が、自己位置姿勢標定処理のための基準映像となる三次元地図データとして記憶・保持され、自己位置姿勢標定装置６０により読み出されて、所定の目的画像との比較参照・座標統合等が行われることになる。 The CV video map database (CV machine map database) 30 is storage means for storing the CV video map generated by the CV video map creation device 20, and constitutes the CV video map database of claim 1 of the present application.
The CV video map stored in the CV video map database 30 is stored and held as 3D map data serving as a reference video for the self-position / posture locating process, read out by the self-position / posture locator 60, and stored in a predetermined manner. Comparison reference with the target image and coordinate integration are performed.

目的移動体（自動運転装置）４０は、自動運転の対象となる車両等で構成され、本実施形態の自己位置姿勢標定の目的となる移動物体である。
この目的移動体４０には、自己位置姿勢標定の目的画像となる画像・映像を撮影・取得するための手段として、上述したＣＶ映像取得装置１０と同様に、例えばビデオカメラや車載カメラ等の撮像手段（目的カメラ）が備えられている。
また、目的移動体４０には、当該移動体の位置情報を取得する手段として、上述のようなＩＭＵ／ＧＹＲＯ／ＧＮＳＳ（ＧＰＳ）などの記載センサーが備えられている。 The target moving body (automated driving device) 40 is a moving object that is configured by a vehicle or the like that is an object of automatic driving and that is the target of the self-position / posture determination of this embodiment.
In the target moving body 40, as a means for capturing and acquiring an image / video that is a target image of self-position / posture determination, as with the CV video acquisition device 10 described above, for example, imaging such as a video camera or an in-vehicle camera is used. Means (target camera) are provided.
In addition, the target moving body 40 is provided with a description sensor such as IMU / GYRO / GNSS (GPS) as described above as means for acquiring position information of the moving body.

そして、カメラ及び機械センサー等を備えた目的移動体４０が、例えば自動運転の対象範囲となる所定の道路等を走行することで、移動体に備えられたカメラ等により、移動体の移動に伴って移動体周辺の映像が目的画像として撮影・取得され、その目的画像の三次元位置情報が機械センサーによって取得・付与される。
この目的移動体４０で取得された目的画像及び機械センサーで得られた６変数データが、ＣＶ映像地図・目的画像比較装置５０を介して自己位置姿勢標定装置６０に入力されて、上述した基準映像となるＣＶ映像地図と対比・参照されて、目的画像に対して三次元座標となるＣＶ値が移植・統合・補正されることになる。 The target moving body 40 having a camera, a mechanical sensor, and the like travels along the movement of the moving body by a camera or the like provided in the moving body, for example, by traveling on a predetermined road or the like that is a target range of automatic driving. Thus, the video around the moving body is captured and acquired as a target image, and the three-dimensional position information of the target image is acquired and given by the mechanical sensor.
The target image acquired by the target moving body 40 and the 6-variable data obtained by the machine sensor are input to the self-position / posture locating device 60 via the CV video map / target image comparison device 50, and the reference video described above. The CV value that is a three-dimensional coordinate is transplanted, integrated, and corrected with respect to the target image by comparison and reference with the CV video map.

ＣＶ映像地図・目的画像比較装置５０は、目的移動体４０に備えられた所定の画像取得手段で撮影された目的画像と、当該目的画像に対応する機械センサーにより取得された６変数データを入力・受信する。そして、当該目的画像を対比・参照させるべき基準画像となるＣＶ映像地図をＣＶ映像地図データベース３０から読み出し、これら目的画像・機械センサーで得られる６変数データ・ＣＶ映像地図を自己位置姿勢標定装置６０に出力する。 The CV video map / target image comparison device 50 inputs a target image taken by a predetermined image acquisition means provided in the target moving body 40 and six variable data acquired by a mechanical sensor corresponding to the target image. Receive. Then, a CV video map serving as a reference image to be compared / referenced with the target image is read from the CV video map database 30, and the 6-variable data / CV video map obtained by the target image / machine sensor is self-position / posture locating device 60. Output to.

ここで、目的画像と比較すべき基準映像となるＣＶ映像地図は、例えば、ＣＶ映像地図に付与されている概略位置情報等に基づき、目的画像に対応するＣＶ映像地図が読み出される。なお、この場合、目的画像には、例えばＧＰＳ等により概略位置情報が付与されているとする。
これにより、自己位置姿勢標定装置６０では、目的画像・機械センサーで得られる６変数・ＣＶ映像地図の各データに基づき、自己位置姿勢標定処理が実行されるようになる。 Here, as the CV video map serving as the reference video to be compared with the target image, for example, the CV video map corresponding to the target image is read based on the approximate position information attached to the CV video map. In this case, it is assumed that approximate position information is given to the target image by, for example, GPS.
As a result, the self-position / posture locating device 60 executes the self-position / posture locating process based on the respective data of the 6-variable / CV video map obtained by the target image / machine sensor.

自己位置姿勢標定装置６０は、ＣＶ映像地図データベース３０に記憶されたＣＶ映像地図を基準画像とし、目的移動体４０に備えられた所定の画像取得手段で撮影された目的画像をＣＶ映像地図と比較して、当該目的画像とＣＶ映像地図の同一箇所を示す複数の特徴点を自動的に対応させることにより、当該目的画像のＣＶ値を取得する手段であり、本願請求項１の自己位置姿勢標定手段を構成している。
まず、自己位置姿勢標定装置６０には、ＣＶ映像地図・目的画像比較装置５０を介して、目的移動体４０の撮像手段（目的カメラ）で取得される目的画像と、機械センサーで得られる目的移動体４０の位置情報となる６変数が入力される。 The self-position / posture locating device 60 uses the CV video map stored in the CV video map database 30 as a reference image, and compares the target image taken by a predetermined image acquisition means provided in the target moving body 40 with the CV video map. The means for acquiring the CV value of the target image by automatically associating the target image with a plurality of feature points indicating the same portion of the CV video map. Means.
First, in the self-position / posture locating device 60, the target image obtained by the imaging means (target camera) of the target moving body 40 and the target movement obtained by the mechanical sensor are transmitted via the CV video map / target image comparing device 50. Six variables serving as position information of the body 40 are input.

また、自己位置姿勢標定装置６０には、ＣＶ映像地図データベース３０に記憶されたＣＶ映像地図のうち、目的画像と比較・参照すべき大まかな範囲を基準画像として読み込み、ＣＶ映像地図・目的画像比較装置５０を介して読み出されて入力される。
そして、自己位置姿勢標定装置６０は、入力された目的画像と機械センサーで得られた６変数データとＣＶ映像地図に基づいて、目的画像とＣＶ映像地図の同一箇所を示す複数の特徴点を自動的に対応させることにより、ＣＶ映像地図に付加されたＣＶ値を、対応する目的画像の特徴点に移植するとで、座標を統合する。
これにより、目的画像にＣＶ値が付与され、目的画像は高精度なＣＶ値を有するＣＶ映像（目的ＣＶ値）として生成・保持されることになる。 The self-position / posture locator 60 reads a rough range to be compared / referenced with the target image from the CV video map stored in the CV video map database 30 as a reference image, and compares the CV video map / target image. It is read and input via the device 50.
The self-position / posture locating device 60 automatically selects a plurality of feature points indicating the same portion of the target image and the CV video map based on the input target image, the six variable data obtained by the machine sensor, and the CV video map. Thus, the coordinates are integrated by transplanting the CV value added to the CV video map to the feature point of the corresponding target image.
As a result, a CV value is given to the target image, and the target image is generated and held as a CV video (target CV value) having a highly accurate CV value.

具体的には、基準映像となるＣＶ映像地図と目的画像間のＣＶ値の移植・統合処理は、例えば、以下のようにして行われる。
まず、目的画像と基準映像との中の共通地点の対応関係を初期設定する。初期設定されたフレームから開始する動画像の各フレームに亘って、基準映像の中の所定の三次元基準点、又は三次元特徴点、又は二次元特徴点に対応する部分を、目的画像の中に自動的に対応付け、各フレームに亘って対応特徴点、又は対応基準点を探索する。
あるいは、目的画像の中の三次元基準点、又は三次元特徴点、又は二次元特徴点を基準映像の中に探索し、基準映像の各フレームに亘って追跡する。
さらに、目的画像が複数フレームに亘る複数画像や動画映像の場合には、探索処理により対応付けられた対応基準点を、目的画像の進行する各フレームに亘って追跡する。同様に基準映像が複数フレームに亘る複数画像や動画映像の場合には、探索処理により対応付けられた対応基準点を、基準映像の進行する各フレームに亘って追跡する。
そして、対応基準点（対応特徴点も同様）の対応結果により、基準映像の三次元基準点の三次元座標が、目的画像の対応基準点に移植される。 Specifically, the CV value transplantation / integration processing between the CV video map serving as the reference video and the target image is performed as follows, for example.
First, the correspondence between common points in the target image and the reference video is initially set. Over each frame of the moving image starting from the initially set frame, a portion corresponding to a predetermined 3D reference point, 3D feature point, or 2D feature point in the reference image is included in the target image. Automatically corresponding to each other and searching for corresponding feature points or corresponding reference points over each frame.
Alternatively, a three-dimensional reference point, a three-dimensional feature point, or a two-dimensional feature point in the target image is searched for in the reference image and tracked over each frame of the reference image.
Further, in the case where the target image is a plurality of images or video images over a plurality of frames, the corresponding reference points associated by the search process are tracked over each frame in which the target image proceeds. Similarly, when the reference image is a plurality of images or moving images over a plurality of frames, the corresponding reference point associated by the search process is traced over each frame in which the reference image proceeds.
Then, the three-dimensional coordinates of the three-dimensional reference point of the reference image are transplanted to the corresponding reference point of the target image based on the correspondence result of the corresponding reference point (same as the corresponding feature point).

さらに、その移植された対応基準点の三次元座標から目的画像のＣＶ値を、上述したＣＶ演算によって求めることができる。目的画像が複数フレームに亘る場合には、追跡により各フレームに亘ってＣＶ値を求めることができる。
すなわち、基準映像と目的画像との対応が付くことで、基準映像の基準点の三次元座標が目的画像に移植されたことになり、それが四点以上有れば、目的画像のカメラ位置と姿勢がＣＶ演算により求めることができ、目的画像のＣＶ値を取得することができるようになる。
また、このようにＣＶ値が付加された基準映像と、ＣＶ値が求められた目的画像とは、同一の座標系に座標統合されたことになる。 Furthermore, the CV value of the target image can be obtained from the three-dimensional coordinates of the transplanted corresponding reference point by the above-described CV calculation. When the target image extends over a plurality of frames, the CV value can be obtained over each frame by tracking.
In other words, the correspondence between the reference image and the target image means that the three-dimensional coordinates of the reference point of the reference image are transplanted to the target image. If there are four or more points, the camera position of the target image The posture can be obtained by CV calculation, and the CV value of the target image can be acquired.
In addition, the reference image to which the CV value is added in this way and the target image for which the CV value is obtained are coordinate-integrated in the same coordinate system.

このように、自己位置姿勢標定装置６０では、ＣＶ映像地図を基準として、目的画像と比較をするが、逆に、目的画像から自動取得した既知の三次元点を基準として、ＣＶ映像地図側に対応点を求めて、目的画像のＣＶ値を求めることもできる。
これは自動認識により、目的画像側でのみ、三次元座標を取得する場合に相当する。つまり、目的画像の三次元座標をその場で計測することではなく、例えば、後述する実世界３Ｄマーカ（図１５参照）のように、その目的物の三次元形状と三次元座標が公表されているような場合に、目的対象物を取得することで、公表されている対象物の三次元座標を外から取得できる場合などに利用できる。 As described above, the self-position / posture locating device 60 compares the CV video map with the target image, but conversely, with the known three-dimensional points automatically acquired from the target image as a reference, Corresponding points can be obtained to obtain the CV value of the target image.
This corresponds to a case where three-dimensional coordinates are acquired only on the target image side by automatic recognition. That is, instead of measuring the 3D coordinates of the target image on the spot, the 3D shape and 3D coordinates of the target object are announced, for example, as in a real world 3D marker (see FIG. 15) described later. In such a case, by acquiring the target object, it can be used when the three-dimensional coordinates of the published object can be acquired from the outside.

また、そのような目的対象物が時間経過等によって移動するような場合にも、目的画像側から、三次元座標を取得することがあり得る。
対象物とは、常に地球上に完全固定されたものは少なく、例えば地震でも移動することがあり、標識などは風雪で傾くこともある。したがって、常に最新の情報を取得することが必要であり、そのために目的画像側から三次元座標を取得することが有用となる。
そして、本実施形態に係る自己位置標定装置６０では、ＣＶ映像地図と目的画像の組み合わせにより、ＣＶ映像地図から目的画像へ、又は／同時に、目的画像からＣＶ映像地図へ、三次元特徴点の三次元座標移転を行うことで、目的画像のＣＶ値取得を行うことができるものである。 In addition, even when such a target object moves with the passage of time or the like, it is possible to acquire three-dimensional coordinates from the target image side.
There are few objects that are completely fixed on the earth at all times. For example, the object may move even in an earthquake, and a sign may be tilted by wind and snow. Therefore, it is necessary to always obtain the latest information, and for that purpose, it is useful to obtain the three-dimensional coordinates from the target image side.
In the self-localization device 60 according to the present embodiment, the combination of the CV video map and the target image is used to convert the three-dimensional feature points from the CV video map to the target image, or simultaneously from the target image to the CV video map. By performing the original coordinate transfer, the CV value of the target image can be acquired.

さらに、上述のとおり、本発明に係るＣＶ演算は、動画像や連続する静止画像を演算して求めるものであるが、ＣＶ映像地図と目的画像の双方を含めてＣＶ演算を行うことも可能である。
これによって、既知のＣＶ値のみを固定して演算を行い、目的画像側の未知のＣＶ値を求めることで、単独で演算するよりも、単に対応点処理で求めるよりも、高精度なＣＶ値を求めることが可能となる。 Furthermore, as described above, the CV calculation according to the present invention is obtained by calculating a moving image or a continuous still image, but it is also possible to perform CV calculation including both a CV video map and a target image. is there.
In this way, the calculation is performed with only the known CV value fixed, and the unknown CV value on the target image side is obtained, so that the CV value can be obtained with higher accuracy than the simple calculation using the corresponding point processing. Can be obtained.

具体的には、自己位置標定装置６０では、ＣＶ映像地図と目的画像の組み合わせで、両者の二次元特徴点（２Ｄ）と三次元特徴点（３Ｄ）が混在する中で、両者を一体としてＣＶ演算を行い、そのときに、三次元座標が既知の特徴点（３Ｄ）の三次元座標は固定したまま、両者の全ての特徴点を使ってＣＶ演算を行うことができる。その結果、同時にＣＶ映像地図と目的画像が混在したすべての特徴点が三次元座標を持つことになり、次に目的画像が取得したＣＶ値を新たなＣＶ値として扱うことで、混在するＣＶ値の中から、目的画像のＣＶ値を分離して、自己位置姿勢を標定することが可能となる。 Specifically, in the self-localization device 60, in the combination of the CV video map and the target image, both of the two-dimensional feature points (2D) and the three-dimensional feature points (3D) are mixed. The calculation is performed, and at that time, the CV calculation can be performed by using all the feature points of the feature point (3D) with the known three-dimensional coordinates while fixing the three-dimensional coordinates. As a result, all feature points in which the CV video map and the target image are mixed simultaneously have three-dimensional coordinates, and then the CV value acquired by the target image is treated as a new CV value, so that the mixed CV value is obtained. The CV value of the target image can be separated from the target image, and the self-position / posture can be determined.

以上のような、自己位置姿勢標定装置６０における自己位置姿勢標定処理の詳細については、図１５〜１７を参照しつつ後述する。
そして、上記のような自己位置姿勢標定装置６０は、目的画像とＣＶ映像地図の同一箇所を示す特徴点として、ＣＶ映像地図に含まれる所定の特徴量として、後述する７種類の特徴量を選択するようになっている（図１５，１６参照）。
この７種類の特徴量については、図１５，１６を参照しつつ後述する。 Details of the self-position / posture locating process in the self-position / posture locating apparatus 60 as described above will be described later with reference to FIGS.
Then, the self-position / posture locating device 60 as described above selects seven types of feature amounts described later as predetermined feature amounts included in the CV video map as feature points indicating the same portion of the target image and the CV video map. (See FIGS. 15 and 16).
The seven types of feature amounts will be described later with reference to FIGS.

さらに、自己位置姿勢標定装置６０は、目的移動体４０に備えられた機械センサーで取得される、当該目的移動体４０の自己位置と姿勢を示す６変数データを、目的画像のＣＶ値に基づいて補正することにより、時間的に連続した目的移動体４０の自己位置と姿勢を示す６変数を取得するようになっている。
これによって、目的画像のＣＶ値を、機械センサーで得られる６変数データによって補正・補完することができる（図１７参照）。
この機械センサーによるＣＶ値の補正・補完処理については、図１７を参照しつつ後述する。 Further, the self-position / posture locating device 60 uses the 6-variable data indicating the self-position and posture of the target moving body 40 acquired by the mechanical sensor provided in the target moving body 40 based on the CV value of the target image. By correcting, six variables indicating the self-position and posture of the target moving body 40 that are temporally continuous are acquired.
As a result, the CV value of the target image can be corrected and supplemented by the six variable data obtained by the machine sensor (see FIG. 17).
The CV value correction / complement processing by the mechanical sensor will be described later with reference to FIG.

そして、以上のようにＣＶ映像地図に基づく自己位置姿勢標定が行われることにより、生成・出力された自己位置姿勢標定結果に基づいて、例えば車両等の移動体の自動運転制御が可能となる。
例えば、図１４の破線で示すように、自動運転の対象となる車両等の目的移動体４０は、自動運転手段によって制御・駆動される。
具体的には、目的移動体４０は、各種センサー等で構成される車両周囲状況判断装置７０の出力信号と、それに基づく走行・停止・回転等の車両の動作を制御する車両制御信号発生装置８０の出力信号により、自動運転が行われる。 Then, by performing the self-position / posture determination based on the CV video map as described above, automatic driving control of a moving body such as a vehicle can be performed based on the generated / output self-position / posture determination result.
For example, as indicated by a broken line in FIG. 14, the target moving body 40 such as a vehicle to be automatically driven is controlled and driven by automatic driving means.
Specifically, the target moving body 40 includes an output signal of the vehicle surrounding state determination device 70 configured by various sensors and the like, and a vehicle control signal generation device 80 that controls the operation of the vehicle such as running / stopping / rotation based on the output signal. The automatic operation is performed by the output signal.

このような目的移動体４０の自動運転手段に対して、上述した自己位置姿勢標定装置６０で生成された自己位置姿勢標定情報が入力されることで、ＣＶ値に基づく高精度な三次元位置情報によって、目的移動体４０の正確な位置情報が、高速かつ低コストで得られるようになる。
以上のような本発明の自己位置姿勢標定装置６０を用いた車両等の自動運転制御の詳細については、図１８を参照しつつ後述する。 High-accuracy three-dimensional position information based on the CV value is obtained by inputting the self-position / posture orientation information generated by the above-described self-position / posture orientation device 60 to the automatic driving means of the target moving body 40. Thus, accurate position information of the target moving body 40 can be obtained at high speed and at low cost.
Details of the automatic driving control of the vehicle or the like using the self-position / posture locating device 60 of the present invention as described above will be described later with reference to FIG.

［自己位置姿勢標定処理］
次に、上述した自己位置姿勢標定装置６０における自己位置姿勢標定処理の詳細について、図１５〜１７を参照しつつ説明する。
ここで、まず自己位置姿勢標定とは、目的移動体（車両、ロボット、航空機、移動する一般的な物体、等）のＣＶ値を決定することである。このＣＶ値を決定したことにより、ＣＶ映像内での移動する物体（目的移動体）の位置と姿勢を一義的に決定することができる。
得られたＣＶ値は、元々は相対座標の相対値であるが、ＣＶ映像地図として、そこに実スケール（絶対座標）を与えることで、ＣＶ値は実スケールを持ち、絶対座標に変換されることになる。 [Self-position orientation determination processing]
Next, details of the self-position / posture locating process in the self-position / posture locator 60 described above will be described with reference to FIGS.
Here, the self-position / posture determination is to determine a CV value of a target moving body (vehicle, robot, aircraft, moving general object, etc.). By determining the CV value, the position and orientation of the moving object (target moving body) in the CV video can be uniquely determined.
The obtained CV value is originally a relative value of a relative coordinate, but as a CV video map, by giving a real scale (absolute coordinate) thereto, the CV value has a real scale and is converted into an absolute coordinate. It will be.

この技術は、自動運転やロボットの走行には欠かせない技術であり、この目的のためには、リアルタイム処理で目的移動体の位置と姿勢（６変数）を求めなければならない。
ただし、ＣＶ映像地図を更新する目的であれば、リアルタイム処理は必要なく、後処理で、取得した新たな映像を自己位置姿勢標定することで、前のＣＶ映像地図の一部分、又は全部を更新することができる。したがって、本発明の自己位置姿勢標定装置はＣＶ映像地図の更新装置としても使用可能となる。 This technique is indispensable for automatic driving and robot traveling. For this purpose, the position and orientation (six variables) of the target moving body must be obtained by real-time processing.
However, if the purpose is to update the CV video map, real-time processing is not necessary, and a part or all of the previous CV video map is updated by post-processing to determine the new position of the acquired self-position / posture. be able to. Therefore, the self-position / posture locating device of the present invention can also be used as a CV video map updating device.

すなわち、本実施形態に係る自己位置姿勢標定が行われることで、ＣＶ値が取得された目的画像は、その後破棄されてしまっても自己位置姿勢標定の目的は達成している。ところが、このようにＣＶ値が取得された目的画像のデータを破棄せず、新しいＣＶ映像地図の一部として利用することもできる。
そのために、ＣＶ値が取得された目的画像をＣＶ映像地図データベースに取り込んで、ＣＶ映像地図に重複するデータとして記憶し、あるいはＣＶ映像地図の一部又は全部と置換して更新することができる。
このように、目的画像と目的画像を取得した６変数を破棄せずに、画像と共に保存することで、自己位置姿勢標定装置としての利用だけではなく、ＣＶ映像地図の更新装置として利用することができる。さらに、ＣＶ映像地図の更新のみを目的として、自動更新のための自動運転車両や自動走行ロボットを運行することも可能である。 That is, by performing the self-position / posture determination according to the present embodiment, the purpose of the self-position / posture determination is achieved even if the target image for which the CV value has been acquired is subsequently discarded. However, the data of the target image from which the CV value is acquired in this way can be used as a part of a new CV video map without being discarded.
For this purpose, the target image from which the CV value is acquired can be taken into the CV video map database and stored as overlapping data in the CV video map, or can be updated by replacing part or all of the CV video map.
In this way, the target image and the six variables obtained from the target image are stored together with the image without being discarded, so that it can be used not only as a self-position / posture locating device but also as a CV video map updating device. it can. Furthermore, it is also possible to operate an automatic driving vehicle or an automatic traveling robot for automatic updating only for the purpose of updating the CV video map.

以上のように、本発明の前提となるＣＶ演算技術によれば、移動する物体に設置されたカメラから動画像を取得し、特徴点を抽出し、それをフレーム間で追跡し、それを使って、演算によりＣＶ値を求めることができる（図１〜１３参照）。
本発明では、既にＣＶ値が既知であるＣＶ映像地図を三次元地図として用いて、目的移動体の映像又は画像のＣＶ値を、直接演算するのではなく、目的移動体が移動する前に、既に存在しているその地点を含む近隣のＣＶ映像地図を前もって準備し、目的移動体に積載したカメラ映像又はカメラ画像と、既に用意してあるＣＶ映像地図との関係性から、その対応関係を求めることで、目的移動体のＣＶ値を求めるものである。
このようにして得られたＣＶ値を目的ＣＶ値と呼ぶことがある。 As described above, according to the CV calculation technique which is the premise of the present invention, a moving image is acquired from a camera installed on a moving object, a feature point is extracted, and it is tracked between frames and used. Thus, the CV value can be obtained by calculation (see FIGS. 1 to 13).
In the present invention, a CV video map whose CV value is already known is used as a three-dimensional map, and the CV value of the video or image of the target moving body is not directly calculated, but before the target moving body moves, Prepare a nearby CV video map including the existing point in advance, and determine the correspondence from the relationship between the camera video or camera image loaded on the target mobile object and the CV video map already prepared. By obtaining, the CV value of the target moving body is obtained.
The CV value thus obtained may be referred to as a target CV value.

そして、本実施形態では、自己位置と姿勢（６変数）を求める装置として、ＣＶ映像地図と目的画像の組み合わせに特徴を有するものである。
特に、自己位置姿勢標定のために必要な画像の特徴点等の種類について、ＣＶ映像地図と目的の移動体との関係性を取るために、以下のような７種の特徴量の少なくともいずれかを用いている。
以下、これを「７種の特徴量」と呼称して説明する。 In the present embodiment, the device for obtaining the self position and the posture (six variables) is characterized by the combination of the CV video map and the target image.
In particular, in order to obtain the relationship between the CV video map and the target mobile object with respect to the types of image feature points and the like necessary for self-position orientation, at least one of the following seven feature amounts Is used.
Hereinafter, this will be referred to as “seven types of feature amounts”.

［７種の特徴量］
以下、図１５，１６を参照しつつ、本実施形態における７種の特徴量について具体的に説明する。
図１５は、図１４に示した自己位置姿勢標定装置６０におけるＣＶ値の移転処理動作の詳細を示すブロック図である。
図１６は、図１５に示すＣＶ値の移転処理動作の具体例を模式的に示す説明図である。
なお、以下に示す特徴量（特徴点）とは、必ずしも面積の無い点ではなく、実際には微少面積を持つ小さな面であったり、特徴の有る形状を持つ面であったり、特徴的属性を持つ領域であることもある。
そのため、本明細書では、特徴点を含めて「特徴量」と呼称して説明する。 [7 types of features]
Hereinafter, the seven types of feature amounts in the present embodiment will be described in detail with reference to FIGS.
FIG. 15 is a block diagram showing details of the CV value transfer processing operation in the self-position / posture locating apparatus 60 shown in FIG.
FIG. 16 is an explanatory diagram schematically showing a specific example of the CV value transfer processing operation shown in FIG.
Note that the feature quantities (feature points) shown below are not necessarily points with no area, but are actually small surfaces with very small areas, surfaces with features, and characteristic attributes. It may be an area you have.
For this reason, in this specification, the feature points including the feature points will be referred to as “feature quantities”.

［１．特徴点（２Ｄ）］
基準映像となるＣＶ映像内の特徴点は、上述したＣＶ演算処理（図１〜１３参照）で説明したように、画像処理技術により自動的に抽出することができる。このＣＶ映像内の特徴点が、特徴点（２Ｄ）６０ａである（図１５，図１６（ａ）参照）。
この映像内の特徴点は点として定義もできるが、実際には座標的には点と見なせる小領域の画像で有ることが多い。特徴点とは、二次元・三次元に限らず、ＣＶ映像地図内の特徴点をいう場合と、目的の移動体に設置したカメラで取得した画像内の同一地点を示す特徴点をいう場合がある。
特に、特徴点（２Ｄ）は映像内、又は画像内で、二次元量として定義される。
また、特徴点（２Ｄ）は映像の隣接するフレームを跨いで追跡され、ＣＶ演算されることで、三次元特徴点（３Ｄ）になる。 [1. Feature point (2D)]
As described in the above-described CV calculation processing (see FIGS. 1 to 13), the feature points in the CV video serving as the reference video can be automatically extracted by the image processing technique. A feature point in the CV video is a feature point (2D) 60a (see FIGS. 15 and 16A).
The feature points in this video can be defined as points, but in reality, they are often small area images that can be regarded as points in terms of coordinates. The feature point is not limited to two-dimensional or three-dimensional, and refers to a feature point in a CV video map, or a feature point indicating the same point in an image acquired by a camera installed on a target mobile object. is there.
In particular, the feature point (2D) is defined as a two-dimensional quantity in a video or an image.
Also, the feature point (2D) is tracked across adjacent frames of the video, and becomes a three-dimensional feature point (3D) by performing CV calculation.

［２．特徴点（３Ｄ）］
上述したＣＶ映像内の特徴点は、ＣＶ映像を生成する過程で、三次元化されるので、三次元の特徴点として扱うことができる。この三次元特徴点が、特徴点（３Ｄ）６０ｂである（図１５，図１６（ｂ）参照）。
三次元特徴点は、図１６（ｂ）に示すように、対応する目的の移動体に設置したカメラで取得した画像（目的画像）内の同一地点を示す特徴点に三次元座標を移転することができる。これにより、対応する目的画像内で取得した画像の一部に三次元座標を与えたことになる。
同様の手法で、画像内の任意の４点以上に三次元座標を移転できれば、画像のカメラ位置と姿勢が求められる。 [2. Feature points (3D)]
Since the above-described feature points in the CV video are three-dimensionalized in the process of generating the CV video, they can be handled as three-dimensional feature points. This three-dimensional feature point is a feature point (3D) 60b (see FIGS. 15 and 16B).
As shown in FIG. 16B, the three-dimensional feature points are obtained by transferring the three-dimensional coordinates to the feature points indicating the same point in the image (target image) acquired by the camera installed on the corresponding target moving body. Can do. As a result, three-dimensional coordinates are given to a part of the image acquired in the corresponding target image.
If the three-dimensional coordinates can be transferred to any four or more points in the image by the same method, the camera position and orientation of the image can be obtained.

［３．指定特徴点（２Ｄ）］
上述した特徴点（２Ｄ）６０ａは、画像処理技術により自動生成した特徴点であるが、人間が特徴点をＣＶ映像内に、あるいは目的画像内に指定することで、上記の特徴点（２Ｄ）６０ａと全く同じ扱いをすることができる。
これが、指定特徴点（２Ｄ）６０ｃである（図１５，図１６（ａ）参照）。 [3. Designated feature point (2D)]
The above-described feature point (2D) 60a is a feature point automatically generated by an image processing technique. When the person designates the feature point in the CV video or the target image, the above feature point (2D) 60a. It can be handled exactly the same as 60a.
This is the designated feature point (2D) 60c (see FIGS. 15 and 16A).

［４．指定特徴点（３Ｄ）］
同様に、人間が特徴点を画像内に指定し、その三次元座標をＣＶ映像内に取得して、その三次元座標を求めることで、上述した特徴点（３Ｄ）と同様に扱うことができる。
これが、指定特徴点（３Ｄ）６０ｄである（図１５，図１６（ｂ）参照）。 [4. Designated feature point (3D)]
Similarly, when a human specifies a feature point in an image, acquires its three-dimensional coordinates in a CV image, and obtains the three-dimensional coordinates, it can be handled in the same manner as the above-described feature point (3D). .
This is the designated feature point (3D) 60d (see FIGS. 15 and 16B).

［５．標識等認識対象物（３Ｄ）］
標識等を上述した二種の指定特徴点６０ｃ，６０ｄとして扱うことができる。
具体的には、認識技術によって取得した標識・看板・地物等の三次元座標付き対象物を、特に標識等認識対象物（３Ｄ）６０ｅとする（図１５参照）。
この標識等認識対象物（３Ｄ）は、既に三次元座標が既知であるから、大きな面積であれば、一つの対象物で目的のＣＶ値を移植することができ、これにより自己位置姿勢標定が可能となる。 [5. Recognized objects such as signs (3D)]
A sign or the like can be handled as the two types of designated feature points 60c and 60d described above.
Specifically, an object with three-dimensional coordinates such as a sign, a signboard, or a feature acquired by the recognition technique is particularly designated as a recognition object (3D) 60e such as a sign (see FIG. 15).
Since the recognition object (3D) such as a sign already has a known three-dimensional coordinate, if the area is large, the target CV value can be transplanted with a single object. It becomes possible.

［６．実世界三次元マーカ］
上記の標識等のように、初めから実在していた対象物（３Ｄ）ではなく、自己位置姿勢標定を目的として、例えば道路やその周辺などに実世界マーカを設置し、その三次元座標を取得して、あるいは既知として、目的画像内に、その対応点を求めて三次元座標を移植し、そこからＣＶ値を演算で求めることができる。
これが、実世界三次元マーカ６０ｆである（図１５，図１６（ｄ）参照）。
ここで、実世界三次元マーカとは、下記の画像内三次元マーカ（３Ｄマーカ）とは異なるもので、例えば車両の自動運転のためや、ロボットの自動走行のために、自己位置標定を目的として、ＣＶ地図の中だけではなく、積極的に実世界の中に設置される三次元のマーカである。 [6. Real world 3D marker]
For example, a real-world marker is installed on the road or its surroundings to acquire the three-dimensional coordinates for the purpose of self-position / posture, not the target (3D) that existed from the beginning, such as the above sign. Alternatively, as known, the corresponding point is obtained in the target image, and the three-dimensional coordinates are transplanted, and the CV value can be obtained therefrom by calculation.
This is the real world three-dimensional marker 60f (see FIGS. 15 and 16D).
Here, the real-world three-dimensional marker is different from the following three-dimensional marker (3D marker) in the image, and for the purpose of self-localization, for example, for automatic driving of a vehicle or automatic driving of a robot. As a 3D marker actively installed in the real world as well as in the CV map.

［７．画像内三次元マーカ］
上記の実世界三次元マーカ６０ｆは、現実世界にマーカを設置する場合であるが、より簡便に、ＣＶ映像地図内に特徴的な場所を選択して、それを画像内三次元マーカ６０ｇとすることができる（図１５，図１６（ｃ）参照）。
この画像内三次元マーカ６０ｇに基づいて、目的画像内に対応箇所を探すことでＣＶ値を取得することができる。 [7. 3D marker in image]
The real-world three-dimensional marker 60f is a case where a marker is placed in the real world, but more easily, a characteristic place is selected in the CV video map, and this is used as the in-image three-dimensional marker 60g. (See FIGS. 15 and 16 (c)).
Based on the three-dimensional marker 60g in the image, the CV value can be acquired by searching for the corresponding portion in the target image.

ここで、画像内三次元マーカ（３Ｄマーカ）とは、自動位置姿勢標定装置を効率よく作動させるために、前もってＣＶ映像地図内に、自己位置姿勢の標定を目的として印（マーク）として抽出される、ＣＶ映像地図内に設置された三次元座標を持つ３Ｄマーカである。
具体的には、３Ｄマーカは、元々自動走行のために設置されたものではない対象物を利用することができる。例えば、路面標示やビルの角，窓など、前もって形状や座標を取得しておくことで３Ｄマーカとして利用することができる。
この点において、上述した実世界に実際に設置される三次元マーカと区別することができる。 Here, the in-image three-dimensional marker (3D marker) is extracted as a mark in advance in the CV video map for the purpose of locating the own position and orientation in order to efficiently operate the automatic position and orientation locator. This is a 3D marker having 3D coordinates installed in a CV video map.
Specifically, the 3D marker can use an object that is not originally installed for automatic traveling. For example, it can be used as a 3D marker by acquiring shapes and coordinates in advance, such as road markings, corners of buildings, and windows.
In this respect, it can be distinguished from the above-described three-dimensional marker actually installed in the real world.

以上のように、本実施形態では、上記のような７種の特徴量（特徴点）の中の一つ又はいくつかを組み合わせて、演算により目的画像のＣＶ値を求めることができる。
その際に、選択される特徴量として、少なくとも一種が自動抽出され、その選択される特徴量は、既にその位置及びその三次元形状の三次元座標の一部又は全部が、既に自動取得されているものとする。
そのように自動抽出される対象物としては、例えば車両の自動運転の場合には、三次元座標を持つ標識，道路標示，ビルの一部，電柱，縁石等の道路周辺に存在する対象物などである。また、ロボットの自動走行の場合には、活動する範囲内の壁の角，移動しない室内の形状，固定された物体，床面の特徴ある図柄等である。そして、そのような認識対象物は、既に三次元座標が自動演算で求められているものである。 As described above, in the present embodiment, the CV value of the target image can be obtained by calculation by combining one or several of the seven types of feature amounts (feature points) as described above.
At that time, at least one kind of feature quantity to be selected is automatically extracted, and the feature quantity to be selected has already been automatically acquired part or all of the position and the three-dimensional coordinates of the three-dimensional shape. It shall be.
For example, in the case of automatic driving of a vehicle, such an object that is automatically extracted includes a sign having a three-dimensional coordinate, a road marking, a part of a building, a power pole, a curb, and the like existing around a road. It is. In the case of automatic running of the robot, the corner of the wall within the active range, the shape of the room that does not move, the fixed object, the characteristic pattern of the floor, etc. In such a recognition object, the three-dimensional coordinates have already been obtained by automatic calculation.

［目的画像への三次元特徴点移転によるＣＶ値取得］
本実施形態における自己位置姿勢標定は、上記のような７種の特徴量（特徴点）の一部又は複数の組み合わせにより、基準となるＣＶ映像から、対象となる目的画像に三次元特徴点を移転することで、目的画像のＣＶ値を取得するものである。
すなわち、基準映像と目的画像の両者の特徴量が座標統合され、目的画像のＣＶ値が取得されることで、自己位置姿勢標定がなされることになる。
これが、図１５に示す対応点演算によるＣＶ移転６１である。 [Acquire CV value by transferring 3D feature points to the target image]
The self-position orientation in the present embodiment is obtained by adding a three-dimensional feature point from a reference CV video to a target target image by using a part or a combination of the seven types of feature amounts (feature points) as described above. By transferring, the CV value of the target image is acquired.
That is, the feature amounts of both the reference video and the target image are coordinate-integrated, and the CV value of the target image is acquired, so that the self-position / posture is determined.
This is the CV transfer 61 by the corresponding point calculation shown in FIG.

このように、予め用意したＣＶ値が既知であるＣＶ映像地図から、目的画像内の特徴点に三次元座標を移転・移植することで、目的画像のＣＶ値を求めることが、本発明に係るＣＶ値取得である。ただし、目的画像内に既知の三次元点や面があれば、それらを用いることにより、目的画像のＣＶ値取得のための演算コスト等を下げることに貢献することになる。
以下、目的画像のＣＶ値取得のための特徴点の移転・移植の処理動作について具体的に説明する。 As described above, the CV value of the target image can be obtained by transferring and transplanting the three-dimensional coordinates to the feature points in the target image from the CV video map in which the CV value prepared in advance is known. It is CV value acquisition. However, if there are known three-dimensional points or surfaces in the target image, using them will contribute to lowering the calculation cost for acquiring the CV value of the target image.
The feature point transfer / transplant processing operation for acquiring the CV value of the target image will be specifically described below.

［対応点演算によるＣＶ移転］
上述のとおり、対象となる目的画像のＣＶ値（６変数）を求めることが、本発明に係る自己位置姿勢標定の本質である。
特徴点（２Ｄ）を抽出・追跡して、ＣＶ演算によりＣＶ値を得る方法については上述したとおりである（図１〜１３参照）。
そこで、以下では、基準となるＣＶ映像に基づいて目的画像のＣＶ値を取得することで自己位置姿勢標定を行うための方法について説明する。 [CV transfer by corresponding point calculation]
As described above, obtaining the CV value (six variables) of the target image to be processed is the essence of the self-position / posture determination according to the present invention.
The method of extracting and tracking the feature point (2D) and obtaining the CV value by the CV calculation is as described above (see FIGS. 1 to 13).
Therefore, a method for performing self-position / posture determination by acquiring a CV value of a target image based on a reference CV video will be described below.

［１．演算組み込み方式１］
上述したＣＶ演算（図１〜１３）を用いて、目的画像内の特徴点（２Ｄ）を単独でＣＶ演算して、その後に、基準となるＣＶ映像内のＣＶ値と座標統合することができる。ここで座標統合のためには、両者画像内で共通の特徴点を数多く取得する必要がある。
これにより、両者画像内のＣＶ値は同一座標系で表示できるようになり、同一座標系に目的画像内のＣＶ値の取得が完了したことになる。つまり、自己位置姿勢標定が完了したことになる。
ここで、両者画像内とは、基準となるＣＶ映像内と目的画像内との両者画像内を意味する。 [1. Computation method 1]
Using the above-described CV calculation (FIGS. 1 to 13), the feature point (2D) in the target image can be independently CV-calculated and then coordinate-integrated with the CV value in the reference CV video. . Here, in order to integrate coordinates, it is necessary to acquire many common feature points in both images.
As a result, the CV values in both images can be displayed in the same coordinate system, and the acquisition of the CV values in the target image in the same coordinate system is completed. That is, the self-position / posture determination has been completed.
Here, the both images mean both the reference CV video and the target image.

［２．演算組み込み方式２］
基準となるＣＶ映像地図のＣＶ映像を、ＣＶ演算以前の状態に戻して、演算に用いた特徴点の三次元座標を用いずに、ＣＶ映像地図と目的画像の両者のフレームと特徴点が混合した状態でＣＶ演算を行う。
これにより、ＣＶ映像地図と目的画像の両者のＣＶ値が求められるが、ここでは両者混合のＣＶ値が求められるときに、ＣＶ映像地図側の特徴点のみの三次元座標を既知として与えることで、目的画像側のＣＶ値が、自動的に精度良く求められることになる。 [2. Computation method 2]
The CV image of the reference CV image map is returned to the state before the CV calculation, and the frames and feature points of both the CV image map and the target image are mixed without using the three-dimensional coordinates of the feature points used in the calculation. In this state, CV calculation is performed.
As a result, the CV values of both the CV video map and the target image are obtained. Here, when the mixed CV value is obtained, the three-dimensional coordinates of only the feature points on the CV video map side are given as known. The CV value on the target image side is automatically obtained with high accuracy.

［３．演算組み込み方式３］
ＣＶ映像地図と目的画像の両者画像内の特徴点（２Ｄ）と特徴点（３Ｄ）、及び三次元座標が既知である点，面などを混在させて、特徴点（３Ｄ）を既知座標として演算に組み込み、これをＣＶ演算することにより、未知であった目的画像内のＣＶ値を取得することで、目的画像のＣＶ値取得が完了する。つまり自己位置姿勢標定が完了する。
すなわち、ＣＶ映像地図・目的画像の両者の二次元特徴点と三次元特徴点が混在する中で、両者を一体としてＣＶ演算を行い、三次元座標が既知の特徴点の三次元座標は固定したまま、全ての特徴点を使ってＣＶ演算を行うことで、目的画像のＣＶ値取得を行うことができる。 [3. Computation method 3]
The feature point (3D) is calculated as a known coordinate by mixing the feature point (2D) and the feature point (3D) in the images of both the CV video map and the target image, as well as the points and planes whose 3D coordinates are known. And obtaining the CV value in the target image that has been unknown, thereby completing the CV value acquisition of the target image. That is, the self-position orientation is completed.
That is, while the two-dimensional feature points and the three-dimensional feature points of both the CV video map and the target image are mixed, CV calculation is performed by combining both of them, and the three-dimensional coordinates of the feature points whose known three-dimensional coordinates are fixed. The CV value of the target image can be acquired by performing the CV calculation using all the feature points.

［４．３Ｄ特徴点座標移転方式］
基準となるＣＶ映像内の三次元既知点の複数点を、目的画像内の対応点（２Ｄ）に自動対応させることで、三次元座標を目的画像内の対応点に移転する。
そして、三次元座標が既知となった目的画像内の四点以上の複数点の三次元特徴点から、目的画像内のＣＶ値を幾何学的に演算する。
これによって、目的画像のＣＶ値取得が完了し、自己位置姿勢標定が完了する。
ここで、特徴面は、複数の特徴点からなると考えることができ、したがって複数の特徴点の対応と見なすことができる。これはすべて三次元特徴点として扱うため、最も演算コストがかからない方法であり、つまり高速でＣＶ値が求められることになる。 [4.3D feature point coordinate transfer method]
By automatically corresponding a plurality of three-dimensional known points in the reference CV video to corresponding points (2D) in the target image, the three-dimensional coordinates are transferred to corresponding points in the target image.
Then, a CV value in the target image is geometrically calculated from a plurality of four or more three-dimensional feature points in the target image whose three-dimensional coordinates are known.
Thereby, the CV value acquisition of the target image is completed, and the self-position / posture determination is completed.
Here, the feature plane can be considered to be composed of a plurality of feature points, and therefore can be regarded as a correspondence of a plurality of feature points. Since all of these are handled as three-dimensional feature points, this is the method that requires the least calculation cost, that is, the CV value is obtained at high speed.

［５．逆３Ｄ特徴点座標移転方式］
この方式は、原理的に基準となるＣＶ映像と目的画像の関係を部分的、限定的に逆転させた場合に相当する。しかしながら、基準画像はあくまでもＣＶ映像地図である。
具体的には、例えば図１６（ｄ）に示すように、実世界に三次元座標の既知の実世界マーカを設置した場合などが相当する。この場合、マーカの三次元座標は基準となるＣＶ映像を介さずに直接的に取得することになる。
ただし、他の特徴点については、ＣＶ映像から取得することになるので、演算内容としては、最終的なＣＶ値取得においては、ＣＶ映像地図が基準画像として係わっていることになる。 [5. Inverse 3D feature point coordinate transfer method]
This method corresponds to a case where the relationship between the reference CV video and the target image is partially or limitedly reversed in principle. However, the reference image is a CV video map.
Specifically, for example, as shown in FIG. 16 (d), a case where a known real world marker of three-dimensional coordinates is installed in the real world corresponds. In this case, the three-dimensional coordinates of the marker are acquired directly without using the reference CV video.
However, since the other feature points are acquired from the CV video, the calculation content is that the CV video map is involved as the reference image in the final CV value acquisition.

なお、上記のような実世界マーカを用いることは、安全等の観点から好ましい。
本発明に係る自己位置姿勢標定装置では、車両等を走行させながらカメラを使用する都合上、どうしても夜間や霧などの場合には、車両等の走行が困難となる場合がある。そのような場合でも、実世界三次元マーカを採用すれば、安全運転の上から、道路上又は道路近傍などに安価かつ安全に設置できることから、実世界３Ｄマーカ方式は安全面等の点で非常に有望である。
もちろん、目的画像内の３Ｄマーカを用いる場合でも、自己位置姿勢標定は十分に可能であるが、行政的配慮などから、法律で定められた実世界３Ｄマーカを用いることが好ましいことになる。 In addition, it is preferable to use a real world marker as described above from the viewpoint of safety and the like.
In the self-position / posture locating apparatus according to the present invention, for the convenience of using the camera while the vehicle or the like is traveling, it may be difficult to travel the vehicle or the like at night or in fog. Even in such a case, if a real-world 3D marker is used, it can be safely and inexpensively installed on the road or in the vicinity of the road, so the real-world 3D marker method is very safe in terms of safety. Promising.
Of course, even when the 3D marker in the target image is used, self-position / posture determination is sufficiently possible, but it is preferable to use a real-world 3D marker defined by law for administrative considerations or the like.

［６．機械センサーによるＣＶ値取得方式］
目的移動体４０（図１４参照）に備えられる機械センサー、例えばＩＭＵ／ＧＹＲＯ／ＧＮＳＳ等により、直接的にＣＶ値を取得することも可能である。
ただし、現在普及している機械センサーは精度が低く、それ単独では実用に耐える精度を得ることは困難である。一方で、機械センサーは、リアルタイム出力が得られることが最大の長所である。
そこで、本実施形態では、比較的低価格のＩＭＵ／ＧＹＲＯ／ＧＮＳＳ等によって、リアルタイムの概略ＣＶ値を得る手法を採用することができる。概略値ではあっても、極短時間であれば、誤差の少ない相対値を得ることができるため、時間不連続のＣＶ値のフレーム間の変動を相対値で知るために、あるいはリアルタイム値を知るために有効となる。
この機械センサーによる補正・補完的なＣＶ値取得の詳細については、後述する図１７を参照しつつ後述する。 [6. CV value acquisition method by mechanical sensor]
It is also possible to directly acquire the CV value by a mechanical sensor provided in the target moving body 40 (see FIG. 14), for example, IMU / GYRO / GNSS.
However, the mechanical sensors that are currently popular have low accuracy, and it is difficult to obtain the accuracy that can withstand practical use by itself. On the other hand, the greatest advantage of mechanical sensors is that real-time output can be obtained.
Therefore, in this embodiment, it is possible to adopt a technique for obtaining a real-time approximate CV value by using relatively low price IMU / GYRO / GNSS or the like. Even if it is an approximate value, it is possible to obtain a relative value with little error in a very short time. Therefore, in order to know the variation between frames of the time discontinuous CV value by the relative value, or know the real-time value. Because it becomes effective.
Details of correction and complementary CV value acquisition by the mechanical sensor will be described later with reference to FIG. 17 described later.

［ＣＶ統合演算］
以上のようにして基準となるＣＶ映像に基づいて目的画像のＣＶ値を取得・移植することができるが、ＣＶ映像地図を取得・生成する場合と同じ方法（図１〜１３参照）で、目的画像内のＣＶ値を直接的に取得して、座標を統合して、位置合わせをして、目的画像内のＣＶ値を取得することができる。
これが、図１５に示すＣＶ統合演算６２である。
なお、このような目的画像単独のＣＶ演算・ＣＶ値取得は、単独で用いるものではなく、状況等に応じて、上述した他の方法と併用して一部に用いることになる。 [CV integration calculation]
As described above, the CV value of the target image can be acquired and transplanted based on the reference CV video, but the target is obtained by the same method as that for acquiring and generating the CV video map (see FIGS. 1 to 13). The CV value in the target image can be acquired by directly acquiring the CV value in the image, integrating the coordinates, and performing alignment.
This is the CV integration calculation 62 shown in FIG.
In addition, such CV calculation and CV value acquisition of the target image alone are not used alone, but are used in combination with other methods described above depending on the situation and the like.

［リアルタイム補正］
原理的に、ＩＭＵ／ＧＹＲＯ等の機械センサーをカメラに取り付ければ、ＣＶ値、つまり６変数を取得することができる。ただし、ＣＶ映像を直接取得するためには、非常に高額（高精度）なＩＭＵ／ＧＹＲＯ等が必要となり、実際には機械式センサー単独での利用は現実的ではない。
一方、ＩＭＵ／ＧＹＲＯ等の機械センサーは、リアルタイム出力が得られるという優れた特徴を持つ。
そこで、本実施形態では、この機械センサーの特徴を有効に活用し、ＣＶ映像、及びＣＶ値取得の演算に伴う時間遅れや時間不連続を補正する、リアルタイム補正として利用するようにしている。 [Real-time correction]
In principle, if a mechanical sensor such as IMU / GYRO is attached to the camera, CV values, that is, six variables can be acquired. However, in order to directly acquire the CV video, a very expensive (high accuracy) IMU / GYRO or the like is required, and in reality, the use of a mechanical sensor alone is not practical.
On the other hand, mechanical sensors such as IMU / GYRO have an excellent feature that a real-time output can be obtained.
Therefore, in the present embodiment, the feature of the mechanical sensor is effectively used and used as real-time correction for correcting time delay and time discontinuity associated with calculation of CV image and CV value acquisition.

本発明に係るＣＶ演算においては、画像処理が係わる特徴量の演算等は、画像処理時間のために原理的に多少の時間遅れが発生する。
そこで、ＩＭＵ／ＧＹＲＯ等の機械センサーを用いて、微少時間遅れが生じたＣＶの変動をリアルタイムに補正することができる。
具体的には、ＣＶ演算とＣＶ値取得にかかる画像処理時間は、数ミリセカンドから数秒程度である。つまり、この時間のみを機械センサーから得られた６変数で補足することになる。この程度の時間であれば、安価（低精度）なＩＭＵ／ＧＹＲＯ等の機械センサーであっても、時間不連続補正やリアルタイム補正が可能となる。
さらに、ＣＶ値を絶対座標に変換するには、環境内に設置したＧＣＰや、目的画像を取得するカメラに剛体結合されたＧＮＳＳ（ＧＰＳ）によって、取得した相対座標を絶対座標に変換することができる。 In the CV calculation according to the present invention, the calculation of the feature amount related to the image processing, in principle, has a slight time delay due to the image processing time.
Therefore, using a mechanical sensor such as IMU / GYRO, it is possible to correct in real time a change in CV in which a slight time delay has occurred.
Specifically, the image processing time required for CV calculation and CV value acquisition is about several milliseconds to several seconds. That is, only this time is supplemented by six variables obtained from the mechanical sensor. With such a time, even with an inexpensive (low accuracy) mechanical sensor such as IMU / GYRO, time discontinuity correction and real-time correction can be performed.
Furthermore, in order to convert the CV value into absolute coordinates, the acquired relative coordinates may be converted into absolute coordinates by GCP installed in the environment or GNSS (GPS) rigidly coupled to the camera that acquires the target image. it can.

［機械センサーによる目的ＣＶ値の高精度化］
以下、図１７を参照しつつ、機械センサーを用いたＣＶ値の高精度化（補正・補完）について具体的に説明する。
図１７は、本実施形態に係る基準映像地図を用いた自己位置姿勢標定装置により得られる目的画像のＣＶ値を機械センサーで得られる６変数により高精度化する場合の処理を模式的に示した説明図であり、（ａ）は目的画像を構成する複数フレームの全体を、（ｂ）は（ａ）に示す複数フレームの一部を拡大して示したものである。 [High accuracy of target CV value by mechanical sensor]
Hereinafter, with reference to FIG. 17, the high accuracy (correction / complementation) of the CV value using the mechanical sensor will be specifically described.
FIG. 17 schematically shows processing when the CV value of the target image obtained by the self-position / posture locating apparatus using the reference video map according to the present embodiment is improved with six variables obtained by the mechanical sensor. It is explanatory drawing, (a) shows the whole some frame which comprises a target image, (b) expands and shows a part of some frame shown to (a).

ここで、図１７においては、目的画像のＣＶ値を縦軸にとるが、６変数のすべてを表示すると煩雑となり理解の妨げとなるため、目的ＣＶ値の変数の１つのみを縦軸［０１］にとる。また、横軸［０８］は時間経過であり、時間軸の一区切りは、目的カメラのフレーム間隔に対応している。
ＣＶ映像地図と目的画像の比較で取得したＣＶ値［０２，０６，１０］は、●で示している。
機械センサーで取得した６変数の内の１つを、点線１［０５］で示す。
図１７（ｂ）に示すように、機械センサーはリアルタイム出力であるから、目的画像から取得した目的ＣＶ値は時間遅れΔｔ［０４］が発生している。
さらに、機械センサーの精度不足から、変数値そのものに、誤差Δｄ［０３］が発生している。 Here, in FIG. 17, the CV value of the target image is plotted on the vertical axis, but displaying all six variables is cumbersome and hinders understanding, so only one variable of the target CV value is plotted on the vertical axis [01. ]. In addition, the horizontal axis [08] indicates the passage of time, and one segment of the time axis corresponds to the frame interval of the target camera.
The CV value [02, 06, 10] obtained by comparing the CV video map with the target image is indicated by ●.
One of the six variables acquired by the mechanical sensor is indicated by a dotted line 1 [05].
As shown in FIG. 17B, since the mechanical sensor has a real-time output, the target CV value acquired from the target image has a time delay Δt [04].
Further, due to insufficient accuracy of the mechanical sensor, an error Δd [03] occurs in the variable value itself.

このような出力が得られた場合に、目的画像から取得したＣＶ値の遅延時間を既知として、時間軸をずらして、リアルタイム軸に戻して機械センサーからの出力と重ね合わせる。
そして、目的画像から得られたＣＶ値を真値として、機械センサーの同時刻のＣＶ値と、フレーム両端の値を合わせるように補正する。この場合、平行移動だけで合致しないときは、比例配分して合致させる。
このようにすることで、フレーム間は機械センサーからのＣＶ値で補完することができたことになる。このようにして、現在時刻でのリアルタイムＣＶ値を取得することに、自己位置姿勢標定装置としての意味がある。 When such an output is obtained, the delay time of the CV value acquired from the target image is known, the time axis is shifted, the real time axis is returned, and the output from the machine sensor is superimposed.
Then, the CV value obtained from the target image is set as a true value, and the CV value at the same time of the mechanical sensor is corrected so as to match the values at both ends of the frame. In this case, when it is not matched only by parallel movement, it is matched by proportional distribution.
By doing in this way, it was able to complement between the frames with the CV value from the mechanical sensor. Thus, acquiring the real-time CV value at the current time is meaningful as a self-position / posture locating device.

ところで、精度の高いリアルタイムのＣＶ値を記録保存するためには、上記のようになるが、現在時刻のＣＶ値のみを必要とする自動運転のための自己位置姿勢標定の場合には、１フレームを超えて補正する必要が出て来る場合があり、次のようになる。すなわち、目的画像から取得したＣＶ値は、演算処理時間のために、常にΔｔの遅延がある。このとき、Δｔが１フレーム期間を超えることがある。その期間は、機械センサーで補うことになる。このとき、フレーム両端ではなく、最終のフレーム［１０］から、現在時刻［１２］までの機械センサーのデータ［０５］で、最終ＣＶ値［１０］から延長させて、現在時刻の目的ＣＶ値［１１］を得ることになる。
なお、図１７では、演算遅延時間が１フレーム内として図示したが、１フレームを超えても、意味は変わらない。Δｔ［０４］の長さが、さらに図面左に伸びることになるだけである。
以上によって、最終的に目的カメラのＣＶ値はリアルタイム性が確保されたことになる。 By the way, in order to record and save a high-precision real-time CV value, it is as described above. In the case of self-position / posture determination for automatic driving that requires only the CV value at the current time, one frame is used. There is a case where it is necessary to correct beyond this, and it becomes as follows. That is, the CV value acquired from the target image always has a delay of Δt due to the calculation processing time. At this time, Δt may exceed one frame period. That period will be supplemented with mechanical sensors. At this time, the machine sensor data [05] from the last frame [10] to the current time [12], not at both ends of the frame, is extended from the final CV value [10], and the target CV value at the current time [ 11].
In FIG. 17, the calculation delay time is illustrated as being within one frame, but the meaning does not change even if it exceeds one frame. The length of Δt [04] only extends further to the left of the drawing.
As a result, the CV value of the target camera is finally secured in real time.

以上のように、本実施形態では、要求精度を満たさない比較的低精度で安価なＩＭＵ／ＧＹＲＯ等の機械センサーにより、目的の移動体（目的移動体）の自己位置姿勢を標定する場合、目的の移動体（目的移動体）の自己位置と姿勢（６変数）を当該機械センサーにより、一旦誤差を含んだまま求める。
そして、その累積誤差を補正するために、ＣＶ映像地図と、当該目的移動体に積載したカメラ画像を比較して得た、より精度の高いＣＶ値でもって機械センサーで得た６変数を間欠的に補正し、時間的に遅延のない、連続な自己位置姿勢の６変数を取得することができるものである。 As described above, in the present embodiment, when the self-position / posture of a target moving body (target moving body) is determined by a relatively low-precision and inexpensive mechanical sensor such as IMU / GYRO that does not satisfy the required accuracy, The self-position and posture (6 variables) of the moving body (target moving body) are obtained by the mechanical sensor with an error included.
Then, in order to correct the accumulated error, 6 variables obtained by the mechanical sensor with a more accurate CV value obtained by comparing the CV video map and the camera image loaded on the target moving body are intermittently obtained. 6 variables of continuous self-position / posture with no time delay can be acquired.

このように、機械センサーから取得した生のデータの６変数は、時間的に連続であり、リアルタイム性がある反面、ＣＶ映像地図のＣＶ値と比較して、機械センサーの方がかなり低精度である。
一方、ＣＶ映像地図によるＣＶ値は、演算による時間遅れと、時間的に断片値であり、連続性に欠けるなど、一長一短がある。
そこで、両者を組み合わせることで、両者の長所を引き出せるようになる。 In this way, the six variables of raw data obtained from the mechanical sensor are continuous in time and have real-time characteristics, but the mechanical sensor is considerably less accurate than the CV value of the CV video map. is there.
On the other hand, the CV value based on the CV video map has advantages and disadvantages, such as a time delay due to computation, a temporally fragmented value, and lack of continuity.
Therefore, by combining the two, the advantages of both can be extracted.

このようにＣＶ映像地図と機械センサーを組み合わせることにより、演算処理による遅延誤差を補正する効果が期待できるだけではなく、リアルタイム性を同時に持つことになる。
すなわち、目的画像のＣＶ値（６変数）取得の演算処理時間が有限であるために遅延が生じる。これは自己位置標定と姿勢の誤差となる。その遅延時間内に生じるＣＶ値の進行を補正するために、目的画像のＣＶ値の空白となる直近の終端から現在時刻までの極小時間の期間について、機械センサーで内挿することで、結果としてリアルタイム性を向上させることができる。
そもそも、機械センサーのみで精度を出すためには高額な設備・装置等が必要となり、それ以外にも、キャリブレーションが困難になるなどの問題があった。 By combining the CV video map and the machine sensor in this way, not only can the effect of correcting the delay error due to the arithmetic processing be expected, but it also has real-time properties.
That is, a delay occurs because the calculation processing time for obtaining the CV value (6 variables) of the target image is finite. This is a self-positioning and posture error. In order to correct the progress of the CV value that occurs within the delay time, the result is obtained by interpolating with the machine sensor the minimum time period from the most recent end point where the CV value of the target image is blank to the current time. Real-time performance can be improved.
In the first place, in order to achieve accuracy with only mechanical sensors, expensive equipment and devices are necessary, and there are other problems such as difficulty in calibration.

そこで、本発明に係るＣＶ演算技術を用いて、精度の高い目的カメラのＣＶ値を求めることで、機械センサーとＣＶ映像地図による補正・補完が可能となる。
すなわち、精度は低いが遅延の無いＩＭＵ／ＧＹＲＯ等の機械センサーによるリアルタイム出力の位置姿勢の６変数を内挿することができる。機械センサーのリアルタイム出力は、精度が低いが、極超短時間であれば、誤差の少ない６変数を取得できるという特性を利用している。
このようにして、ＣＶ値は基本的にＣＶ演算で求めるが、直近の極超短時間のみ機械センサーで補完して、リアルタイム補正するものである。 Therefore, by using the CV calculation technique according to the present invention, the CV value of the target camera with high accuracy can be obtained, and correction / complementation using the mechanical sensor and the CV video map can be performed.
That is, it is possible to interpolate six variables of the position and orientation of real-time output by a mechanical sensor such as IMU / GYRO that has low accuracy but no delay. The real-time output of the mechanical sensor is low in accuracy, but utilizes the characteristic that six variables with few errors can be acquired for an extremely short time.
In this way, the CV value is basically obtained by CV calculation, but only the latest ultra-short time is complemented by the mechanical sensor and is corrected in real time.

［自動運転システム］
次に、以上のような本実施形態に係る基準映像地図を用いた自己位置姿勢標定装置による目的移動体の自動運転について、図１８を参照しつつ説明する。
図１８は、本実施形態に係る基準映像地図を用いた自己位置姿勢標定装置による移動体の自動運転システム１００のシステム構成を示す機能ブロック図である。
なお、図１８に示す自動運転システム１００のシステム構成では、上述した自己位置姿勢標定装置（図１４参照）に直接関係しない部分を点線で、自己位置姿勢標定装置に直接関連する部分を実線で示している。 [Automatic driving system]
Next, automatic operation of the target moving body by the self-position / posture locating apparatus using the reference image map according to the present embodiment as described above will be described with reference to FIG.
FIG. 18 is a functional block diagram showing a system configuration of the automatic driving system 100 for a moving body by the self-position / posture locating apparatus using the reference video map according to the present embodiment.
In the system configuration of the automatic driving system 100 shown in FIG. 18, a portion not directly related to the above-described self-position / posture determination device (see FIG. 14) is indicated by a dotted line, and a portion directly related to the self-position / posture determination device is indicated by a solid line. ing.

自動運転システム１００では、目的移動体（この場合は自動運転車両）に設置された目的画像を取得する目的画像取得／目的カメラ部１０１と、前もって作成されたＣＶ映像地図部１０２と、両者を比較する特徴点比較部１０５で、両者画像の対応点を取る。これによって、まず、演算遅延を持つ目的ＣＶ値が取得される。
次に、機械センサー／６変数取得部１０６から取得した低精度ではあっても、リアルタイム出力のＣＶ値が出力される。
次いで、自己位置姿勢標定部１０９で、特徴点比較部１０５から出力されたＣＶ値が、機械センサー／６変数取得部１０６から出力される遅延の無いＣＶ値で補正され、遅延の無い最終の目的ＣＶ値が生成・出力される。 In the automatic driving system 100, the target image acquisition / target camera unit 101 that acquires the target image installed in the target moving body (in this case, the autonomous driving vehicle) is compared with the CV video map unit 102 that is created in advance. The feature point comparison unit 105 takes the corresponding points of both images. Thereby, first, a target CV value having an operation delay is acquired.
Next, even if it is the low accuracy acquired from the mechanical sensor / 6 variable acquisition part 106, the CV value of a real-time output is output.
Next, the CV value output from the feature point comparison unit 105 is corrected by the self-position / posture determination unit 109 with the CV value without delay output from the machine sensor / 6 variable acquisition unit 106, and the final object without delay is corrected. A CV value is generated and output.

以上のような自己位置標定装置は、自動運転の主たる装置であるが、自動運転を行うには、その他の構成として、３Ｄ空間／環境属性識別部１０３で、屋外，屋内，交差点，道路上，道路外，トンネル等の環境の大まかな属性が把握される。
また、対象物認識部１０４において、走行路付近の対象部が認識され、その三次元座標が取得される。
そして、障害物３Ｄ認識部１０７により、走行路付近の対象物が障害物で有ると判断されれば、運転パラメータ指示部１１１に信号が送られる。 The self-localization device as described above is a main device for automatic operation. However, in order to perform automatic operation, the 3D space / environment attribute identification unit 103 can be used outdoors, indoors, intersections, on roads, The general attributes of the environment, such as outside the road and tunnel, are grasped.
Further, the target object recognition unit 104 recognizes a target part in the vicinity of the traveling road, and acquires its three-dimensional coordinates.
If the obstacle 3D recognition unit 107 determines that the object near the travel path is an obstacle, a signal is sent to the driving parameter instruction unit 111.

このようにして、当該自動運転車両の周辺の、並進車両，対向車両，駐車車両，人，その他の移動体などが認識され、並進車両，対向車両については、大きさとその６変数が、駐車車両については、大きさとその三次元座標が、人やその他の移動体については、大きさと移動方向等の情報が取得され、運転パラメータ指示部１１１に信号が送られる。
運転パラメータ指示部１１１は、運転条件設定部１１０のコントロール下にあり、運転に必要な条件が設定され、最終的に車両の案内自動走行部１１２によって、直接車両がコントロールされることで、自動運転が実行される。 Thus, a translation vehicle, an oncoming vehicle, a parked vehicle, a person, other moving bodies, etc. around the self-driving vehicle are recognized. For the translation vehicle and the oncoming vehicle, the size and its six variables are the parked vehicle. As for the size and its three-dimensional coordinates, and for humans and other moving objects, information such as the size and direction of movement is acquired, and a signal is sent to the operation parameter instruction unit 111.
The driving parameter instruction unit 111 is under the control of the driving condition setting unit 110. Conditions necessary for driving are set. Finally, the vehicle is automatically controlled by the automatic guide traveling unit 112 of the vehicle. Is executed.

以上のように、本実施形態の自動運転システム１００は、上述した基準映像地図を用いた自己位置標定装置１を利用して、正確かつ安全な移動車両等の自動運転・自動走行が可能となる。
なお、上記のような自己位置姿勢標定装置が何らかの理由で故障した場合などには、車両を安全に誘導し、安全に停止させなければならない。高速道路の場合などには、すぐに止まれない場合も想定される。
そこで、通常は自己位置姿勢標定装置を作動させた自動運転としながらも、緊急時には、外からの信号を一切受けずに、自ら取得した情報のみで、車両を安全に誘導し、停止させる構成を備えることが必要となる。 As described above, the automatic driving system 100 according to the present embodiment can perform automatic driving and automatic driving of a moving vehicle and the like accurately and safely using the above-described self-positioning device 1 using the reference video map. .
When the above self-position / posture locating device fails for some reason, the vehicle must be guided safely and stopped safely. In the case of a highway, it may be impossible to stop immediately.
Therefore, it is usually configured to automatically guide and stop the vehicle with only the information acquired by itself without receiving any external signal in an emergency, while performing automatic operation with the self-position and posture locating device activated. It is necessary to prepare.

自動運転装置・システムには、必ず障害物検出装置が設置されるから、障害物検出装置が設置された装置側を補助装置として、常に本装置と補助装置の二装置を作動させながら、自動運転を行うことが望ましい。
また、ロボットにおいても同様に、自己位置姿勢装置が故障しても、自動で安全な場所に移動して、停止させる補助装置が必要である。
このように、本実施形態では、自己位置姿勢標定装置以外に、目的車両に積載した、本装置とは独立した自律走行システムを積載して、緊急時には、他のカメラ、又は他のセンサーにより、目的車両を安全に誘導し、停止させることができる安全装置付きの自己位置姿勢標定装置を提供することができる。 Since automatic detection devices and systems always have an obstacle detection device, the device on which the obstacle detection device is installed is used as an auxiliary device, and automatic operation is performed while always operating this device and the auxiliary device. It is desirable to do.
Similarly, in the robot, there is a need for an auxiliary device that automatically moves to a safe place and stops even if the self-position / posture device breaks down.
Thus, in the present embodiment, in addition to the self-position / posture locating device, an autonomous traveling system that is loaded on the target vehicle and independent from the device is loaded, and in an emergency, by other cameras or other sensors, A self-position / posture locating device with a safety device that can safely guide and stop a target vehicle can be provided.

［実施例］
次に、本実施形態に係る基準映像地図を用いた自己位置姿勢標定装置の、より具体的な実施例について説明する。
以下では、自動運転における標準的な自己位置姿勢標定装置の実施例を示す。
なお、以下の実施例では、適宜上述した図１及び図６を参照しつつ説明する。 [Example]
Next, a more specific example of the self-position / posture locating apparatus using the reference video map according to the present embodiment will be described.
In the following, an example of a standard self-position posture locating device in automatic operation will be shown.
In the following embodiments, description will be made with reference to FIGS. 1 and 6 as appropriate.

前もってＣＶ映像取得装置１０で得られた環境を撮影した映像からＣＶ演算して、ＣＶ映像地図作成装置２０により、ＣＶ映像地図データベース３０を作成する。
次に、ＣＶ映像データベース３０から、自動運転に必要な範囲の地図のみ切り出して、ＣＶ映像地図・目的画像比較装置５０で、目的カメラのＣＶ値を取得する。
一方、同時に目的移動体４０にカメラと一体化された装置で、まず機械センサーにより、直接６変数を取得する。これで、同時にカメラからの画像で取得したＣＶ値と、機械センサーで取得したＣＶ値、即ち両者の６変数を取得したことになる。 A CV video map database 30 is created by the CV video map creation device 20 by performing CV calculation from a video obtained by photographing the environment obtained by the CV video acquisition device 10 in advance.
Next, from the CV video database 30, only the map in the range necessary for automatic driving is cut out, and the CV video map / target image comparison device 50 acquires the CV value of the target camera.
On the other hand, at the same time, the apparatus is integrated with the target moving body 40 with the camera, and first, six variables are directly acquired by a mechanical sensor. As a result, the CV value acquired from the image from the camera and the CV value acquired from the mechanical sensor, that is, the six variables of both are acquired.

この二種のデータの内、機械センサーにより直接得られた６変数は、時間的に連続的だが（図１７［０５］参照）、精度が低い。一方、画像は断片的データ［図１７［０６］参照］だが、精度が高い。
そこで、自己位置姿勢標定装置６０において、連続した機械センサーのＣＶ値の中で、目的カメラのＣＶ値のデータと一致する時間（フレームの整数倍）でキャリブレーションして機械センサーのＣＶ値を補正し（図１７［０４／０３］）、その中間は機械センサーのデータで内挿して、全体として、精度が向上したＣＶ値を取得する。 Of these two types of data, six variables obtained directly by the mechanical sensor are temporally continuous (see FIG. 17 [05]), but the accuracy is low. On the other hand, although the image is fragmentary data [see FIG. 17 [06]], the accuracy is high.
Therefore, the self-position / posture locating device 60 corrects the CV value of the mechanical sensor by calibrating at a time (integer multiple of the frame) that matches the CV value data of the target camera among the continuous CV values of the mechanical sensor. (FIG. 17 [04/03]), and the intermediate part is interpolated with the data of the mechanical sensor to obtain a CV value with improved accuracy as a whole.

自己位置姿勢標定装置６０で取得された信号は、車両周囲状況判断装置７０に送られ、ＣＶ映像地図内に位置決めされ、同時に周囲の対象物や歩行者，障害物等とともに位置決めされる。
これらの対象物等が三次元的に、そしてそれぞれの位置関係が明らかになり、車両制御信号発生装置８０で、車両の制御信号が生成され、車両（目的移動体４０）が自動制御されて、自動運転が実現される。 The signal acquired by the self-position / posture locating device 60 is sent to the vehicle surrounding state determination device 70, positioned in the CV video map, and simultaneously positioned with surrounding objects, pedestrians, obstacles, and the like.
These objects are three-dimensionally and their positional relationships are clarified, and the vehicle control signal generator 80 generates a vehicle control signal, and the vehicle (target moving body 40) is automatically controlled. Automatic operation is realized.

なお、目的画像のＣＶ値を求めることが、自己位置姿勢標定そのものであるが、必ずしも目的画像の全フレームを特徴点追跡によって求めなくてもよい。
一般に、演算にはそれなりの時間をかけなければ精度の高い値（ＣＶ値）は求められないので、できるだけ演算時間をかけることが望ましいが、その分フレームは落ちてくることになる。
そこで、本実施例では、目的画像側では飛び飛びのフレームでＣＶ取得して、その飛ばされたフレームに関しては演算せずに、同時に取得した機械センサーから得た６変数により埋めるようにする。 Note that obtaining the CV value of the target image is the self-position / posture orientation itself, but it is not always necessary to obtain all the frames of the target image by feature point tracking.
In general, since it is not possible to obtain a highly accurate value (CV value) unless a certain amount of time is taken for the calculation, it is desirable to take the calculation time as much as possible, but the frame falls accordingly.
Therefore, in this embodiment, CV acquisition is performed with a skipped frame on the target image side, and the skipped frame is not calculated, but is filled with six variables obtained from the simultaneously acquired mechanical sensor.

具体的には、以下のようにして行うことができる。
［実施例１］
目的画像が６ｆｐｓ（毎秒６フレーム）である場合、ＣＶ演算には一秒かけて、つまり１／６のフレームのみＣＶ演算する。
これにより、１フレームのみ高精度演算を行い、残りの５フレームについては、機械センサーから取得した６変数を内挿することができる。 Specifically, it can be performed as follows.
[Example 1]
When the target image is 6 fps (6 frames per second), the CV calculation takes 1 second, that is, only the 1/6 frame is calculated.
As a result, high-precision calculation can be performed for only one frame, and six variables acquired from the mechanical sensor can be interpolated for the remaining five frames.

［実施例２］
図１７に示したような、目的画像が０．５ｆｐｓ（２秒に１フレーム）である場合に、ＣＶ演算に０．３秒がかかり、したがって、０．３秒のディレー（遅延）が発生するとする。
この場合、機械センサーで、０．３秒間を補完して、リアルタイムＣＶ値を出力することができる（図１７参照）。 [Example 2]
When the target image is 0.5 fps (1 frame per 2 seconds) as shown in FIG. 17, it takes 0.3 seconds for the CV calculation, and therefore a delay (delay) of 0.3 seconds occurs. To do.
In this case, it is possible to output a real-time CV value with a mechanical sensor supplementing 0.3 seconds (see FIG. 17).

勿論、この場合、機械センサーによる誤差が、ＣＶ演算の誤差より少ないか、同等であることが条件となる。この条件は極めて妥当な条件である。
もし、目的ＣＶ演算に一秒かかれば、これは機械センサーから得られる６変数を一秒に一回、キャリブレーションしていることになる。
機械センサーの１秒間で発生する累積誤差が、ＣＶ演算による誤差を下回っているという条件は、安価なＩＭＵやＧＹＲＯでも、十分に対応できることを意味する。したがって、これは極めて現実的な方法となる。
なお、目的画像から得られた目的ＣＶ値も、機械センサーも、それぞれ誤差を持つので、補正期間とカメラフレーム数、機械センサーの性能等の兼ね合いで、カメラフレーム数を決定することになる。 Of course, in this case, it is a condition that the error due to the mechanical sensor is less than or equal to the error of the CV calculation. This condition is extremely reasonable.
If the target CV calculation takes 1 second, this means that the 6 variables obtained from the machine sensor are calibrated once a second.
The condition that the accumulated error generated in one second of the mechanical sensor is lower than the error due to the CV calculation means that even an inexpensive IMU or GYRO can sufficiently cope with it. This is therefore a very realistic method.
Since the target CV value obtained from the target image and the mechanical sensor have errors, the number of camera frames is determined in consideration of the correction period, the number of camera frames, the performance of the mechanical sensor, and the like.

以上説明したように、本実施形態の基準映像地図を用いた自己位置姿勢標定装置１によれば、基準映像となるＣＶ映像地図として、移動するカメラによる映像又は連続する静止画像（２次元座標）と、当該カメラの位置と姿勢を示す３次元座標（Ｘ，Ｙ，Ｚ）及び回転量（Φｘ，Φｙ，Φｚ）の計６変数をＣＶ値として全フレームに付与し、それによっていつでも、映像内の任意の場所の３次元座標を取得することができる状態とすることができる。
したがって、自動運転等を行う環境の全域の三次元点（点群）を保持しなくても、必要なときに数ミリセカンドの演算処理を行うことで、いつでも任意の点の三次元点の位置情報を取得・生成することができ、三次元空間を示すデータを大幅に圧縮できるようになる。
これによって、車両や航空機などの自動運転、三次元空間を移動するロボットの自動走行等において、移動体自身の自己位置と姿勢を示す情報を、簡易・迅速かつ高精度に知ることができるようになる。 As described above, according to the self-position / posture locating apparatus 1 using the reference image map of the present embodiment, as a CV image map serving as a reference image, an image from a moving camera or a continuous still image (two-dimensional coordinates). And a total of 6 variables of 3D coordinates (X, Y, Z) and rotation amounts (Φx, Φy, Φz) indicating the position and orientation of the camera are given to all frames as CV values. It is possible to obtain a state in which the three-dimensional coordinates of an arbitrary place can be acquired.
Therefore, the position of a 3D point at any point can be obtained at any time by performing several millisecond arithmetic processing when necessary, without having to maintain 3D points (point clouds) for the entire environment where automatic driving is performed. Information can be acquired and generated, and data representing a three-dimensional space can be greatly compressed.
As a result, information indicating the position and posture of the moving body itself can be obtained easily, quickly and with high accuracy in automatic driving of a vehicle or aircraft, automatic driving of a robot moving in a three-dimensional space, etc. Become.

上述したように、従来の自己位置姿勢標定に用いられているＳＬＡＭやＶ−ＳＬＡＭは、最初から全域の三次元点を持つために、しかも膨大な点群データをやりとりするために、データの転送や記録に膨大な演算コストや費用が発生し、広い地域を自動運転し、自己位置姿勢の標定を行うには、現実的ではなかった。
ＳＬＡＭやＶ−ＳＬＡＭでは通常、自動運転に必要な点群は、概略数億点／ｋｍにもなる。
本実施形態の自己位置姿勢標定装置においても、実際にはいくつかの特徴ある三次元点のデータを保持することになるが、それが仮に数百点／ｋｍの三次元点を持ったとしても、従来のＬＩＤＡＲ方式やＶ−ＳＬＡＭによる点群の数とは比較にならないほど微少であり、わずか数百点のために増加するデータ量は無視できる量である。 As described above, the SLAM and V-SLAM used for conventional self-position / posture determination have three-dimensional points in the entire area from the beginning, and also transfer data in order to exchange enormous point cloud data. As a result, enormous calculation costs and expenses are incurred for recording, and it is not realistic to automatically drive a wide area and to determine the self-position posture.
In SLAM and V-SLAM, the point cloud necessary for automatic operation is generally several hundred million points / km.
Even in the self-position / posture locating apparatus of the present embodiment, data of some characteristic three-dimensional points are actually held, but even if it has a three-dimensional point of several hundred points / km. The number of point clouds by the conventional LIDAR method or V-SLAM is so small that it cannot be compared, and the amount of data that increases due to only a few hundred points is negligible.

実際にデータとして持つのは、空間の三次元座標ではなく、カメラ位置の６変数のみである。そして、それでありながら、この６変数により、画像内のすべての点の三次元座標を簡単な演算で求めることができる状態で３Ｄ地図として利用できることが、本発明の優れた特徴である。
ＣＶ映像地図は、前もって自動走行する目的の環境内を移動する基準カメラによる基準映像又は連続する基準静止画像（２次元画像）と、当該環境の基準ＣＶ値を生成しておくことで、つまり、二次元のままの映像と、その各フレームに６変数を対応させることで、当該環境の三次元情報を集約して保持できるものである。
しかも、ＣＶ映像地図は、膨大な三次元点群を持つことなく、当該ＣＶ映像地図内の任意の点の三次元座標を、必要な時に、演算で求めることができる状態にしておくことで、データを大きく軽量化して保持・管理できることが特徴である。 What actually has data is not the three-dimensional coordinates of the space, but only the six variables of the camera position. In spite of this, it is an excellent feature of the present invention that these 6 variables can be used as a 3D map in a state in which the three-dimensional coordinates of all points in the image can be obtained by a simple calculation.
A CV video map is generated by generating a reference image or a continuous reference still image (two-dimensional image) by a reference camera moving in an environment for the purpose of automatically running in advance and a reference CV value of the environment. By associating the two-dimensional video and six variables with each frame, the three-dimensional information of the environment can be collected and held.
Moreover, the CV video map does not have an enormous three-dimensional point group, and the three-dimensional coordinates of an arbitrary point in the CV video map can be obtained by calculation when necessary. The feature is that data can be kept and managed with a large weight reduction.

このように、本発明で基準映像として用いるＣＶ映像地図は、一手間をかけることで、いつでもどこでも、環境内の三次元座標を取得できる状態で保持されていることから、データは極めて軽量であり、通信にも十分耐えうることになり、自動運転を現実的なものとすることができる。
そして、このようなＣＶ映像地図を基準画像として、これと目的画像を比較し、両者の同一箇所を示す複数の特徴点を自動的に対応させ、目的画像のＣＶ値（６変数）を演算で取得することで、目的画像自己位置姿勢を、迅速に取得することができるものである。 As described above, the CV video map used as the reference video in the present invention is held in a state where the three-dimensional coordinates in the environment can be acquired anytime and anywhere by taking one effort, and thus the data is extremely light. Thus, it can sufficiently withstand communication, and automatic driving can be made realistic.
Then, using such a CV video map as a reference image, this is compared with the target image, a plurality of feature points indicating the same location of both are automatically associated, and the CV value (6 variables) of the target image can be calculated. By acquiring, the target image self-position / posture can be acquired quickly.

したがって、本実施形態に係る基準映像地図を用いた自己位置姿勢標定装置１では、ＣＶ映像と目的画像との比較、及び機械センサーとの組み合わせによって、以下のような優れた効果を実現することができる。
すなわち、ＣＶ映像地図と目的画像を組み合わせることで、第一に、取り扱うデータが軽くなり、演算が効率的になる。
第二に、カメラから取得した動画像と、カメラから取得した目的画像のＣＶ値に、機械センサーでの補正を加えることができる。これらの機器構成としては単純であり、堅牢であり、扱いやすく、低価格で、高性能な自己位置姿勢標定装置が得られる。
第三に、機械センサーとしてのＧＮＳＳからの出力を一つの特徴点として、あるいは直接既知のＣＶ値として、ＣＶ演算に組み入れることで、簡単に目的ＣＶ値の高精度化が可能となる。 Therefore, in the self-position / posture locating apparatus 1 using the reference video map according to the present embodiment, the following excellent effects can be realized by comparing the CV video with the target image and combining with the mechanical sensor. it can.
That is, by combining the CV video map and the target image, firstly, the data to be handled becomes light and the calculation becomes efficient.
Secondly, correction by a mechanical sensor can be added to the CV value of the moving image acquired from the camera and the target image acquired from the camera. These device configurations are simple, robust, easy to handle, low-cost, and high-performance self-positioning posture locating devices.
Thirdly, by incorporating the output from the GNSS as a mechanical sensor as one feature point or directly as a known CV value into the CV calculation, it is possible to easily increase the accuracy of the target CV value.

さらに、ＩＭＵ・ＧＹＲＯによる時間不連続補正を行うことができる。
このようなＣＶ映像の精度向上は、後処理で、撮影後でも可能であり、精度に合わせた後処理や設計等が可能であり、さらに高精度を要求されれば、いつでも高精度化が可能である。
これに対して、上述した特許文献１に開示されているような従来の自己位置姿勢標定方法では、レーザー点群による３Ｄ地図は後の精度向上が極めて困難で、一度取得したデータを変更等することは極めて困難である。 Further, time discontinuity correction by IMU / GYRO can be performed.
Such CV video accuracy can be improved by post-processing and post-shooting, and post-processing and design that match the accuracy is possible. If higher accuracy is required, it can be improved at any time. It is.
On the other hand, in the conventional self-position / posture locating method as disclosed in Patent Document 1 described above, it is extremely difficult to improve the accuracy of a 3D map using a laser point group, and the acquired data is changed. It is extremely difficult.

以上、本発明の基準映像地図を用いた自己位置姿勢標定装置について、好ましい実施形態を示して説明したが、本発明に係る基準映像地図を用いた自己位置姿勢標定装置は、上述した実施形態にのみ限定されるものではなく、本発明の範囲で種々の変更実施が可能であることは言うまでもない。
例えば、上述した実施形態においては、本発明の基準映像地図を用いた自己位置姿勢標定装置の適用対象として、車両等の移動体の自動運転を想定して説明したが、本発明に係る自己位置姿勢標定装置は、自己位置姿勢標定が必要となるどのような装置や手段にも応用できるものであり、その用途・使用方法等も特に限定されるものでないことは言うまでもない。 As described above, the self-position / posture determination apparatus using the reference video map of the present invention has been described with reference to the preferred embodiment. However, the self-position / posture determination apparatus using the reference video map according to the present invention is the same as the above-described embodiment. Needless to say, the present invention is not limited thereto, and various modifications can be made within the scope of the present invention.
For example, in the above-described embodiment, the application of the self-position / posture locating device using the reference image map of the present invention has been described assuming automatic driving of a moving body such as a vehicle, but the self-position according to the present invention is described. The posture locating device can be applied to any device or means that requires self-position posture locating, and needless to say, its use and usage are not particularly limited.

また、上述した実施形態では、自動運転等を行う場合に、基準となる三次元地図として、前もってＣＶ値を高精度で求めておいた基準映像としてのＣＶ映像（全フレームにＣＶ値を持った動画映像）を用いて、車両積載のカメラから取り込んだリアルタイム映像を目的画像として、本発明によりＣＶ値の移植・統合を行うことで、自車両の位置を高精度に取得できることを説明した。
ただし、基準映像を、機械センサーや測量装置等を用いて実測により生成した三次元地図を基準としても同様となる。
したがって、本発明に係る自己位置姿勢標定装置は、機械センサー等の実測値のデータを三次元座標データとして併用することが可能である。 In the above-described embodiment, when automatic driving or the like is performed, a CV image as a reference image in which a CV value is obtained with high accuracy in advance as a reference three-dimensional map (with CV values in all frames). It has been explained that the position of the host vehicle can be acquired with high accuracy by transplanting and integrating CV values according to the present invention using a real-time video captured from a vehicle-mounted camera as a target image.
However, the same applies to a reference image based on a three-dimensional map generated by actual measurement using a mechanical sensor, a surveying device, or the like.
Therefore, the self-position / posture locating apparatus according to the present invention can use measured value data such as a mechanical sensor together as three-dimensional coordinate data.

また、本発明によれば、位置精度としては、リアルタイムＧＰＳの百倍以上の位置精度を持つことが期待できることから、ＧＰＳを本発明における概略位置設定手段として利用することもできる。
また、上述したように、三次元地図を更新する場合にも、本発明の基準映像地図を用いた自己位置姿勢標定装置は効果的に用いられ、映像から地図を生成する際の更新にも当然利用することができる。
さらに、本発明によれば、ＧＰＳを用いないでも高精度の位置座標を取得できることから、高精度ナビゲーション技術への利用も期待できる。 In addition, according to the present invention, the position accuracy can be expected to be one hundred times that of real-time GPS. Therefore, the GPS can also be used as the approximate position setting means in the present invention.
In addition, as described above, even when updating a three-dimensional map, the self-position / posture locating apparatus using the reference video map of the present invention is effectively used, and of course when updating a map from video. Can be used.
Furthermore, according to the present invention, high-accuracy position coordinates can be obtained without using GPS, so that it can be expected to be used for high-precision navigation technology.

本発明は、例えば自動車などの各種車両や航空機，船舶等の移動体の自動運転や、ロボット等の自動走行などに好適に利用することができる。 The present invention can be suitably used for, for example, automatic driving of various vehicles such as automobiles, moving bodies such as airplanes and ships, automatic traveling of robots, and the like.

１０ＣＶ映像取得装置
２０ＣＶ映像地図作成装置
３０ＣＶ映像地図データベース
４０目的移動体
５０ＣＶ映像地図・目的画像比較装置
６０自己位置姿勢標定装置 DESCRIPTION OF SYMBOLS 10 CV image acquisition device 20 CV image map creation device 30 CV image map database 40 Target moving body 50 CV image map / target image comparison device 60 Self-position / posture locating device

Claims

Based on a reference image captured by a predetermined image acquisition means, CV calculation is performed to obtain a CV (camera vector) value indicating a three-dimensional coordinate value of the camera position and orientation of the reference image, and the CV value is applied to the reference image. A CV video map creating means for generating a CV video map to which
A CV video map database for storing the CV video map;
The CV video map stored in the CV video map database is used as a reference image, and the target image captured by a predetermined image acquisition means provided in the target moving body is compared with the CV video map, and the target image and the CV Self-position / posture locating means for automatically acquiring a plurality of feature points indicating the same part of the video map to obtain a CV value of the target image;
A self-position / posture locating device using a reference image map.

The CV video map creating means includes:
The reference, which is the CV (camera vector) value when the reference image is acquired without holding the three-dimensional information of the three-dimensional space imprinted on the reference image as the three-dimensional coordinate data of the space. Generating 6-variable data indicating the position and orientation of the image acquisition means that has captured the image;
The CV video map database is
3D coordinates of an arbitrary point in the reference image can be obtained as needed from the CV value. The CV in which the reference image and the CV value are held corresponding to each other with reduced weight of 3D data. The self-position / posture locating apparatus using the reference video map according to claim 1, wherein the video map is stored.

The self-position posture locating means is
The 6-variable data indicating the self-position and posture of the target moving body acquired by the mechanical sensor provided in the target moving body is treated as data without delay, and the calculation delay time of the CV value of the target image is converted. 3. The reference according to claim 1, wherein a real-time CV value or six variables indicating the self-position and posture of the target moving body that are temporally continuous are acquired by correcting to a real-time value. Self-position and posture locator using video map.

The self-position posture locating means is
As a feature point indicating the same location of the target image and the CV video map, a predetermined feature amount included in the CV video map is selected,
The three-dimensional recognition object such as a sign having an automatically extracted three-dimensional coordinate included in the CV video map is selected as the predetermined feature amount. Self-position / posture locator using the reference video map described.

The self-position posture locating means is
As a feature point indicating the same location of the target image and the CV video map, a predetermined feature amount included in the CV video map is selected,
The reference video map according to any one of claims 1 to 4, wherein a three-dimensional marker in the image having a known three-dimensional coordinate included in the CV video map is selected as the predetermined feature amount. Self-position posture locator using

The self-position posture locating means is
As a feature point indicating the same location of the target image and the CV video map, a predetermined feature amount included in the CV video map is selected,
The reference video map according to any one of claims 1 to 5, wherein a real-world three-dimensional marker having a known three-dimensional coordinate included in the CV video map is selected as the predetermined feature amount. Self-position posture locator using

The self-position posture locating means is
7. The target image from which the CV value is acquired is taken into the CV video map database, and a part or all of the CV video map stored in the CV video map database is updated. A self-position / posture locating device using the reference video map according to any one of the preceding claims.

The self-position posture locating means is
By combining the CV video map and the target image and transferring the three-dimensional coordinates of the three-dimensional feature points from the CV video map to the target image or from the target image to the CV video map, The CV value of an image is acquired. The self-position / posture locating apparatus using the reference video map according to claim 1.

The self-position posture locating means is
The CV video map and the target image are combined, and the CV calculation is performed by integrating the two-dimensional feature points and the three-dimensional feature points included in the CV video map and the target image. The CV value of the target image is acquired by performing CV calculation for all feature points included in the CV video map and the target image, with the three-dimensional coordinates fixed.
The CV value of the target image is separated from all CV values included in the CV video map and the target image. The self using the reference video map according to any one of claims 1 to 8, Position and orientation locator.

The target moving body consists of an autonomous traveling system and a target vehicle equipped with a camera or a sensor,
The reference image according to any one of claims 1 to 9, wherein the target vehicle is guided and stopped based on the target image to which the CV generated by the self-position / posture locating unit is added. Self-position posture locator using map.

The CV video map creating means is
A feature point extraction unit for automatically extracting a predetermined number of feature points from the image data of the moving image;
About the extracted feature points, a feature point correspondence processing unit that automatically tracks within each frame image of the moving image and obtains a correspondence relationship between the frame images;
A camera vector calculation unit that obtains the three-dimensional position coordinates of the feature point for which the correspondence relationship is obtained, and obtains a camera vector composed of the three-dimensional position coordinates and the three-dimensional rotation coordinates of the camera corresponding to each frame image from the three-dimensional position coordinates. The self-position posture locating device using the reference video map according to any one of claims 1 to 10.