JP2021107981A

JP2021107981A - Teacher data generation device

Info

Publication number: JP2021107981A
Application number: JP2019238712A
Authority: JP
Inventors: 穏人藤田; Yasuhito Fujita
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-07-29

Abstract

To improve recognition performance when learning a photographed image and teacher data as a set.SOLUTION: A teacher data generation device includes means (130) for storing a background CG model space previously created by a three-dimensional point cloud and a camera video image, means (102) for recognizing a foreground object in video photographed by a camera disposed at a road side, and creating a CG model by arranging a CAD model corresponding to the foreground object in the background CG model space, and means (103) for calculating similarity between the CG model and an original flame image, and generating teacher data from the generated CG model on the basis of the similarity.SELECTED DRAWING: Figure 1

Description

本発明は、教師データ生成装置に関し、特に実写画像とＣＧ画像との間における認識性能を向上させる教師データ生成装置に関する。 The present invention relates to a teacher data generator, and more particularly to a teacher data generator that improves recognition performance between a live-action image and a CG image.

現在、車両の自動運転システムや運転支援システムを実現するため、様々な開発が進められている。例えば、こうしたシステムでは、自車両周辺の障害物や移動体などの自車両周辺の外界情報を認識し、自車両の外界周辺状況に応じた走行制御を行っている。したがって、外界情報の認識における誤検知や未検知は、安全上の重大な問題である。外界情報の認識における誤検知や未検知の問題を解決するため、外界情報の画像データからその特徴を段階的に学習するディープラーニング（機械学習）が用いられる。ディープラーニングによる認識性能を向上させるためには、リアリティ性の高い多数のサンプルが求められている。特許文献１では、ＣＧにより実写に極めて類似した画像を生成することで学習のサンプル数を増やして、認識率を向上させることが開示されている。こうしたＣＧによりバリエーションに富んだ教師データを作成することができる。 Currently, various developments are underway to realize an automatic driving system and a driving support system for vehicles. For example, in such a system, the outside world information around the own vehicle such as obstacles and moving objects around the own vehicle is recognized, and traveling control is performed according to the situation around the outside world of the own vehicle. Therefore, false detection or non-detection in recognition of external information is a serious safety problem. In order to solve the problems of false detection and undetection in the recognition of external world information, deep learning (machine learning) is used in which the features are gradually learned from the image data of the external world information. In order to improve the recognition performance by deep learning, a large number of highly realistic samples are required. Patent Document 1 discloses that the number of learning samples is increased and the recognition rate is improved by generating an image very similar to a live-action image by CG. With such CG, it is possible to create a wide variety of teacher data.

特開２０１８―０６０５１１号公報Japanese Unexamined Patent Publication No. 2018-060511

しかしながら、実際には、実写画像に極めて類似したリアリティの高いＣＧ画像であっても、ＣＧ画像と実写画像との間のドメインシフトのため、実画像に対して適用すると認識性能が悪化するという問題点がある。 However, in reality, even a highly realistic CG image that is very similar to the live-action image has a problem that the recognition performance deteriorates when applied to the real image due to the domain shift between the CG image and the live-action image. There is a point.

本発明は、このような事情に鑑みてなされたものであって、実写画像とＣＧ画像との間における認識性能を向上させる教師データ生成装置を提供するものである。 The present invention has been made in view of such circumstances, and provides a teacher data generation device for improving recognition performance between a live-action image and a CG image.

本発明に係る教師データ生成装置は、３次元ポイントクラウドとカメラ映像により予め作成された背景ＣＧモデル空間を格納する手段と、
路側に設置したカメラで撮影された映像の前景物体を認識し、前記前景物体に対応するＣＡＤモデルを前記背景ＣＧモデル空間内に配置することでＣＧモデルを作成する手段と、
前記ＣＧモデルと元フレーム画像との間の類似度を算出する手段と、
前記類似度に基づいて、前記作成したＣＧモデルから教師データを生成する手段と、
を備えるものである。 The teacher data generation device according to the present invention includes means for storing a background CG model space created in advance by a three-dimensional point cloud and a camera image, and
A means for creating a CG model by recognizing a foreground object of an image taken by a camera installed on the roadside and arranging a CAD model corresponding to the foreground object in the background CG model space.
A means for calculating the degree of similarity between the CG model and the original frame image,
A means for generating teacher data from the created CG model based on the similarity, and
Is provided.

本発明により、ＣＧモデルから教師データを増やした場合において、その教師データの妥当性を判断することで、より精密な教師データを生成することができる。 According to the present invention, when the teacher data is increased from the CG model, more precise teacher data can be generated by judging the validity of the teacher data.

本発明によれば、実写画像とＣＧ教師データをセットにして学習したときの認識性能が向上する。 According to the present invention, the recognition performance when learning a live-action image and CG teacher data as a set is improved.

本実施の形態にかかる学習装置の構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the learning apparatus which concerns on this Embodiment. 本実施の形態にかかるニューラルネットワーク学習を説明する図である。It is a figure explaining the neural network learning which concerns on this Embodiment. 本実施の形態にかかる学習装置の構成を示すブロック図である。It is a block diagram which shows the structure of the learning apparatus which concerns on this embodiment. 背景ＣＧモデル作成のフローチャートである。It is a flowchart of background CG model creation. フレームＣＧモデル作成処理のフローチャートである。It is a flowchart of a frame CG model creation process. フレームＣＧモデル作成処理のフローチャートである。It is a flowchart of a frame CG model creation process. 教師データ作成部３０２による教師データ作成処理、妥当性判断部３０４による妥当性判断処理、及び手作業修正部３０５による手作業修正処理のフローチャートである。It is a flowchart of the teacher data creation process by the teacher data creation unit 302, the validity determination process by the validity determination unit 304, and the manual operation correction process by the manual operation correction unit 305. 教師データ作成部３０２による教師データ作成処理、妥当性判断部３０４による妥当性判断処理、及び手作業修正部３０５による手作業修正処理のフローチャートである。It is a flowchart of the teacher data creation process by the teacher data creation unit 302, the validity determination process by the validity determination unit 304, and the manual operation correction process by the manual operation correction unit 305. 各種教師データを示す図である。It is a figure which shows various teacher data. ネットワーク訓練部３０６による訓練処理のフローチャートである。It is a flowchart of the training process by the network training unit 306.

まず本発明にかかる教師データ生成装置の概要を説明する。本発明では３次元ポイントクラウド（Point Cloud：点群）とカメラ映像により予め背景のＣＧモデル空間を作成する。路側に設置したカメラで撮影した映像の前景物体（例えば、車、バイク、人など）を認識し、対応するＣＡＤモデルを背景ＣＧモデル空間内に配置していき、映像と一致するＣＧモデルを半自動で作成する。作成したＣＧモデルから、妥当性の高い各種の教師データを自動で出力することにより精度の高い教師データを作成する。これにより、大量のデータを識別・分類して正解ラベルを作成するアノテーション工数を削減することができる。さらに、学習時にはＣＧ画像を使用せずに実写画像とＣＧ教師データをセットにして学習することで、認識性能を向上させることができる。 First, an outline of the teacher data generation device according to the present invention will be described. In the present invention, a background CG model space is created in advance using a three-dimensional point cloud (point cloud) and a camera image. It recognizes the foreground object (for example, car, motorcycle, person, etc.) of the image taken by the camera installed on the roadside, arranges the corresponding CAD model in the background CG model space, and semi-automatically sets the CG model that matches the image. Create with. Highly accurate teacher data is created by automatically outputting various highly relevant teacher data from the created CG model. As a result, it is possible to reduce the man-hours for annotation to identify and classify a large amount of data and create a correct label. Further, the recognition performance can be improved by learning the live-action image and the CG teacher data as a set without using the CG image at the time of learning.

これまで、ディープラーニングにおいて教師データを手作業により作成するアノテーション作業には多大な工数を要していた。また、道路上の物体検出を行うために用いる距離センサやＬｉＤＡＲは、高額で導入コストが高い上に、距離や３次元の教師データを正確に作成するのは困難である。ＣＧにより大量の教師データを簡易に作成できるが、フォトリアリスティックなＣＧ画像であっても、ＣＧ教師データのみもしくは実データの教師データと混合して学習すると、適用時に認識性能が悪化するという問題があった。ＧＡＮ（Generative Adversarial Nets）変換によりＣＧ画像と実写画像のドメインシフトを軽減することができるが、完全に解決するわけではない。本発明の手法により実測、作成することが難しい距離や３次元の教師データも含めて半自動で作成でき、コストと工数の削減が期待できる。また、従来のＣＧとの混合学習方法よりも認識性能の向上が期待できる。 Until now, in deep learning, the annotation work of manually creating teacher data required a large amount of man-hours. In addition, distance sensors and LiDAR used for detecting objects on the road are expensive and expensive to introduce, and it is difficult to accurately create distance and three-dimensional teacher data. A large amount of teacher data can be easily created by CG, but even if it is a photorealistic CG image, if it is learned only with CG teacher data or mixed with teacher data of actual data, the recognition performance deteriorates at the time of application. was there. GAN (Generative Adversarial Nets) conversion can reduce the domain shift between CG images and live-action images, but it is not a complete solution. By the method of the present invention, distances and three-dimensional teacher data that are difficult to actually measure and create can be created semi-automatically, and cost and man-hours can be expected to be reduced. In addition, improvement in recognition performance can be expected as compared with the conventional mixed learning method with CG.

以下、本発明を適用した具体的な実施形態について、図面を参照しながら詳細に説明する。ただし、本発明が以下の実施形態に限定される訳ではない。また、説明を明確にするため、以下の記載および図面は、適宜、簡略化されている。 Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. However, the present invention is not limited to the following embodiments. In addition, the following description and drawings have been simplified as appropriate to clarify the description.

図１は、本実施の形態にかかる学習装置の構成を示す概略ブロック図である。図２は、本実施の形態にかかるニューラルネットワーク学習を説明する図である。 FIG. 1 is a schematic block diagram showing a configuration of a learning device according to the present embodiment. FIG. 2 is a diagram illustrating neural network learning according to the present embodiment.

学習装置は、ＣＰＵ等の演算処理装置を備え、細分化された処理のそれぞれを実行する機能演算部としての機能も担う。具体的には、学習装置は、路側カメラ動画入力部１０１、教師データ作成部１０２、精密教師データ作成部１０３及び学習部１０４を備える。また、学習装置は、前景物体ＣＡＤモデルデータベース１２０と背景ＣＧモデルデータベース１３０を備える。教師データ作成部１０２、精密教師データ作成部１０３、及び背景ＣＧモデルデータベース１３０は、本発明の特徴部分の１つである、教師データ生成装置としても機能する。 The learning device includes an arithmetic processing unit such as a CPU, and also has a function as a functional arithmetic unit that executes each of the subdivided processes. Specifically, the learning device includes a roadside camera moving image input unit 101, a teacher data creation unit 102, a precision teacher data creation unit 103, and a learning unit 104. Further, the learning device includes a foreground object CAD model database 120 and a background CG model database 130. The teacher data creation unit 102, the precision teacher data creation unit 103, and the background CG model database 130 also function as a teacher data generation device, which is one of the feature parts of the present invention.

路側カメラ動画入力部１０１は路側に設置したカメラ等のセンサ手段で撮影した動画を実写動画として入力する。 The roadside camera moving image input unit 101 inputs a moving image taken by a sensor means such as a camera installed on the roadside as a live-action moving image.

前景物体ＣＡＤモデルデータベース１２０は道路上を往来する可能性のある前景物体（例えば、車、バイク、人など）のテンプレートのＣＡＤモデルを格納している。背景ＣＧモデルデータベース１３０は路側カメラが設置された道路や周辺構造物などの背景ＣＡＤモデルを格納している。 The foreground object CAD model database 120 stores a CAD model of a template of a foreground object (for example, a car, a motorcycle, a person, etc.) that may come and go on the road. The background CG model database 130 stores background CAD models such as roads and peripheral structures in which roadside cameras are installed.

教師データ作成部１０２は背景ＣＡＤモデルに車などの前景物体を配置していき、路側カメラ動画入力部１０１から入力された実写動画に対応するＣＧモデル空間を作成し、各種の教師データＴＤを自動で作成する。 The teacher data creation unit 102 arranges a foreground object such as a car on the background CAD model, creates a CG model space corresponding to the live-action video input from the roadside camera video input unit 101, and automatically performs various teacher data TDs. Create with.

作成された各種の教師データＴＤの例としては、例えば、２次元、３次元バウンディングボックス（2D,3D Bounding Box）、セマンティック（Semantic）、インスタンスセグメンテーション（Instance Segmentation）、デプスマップ（Depth Map）、クラウドポイント（Cloud Point）、クラス名、属性情報などが挙げられる。 Examples of various teacher data TDs created include, for example, 2D, 3D Bounding Box, Semantic, Instance Segmentation, Depth Map, and cloud. Points (Cloud Point), class name, attribute information, etc. can be mentioned.

精密教師データ作成部１０３は隠れによる未検知などの理由で自動作成できなかった不完全な教師データを修正し、高精度な教師データＰＴＤを作成する。また、精密教師データ作成部１０３は、教師データＴＤの妥当性を判定し、高精度な教師データＰＴＤを作成する。詳細は後述するが、精密教師データ作成部１０３は、作成したＣＧモデルと元フレーム画像との間の類似度を算出し、類似度に基づいて、作成したＣＧモデルから精密教師データＰＴＤを生成する。 The precision teacher data creation unit 103 corrects incomplete teacher data that could not be automatically created due to reasons such as undetection due to hiding, and creates highly accurate teacher data PTD. Further, the precision teacher data creation unit 103 determines the validity of the teacher data TD and creates a highly accurate teacher data PTD. Although the details will be described later, the precision teacher data creation unit 103 calculates the similarity between the created CG model and the original frame image, and generates the precision teacher data PTD from the created CG model based on the similarity. ..

学習部１０４は、路側カメラからの実写画像とＣＧにより作成した教師データをセットしてニューラルネットワークの訓練を行う。なお、本明細書では、公知のニューラルネットワーク、深層学習などについての詳細な説明を省略する。 The learning unit 104 trains the neural network by setting the live-action image from the roadside camera and the teacher data created by CG. In this specification, detailed description of known neural networks, deep learning, and the like will be omitted.

図３は本発明の実施の形態にかかる学習装置の構成を示すブロック図である。学習装置は本発明の特徴部の１つである教師データ生成装置を含んでいる。 FIG. 3 is a block diagram showing a configuration of a learning device according to an embodiment of the present invention. The learning device includes a teacher data generation device which is one of the feature parts of the present invention.

高密度ポイントクラウド取得部３１１はＭＭＳ（Mobile Mapping System）やＬｉＤＡＲ（Light Detection and Ranging）などで高密度ポイントクラウドを取得する。カメラ映像入力部３１２は、ポイントクラウドを取得するときに同時にカメラ映像を撮影し、カメラ映像データを入力する。 The high-density point cloud acquisition unit 311 acquires a high-density point cloud using MMS (Mobile Mapping System), LiDAR (Light Detection and Ranging), or the like. The camera image input unit 312 shoots a camera image at the same time as acquiring the point cloud, and inputs the camera image data.

路側カメラ動画入力部３０１は、路側に設置されたカメラで撮影した映像を入力する。路側カメラ設置情報データベース３１０は、路側カメラの設置情報（例えば、緯度、経度、高度、設置角度）を格納している。 The roadside camera video input unit 301 inputs video captured by a camera installed on the roadside. The roadside camera installation information database 310 stores installation information (for example, latitude, longitude, altitude, installation angle) of the roadside camera.

背景ＣＧモデルデータベース３３０は、道路とその周辺構造物（例えば、信号機、標識など）のＣＡＤモデルとそれらを配置したＣＧモデル空間を格納している。 The background CG model database 330 stores the CAD models of roads and their surrounding structures (for example, traffic lights, signs, etc.) and the CG model space in which they are arranged.

前景物体ＣＡＤモデルデータベース３２０は、車、バイク、人など道路を往来する前景物体のＣＡＤモデルを格納している。 The foreground object CAD model database 320 stores CAD models of foreground objects such as cars, motorcycles, and people who come and go on the road.

教師データデータベース３４０は、フレーム毎に実写画像とＣＧ教師データとをセットにして格納している。教師ＣＧモデルデータベース３５０は各フレームに対応したＣＧモデル空間を格納している。 The teacher data database 340 stores a live-action image and CG teacher data as a set for each frame. The teacher CG model database 350 stores the CG model space corresponding to each frame.

学習結果データベース３６０は、背景ＣＧモデル作成部３１３を用いて随時訓練したネットワークモデル（学習済みモデル）を格納している。 The learning result database 360 stores a network model (learned model) trained at any time using the background CG model creation unit 313.

背景ＣＧモデル作成部３１３は、高密度ポイントクラウドとカメラ映像から道路とその周辺構造物のＣＡＤモデルを作成する。 The background CG model creation unit 313 creates a CAD model of the road and its surrounding structures from the high-density point cloud and the camera image.

フレームＣＧモデル作成部３０３は、路側カメラ動画から各フレームに対応するＣＧモデル空間を作成する。教師データ作成部３０２は、各フレームに対応する各種の教師データを作成する。妥当性判断部３０４は、作成した教師データの精度を確認し、妥当性を判断する。手作業修正部３０５は、自動生成に失敗したＣＧモデルをマニュアルで修正する。 The frame CG model creation unit 303 creates a CG model space corresponding to each frame from the roadside camera moving image. The teacher data creation unit 302 creates various teacher data corresponding to each frame. The validity determination unit 304 confirms the accuracy of the created teacher data and determines the validity. The manual correction unit 305 manually corrects the CG model that failed to be automatically generated.

ネットワーク訓練部３０６は、作成した教師データを用いて各種ネットワーク（例えば、ニューラルネットワーク）の訓練を行う。 The network training unit 306 trains various networks (for example, neural networks) using the created teacher data.

図４は背景ＣＧモデル作成部３１３による背景ＣＧモデル作成処理のフローチャートである。 FIG. 4 is a flowchart of the background CG model creation process by the background CG model creation unit 313.

ステップＳ４０ｌでは「高密度ポイントクラウド取得部３１１」からのポイントクラウドと「カメラ映像入力部３１２」からのカメラ映像を引数に取る。ＬｉＤＡＲで高密度のポイントクラウドを計測すると同時にカメラで映像を撮影する。ＬｉＤＡＲとカメラとの間はキャリブレーションされており、取得されたポイントクラウドとカメラ映像は、フレーム毎に同期が取れているものとする。 In step S40l, the point cloud from the "high-density point cloud acquisition unit 311" and the camera image from the "camera image input unit 312" are taken as arguments. At the same time as measuring a high-density point cloud with LiDAR, an image is taken with a camera. It is assumed that the LiDAR and the camera are calibrated, and the acquired point cloud and the camera image are synchronized frame by frame.

ステップＳ４０２では車両、歩行者などの背景以外のポイントクラウド（前景物体）を除去する。ステップＳ４０３では、フレーム画像に対してセマンティックセグメンテーション（ＳｅｍａｎｔｉｃＳｅｇｍｅｎｔａｔｉｏｎ）などの手法を用いて、画素毎にクラス属性を付与する。 In step S402, a point cloud (foreground object) other than the background such as a vehicle or a pedestrian is removed. In step S403, a class attribute is assigned to each pixel of the frame image by using a method such as semantic segmentation.

ステップＳ４０４ではポイントクラウドをフレーム画像とセマンティックセグメンテーション画像ヘマッピングする。ステップＳ４０５ではマッピング結果から各ポイントクラウドへ色情報、クラス属性を付与する。これにより、色付き、クラス属性付きポイントクラウドが作成される。 In step S404, the point cloud is mapped to the frame image and the semantic segmentation image. In step S405, color information and class attributes are added to each point cloud from the mapping result. This creates a colored, class-attributed point cloud.

背景として分類される各クラス（道路、建物、樹木等）に対して以下のステップＳ４０６〜ステップＳ４０８を繰り返す。 The following steps S406 to S408 are repeated for each class (road, building, tree, etc.) classified as a background.

ステップＳ４０６では対象クラスのポイントクラウドのクラスタリングを行う。ステップＳ４０７ではクラスタリングされたポイントクラウドから３次元のＣＡＤモデルを作成（変換）する。ステップＳ４０８では作成したＣＡＤモデルをグローバルなＣＧモデル空間内に配置し、作成された背景ＣＧモデルを前述した「背景ＣＧモデルデータベース３３０」に格納する。 In step S406, the point cloud of the target class is clustered. In step S407, a three-dimensional CAD model is created (converted) from the clustered point cloud. In step S408, the created CAD model is arranged in the global CG model space, and the created background CG model is stored in the above-mentioned "background CG model database 330".

図５Ａ及び図５Ｂは、フレームＣＧモデル作成部３０３によるフレームＣＧモデル作成処理のフローチャートである。
ディープラーニングによる推論を行う処理では「学習結果データベース３６０」から精度のよいネットワークが作成できれば、適宜、更新された学習済みモデルに取り換えを行う。 5A and 5B are flowcharts of the frame CG model creation process by the frame CG model creation unit 303.
In the process of inferring by deep learning, if an accurate network can be created from the "learning result database 360", it is replaced with an updated trained model as appropriate.

ステップＳ５０１では「路側カメラ動画入力部３０１」と「路側ＬｉＤＡＲログ５０４」から順次フレーム毎の画像やポイントクラウドを取得する。この際、路側にＬｉＤＡＲを設置したほうが精度よくＣＧモデルを作成できる。 In step S501, images and point clouds for each frame are sequentially acquired from the “roadside camera moving image input unit 301” and the “roadside LiDAR log 504”. At this time, it is possible to create a CG model more accurately by installing LiDAR on the roadside.

ステップＳ５０２では前フレーム画像に対応するＣＧモデルが存在するかを確認する。対応ＣＧモデルが存在する（ステップＳ５０２でＹＥＳ）場合は、「教師ＣＧモデルデータベース３５０」から前フレームに対応するＣＧモデルを取得する（ステップＳ５０３）。一方、対応ＣＧモデルが存在しない（ステップＳ５０２でＮＯ）場合は、「路側カメラ設置情報データベース３１０」から大まかな設置位置を把握し、「背景ＣＧモデルデータベース３３０」からカメラ周辺のＣＧモデルを取得する（ステップＳ５０４）。また、「路側カメラ設置情報データベース３１０」からＣＧモデル内のカメラパラメータを設定する。 In step S502, it is confirmed whether or not the CG model corresponding to the previous frame image exists. If the corresponding CG model exists (YES in step S502), the CG model corresponding to the previous frame is acquired from the "teacher CG model database 350" (step S503). On the other hand, if the corresponding CG model does not exist (NO in step S502), the rough installation position is grasped from the "roadside camera installation information database 310", and the CG model around the camera is acquired from the "background CG model database 330". (Step S504). In addition, the camera parameters in the CG model are set from the "roadside camera installation information database 310".

ステップＳ５０５ではフレーム内に存在する車、バイク、人などの前景物体を検出する（認識する）。前フレームに対応するＣＧモデルが取得できた場合はそれらの情報も参考にしてもよい。また、ディープラーニングによる物体検出の手法を用いてもよい。いくつかのネットワークの結果を比較して、検出精度を上げてもよい。 In step S505, foreground objects such as a car, a motorcycle, and a person existing in the frame are detected (recognized). If the CG model corresponding to the previous frame can be acquired, that information may also be referred to. Further, a method of object detection by deep learning may be used. The detection accuracy may be improved by comparing the results of several networks.

検出した前景物体に対して以下のステップＳ５０６〜ステップＳ５１４の処理を繰り返す。ステップＳ５０６では対象物体の詳細な属性を推定する。例えば、車とバイクの車種の判定、人の性別、年齢、及び体型を推定してもよい。ステップＳ５０７ではステップＳ５０６で推定した属性に応じて「前景物体ＣＡＤモデルデータベース３２０」から類似した前景物体のＣＡＤモデルを取得する。人クラスの場合は類似度の高い体型のＣＡＤモデルを取得する必要がある。ステップＳ５０８では対象物体のカメラからの距離と、対象物体の姿勢（回転行列Ｒと併進ベクトルｔ）を推定する。 The following steps S506 to S514 are repeated for the detected foreground object. In step S506, the detailed attributes of the target object are estimated. For example, the vehicle type of a car and a motorcycle may be determined, and the gender, age, and body shape of a person may be estimated. In step S507, a CAD model of a similar foreground object is acquired from the "foreground object CAD model database 320" according to the attributes estimated in step S506. In the case of the human class, it is necessary to acquire a CAD model of a body shape with a high degree of similarity. In step S508, the distance of the target object from the camera and the posture of the target object (rotation matrix R and translation vector t) are estimated.

ステップＳ５０９では対象物体が剛体とみなせるクラス（車、バイクなど）かどうか判定する。 In step S509, it is determined whether or not the target object is in a class (car, motorcycle, etc.) that can be regarded as a rigid body.

対象物体が人クラスの場合（ステップＳ５０９でＮＯ、すなわち剛体でない場合）は以下の処理を行う。関節のキーポイント（画像から特徴的と思われる点）の３次元位置を推定する（ステップＳ５１０）。推定したキーポイントに合致するようにＣＡＤモデルを変形させる（調整する）（ステップＳ５１１）。ステップＳ５１２では、ステップＳ５０８で推定した変換行列（Ｒとｔ）を用いて対象物体（人）のＣＡＤモデルをＣＧ空間内に配置する。 When the target object is of the human class (NO in step S509, that is, when it is not a rigid body), the following processing is performed. The three-dimensional position of the key point of the joint (the point that seems to be characteristic from the image) is estimated (step S510). The CAD model is deformed (adjusted) so as to match the estimated key point (step S511). In step S512, the CAD model of the target object (person) is arranged in the CG space using the transformation matrices (R and t) estimated in step S508.

一方、対象物体が剛体クラス（車、バイクなど）の場合（ステップＳ５０９でＹＥＳ）は、直接、ステップＳ５１２に進み、ステップＳ５０８で推定した変換行列（Ｒとｔ）を用いて対象物体（車、バイクなど）のＣＡＤモデルをＣＧ空間内に配置する。 On the other hand, when the target object is a rigid body class (car, motorcycle, etc.) (YES in step S509), the process directly proceeds to step S512, and the target object (car, t) is used by using the transformation matrix (R and t) estimated in step S508. A CAD model (such as a motorcycle) is placed in the CG space.

ステップＳ５１３ではＣＧモデルから該当領域のＲＧＢ画像をレンダリングする。ＣＧ画像と実画像でそれぞれエッジ及びコーナーの特徴量を比較し、類似度を算出する。ステップＳ５１４ではポイントクラウドの実データがあれば、その領域のポイントクラウドを切り出す。ＣＧから該当モデル領域のポイントクラウドを出力し、実データからのポイントクラウドと、ＣＧからのポイントクラウドを比較し、類似度を算出する。 In step S513, an RGB image of the corresponding region is rendered from the CG model. The feature amounts of the edges and corners are compared between the CG image and the actual image, and the similarity is calculated. In step S514, if there is actual data of the point cloud, the point cloud in that area is cut out. The point cloud of the corresponding model area is output from CG, the point cloud from actual data is compared with the point cloud from CG, and the degree of similarity is calculated.

ステップＳ５１５では、ステップＳ５１３とステップＳ５１４での比較結果を用いてＣＡＤモデルがＣＧ空間内に精度よく配置されたかを判定する。ＣＡＤモデルが精度よく配置されていない（ステップＳ５１５でＮＯ）場合、ステップＳ５１６に進む。ステップＳ５１６では、ステップＳ５１３とステップＳ５１４が最小となるように変換行列（回転行列Ｒと併進ベクトルｔ）を最適化する。その後、再びステップＳ５１２に戻り、処理を繰り返す。 In step S515, it is determined whether or not the CAD model is accurately arranged in the CG space by using the comparison result in step S513 and step S514. If the CAD model is not accurately arranged (NO in step S515), the process proceeds to step S516. In step S516, the transformation matrix (rotation matrix R and translation vector t) is optimized so that step S513 and step S514 are minimized. After that, the process returns to step S512 and the process is repeated.

一方、ＣＡＤモデルがＣＧ空間内に精度よく配置されている（ステップＳ５１５でＹＥＳ）場合、処理を終了する。 On the other hand, when the CAD model is accurately arranged in the CG space (YES in step S515), the process ends.

図６Ａ及び図６Ｂは、教師データ作成部３０２による教師データ作成処理、妥当性判断部３０４による妥当性判断処理、及び手作業修正部３０５による手作業修正処理のフローチャートである。 6A and 6B are flowcharts of the teacher data creation process by the teacher data creation unit 302, the validity determination process by the validity determination unit 304, and the manual correction process by the manual correction unit 305.

ステップＳ６０ｌでは、教師データ作成部３０２は、「フレームＣＧモデル作成部３０３」からフレーム画像に対応するＣＧモデルを受け取り、各種の教師データを出力する。各種の教師データとしては、図７に示すように、２次元バウンディングボックス（2D Bounding Box）、セマンティックセグメンテーション（Semantic Segmentation）、インスタンスセグメンテーション（Instance Segmentation）、デプスマップ（Depth Map）、３次元バウンディングボックス（3D Bounding Box）、ポイントクラウド（Point Cloud）、クラス名（自動車、バイク、人など）、属性情報（車種、性別、年齢など）が挙げられる。 In step S60l, the teacher data creation unit 302 receives the CG model corresponding to the frame image from the “frame CG model creation unit 303” and outputs various teacher data. As shown in FIG. 7, various teacher data include a two-dimensional bounding box (2D Bounding Box), a semantic segmentation (Semantic Segmentation), an instance segmentation (Instance Segmentation), a depth map (Depth Map), and a three-dimensional bounding box (3D bounding box). 3D Bounding Box), Point Cloud, class name (car, bike, person, etc.), attribute information (vehicle type, gender, age, etc.).

ステップＳ６０２では、ＣＧ画像６１をレンダリングする。 In step S602, the CG image 61 is rendered.

ステップＳ６０３〜ステップＳ６０６では、妥当性判断部３０４は、作成したＣＧ教師データの妥当性を判断する。具体的には、ステップＳ６０３では元フレーム画像６２とＣＧ画像６１でそれぞれ、エッジとコーナーなどの特徴量を抽出し、比較して、類似度を算出する。ステップＳ６０４では元フレーム画像６２をセマンティックセグメンテーションにより、クラス毎にピクセル値に塗り分ける。また、ステップＳ６０５では、ステップＳ６０４で作成したセグメンテーション画像とＣＧのセマンティックセグメンテーション教師画像の類似度を比較する。さらに、ステップＳ６０６では実測のポイントクラウド６３とＣＧのポイントクラウドを比較し、類似度を算出する。 In steps S603 to S606, the validity determination unit 304 determines the validity of the created CG teacher data. Specifically, in step S603, feature quantities such as edges and corners are extracted from the original frame image 62 and the CG image 61, respectively, and compared to calculate the degree of similarity. In step S604, the original frame image 62 is divided into pixel values for each class by semantic segmentation. Further, in step S605, the similarity between the segmentation image created in step S604 and the CG semantic segmentation teacher image is compared. Further, in step S606, the actually measured point cloud 63 and the CG point cloud are compared, and the degree of similarity is calculated.

これらの類似度が高い（類似している）場合（すなわち、ステップＳ６０３、ステップＳ６０５、ステップＳ６０６でＹＥＳの場合）、ステップＳ６０７では、教師データとして妥当であると判断する。その後、元フレーム画像とＣＧ教師データを「教師データデータベース３４０」へ格納する。ＣＧモデルを「教師ＣＧモデルデータベース３５０」へ格納する。格納するデータは、容量を削減するため、前フレームとの差分情報だけであってもよい。 When these similarities are high (similar) (that is, when YES in step S603, step S605, and step S606), in step S607, it is determined that the teacher data is valid. After that, the original frame image and the CG teacher data are stored in the "teacher data database 340". The CG model is stored in the "teacher CG model database 350". The data to be stored may be only the difference information from the previous frame in order to reduce the capacity.

一方、これらの類似度が近似していない場合（すなわち、ステップＳ６０３、ステップＳ６０５、ステップＳ６０６でＮＯの場合）、比較した各種の類似度の差が、闘値よりも低ければ（ステップＳ６０８でＮＯ）、手作業による修正（ステップＳ６０９）を行う。具体的には、ステップＳ６０９では手作業による既存のＣＡＤモデルの配置調整や、未検出の物体のＣＡＤモデルを配置し、フレームＣＧモデルを修正する。 On the other hand, when these similarities are not similar (that is, when NO in step S603, step S605, and step S606), if the difference between the various similarities compared is lower than the fighting value (NO in step S608). ), Manually make corrections (step S609). Specifically, in step S609, the arrangement of the existing CAD model is manually adjusted, the CAD model of the undetected object is arranged, and the frame CG model is modified.

ステップＳ６０８では修正不可と判断した場合（ステップＳ６０８でＹＥＳの場合）は、ＣＧ画像を教師データデータベース３４０に格納することなく、処理を終了する。 If it is determined that the correction is not possible in step S608 (YES in step S608), the process ends without storing the CG image in the teacher data database 340.

このように、本実施の形態によれば、教師データとして妥当性の高いものを抽出することで、高精度な教師データを生成することができる。また、距離や３次元バウンディングボックス、ポイントクラウドなどの教師データを正確に作成するのは難しいが、このような妥当性の判断を行うことで、これらの教師データも利用できるようになる。 As described above, according to the present embodiment, highly accurate teacher data can be generated by extracting highly valid teacher data. In addition, it is difficult to accurately create teacher data such as distance, 3D bounding box, and point cloud, but by making such a validity judgment, these teacher data can also be used.

図８はネットワーク訓練部３０６による訓練処理のフローチャートである。
ある程度の教師データの蓄積後、各種のネットワークの訓練を行う。ステップＳ８０ｌでは、教師データデータベース３４０から教師データ（ＣＧ教師データ）を取得して実写画像とＣＧ教師データをセットにして各種ネットワークの学習を行う。ステップＳ８０２では学習済みモデルの評価を行う。評価結果と学習済みのネットワークモデルを学習結果データベース３６０へ格納する。ステップＳ８０３ではパラメータの最適化を行う。パラメータの最適化処理は、バッチ処理等で一括計算させてもよい。このように、学習時にはＣＧ画像を使用せずに実写画像とＣＧ教師データをセットにして学習することで、認識性能を向上させることができる。また、ＧＡＮ変換によるドメインシフトの低減には限界があるので、本発明に示す手法は有効である。 FIG. 8 is a flowchart of the training process by the network training unit 306.
After accumulating some teacher data, various networks are trained. In step S80l, teacher data (CG teacher data) is acquired from the teacher data database 340, and the live-action image and the CG teacher data are set as a set to learn various networks. In step S802, the trained model is evaluated. The evaluation result and the trained network model are stored in the training result database 360. In step S803, the parameters are optimized. The parameter optimization process may be collectively calculated by batch process or the like. In this way, the recognition performance can be improved by learning the live-action image and the CG teacher data as a set without using the CG image at the time of learning. Moreover, since there is a limit to the reduction of domain shift by GAN conversion, the method shown in the present invention is effective.

上述の例において、プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、ＤＶＤ（Digital Versatile Disc）、ＢＤ（Blu-ray（登録商標） Disc）、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（Random Access Memory））を含む。また、プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 In the above example, the program can be stored and supplied to a computer using various types of non-transitory computer readable medium. Non-transitory computer-readable media include various types of tangible storage media. Examples of non-temporary computer-readable media include magnetic recording media (eg, flexible discs, magnetic tapes, hard disk drives), magneto-optical recording media (eg, magneto-optical discs), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W, DVD (Digital Versatile Disc), BD (Blu-ray (registered trademark) Disc), semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM ( Random Access Memory)) is included. The program may also be supplied to the computer by various types of transient computer readable medium. Examples of temporary computer-readable media include electrical, optical, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 The present invention is not limited to the above embodiment, and can be appropriately modified without departing from the spirit.

６１ＣＧ画像
６２フレーム画像
６３フレームポイントクラウド
１０１路側カメラ動画入力部
１０２教師データ作成部
１０３精密教師データ作成部
１０４学習部
１２０前景物体ＣＡＤモデルデータベース
１３０背景ＣＧモデルデータベース
３０１路側カメラ動画入力部
３０２教師データ作成部
３０３フレームＣＧモデル作成部
３０４妥当性判断部
３０５手作業修正部
３０６ネットワーク訓練部
３１０路側カメラ設置情報データベース
３１１高密度ポイントクラウド取得部
３１２カメラ映像入力部
３１３背景ＣＧモデル作成部
３２０前景物体ＣＡＤモデルデータベース
３３０背景ＣＧモデルデータベース
３４０教師データデータベース
３５０教師ＣＧモデルデータベース
３６０学習結果データベース
５０４路側ＬｉＤＡＲログ
ＴＤ教師データ
ＰＴＤ精密教師データ 61 CG image 62 Frame image 63 Frame point cloud 101 Roadside camera video input unit 102 Teacher data creation unit 103 Precision teacher data creation unit 104 Learning unit 120 Foreground object CAD model database 130 Background CG model database 301 Roadside camera video input unit 302 Teacher data Creation unit 303 Frame CG model creation unit 304 Validity judgment unit 305 Manual work correction unit 306 Network training unit 310 Roadside camera installation information database 311 High-density point cloud acquisition unit 312 Camera image input unit 313 Background CG model creation unit 320 Foreground object CAD Model database 330 Background CG model database 340 Teacher data database 350 Teacher CG model database 360 Learning result database 504 Roadside LiDAR log TD Teacher data PTD Precision teacher data

Claims

A means to store the background CG model space created in advance by the 3D point cloud and the camera image,
A means for creating a CG model by recognizing a foreground object of an image taken by a camera installed on the roadside and arranging a CAD model corresponding to the foreground object in the background CG model space.
A means for calculating the degree of similarity between the CG model and the original frame image,
A means for generating teacher data from the created CG model based on the similarity, and
A teacher data generator.