JP7136849B2

JP7136849B2 - LEARNING DATA GENERATION METHOD, HUMAN DETECTION MODEL LEARNING METHOD, AND LEARNING DATA GENERATOR

Info

Publication number: JP7136849B2
Application number: JP2020119892A
Authority: JP
Inventors: 聡飯尾; 喜一杉本; 健太中尾; 成寿亀尾; 昭伍南; 敏樹岡本
Original assignee: Mitsubishi Logisnext Co Ltd
Current assignee: Mitsubishi Logisnext Co Ltd
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2022-09-13
Anticipated expiration: 2040-07-13
Also published as: JP2022016907A

Description

本開示は、学習用データの生成方法、人検知モデルの学習方法、及び学習用データの生成装置に関する。 The present disclosure relates to a learning data generation method, a human detection model learning method, and a learning data generation device.

安全のために産業車両の周囲に位置する人を検知するための人検知技術が提案されている。例えば、特許文献１には、産業車両の周囲を撮像し、人の特徴的な部分を表す画像としてヘルメットを検知して人候補画像を抽出する周辺監視システムが開示されている。 A human detection technology has been proposed for detecting a person positioned around an industrial vehicle for safety. For example, Patent Literature 1 discloses a perimeter monitoring system that captures an image of the periphery of an industrial vehicle, detects a helmet as an image representing a characteristic part of a person, and extracts a person candidate image.

特開２０１７－１０１４２０号公報JP 2017-101420 A

ところで、産業車両の人検知システムは、販売台数が少ないうえに、利用条件が個々の客先環境によって異なるため、低コストで品質を確保することが困難である。例えば、特許文献１では、人の特徴的な部分を表す画像としてヘルメットを人検知に利用している。このような人検知技術では、ヘルメットを着用していない人がいる作業場所、障害物が多い作業場所、人が多い場所等に適用された場合に、過検知が生じたり、検知漏れが生じたりする虞がある。すなわち、人検知システムを客先環境に最適化しなければ、十分な検知精度を確保できない場合がある。 By the way, human detection systems for industrial vehicles are difficult to ensure quality at low cost because the number of sales is small and the conditions of use differ depending on the customer's environment. For example, in Patent Document 1, a helmet is used for human detection as an image representing a characteristic part of a person. Such human detection technology may cause over-detection or omission of detection when applied to work places where people are not wearing helmets, work places where there are many obstacles, places where there are many people, etc. there is a risk of In other words, unless the human detection system is optimized for the customer's environment, it may not be possible to ensure sufficient detection accuracy.

人検知システムを客先環境に最適化する方法として、深層学習を行う人検知モデルを用いることが考えられる。しかし、深層学習を用いてロバストな人検知モデルを作成するためには、実環境で撮られた多様な人画像を学習用データとして人検知モデルを学習させる必要があり、多様な人画像を収集し、更に人画像に対して正解情報（正しい人の位置情報）を作成することは手間と時間がかかる。例えば、実環境中のいろいろな場所に実際に人を配置し、それらを正解情報としてラベリングし、検知できるか否かを検証することには多大な労力を要する。 As a method of optimizing the human detection system for the customer's environment, it is conceivable to use a human detection model that performs deep learning. However, in order to create a robust human detection model using deep learning, it is necessary to train the human detection model using various human images taken in the real environment as learning data. Furthermore, it takes time and effort to create correct information (correct person's position information) for human images. For example, it takes a lot of effort to actually place people in various places in the real environment, label them as correct information, and verify whether they can be detected.

したがって、正解情報を含む多様な学習用データを生成するための手間を削減し、生成時間を短縮化することが求められる。なお、ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｈｓ）技術のみを用いて多数の仮想的人画像を生成することで、短時間で学習用データを増やすことも考えられるが、実画像との乖離が大きいため、十分な人検知精度を確保することが困難である。 Therefore, it is required to reduce the labor for generating various learning data including correct information and to shorten the generation time. It is conceivable to increase the learning data in a short time by generating a large number of virtual human images using only CG (computer graphics) technology. It is difficult to ensure detection accuracy.

上述の事情に鑑みて、本開示は、正解情報を含む多様な学習用データを生成するための生成時間を短縮化し、人検知精度を向上可能な学習用データを生成することを目的とする。 In view of the circumstances described above, an object of the present disclosure is to shorten the generation time for generating various types of learning data including correct information, and to generate learning data capable of improving human detection accuracy.

本開示に係る学習用データの生成方法は、
機械学習を行う人検知モデルの学習用データの生成方法であって、
作業場所を走行しながら撮像した実画像に基づいて三次元点群の環境モデルを生成するステップと、
前記環境モデルが示す三次元空間内に一以上の人モデルを配置した三次元合成モデルを生成するステップと、
前記三次元合成モデルに基づいて前記学習用データを生成するステップと、
を含む。 A method for generating learning data according to the present disclosure includes:
A method for generating learning data for a human detection model that performs machine learning, comprising:
a step of generating a three-dimensional point cloud environment model based on actual images captured while traveling in a work place;
generating a three-dimensional synthetic model in which one or more human models are arranged in a three-dimensional space represented by the environment model;
generating the learning data based on the three-dimensional synthetic model;
including.

本開示に係る人検知モデルの学習方法は、
上記の学習用データの生成方法によって学習用データを生成するステップと、
生成された前記学習用データを用いて人検知モデルを学習させるステップと、
を含む。 The human detection model learning method according to the present disclosure includes:
a step of generating learning data by the method for generating learning data;
a step of learning a human detection model using the generated learning data;
including.

本開示に係る学習用データの生成装置は、
機械学習を行う人検知モデルの学習用データの生成装置であって、
作業場所を走行しながら撮像した実画像に基づいて三次元点群の環境モデルを生成する環境モデル生成部と、
前記環境モデルが示す三次元空間内に一以上の人モデルを配置した三次元合成モデルを生成するデータ合成部と、
前記三次元合成モデルに基づいて前記学習用データを生成するデータ生成部と、
を備える。 The learning data generation device according to the present disclosure includes:
A device for generating learning data for a human detection model that performs machine learning,
an environment model generation unit that generates a three-dimensional point cloud environment model based on actual images captured while traveling in a work place;
a data synthesizing unit that generates a three-dimensional synthesized model in which one or more human models are arranged in a three-dimensional space indicated by the environment model;
a data generation unit that generates the learning data based on the three-dimensional synthetic model;
Prepare.

本開示によれば、正解情報を含む多様な学習用データを生成するための生成時間を短縮化し、人検知精度を向上可能な学習用データを生成することができる。 Advantageous Effects of Invention According to the present disclosure, it is possible to shorten the generation time for generating various types of learning data including correct information, and generate learning data capable of improving human detection accuracy.

一実施形態に係る学習用データの生成装置と人検知装置とを含む人検知システムの機能的構成を概略的に示すブロック図である。1 is a block diagram schematically showing a functional configuration of a human detection system including a learning data generation device and a human detection device according to an embodiment; FIG. 一実施形態に係る学習用データの生成装置のハードウェア構成を概略的に示すブロック図である。1 is a block diagram schematically showing the hardware configuration of a learning data generation device according to an embodiment; FIG. 一実施形態に係る学習用データの生成方法の手順を示すフローチャートである。4 is a flow chart showing the procedure of a method for generating learning data according to an embodiment; 一実施形態に係る学習用データの生成装置が生成する学習用データを示す概略図である。FIG. 2 is a schematic diagram showing learning data generated by a learning data generation device according to an embodiment; 一実施形態に係る学習用データの生成装置が取得した実画像の一例を示す概略図である。FIG. 4 is a schematic diagram showing an example of an actual image acquired by a learning data generation device according to an embodiment; 一実施形態に係る学習用データの生成装置が属性を付与した画像データの一例を示す概略図である。FIG. 4 is a schematic diagram showing an example of image data to which attributes are added by the learning data generation device according to the embodiment;

以下、添付図面を参照して幾つかの実施形態について説明する。ただし、実施形態として記載されている又は図面に示されている構成部品の寸法、材質、形状、その相対的配置等は、発明の範囲をこれに限定する趣旨ではなく、単なる説明例にすぎない。
例えば、「ある方向に」、「ある方向に沿って」、「平行」、「直交」、「中心」、「同心」或いは「同軸」等の相対的或いは絶対的な配置を表す表現は、厳密にそのような配置を表すのみならず、公差、若しくは、同じ機能が得られる程度の角度や距離をもって相対的に変位している状態も表すものとする。
例えば、「同一」、「等しい」及び「均質」等の物事が等しい状態であることを表す表現は、厳密に等しい状態を表すのみならず、公差、若しくは、同じ機能が得られる程度の差が存在している状態も表すものとする。
例えば、四角形状や円筒形状等の形状を表す表現は、幾何学的に厳密な意味での四角形状や円筒形状等の形状を表すのみならず、同じ効果が得られる範囲で、凹凸部や面取り部等を含む形状も表すものとする。
一方、一の構成要素を「備える」、「具える」、「具備する」、「含む」、又は、「有する」という表現は、他の構成要素の存在を除外する排他的な表現ではない。 Several embodiments will now be described with reference to the accompanying drawings. However, the dimensions, materials, shapes, relative arrangements, etc. of the components described as embodiments or shown in the drawings are not intended to limit the scope of the invention, but are merely illustrative examples. .
For example, expressions denoting relative or absolute arrangements such as "in a direction", "along a direction", "parallel", "perpendicular", "center", "concentric" or "coaxial" are strictly not only represents such an arrangement, but also represents a state of relative displacement with a tolerance or an angle or distance to the extent that the same function can be obtained.
For example, expressions such as "identical", "equal", and "homogeneous", which express that things are in the same state, not only express the state of being strictly equal, but also have tolerances or differences to the extent that the same function can be obtained. It shall also represent the existing state.
For example, expressions that express shapes such as squares and cylinders do not only represent shapes such as squares and cylinders in a geometrically strict sense, but also include irregularities and chamfers to the extent that the same effect can be obtained. The shape including the part etc. shall also be represented.
On the other hand, the expressions "comprising", "comprising", "having", "including", or "having" one component are not exclusive expressions excluding the presence of other components.

（人検知システムの全体構成）
以下、一実施形態に係る産業車両の人検知システム４００の構成について説明する。図１は、一実施形態に係る学習用データの生成装置１００と、人検知装置２００と、当該人検知装置２００を搭載した産業車両３００とを含む人検知システム４００の機能的構成を概略的に示すブロック図である。 (Overall configuration of human detection system)
The configuration of the industrial vehicle human detection system 400 according to one embodiment will be described below. FIG. 1 schematically shows a functional configuration of a human detection system 400 including a learning data generation device 100 according to an embodiment, a human detection device 200, and an industrial vehicle 300 equipped with the human detection device 200. It is a block diagram showing.

人検知装置２００は、撮像画像に含まれる人を検知するための人検知モデル２１０と、人の検知結果を出力するための出力部２２０とを備える。人検知装置２００は、深層学習を行う人検知モデル２１０に正解情報を含む学習用データ（学習用画像）を学習させ、それを利用して実際の撮像画像に含まれる人を検知するように構成される。人検知装置２００は、フォークリフトなどの産業車両３００に搭載され、産業車両３００の所定位置に設けられたカメラなどの撮像部３１０が取得する作業場所の撮像画像を、学習済みの人検知モデル２１０に入力することによって、産業車両３００の周囲の人を検知するように構成される。出力部２２０は、人検知モデル２１０によって産業車両３００の周囲の人が検知された場合に、当該検知結果を産業車両３００の報知部３２０に出力する。報知部３２０は、画面表示、警告灯、警告音声などの手段によって、産業車両３００と人との接触の危険性を搭乗者に報知するように構成される。学習用データの生成装置１００は、人検知装置２００の人検知モデル２１０の学習用データを生成するための装置である。 The human detection device 200 includes a human detection model 210 for detecting a person included in a captured image, and an output unit 220 for outputting the human detection result. The human detection device 200 is configured to allow a human detection model 210 that performs deep learning to learn learning data (learning image) including correct information, and to detect a person included in an actual captured image using the learning data. be done. The human detection device 200 is mounted on an industrial vehicle 300 such as a forklift. The input is configured to detect people around the industrial vehicle 300 . When the human detection model 210 detects a person around the industrial vehicle 300 , the output unit 220 outputs the detection result to the notification unit 320 of the industrial vehicle 300 . The notification unit 320 is configured to notify the passenger of the risk of contact between the industrial vehicle 300 and a person by means of screen display, warning light, warning sound, or the like. The learning data generation device 100 is a device for generating learning data for the human detection model 210 of the human detection device 200 .

図２は、一実施形態に係る学習用データの生成装置１００のハードウェア構成を概略的に示すブロック図である。学習用データの生成装置１００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等のプロセッサ７２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）７４と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）７６と、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）７８と、入力Ｉ／Ｆ８０と、出力Ｉ／Ｆ８２と、を含み、これらがバス８４を介して互いに接続されたコンピュータを用いて構成される。人検知装置２００もこれと同様の構成であってもよい。 FIG. 2 is a block diagram schematically showing the hardware configuration of the learning data generation device 100 according to one embodiment. The learning data generation device 100 includes a processor 72 such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit), a RAM (Random Access Memory) 74, a ROM (Read Only Memory) 76, and a HDD (Hard Disk Drive). 78 , an input I/F 80 and an output I/F 82 , which are connected to each other via a bus 84 using a computer. The human detection device 200 may also have a configuration similar to this.

学習用データの生成装置１００及び人検知装置２００は、プロセッサ７２がＲＯＭ７６やＲＡＭ７４等のメモリに記憶されているプログラムを実行することにより、後述する各種機能を実現する。なお、学習用データの生成装置１００及び人検知装置２００は、一体であってもよいし、別体であってもよい。別体である場合、学習用データの生成装置１００は、産業車両３００外部の学習用環境で利用される情報処理装置であってもよい。この場合、図１に示すように、学習済みの人検知装置２００のみが産業車両３００に搭載されることとなる。 The learning data generation device 100 and the human detection device 200 implement various functions described later by the processor 72 executing programs stored in memories such as the ROM 76 and the RAM 74 . The learning data generation device 100 and the human detection device 200 may be integrated or separated. If it is separate, the learning data generation device 100 may be an information processing device used in a learning environment outside the industrial vehicle 300 . In this case, as shown in FIG. 1 , only the trained human detection device 200 is mounted on the industrial vehicle 300 .

図１に示すように、学習用データの生成装置１００は、人検知装置２００が搭載された産業車両３００が実際に運用される作業場所を走行しながら撮像した実画像を取得するように構成された実画像取得部１１０と、取得した実画像および当該実画像の取得位置などの情報に基づいて三次元点群からなる環境モデルを生成する環境モデル生成部１２０と、生成した環境モデルが示す三次元空間内に一以上の仮想的な人モデルを配置した三次元合成モデルを生成するデータ合成部１３０と、所定の視点位置から当該三次元合成モデルを観測した場合の画像を学習用データ（学習用画像）として生成するデータ生成部１４０と、を備える。 As shown in FIG. 1, the learning data generation device 100 is configured to acquire actual images captured while an industrial vehicle 300 equipped with a human detection device 200 is traveling in a work place where the human detection device 200 is actually operated. an actual image acquisition unit 110, an environment model generation unit 120 that generates an environment model consisting of a three-dimensional point cloud based on information such as the acquired actual image and the acquisition position of the actual image; A data synthesizing unit 130 that generates a three-dimensional synthetic model in which one or more virtual human models are arranged in the original space, and an image of the three-dimensional synthetic model observed from a predetermined viewpoint position as learning data (learning data). and a data generation unit 140 that generates a data image).

実画像取得部１１０は、例えば、産業車両３００に設けられた撮像部３１０やデータ測定用の試験走行車などから、作業場所の実画像を取得するように構成される。実画像の取得は、例えば、産業車両３００の導入前に試験走行車を周回させることによって取得しても良いし、産業車両３００を導入後、実際の荷役作業を行いながら取得しても良い。環境モデル生成部１２０は、例えばＶ－ＳＬＡＭ（ＶｉｓｕａｌＳｉｍｕｌｔａｎｅｏｕｓＬｏｃａｌｉｚａｔｉｏｎａｎｄＭａｐｐｉｎｇ）技術やＳｆＭ（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ）技術によって作業場所の特徴点を抽出し、実画像取得部１１０が取得した実画像およびその取得位置などの情報に基づき、三次元点群からなる環境モデルを生成するように構成される。環境モデルを構成する三次元点群は、例えば、抽出された特徴点の三次元座標に加えて、当該三次元座標に係る色相、彩度、輝度などの付随情報を含むものであってもよい。 The real image acquisition unit 110 is configured to acquire an actual image of the work place from, for example, the imaging unit 310 provided in the industrial vehicle 300 or a test vehicle for data measurement. Acquisition of the actual image may be performed, for example, by running a test vehicle before introduction of the industrial vehicle 300, or may be acquired while performing actual cargo handling work after the introduction of the industrial vehicle 300. The environment model generation unit 120 extracts feature points of the work place by, for example, V-SLAM (Visual Simultaneous Localization and Mapping) technology or SfM (Structure from Motion) technology, and obtains the actual image acquired by the actual image acquisition unit 110 and its acquisition position. Based on such information, it is configured to generate an environment model consisting of a three-dimensional point cloud. The three-dimensional point group that constitutes the environment model may include, for example, in addition to the three-dimensional coordinates of the extracted feature points, additional information such as hue, saturation, and brightness related to the three-dimensional coordinates. .

データ合成部１３０は、予め記憶されている仮想的な人モデルを環境モデル内に配置（合成）し、三次元合成モデルを生成するように構成される。人モデルは、ＣＧ技術によって生成されるものであり、環境モデルと同様に人の特徴点を三次元点群として規定したものであっても良いし、人の存在範囲を物体の面として規定したもの、具体的には、３Dモデリングソフトで作り出した三次元オブジェクトであっても良い。データ合成部１３０が配置する人モデルには、後述する人としての属性情報が予め付与されているものとする。人モデルとしては、産業車両３００が実際に運用される作業場所を想定して、様々な服装、姿勢、身長、性別の人をパターン化して用意しておくとよい。人モデルは、配置場所に応じて、適宜拡大、縮小、回転、色調整などの処理を行ったうえで環境モデルに合成される。データ合成部１３０は、実画像を取得するときの産業車両３００の走行履歴に基づいて、人モデルを配置する領域として、産業車両３００の走行頻度が高い経路を走行頻度が低い経路よりも優先的に選択するように構成されてもよい。データ合成部１３０は、後述する属性付与処理において人の属性を示す点群または面が多く抽出される領域を優先的に選択して人モデルを配置するように構成されてもよい。 The data synthesizing unit 130 is configured to arrange (synthesize) pre-stored virtual human models in the environment model to generate a three-dimensional synthetic model. The human model is generated by CG technology, and may be defined as a three-dimensional point group of characteristic points of a person in the same way as the environment model, or defined as a surface of an object representing the existence range of the person. It may be a thing, specifically a three-dimensional object created by 3D modeling software. It is assumed that attribute information as a person, which will be described later, is given in advance to the human model arranged by the data synthesizing unit 130 . As human models, it is preferable to prepare patterns of people with various clothes, postures, heights, and genders, assuming a work place where the industrial vehicle 300 is actually operated. The human model is appropriately enlarged, reduced, rotated, and color-adjusted according to the place of placement, and then combined with the environment model. Based on the travel history of the industrial vehicle 300 when the actual image is acquired, the data synthesizing unit 130 prioritizes routes in which the industrial vehicle 300 travels frequently over routes in which the industrial vehicle 300 travels less frequently as a region for arranging the human model. may be configured to select to The data synthesizing unit 130 may be configured to preferentially select an area in which a large number of point groups or surfaces indicating a person's attribute are extracted in an attribute assignment process, which will be described later, and arrange the human model.

データ生成部１４０は、データ合成部１３０が生成した三次元合成モデルに基づき、所定の視点位置から当該三次元合成モデルを観測した画像を学習用データ（学習用画像）として生成する。ここで想定する画像は、データ合成部１３０が配置した人モデルが産業車両３００の周囲に映り込んだ状態の画像であり、後述する正解情報のラベリングが付された状態で、学習用データ（学習用画像）として人検知モデル２１０の深層学習の用に供される。三次元合成モデルから学習用画像を生成する際には、環境モデルで表現される作業場所内の位置、産業車両３００に設けられる撮像部３１０の設置高さ、設置角度、画角、焦点距離などの視点位置情報が考慮される。 Based on the 3D synthetic model generated by the data synthesizing unit 130, the data generation unit 140 generates an image obtained by observing the 3D synthetic model from a predetermined viewpoint position as learning data (learning image). The image assumed here is an image in which the human model placed by the data synthesizing unit 130 is reflected around the industrial vehicle 300, and the learning data (learning It is used for deep learning of the human detection model 210 as a human image). When generating a learning image from a three-dimensional synthetic model, the position in the work place represented by the environment model, the installation height, installation angle, angle of view, focal length, etc. of the imaging unit 310 provided in the industrial vehicle 300 viewpoint position information is considered.

学習用データの生成装置１００は、環境モデルを構成する三次元点群の少なくとも一部の点群に属性を付与するように構成された属性付与部１５０と、属性付与における物体の面又は物体の面をなす点群を物体単位で配置変更、削除、又は複製するための処理をするように構成された処理部１６０と、人を示す属性が付与された点群又は物体の面を、人検知モデル２１０を学習する際の正解情報としてラベリングするように構成された正解情報設定部１７０と、をさらに備える。 The learning data generating device 100 includes an attribute assigning unit 150 configured to assign attributes to at least a part of the three-dimensional point cloud that constitutes the environment model, and an object surface or object in attribute assignment. A processing unit 160 configured to perform processing for rearranging, deleting, or duplicating a point cloud forming a surface on an object-by-object basis; A correct answer information setting unit 170 configured to perform labeling as correct answer information when learning the model 210 is further provided.

属性付与部１５０は、セマンテックセグメンテーション技術によって属性を付与するように構成されてもよい。属性付与部１５０は、環境モデルにおける各三次元点群の属性を求める。属性は、例えば、人、フォークリフト、トラック、棚、道路、建物等である。属性付与部１５０は、同じ属性の点を結んでメッシュ状の面を形成してもよい。属性付与部１５０は、例えば、同じ属性の付与された点群が連続的に並んでいる場合に、それらをグルーピングして物体の面として識別するように構成されてもよい。すなわち、属性付与部１５０は、付与した属性に基づいて物体の面を識別してもよい。属性付与部１５０が属性を付与する物体は、人検知装置２００の検知対象である人であってもよいし、検知対象ではない物品、構造物、動物等であってもよい。 The attribute assigner 150 may be configured to assign attributes by a semantic segmentation technique. The attribute assignment unit 150 obtains attributes of each 3D point group in the environment model. Attributes are, for example, people, forklifts, trucks, shelves, roads, buildings, and the like. The attribute assigning unit 150 may connect points with the same attribute to form a mesh surface. For example, the attribute assigning unit 150 may be configured to group and identify a surface of an object when groups of points assigned with the same attribute are continuously arranged. That is, the attribute assigning section 150 may identify the surface of the object based on the assigned attribute. An object to which the attribute assigning unit 150 assigns an attribute may be a person who is a detection target of the human detection device 200, or may be an article, a structure, an animal, or the like which is not a detection target.

環境モデルを生成するための実画像を取得する際には人が映り込んでいる場合があるが、この映り込んだ人を背景として扱った環境モデルに更に人モデルを合成し三次元合成モデルとして学習用データを生成すると、背景として映り込んだ人は正解情報としてラベリングされない一方、配置した人モデルについては正解情報としてラベリングされるという状況が生じるため人検知モデル２１０の学習効率を下げ、また人検知の精度に影響する可能性がある。従って、実画像に映り込んだ人については属性を付与しこれを人として扱う。これにより、実画像に映り込んだ人を示す点群又は物体の面については環境モデル（三次元合成モデル）の背景として扱われないように処理される。なお、データ合成部１３０が配置する人モデルについては人であることが既知であるため、属性付与部１５０に拠らず、当該人モデルを構成する点群または物体の面に対して自動的に人としての属性が付与される。 When acquiring the actual image for generating the environment model, there are cases where people are reflected in the image. When the training data is generated, a situation arises in which a person reflected in the background is not labeled as correct information, while the arranged human model is labeled as correct information. It may affect detection accuracy. Therefore, a person who appears in the real image is assigned an attribute and treated as a person. As a result, the point cloud representing the person or the surface of the object captured in the real image is processed so as not to be treated as the background of the environment model (three-dimensional synthetic model). Since it is already known that the human model arranged by the data synthesizing unit 130 is a person, the point cloud or the surface of the object that constitutes the human model is automatically Attributes as a person are given.

処理部１６０は、属性付与された物体の編集操作を受け付けるように構成されてもよい。これにより、環境モデル及び三次元合成モデルに含まれる物体の面又はその点群を物体単位で識別し、物体単位で配置変更、削除、又は複製などの編集をすることが可能となる。そのため、学習用データの生成において背景や正解情報の編集を行うことができる。例えば、普段物体が存在しない領域におかれた物体（人を含む）を削除したり、よく物体が存在する場所に物体（人を含む）を移動又は追加したりすることができる。例えば、処理部１６０は、実画像に映り込んだ人について環境モデル上で属性付与がなされていれば、これを排除することも、新規の人モデルとして環境モデル内の任意の場所に移動させることも可能となる。これにより、三次元環境モデル構築のための走行時に人を排除する必要がなくなる。 The processing unit 160 may be configured to receive an editing operation for an object to which attributes have been assigned. This makes it possible to identify object surfaces or point groups thereof included in the environment model and the three-dimensional composite model for each object, and to edit, such as rearrangement change, deletion, or duplication, for each object. Therefore, it is possible to edit the background and correct answer information in generating the learning data. For example, an object (including a person) placed in an area where no object usually exists can be deleted, or an object (including a person) can be moved or added to a location where an object often exists. For example, the processing unit 160 can remove attributes of a person who appears in the real image on the environment model, or move the person as a new person model to an arbitrary location in the environment model. is also possible. As a result, there is no need to exclude people during travel for constructing a three-dimensional environment model.

正解情報設定部１７０は、三次元合成モデルにおいて人を示す属性が付与された点群又は物体の面を正解情報としてラベリングするように構成される。このラベリング情報は、データ生成部１４０が三次元合成モデルに基づき生成する学習用画像においても反映される。なお、正解情報設定部１７０は、三次元合成モデルに基づき学習用画像を生成した後に正解情報のラベリングを行うように構成されていても良い。 The correct information setting unit 170 is configured to label, as correct information, a point cloud or a surface of an object to which an attribute indicating a person is assigned in the three-dimensional synthetic model. This labeling information is also reflected in the learning images generated by the data generator 140 based on the three-dimensional synthetic model. Note that the correct information setting unit 170 may be configured to perform labeling of correct information after generating a learning image based on the three-dimensional synthetic model.

（学習用データの生成方法）
以下、一実施形態に係る学習用データの生成方法について説明する。図３は、一実施形態に係る学習用データの生成方法の手順を示すフローチャートである。 (Method of generating learning data)
A method of generating learning data according to an embodiment will be described below. FIG. 3 is a flow chart showing procedures of a method for generating learning data according to an embodiment.

図３に示すように、学習用データの生成装置１００（実画像取得部１１０）は、作業場所を走行しながら撮像した実画像を取得する（ステップＳ１）。密な三次元点群の環境モデルを生成するために、実画像は、作業場所で何周も走行して取得することが好ましい。例えば、複数回の走行によって取得した三次元点群を重ね合わせることにより、密な三次元点群の環境モデルを生成することが可能となる。また、密な三次元点群の環境モデルを生成するために、人検知に使用する画像よりも高解像度の撮像装置によって環境モデル生成用の実画像を取得してもよい。実画像は、作業場所で通常通りの業務が行われている状況で取得されてもよい。 As shown in FIG. 3, the learning data generation device 100 (actual image acquisition unit 110) acquires an actual image captured while traveling in a work place (step S1). In order to generate a dense 3D point cloud environment model, the real images are preferably acquired in a number of laps around the work site. For example, by superimposing three-dimensional point clouds obtained by multiple runs, it is possible to generate a dense three-dimensional point cloud environment model. Also, in order to generate an environment model of a dense three-dimensional point group, a real image for environment model generation may be acquired by an imaging device having a higher resolution than the image used for human detection. The actual image may be captured in a normal business environment at the work site.

学習用データの生成装置１００（環境モデル生成部１２０）は、取得した実画像に基づいて三次元点群の環境モデルを生成する（ステップＳ２）。学習用データの生成装置１００（属性付与部１５０）は、環境モデルの三次元点群の少なくとも一部の点群に属性を付与する（ステップＳ３）。学習用データの生成装置１００（属性付与部１５０）は、この属性に基づいて物体の面を識別する。 The learning data generation device 100 (environment model generation unit 120) generates a three-dimensional point cloud environment model based on the acquired real image (step S2). The learning data generation device 100 (attribute assigning unit 150) assigns attributes to at least a part of the three-dimensional point group of the environment model (step S3). The learning data generation device 100 (attribute assigning unit 150) identifies the surface of the object based on this attribute.

ここで、学習用データの生成装置１００（属性付与部１５０）は、人を示す属性の物体があるか否かを判別する（ステップＳ４）。人を示す属性の物体があると判別した場合（ステップＳ４；Ｙｅｓ）、学習用データの生成装置１００（処理部１６０）は、その物体又はその物体の点群に対する処理内容（配置変更、削除、又は複製等）を決定する（ステップＳ５）。具体的には、学習用データの生成装置１００（処理部１６０）は、人の属性を示す物体又はその物体の点群を環境モデルから排除するか、任意の場所に移動させるかなどを決定する。人を示す属性の物体がないと判別した場合（ステップＳ４；Ｎｏ）、学習用データの生成装置１００は、ステップＳ５をスキップする。 Here, the learning data generation device 100 (attribute assigning unit 150) determines whether or not there is an object with an attribute indicating a person (step S4). When it is determined that there is an object with an attribute indicating a person (step S4; Yes), the learning data generation device 100 (processing unit 160) performs processing on the object or the point group of the object (arrangement change, deletion, or duplication) is determined (step S5). Specifically, the learning data generation device 100 (processing unit 160) determines whether an object indicating a person's attribute or a point group of the object should be excluded from the environment model or moved to an arbitrary location. . If it is determined that there is no object with the attribute indicating a person (step S4; No), the learning data generation device 100 skips step S5.

学習用データの生成装置１００（データ合成部１３０）は、環境モデルが示す三次元空間内に一以上の人モデルを配置し、三次元合成モデルを生成する（ステップＳ６）。ここで、配置する人モデルを構成する点群又は物体の面には予め人を示す属性が付与されている。学習用データの生成装置１００（正解情報設定部１７０）は、人の属性を示す物体又はその物体の点群を正解情報としてラベリングする（ステップＳ７）。学習用データの生成装置１００（データ生成部１４０）は、人モデルを配置した三次元合成モデルに基づき、正解情報がラベリングされた学習用データを生成する（ステップＳ８）。人モデルは、実画像取得時の産業車両３００の走行履歴に基づいて、走行頻度が高い経路の付近（例えば、移動経路から５ｍ以内の位置）に配置されてもよい。人モデルは、人通りが多い場所（例えば、建物や部屋の出入口）に優先的に配置されてもよい。このような人通りが多い場所は、ステップＳ４で人の属性を示す物体が検知された数や頻度に基づいて特定されてもよい。 The learning data generation device 100 (data synthesizing unit 130) arranges one or more human models in the three-dimensional space represented by the environment model to generate a three-dimensional synthetic model (step S6). Here, an attribute indicating a person is given in advance to the point group or the surface of the object that constitutes the human model to be arranged. The learning data generation device 100 (correct answer information setting unit 170) labels an object indicating a person's attribute or a point group of the object as correct answer information (step S7). The learning data generation device 100 (data generation unit 140) generates learning data labeled with correct information based on the three-dimensional synthetic model in which the human model is arranged (step S8). Based on the travel history of the industrial vehicle 300 at the time of acquiring the actual image, the human model may be placed near a route with high travel frequency (for example, a position within 5 m from the travel route). Human models may be preferentially placed in places where there is a lot of foot traffic (for example, entrances and exits of buildings and rooms). Such a place with a lot of foot traffic may be identified based on the number or frequency of detection of objects indicating human attributes in step S4.

（環境モデルの具体例）
以下、環境モデルの具体例を説明する。 (Concrete example of environmental model)
A specific example of the environment model will be described below.

図４は、一実施形態に係る学習用データの生成装置１００の環境モデル生成部１２０が生成した環境モデルの一例を示す概略図である。図５は、一実施形態に係る学習用データの生成装置１００の属性付与部１５０が属性を付与した環境モデルの一例を示す概略図である。 FIG. 4 is a schematic diagram showing an example of an environment model generated by the environment model generation unit 120 of the learning data generation device 100 according to one embodiment. FIG. 5 is a schematic diagram showing an example of an environment model to which attributes are assigned by the attribute assigning unit 150 of the learning data generating apparatus 100 according to one embodiment.

図４に示す例は、倉庫内で撮影した実画像に基づき生成された環境モデルである。この環境モデルには、人４０と積荷５０と壁６０と床７０とが含まれている。このままの状態で環境モデル中に新たに人モデルを配置すると、配置した人モデルには人としての属性が付与され正解情報としてラベリングされる一方、実画像に映り込んでしまった人４０は環境モデルを構成する背景の一部として扱われる虞があるため、人４０に属性を付与して環境モデルから排除する、任意の場所に移動するなどの処理を行う必要がある。なお、属性を付与した人４０の配置に対して特段の操作を必要としない場合は、当該人４０を構成する点群又は物体の面をそのまま正解情報としてラベリングしても構わない。 The example shown in FIG. 4 is an environment model generated based on an actual image taken in a warehouse. The environment model includes people 40, cargo 50, walls 60 and floor 70. If a new human model is placed in the environment model in this state, the placed human model is assigned a human attribute and labeled as correct information. Therefore, it is necessary to assign an attribute to the person 40 to exclude it from the environment model, or move it to an arbitrary location. Note that if no special operation is required for the placement of the person 40 to whom attributes have been assigned, the point cloud or the surface of the object that constitutes the person 40 may be labeled as it is as the correct information.

図５は、図４に示す環境モデルに対して、属性付与部１５０が属性を付与した後の状態を模式的に示している。環境モデルに含まれる物体である人４０と積荷５０と壁６０と床７０とには、それぞれ異なる属性が付与されている。この図では、付与した属性をビジュアル化するために、ハッチングで区別している。しかし、このような画像処理は必須ではない。属性付与部１５０は、三次元点群の各領域に対して、メタ情報として属性を付与してもよいし、符号や識別子を付加してもよい。 FIG. 5 schematically shows a state after the attribute assigning unit 150 assigns attributes to the environment model shown in FIG. A person 40, a load 50, a wall 60, and a floor 70, which are objects included in the environment model, are given different attributes. In this figure, they are distinguished by hatching in order to visualize the assigned attributes. However, such image processing is not essential. The attribute assigning unit 150 may assign an attribute as meta information, or may add a code or identifier to each region of the three-dimensional point group.

図５に示すように、物体の面を識別可能に、属性が付与される。属性を付与することで、処理部１６０が物体単位で配置変更、削除、追加などの編集をすることが可能となる。人４０の領域については、その配置が産業車両３００の実走行時の状態から考えて違和感のないものであれば、そのまま正解情報としてラベリングしても良いし、そのままの配置では実運用時との乖離があるようであれば、環境モデルから排除したり、拡大、縮小、回転などの処理を施したうえで別の場所に移動させたりしても良い。 As shown in FIG. 5, attributes are assigned to the surfaces of the object so that they can be identified. By assigning attributes, the processing unit 160 can edit, such as changing the layout, deleting, and adding, on an object-by-object basis. As for the area of the person 40, if the arrangement does not cause a sense of incongruity when considering the state of the industrial vehicle 300 when it is actually running, it may be labeled as correct information as it is, or the arrangement as it is may be different from that during actual operation. If there is a discrepancy, it may be removed from the environment model, or it may be moved to a different location after processing such as enlargement, reduction, or rotation.

（学習用データの具体例）
以下、学習用データの具体例を説明する。 (Specific example of learning data)
A specific example of the learning data will be described below.

図６は、一実施形態に係る学習用データの生成装置１００のデータ生成部１４０が生成する学習用データ（学習用画像）を示す概略図である。図６に示す学習用画像は、環境モデル生成部１２０によって生成された環境モデルにデータ合成部１３０が人モデルを合成して得られた三次元合成モデルに基づいて、データ生成部１４０が生成したものである。図６においては、三次元合成モデルから学習用画像を生成する際の視点位置として、産業車両３００の上方に設置された撮像部３１０の位置を想定しているため、産業車両３００の一部が映り込むとともに、道路に駐車中の車両１０と、産業車両の周囲に存在する人２０とが含まれている。この人２０の画像は、データ合成部１３０が環境モデル内に配置した人モデルを所定の視点位置から観測した結果得られるものである。学習用データの生成装置１００は、三次元合成モデルにおいて人モデルに対して属性を付与して正解情報としてラベリングしているため、生成した学習用画像においても容易に人２０を含む領域を正解情報として特定し、学習用データとして人検知モデル２１０に入力することができる。 FIG. 6 is a schematic diagram showing learning data (learning image) generated by the data generating unit 140 of the learning data generating apparatus 100 according to one embodiment. The learning image shown in FIG. 6 is generated by the data generation unit 140 based on a three-dimensional synthetic model obtained by synthesizing the environment model generated by the environment model generation unit 120 with the human model by the data synthesis unit 130. It is. In FIG. 6, the position of the imaging unit 310 installed above the industrial vehicle 300 is assumed as the viewpoint position when generating the learning image from the three-dimensional synthetic model. A vehicle 10 parked on the road and a person 20 existing around the industrial vehicle are included. This image of the person 20 is obtained as a result of observing the person model placed in the environment model by the data synthesizing unit 130 from a predetermined viewpoint position. Since the learning data generation device 100 assigns attributes to the human model in the three-dimensional synthetic model and labels it as correct information, it is possible to easily identify a region including the person 20 in the generated learning image as correct information. , and input to the human detection model 210 as learning data.

本開示は上述した実施形態に限定されることはなく、上述した実施形態に変形を加えた形態や、複数の実施形態を適宜組み合わせた形態も含む。例えば、学習用データの生成方法は、図３に示す例に限定されない。各ステップの順序が変更されてもよいし、一部のステップが省略されてもよい。なお、上述した学習用データの生成方法によって生成した学習用データは、人検知モデル２１０に提供され、人検知モデル２１０は、学習用データに基づく機械学習（深層学習）を実行する。深層学習の手段については、例えば、ＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＭｕｌｔｉｂｏｘＤｅｔｅｃｔｏｒ）などの手法を採用することができるが、これに限定されるものではなく、任意の物体検出アルゴリズムを用いて構わない。学習済みの人検知モデルは、例えば、作業場所で取得した撮像画像を入力情報とし、当該撮像画像に含まれる物体の位置（矩形により特定）、クラス（人、車両など）及びそのクラスに属する確率などを出力情報とするものであってもよい。 The present disclosure is not limited to the above-described embodiments, and includes modifications of the above-described embodiments and modes in which a plurality of embodiments are appropriately combined. For example, the method of generating learning data is not limited to the example shown in FIG. The order of each step may be changed, and some steps may be omitted. The learning data generated by the learning data generation method described above is provided to the human detection model 210, and the human detection model 210 executes machine learning (deep learning) based on the learning data. For deep learning means, for example, a method such as SSD (Single Shot Multibox Detector) can be adopted, but the method is not limited to this, and any object detection algorithm may be used. The trained human detection model uses, for example, a captured image acquired at a work place as input information, the position of an object included in the captured image (specified by a rectangle), the class (person, vehicle, etc.), and the probability of belonging to that class. and the like may be used as output information.

（まとめ）
上記各実施形態に記載の内容は、例えば以下のように把握される。 (summary)
The contents described in each of the above embodiments are understood as follows, for example.

（１）本開示に係る学習用データの生成方法は、
機械学習を行う人検知モデル（２１０）の学習用データの生成方法であって、
作業場所を走行しながら撮像した実画像に基づいて三次元点群の環境モデルを生成するステップと、
前記環境モデルが示す三次元空間内に一以上の人モデル（２０）を配置した三次元合成モデルを生成するステップと、
前記三次元合成モデルに基づいて前記学習用データを生成するステップと、
を含む。 (1) The method of generating learning data according to the present disclosure includes:
A method for generating learning data for a human detection model (210) that performs machine learning,
a step of generating a three-dimensional point cloud environment model based on actual images captured while traveling in a work place;
generating a three-dimensional composite model in which one or more human models (20) are arranged in a three-dimensional space indicated by the environment model;
generating the learning data based on the three-dimensional synthetic model;
including.

上記方法によれば、実画像に基づく環境モデルを背景として学習用データを生成するため、作業場所すなわち客先環境における背景との乖離を小さくできる。これにより、人検知精度を向上可能な学習用データを生成できる。また、環境モデルの画像空間内に人モデル（２０）を配置する際に、配置の位置や数を工夫して正解情報を含む多様な学習用データを生成することができる。そのため、人の配置パターンごとの実画像を収集して正解情報を生成する場合に比べて学習用データの生成時間を大幅に短縮でき、かつ正解情報を含む多様な学習用データによる深層学習が可能となる。 According to the above method, since the learning data is generated with the environment model based on the actual image as the background, it is possible to reduce the divergence from the background in the work place, that is, the customer's environment. As a result, it is possible to generate learning data capable of improving human detection accuracy. In addition, when arranging the human model (20) in the image space of the environment model, it is possible to generate various learning data including correct information by devising the position and number of the arrangement. As a result, the time required to generate training data can be significantly reduced compared to the case where correct information is generated by collecting actual images of each person's placement pattern, and deep learning can be performed using a variety of training data including correct information. becomes.

（２）幾つかの実施形態では、上記（１）に記載の方法において、
前記環境モデルの前記三次元点群の少なくとも一部の点群に属性を付与し、前記属性に基づいて物体の面を識別するステップを含む。 (2) In some embodiments, in the method described in (1) above,
Attributing attributes to at least a portion of the three-dimensional point cloud of the environment model and identifying surfaces of objects based on the attributes.

三次元点群は離散的な点によって構成されるため、そこに人モデル（２０）を組み込んで合成画像（三次元合成モデル）を生成すると、背景の一部の欠落又は透過が生じる虞がある。このような三次元合成モデルに基づいて生成された学習用データでは、人検知機能の低下を招く恐れがある。この点、上記方法によれば、物体の面を識別するため、背景の一部が欠落又は透過する虞を低減することができる。 Since the 3D point cloud is composed of discrete points, if the human model (20) is incorporated therein to generate a composite image (3D composite model), there is a risk that part of the background will be missing or transparent. . Learning data generated based on such a three-dimensional synthetic model may lead to deterioration of the human detection function. In this respect, according to the above method, since the surface of the object is identified, the possibility that part of the background is missing or transparent can be reduced.

（３）幾つかの実施形態では、上記（２）に記載の方法において、
前記環境モデル中に人を示す前記属性が付与された前記点群又は前記物体の面（４０）があるか否かを判別するステップを含む。 (3) In some embodiments, in the method of (2) above,
determining whether there is a surface (40) of said point cloud or said object with said attribute indicating a person in said environment model.

環境モデルを生成するための実画像を取得する際には人が映り込んでいる場合があるが、この映り込んだ人を背景として扱った環境モデルに更に人モデルを合成し三次元合成モデルとして学習用データを生成すると、背景として映り込んだ人は正解情報としてラベリングされない一方、配置した人モデルについては正解情報としてラベリングされるという状況が生じるため人検知モデル２１０の学習効率を下げ、また人検知の精度に影響する可能性がある。この点、上記方法によれば、実画像に映り込んだ人を示す点群又は物体の面については環境モデル（三次元合成モデル）の背景として扱うことがなく、人がいる環境で実画像が取得されてもよいため、通常業務を行いながら環境モデルを生成することができる。 When acquiring the actual image for generating the environment model, there are cases where people are reflected in the image. When the training data is generated, a situation arises in which a person reflected in the background is not labeled as correct information, while the arranged human model is labeled as correct information. It may affect detection accuracy. In this regard, according to the above method, the point cloud showing the person reflected in the real image or the surface of the object is not treated as the background of the environment model (three-dimensional synthetic model), and the real image is reproduced in the environment where people are present. Since it may be acquired, an environment model can be generated while performing normal work.

（４）幾つかの実施形態では、上記（３）に記載の方法において、
前記物体の面又は前記物体の面をなす前記点群を物体単位で配置変更、削除、又は複製する処理を行うステップを含む。 (4) In some embodiments, in the method described in (3) above,
It includes a step of performing a process of rearranging, deleting, or duplicating the surface of the object or the point group forming the surface of the object for each object.

上記方法によれば、環境モデルに含まれる物体の面又はその点群を物体単位で識別し、物体単位で配置変更、削除、複製などの編集をすることが可能となる。そのため、普段物体が存在しない領域におかれた物体（人を含む）を削除したり、よく物体が存在する場所に物体（人を含む）を移動又は追加したりすることができる。 According to the above method, it is possible to identify the surface of an object included in the environment model or its point group in units of objects, and edit such as changing the layout, deleting, duplicating, etc. in units of objects. Therefore, it is possible to delete an object (including a person) placed in an area where no object normally exists, or to move or add an object (including a person) to a location where an object often exists.

（５）幾つかの実施形態では、上記（３）または（４）に記載の方法において、
前記三次元合成モデルを生成するステップにおいて、人を示す前記属性が付与された前記点群又は前記物体の面が多く検出される領域を優先的に選択して前記人モデルを配置する。 (5) In some embodiments, in the method of (3) or (4) above,
In the step of generating the three-dimensional synthetic model, the human model is arranged by preferentially selecting the point group to which the attribute representing the person or the region in which many surfaces of the object are detected.

上記方法によれば、危険度の高い領域（例えば、人がよく通る道路、建物の出入口付近、人の作業場所等）に優先的に人モデルを配置することで、人検知モデルを学習させることが可能な学習用データを効率よく生成することができる。そのため、無駄な学習用データの生成を抑えることができる。 According to the above method, the human detection model is learned by preferentially arranging the human model in high-risk areas (e.g., roads where people often pass, near building entrances and exits, places where people work, etc.). can efficiently generate learning data. Therefore, generation of useless learning data can be suppressed.

（６）幾つかの実施形態では、上記（１）乃至（５）の何れか一つに記載の方法において、
前記三次元合成モデルを生成するステップにおいて、前記実画像を取得するときの走行履歴に基づいて、前記人モデル（２０）を配置する領域として、走行頻度が高い経路を走行頻度が低い経路よりも優先的に選択する。 (6) In some embodiments, in the method according to any one of (1) to (5) above,
In the step of generating the three-dimensional synthetic model, based on the travel history when the actual image is acquired, the human model (20) is placed on a route with a high travel frequency rather than a route with a low travel frequency. choose preferentially.

上記方法によれば、通常業務で走行しながら実画像を取得し、その走行においてよく通る経路に人モデルが配置される。そのため、無駄な学習用データの生成を抑えることができる。 According to the above method, real images are acquired while the vehicle is traveling in normal business, and the human model is arranged on a route that the vehicle frequently travels during the traveling. Therefore, generation of useless learning data can be suppressed.

（７）本開示に係る人検知モデル（２１０）の学習方法は、
上記（１）乃至（６）の何れか一つに記載の学習用データの生成方法によって学習用データを生成するステップと、
生成された前記学習用データを用いて人検知モデル（２１０）を学習させるステップと、
を含む。 (7) A method for learning a human detection model (210) according to the present disclosure includes:
a step of generating learning data by the method of generating learning data according to any one of (1) to (6) above;
a step of learning a human detection model (210) using the generated training data;
including.

上記方法によれば、正解情報を含む多様な学習用データを生成するための生成時間を短縮化することができる。また、正解情報を含む多様な学習用データを人検知モデル（２１０）に学習させるため、人検知精度を向上させることができる。 According to the above method, it is possible to shorten the generation time for generating various learning data including correct answer information. In addition, since the human detection model (210) learns various learning data including correct information, the human detection accuracy can be improved.

（８）本開示に係る学習用データの生成装置（１００）は、
機械学習を行う人検知モデル（２１０）の学習用データの生成装置（１００）であって、
作業場所を走行しながら撮像した実画像に基づいて三次元点群の環境モデルを生成する環境モデル生成部（１２０）と、
前記環境モデルが示す三次元空間内に一以上の人モデル（２０）を配置した三次元合成モデルを生成するデータ合成部（１３０）と、
前記三次元モデルに基づいて前記学習用データを生成するデータ生成部（１４０）と、
を備える。 (8) The learning data generation device (100) according to the present disclosure is
A learning data generation device (100) for a human detection model (210) that performs machine learning,
an environment model generation unit (120) that generates a three-dimensional point cloud environment model based on actual images captured while traveling in a work place;
a data synthesizing unit (130) for generating a three-dimensional synthesized model in which one or more human models (20) are arranged in a three-dimensional space indicated by the environment model;
a data generation unit (140) that generates the learning data based on the three-dimensional model;
Prepare.

上記構成によれば、実画像に基づく環境モデルを背景として学習用データを生成するため、作業場所すなわち客先環境における背景との乖離を小さくできる。これにより、人検知精度を向上可能な学習用データを生成できる。また、環境モデルの画像空間内に人モデル（２０）を配置する際に、配置の位置や数を工夫して正解情報を含む多様な学習用データを生成することができる。そのため、人の配置パターンごとの実画像を収集して正解情報を生成する場合に比べて学習用データの生成時間を大幅に短縮でき、かつ正解情報を含む多様な学習用データによる深層学習が可能となる。 According to the above configuration, since the learning data is generated with the environment model based on the actual image as the background, it is possible to reduce the divergence from the background in the work place, that is, the customer's environment. As a result, it is possible to generate learning data capable of improving human detection accuracy. In addition, when arranging the human model (20) in the image space of the environment model, it is possible to generate various learning data including correct information by devising the position and number of the arrangement. As a result, the time required to generate training data can be significantly reduced compared to the case where correct information is generated by collecting actual images of each person's placement pattern, and deep learning can be performed using a variety of training data including correct information. becomes.

１０車両
２０人（人モデル）
４０人
５０積荷
６０壁
７０床
７２プロセッサ
７４ＲＡＭ
７６ＲＯＭ
７８ＨＤＤ
８０入力Ｉ／Ｆ
８２出力Ｉ／Ｆ
８４バス
１００学習用データの生成装置
１１０実画像取得部
１２０環境モデル生成部
１３０データ合成部
１４０データ生成部
１５０属性付与部
１６０処理部
１７０正解情報設定部
２００人検知装置
２１０人検知モデル
２２０出力部
３００産業車両
３１０撮像部
３２０報知部
４００人検知システム 10 vehicles 20 people (human model)
40 people 50 cargo 60 walls 70 floor 72 processor 74 RAM
76 ROMs
78 HDDs
80 Input I/F
82 Output I/F
84 bus 100 learning data generation device 110 real image acquisition unit 120 environment model generation unit 130 data synthesis unit 140 data generation unit 150 attribute assignment unit 160 processing unit 170 correct information setting unit 200 human detection device 210 human detection model 220 output unit 300 industrial vehicle 310 imaging unit 320 notification unit 400 human detection system

Claims

A method for generating learning data for a human detection model that performs machine learning, comprising:
a step of generating a three-dimensional point cloud environment model based on an actual image captured by an imaging unit installed in the industrial vehicle while the industrial vehicle is running at a work site;
generating a three-dimensional synthetic model in which one or more human models are arranged in a three-dimensional space represented by the environment model;
A learning image obtained when the three-dimensional synthetic model in which the human model is arranged is observed from a viewpoint position assuming the installation position of the imaging unit in the industrial vehicle, using the viewpoint position information of the imaging unit. a step of generating as the learning data;
How to generate training data including

2. The method of generating learning data according to claim 1, further comprising the step of assigning attributes to at least a portion of the three-dimensional point cloud of the environment model, and identifying surfaces of objects based on the attributes.

3. The method of generating learning data according to claim 2, further comprising the step of determining whether or not there is a surface of said point group or said object to which said attribute indicating a person is assigned in said environment model.

4. The method of generating learning data according to claim 3, further comprising a step of changing, deleting, or duplicating the surface of the object or the group of points forming the surface of the object for each object.

4. In the step of generating the three-dimensional synthetic model, the human model is arranged by preferentially selecting an area in which many faces of the point group or the object to which the attribute indicating the person is assigned are detected. Or the method of generating learning data according to 4.

In the step of generating the three-dimensional synthetic model, a route with a high travel frequency is prioritized over a route with a low travel frequency as a region for arranging the human model based on the travel history when the actual image is acquired. 6. The method of generating learning data according to any one of claims 1 to 5.

generating learning data by the method for generating learning data according to any one of claims 1 to 6;
a step of learning a human detection model using the generated learning data;
Training methods for human detection models, including

A device for generating learning data for a human detection model that performs machine learning,
an environment model generation unit that generates a three-dimensional point cloud environment model based on an actual image captured by an imaging unit installed in the industrial vehicle while the industrial vehicle is running at a work site;
a data synthesizing unit that generates a three-dimensional synthesized model in which one or more human models are arranged in a three-dimensional space indicated by the environment model;
A learning image obtained when the three-dimensional synthetic model in which the human model is arranged is observed from a viewpoint position assuming the installation position of the imaging unit in the industrial vehicle, using the viewpoint position information of the imaging unit. a data generation unit that generates the learning data;
A learning data generation device comprising: