JP6945772B1

JP6945772B1 - Learning device, object detection device and learning method

Info

Publication number: JP6945772B1
Application number: JP2021526501A
Authority: JP
Inventors: 百代日野; 秀明前原; 守屋　芳美; 芳美守屋; 眞實佐藤; 善隆豊田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2019-06-25
Filing date: 2019-06-25
Publication date: 2021-10-06
Anticipated expiration: 2039-06-25
Also published as: WO2020261392A1; JPWO2020261392A1

Abstract

物体が撮像された教師画像、および、物体に対応する補助情報を取得する第１補助情報参照部（１０１）と、第１補助情報参照部（１０１）が取得した補助情報を教師画像に反映した合成教師画像を生成する第１補助情報合成部（１０２）と、第１補助情報合成部（１０２）が生成した合成教師画像を用いた学習により機械学習モデル（４０）を生成する学習部（１０６）を備えた。The teacher image obtained by capturing the object, the first auxiliary information reference unit (101) for acquiring the auxiliary information corresponding to the object, and the auxiliary information acquired by the first auxiliary information reference unit (101) are reflected in the teacher image. A first auxiliary information synthesis unit (102) that generates a composite teacher image, and a learning unit (106) that generates a machine learning model (40) by learning using the composite teacher image generated by the first auxiliary information synthesis unit (102). ) Was provided.

Description

この発明は、機械学習モデルを生成する学習装置、学習方法、および、機械学習を用いた物体検出装置に関するものである。 The present invention relates to a learning device for generating a machine learning model, a learning method, and an object detection device using machine learning.

機械学習を用いて対象画像から特定の物体（以下、「特定の物体」のことを単に「物体」という。）を検出する物体検出装置が知られている。対象画像とは、物体が撮像されている可能性がある画像であり、物体検出装置が物体を検出する対象となる画像である。例えば、特許文献１には、空中撮像画像内の移動車両を識別する出力を生成するための機械学習モデルを使用した移動車両分析システムが開示されている。 There is known an object detection device that detects a specific object (hereinafter, "specific object" is simply referred to as "object") from a target image by using machine learning. The target image is an image in which an object may be captured, and is an image for which the object detection device detects the object. For example, Patent Document 1 discloses a mobile vehicle analysis system that uses a machine learning model to generate an output that identifies a moving vehicle in an aerial image.

特表２０１８−５３６２３６号公報Special Table 2018-536236

対象画像とともに機械学習モデルへ入力した場合に、物体の検出精度を向上させることができる情報（以下「補助情報」という。）が存在する。補助情報として、例えば、ＧＩＳ（ＧｅｏｇｒａｐｈｉｃＩｎｆｏｒｍａｔｉｏｎＳｙｓｔｅｍ）情報がある。
しかしながら、従来、機械学習において補助情報を用いるためには、例えば、対象画像のみを入力パラメータとする機械学習モデルをそのまま用いることはできず、対象画像および補助情報の両方を入力パラメータとする機械学習モデルを改めて設計する必要があるという課題があった。There is information (hereinafter referred to as "auxiliary information") that can improve the detection accuracy of an object when it is input to a machine learning model together with a target image. As auxiliary information, for example, there is GIS (Geographic Information System) information.
However, conventionally, in order to use auxiliary information in machine learning, for example, a machine learning model in which only the target image is used as an input parameter cannot be used as it is, and machine learning in which both the target image and auxiliary information are used as input parameters cannot be used as it is. There was a problem that the model had to be redesigned.

この発明は上記のような課題を解決するためになされたもので、補助情報を入力パラメータとする機械学習モデルを用いることなく、補助情報を考慮した機械学習モデルを生成することができる学習装置を提供することを目的としている。 The present invention has been made to solve the above problems, and provides a learning device capable of generating a machine learning model in consideration of auxiliary information without using a machine learning model in which auxiliary information is used as an input parameter. It is intended to be provided.

この発明に係る学習装置は、物体が撮像された教師画像、および、物体に対応する補助情報を取得する第１補助情報参照部と、第１補助情報参照部が取得した補助情報を教師画像に反映した合成教師画像を生成する第１補助情報合成部と、第１補助情報合成部が生成した合成教師画像を用いた学習により機械学習モデルを生成する学習部と、第１補助情報合成部が生成した合成教師画像を複数の小教師画像に分割する第１画像分割部と、第１画像分割部が分割した複数の小教師画像を複数の分類に分類する統計量解析部と、統計量解析部が分類した後の、各分類に属する小教師画像を、各分類に属する小教師画像の数に応じて、間引きする教師データ間引き部を備え、学習部は、教師データ間引き部が間引きした後の小教師画像の学習によって機械学習モデルを生成することを特徴とするものである。 The learning device according to the present invention uses a teacher image of an object, a first auxiliary information reference unit that acquires auxiliary information corresponding to the object, and auxiliary information acquired by the first auxiliary information reference unit as a teacher image. The first auxiliary information synthesis unit that generates the reflected synthetic teacher image, the learning unit that generates a machine learning model by learning using the synthetic teacher image generated by the first auxiliary information synthesis unit, and the first auxiliary information synthesis unit A first image division unit that divides the generated composite teacher image into a plurality of teacher images, a statistic analysis unit that classifies a plurality of teacher images divided by the first image division into a plurality of classifications, and a statistic analysis. The learning unit is provided with a teacher data thinning unit that thins out the small teacher images belonging to each classification after the departments have been classified according to the number of small teacher images belonging to each classification, and the learning unit is after the teacher data thinning unit thins out. It is characterized in that a machine learning model is generated by learning a small teacher image of .

この発明によれば、補助情報を入力パラメータとする機械学習モデルを用いることなく、補助情報を考慮した機械学習モデルを生成することができる。 According to the present invention, it is possible to generate a machine learning model in consideration of auxiliary information without using a machine learning model in which auxiliary information is used as an input parameter.

実施の形態１に係る物体検出装置の構成例を示す図である。It is a figure which shows the structural example of the object detection apparatus which concerns on Embodiment 1. FIG. 実施の形態１において、第１補助情報合成部が、補助情報と教師画像とを合成して合成教師画像を生成するイメージの一例を説明するための図である。In the first embodiment, it is a figure for demonstrating an example of an image in which the 1st auxiliary information synthesis part synthesizes auxiliary information and a teacher image to generate a composite teacher image. 実施の形態１において、第１画像分割部が、小教師画像に分割する前の合成教師画像と、小教師画像に分割した後の合成教師画像のイメージの一例を説明するための図である。In the first embodiment, the first image dividing unit is a diagram for explaining an example of an image of a composite teacher image before being divided into a teacher image and an image of a composite teacher image after being divided into a teacher image. 実施の形態１において、教師データ間引き部が行う、小教師画像の間引きのイメージの一例について説明するための図である。It is a figure for demonstrating an example of the image of the thinning-out of a teacher image performed by the teacher data thinning-out part in Embodiment 1. FIG. 実施の形態１において、第２画像分割部が、小対象画像に分割する前の合成対象画像と、小対象画像に分割した後の合成対象画像のイメージの一例を説明するための図である。In the first embodiment, the second image segmentation section is a diagram for explaining an example of an image of a composite target image before being divided into small target images and an image of a composite target image after being divided into small target images. 実施の形態１に係る学習装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation of the learning apparatus which concerns on Embodiment 1. FIG. 実施の形態１に係る推論装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation of the inference apparatus which concerns on Embodiment 1. FIG. 実施の形態１において、例えば、物体を、道路を移動中の車両とし、補助情報を道路情報および幹線道路情報とした場合に、第１補助情報合成部または第２補助情報合成部が、教師画像または対象画像に対して、補助情報を反映した、合成教師画像または合成対象画像を生成するイメージの一例を説明する図である。In the first embodiment, for example, when the object is a vehicle moving on the road and the auxiliary information is the road information and the main road information, the first auxiliary information synthesis unit or the second auxiliary information synthesis unit is a teacher image. Alternatively, it is a figure explaining an example of the image which generates the composite teacher image or the composite target image which reflected the auxiliary information with respect to the target image. 図９Ａ，図９Ｂは、実施の形態１に係る学習装置および推論装置のハードウェア構成の一例を示す図である。9A and 9B are diagrams showing an example of the hardware configuration of the learning device and the inference device according to the first embodiment.

以下、この発明の実施の形態について、図面を参照しながら詳細に説明する。
実施の形態１．Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Embodiment 1.

実施の形態１において、物体検出装置１は、教師データと補助情報とを用いて学習を実行し、機械学習モデルを生成する。そして、物体検出装置１は、物体を検出する対象となる対象画像を取得し、機械学習モデルを用いて当該対象画像から物体を検出する。実施の形態１では、物体として、ボートが想定されている。また、実施の形態１では、対象画像および後述する教師画像として、衛星画像、ドローン空撮画像、または、航空機画像等が想定されている。
以下の実施の形態１では、物体検出装置１は、水上を航行中のボートまたは水上で出発準備中のボートを対象画像から検出するものとする。In the first embodiment, the object detection device 1 executes learning using the teacher data and auxiliary information, and generates a machine learning model. Then, the object detection device 1 acquires a target image to be detected for the object, and detects the object from the target image using the machine learning model. In the first embodiment, a boat is assumed as an object. Further, in the first embodiment, a satellite image, a drone aerial image, an aircraft image, or the like is assumed as the target image and the teacher image described later.
In the following embodiment 1, the object detection device 1 shall detect a boat sailing on the water or a boat preparing for departure on the water from the target image.

図１は、実施の形態１に係る物体検出装置１の構成例を示す図である。
物体検出装置１は、学習装置１０、第１補助情報ＤＢ２０、機械学習モデル４０、推論装置５０、および、第２補助情報ＤＢ６０を備える。FIG. 1 is a diagram showing a configuration example of the object detection device 1 according to the first embodiment.
The object detection device 1 includes a learning device 10, a first auxiliary information DB 20, a machine learning model 40, an inference device 50, and a second auxiliary information DB 60.

学習装置１０は、物体検出装置１において、機械学習における学習を実行する装置であり、例えば、高性能なワークステーションから成る。
学習装置１０は、教師データと補助情報とを用いて、学習を実行する。学習装置１０は、学習により、対象画像から物体を検出するための機械学習モデル４０を生成する。
学習装置１０は、教師データ取得部１００、第１補助情報参照部１０１、第１補助情報合成部１０２、第１画像分割部１０３、統計量解析部１０４、教師データ間引き部１０５、および、学習部１０６を備える。The learning device 10 is a device that executes learning in machine learning in the object detection device 1, and is composed of, for example, a high-performance workstation.
The learning device 10 executes learning by using the teacher data and the auxiliary information. The learning device 10 generates a machine learning model 40 for detecting an object from a target image by learning.
The learning device 10 includes a teacher data acquisition unit 100, a first auxiliary information reference unit 101, a first auxiliary information synthesis unit 102, a first image division unit 103, a statistic analysis unit 104, a teacher data thinning unit 105, and a learning unit. 106 is provided.

教師データ取得部１００は、教師データを取得する。なお、教師データは、予め用意されており、例えば、学習装置１０が参照可能な場所に記憶されている。
実施の形態１において、教師データは、複数の、物体が撮像された画像（以下「教師画像」という。）を含む。また、教師データは、複数の教師画像のうちの各教師画像に対応づけられた、当該各教師画像上の物体に関するテキスト情報を含む。対象物に関するテキスト情報とは、例えば、物体の、教師画像上の位置情報である。教師画像上の物体の位置は、例えば、物体の矩形に基づいてあらわされる。物体の矩形とは、例えば、教師画像上で物体を囲む最小矩形である。実施の形態１では、物体をボートとしているので、例えば、教師画像上でボートを囲む最小矩形の４つの頂点を示すピクセル位置（ｘ，ｙ）が、物体の、教師画像上の位置情報となる。
なお、１つの教師画像上に存在する物体は１つとは限らない。１つの教師画像上に複数の物体が存在する場合、教師データには、１つの教師画像と対応付けて、複数の物体それぞれに関するテキスト情報が含まれる。例えば、１つの教師画像上にボートが２艘存在していれば、当該２艘のボートの矩形の４つの頂点が２組で、合計８つの頂点を示すピクセル位置が、ボートの位置情報として教師データに含まれる。
教師画像のメタデータには、教師画像に撮像されている地点または領域を示す位置情報（以下「第１撮像位置情報」という。）が含まれている。第１撮像位置情報は、具体的には、例えば、教師画像に撮像されている地点または領域を示す、緯度および経度の情報である。
教師データ取得部１００は、取得した教師データを、第１補助情報参照部１０１および学習部１０６に出力する。The teacher data acquisition unit 100 acquires teacher data. The teacher data is prepared in advance and is stored in a place where the learning device 10 can refer to, for example.
In the first embodiment, the teacher data includes a plurality of images (hereinafter referred to as "teacher images") in which an object is captured. The teacher data also includes text information about an object on each teacher image associated with each teacher image among the plurality of teacher images. The text information about the object is, for example, the position information of the object on the teacher image. The position of the object on the teacher image is represented, for example, based on the rectangle of the object. The rectangle of the object is, for example, the smallest rectangle that surrounds the object on the teacher image. In the first embodiment, since the object is a boat, for example, the pixel positions (x, y) indicating the four vertices of the smallest rectangle surrounding the boat on the teacher image are the position information of the object on the teacher image. ..
The number of objects existing on one teacher image is not limited to one. When a plurality of objects exist on one teacher image, the teacher data includes text information about each of the plurality of objects in association with one teacher image. For example, if there are two boats on one teacher image, the four vertices of the rectangle of the two boats are two sets, and the pixel position indicating a total of eight vertices is the teacher as the position information of the boat. Included in the data.
The metadata of the teacher image includes position information (hereinafter referred to as “first imaged position information”) indicating a point or region captured in the teacher image. Specifically, the first imaging position information is latitude and longitude information indicating a point or region imaged in the teacher image, for example.
The teacher data acquisition unit 100 outputs the acquired teacher data to the first auxiliary information reference unit 101 and the learning unit 106.

第１補助情報参照部１０１は、第１補助情報ＤＢ２０を参照して、教師データ取得部１００が取得した教師画像のメタデータに含まれている、第１撮像位置情報に基づき、第１補助情報ＤＢ２０から補助情報を取得する。
実施の形態１において、補助情報とは、教師画像または対象画像に撮像されている地点または領域に関連した情報である。補助情報は、例えば、水域地図、地形図、陸域地図、道路地図、土地被覆地図、または、過去に、教師画像若しくは対象画像に撮像されている地点若しくは領域と同じ地点若しくは領域が撮像された画像である。補助情報には、それぞれ、地理情報が付与されている。地理情報とは、例えば、緯度、経度および標高の情報であり、当該地理情報に基づいて、当該地理情報に地理的に対応する範囲の補助情報が特定される。
第１補助情報ＤＢ２０には、補助情報が記憶されている。
第１補助情報参照部１０１が、どのような種類の補助情報を取得するかは、物体に応じて予め決められている。例えば、ボートに対しては、水域地図等の、水域に関する情報を補助情報とすると予め決められている。従って、ここでは、第１補助情報参照部１０１は、第１補助情報ＤＢ２０を参照して、水域に関する情報を取得する。
第１補助情報参照部１０１は、教師データと、取得した補助情報とを第１補助情報合成部１０２に出力する。The first auxiliary information reference unit 101 refers to the first auxiliary information DB 20, and based on the first imaging position information included in the metadata of the teacher image acquired by the teacher data acquisition unit 100, the first auxiliary information Acquire auxiliary information from DB20.
In the first embodiment, the auxiliary information is information related to a point or region captured in the teacher image or the target image. The auxiliary information is, for example, a water area map, a topographic map, a land area map, a road map, a land cover map, or the same point or area as the point or area previously captured in the teacher image or the target image. It is an image. Geographic information is attached to each of the auxiliary information. The geographic information is, for example, information on latitude, longitude, and altitude, and based on the geographic information, auxiliary information in a range geographically corresponding to the geographic information is specified.
Auxiliary information is stored in the first auxiliary information DB 20.
What kind of auxiliary information the first auxiliary information reference unit 101 acquires is predetermined according to the object. For example, for boats, it is predetermined that information on the water area, such as a water area map, will be used as auxiliary information. Therefore, here, the first auxiliary information reference unit 101 refers to the first auxiliary information DB 20 to acquire information on the water area.
The first auxiliary information reference unit 101 outputs the teacher data and the acquired auxiliary information to the first auxiliary information synthesis unit 102.

第１補助情報合成部１０２は、第１補助情報参照部１０１から出力された教師データおよび補助情報に基づき、補助情報と教師画像を合成する。具体的には、第１補助情報合成部１０２は、例えば、教師画像において、水域に該当する箇所以外の箇所の画素を、特定の色で塗りつぶすことでマスクした画像を生成する。実施の形態１において、第１補助情報合成部１０２が補助情報と教師画像を合成して生成した画像を、「合成教師画像」ともいう。ここでは、一例として、第１補助情報合成部１０２は、教師画像において、水域に該当する箇所以外の箇所の画素を、黒色で塗りつぶすものとする。
なお、上述のとおり、補助情報は、第１撮像位置情報に基づき取得されたものであり、教師画像に撮像されている地点または領域と、地理的に対応した情報である。従って、第１補助情報合成部１０２は、補助情報に基づき、教師画像上で水域に該当する箇所を特定することができる。The first auxiliary information synthesizing unit 102 synthesizes the auxiliary information and the teacher image based on the teacher data and the auxiliary information output from the first auxiliary information reference unit 101. Specifically, the first auxiliary information synthesizing unit 102 generates, for example, a masked image by painting pixels in a portion other than a portion corresponding to a water area in a teacher image with a specific color. In the first embodiment, the image generated by the first auxiliary information synthesizing unit 102 by synthesizing the auxiliary information and the teacher image is also referred to as a "composite teacher image". Here, as an example, the first auxiliary information synthesizing unit 102 fills the pixels of the portion other than the portion corresponding to the water area with black in the teacher image.
As described above, the auxiliary information is acquired based on the first imaging position information, and is information that geographically corresponds to the point or area imaged in the teacher image. Therefore, the first auxiliary information synthesizing unit 102 can identify the part corresponding to the water area on the teacher image based on the auxiliary information.

図２は、実施の形態１において、第１補助情報合成部１０２が、補助情報と教師画像とを合成して合成教師画像を生成するイメージの一例を説明するための図である。
図２では、教師画像は、ボートが浮かんでいる水際を撮像した空撮画像とし、補助情報は、水域に関する情報としている。図２において、水域を、２０１で示している。
第１補助情報合成部１０２が、教師画像と補助情報を合成すると、合成教師画像は、２０１で示す水域を表す範囲以外の範囲の画素が、黒色に塗りつぶされた画像となる。
なお、補助情報が、１枚の教師画像に撮像されている領域の一部に対して存在しない場合には、当該一部の領域については、例えば、上記水域に該当する箇所を特定できない場合もあり得る。FIG. 2 is a diagram for explaining an example of an image in which the first auxiliary information synthesizing unit 102 synthesizes the auxiliary information and the teacher image to generate a composite teacher image in the first embodiment.
In FIG. 2, the teacher image is an aerial image of the waterside where the boat is floating, and the auxiliary information is information about the water area. In FIG. 2, the water area is indicated by 201.
When the first auxiliary information synthesizing unit 102 synthesizes the teacher image and the auxiliary information, the composite teacher image becomes an image in which pixels in a range other than the range representing the water area indicated by 201 are filled in black.
If the auxiliary information does not exist for a part of the area captured in one teacher image, for example, the part corresponding to the above water area may not be specified for the part of the area. could be.

第１補助情報合成部１０２は、教師画像を合成教師画像に置き換えた教師データ（以下「合成教師データ」という。）を、第１画像分割部１０３に出力する。
なお、第１補助情報合成部１０２は、第１補助情報参照部１０１から補助情報が出力されず、教師画像と合成する補助情報が存在しなかった場合は、教師データをそのまま合成教師データとして、第１画像分割部１０３に出力する。The first auxiliary information synthesis unit 102 outputs the teacher data (hereinafter referred to as “composite teacher data”) in which the teacher image is replaced with the composite teacher image to the first image division unit 103.
If the first auxiliary information synthesizing unit 102 does not output the auxiliary information from the first auxiliary information reference unit 101 and the auxiliary information to be combined with the teacher image does not exist, the first auxiliary information synthesizing unit 102 uses the teacher data as it is as the synthesized teacher data. Output to the first image dividing unit 103.

第１画像分割部１０３は、合成教師データに含まれる合成教師画像のサイズが大きい場合に、当該合成教師画像を、予め決められたサイズに分割する。例えば、第１画像分割部１０３は、合成教師画像を、２５６×２５６のサイズに分割する。
以下、第１画像分割部１０３によって小さいサイズに分割された合成教師画像を、「小教師画像」という。When the size of the composite teacher image included in the composite teacher data is large, the first image segmentation unit 103 divides the composite teacher image into a predetermined size. For example, the first image segmentation unit 103 divides the composite teacher image into a size of 256 × 256.
Hereinafter, the composite teacher image divided into small sizes by the first image segmentation unit 103 is referred to as a "small teacher image".

図３は、実施の形態１において、第１画像分割部１０３が、小教師画像に分割する前の合成教師画像と、小教師画像に分割した後の合成教師画像のイメージの一例を説明するための図である。
図３では、合成教師画像のサイズが１０２４×２０４８であったとし、第１画像分割部１０３は、合成教師画像を、２５６×２５６のサイズの小教師画像に分割するものとしている。
第１画像分割部１０３が合成教師画像を分割した結果、合成教師画像は、３２枚の小教師画像に分割される。
このとき、第１画像分割部１０３は、小教師画像上に物体が存在する場合、小教師画像上の物体の位置情報を、小教師画像に付与しておくようにする。第１画像分割部１０３は、小教師画像上の物体の位置情報を、合成教師画像に対応付けられている物体の位置情報から判断すればよい。
第１画像分割部１０３は、小教師画像に分割した後の合成教師データを、統計量解析部１０４に出力する。FIG. 3 is for explaining an example of an image of a composite teacher image before being divided into a teacher image and an image of a composite teacher image after being divided into a teacher image by the first image division unit 103 in the first embodiment. It is a figure of.
In FIG. 3, it is assumed that the size of the composite teacher image is 1024 × 2048, and the first image segmentation unit 103 divides the composite teacher image into a small teacher image having a size of 256 × 256.
As a result of the first image segmentation unit 103 dividing the composite teacher image, the composite teacher image is divided into 32 sub-teacher images.
At this time, when the object exists on the teacher image, the first image dividing unit 103 adds the position information of the object on the teacher image to the teacher image. The first image segmentation unit 103 may determine the position information of the object on the teacher image from the position information of the object associated with the composite teacher image.
The first image segmentation unit 103 outputs the composite teacher data after being divided into small teacher images to the statistic analysis unit 104.

統計量解析部１０４は、第１画像分割部１０３から出力された合成教師データについて、小教師画像の特性ごとに当該小教師画像を分類し、分類毎の小教師画像の枚数をカウントする。
実施の形態１では、一例として、統計量解析部１０４は、小教師画像を、「補助情報合成有り、かつ、物体有り」、「補助情報合成有り、かつ、物体無し」、「補助情報合成無し、かつ、物体有り」、または、「補助情報合成無し、かつ、物体無し」の４パターンに分類するものとする。なお、これは一例に過ぎず、統計量解析部１０４は、適宜のパターンに小教師画像を分類可能である。
統計量解析部１０４は、小教師画像の枚数をカウントした結果に関する情報を、合成教師データとともに、教師データ間引き部１０５に出力する。このとき、統計量解析部１０４は、小教師画像に対して、どの分類に分類分けされたかの情報を付与するようにする。The statistic analysis unit 104 classifies the composite teacher image output from the first image division unit 103 according to the characteristics of the teacher image, and counts the number of the teacher images for each classification.
In the first embodiment, as an example, the statistic analysis unit 104 uses the teacher image as "with auxiliary information synthesis and with object", "with auxiliary information synthesis and without object", and "without auxiliary information synthesis". , And there is an object "or" there is no auxiliary information synthesis and there is no object ". Note that this is only an example, and the statistic analysis unit 104 can classify the teacher image into an appropriate pattern.
The statistic analysis unit 104 outputs information regarding the result of counting the number of small teacher images to the teacher data thinning unit 105 together with the synthetic teacher data. At this time, the statistic analysis unit 104 adds information on which classification the teacher image is classified into.

教師データ間引き部１０５は、統計量解析部１０４が小教師画像の枚数をカウントした結果に基づき、偏りのある分類に属する小教師画像を、間引く。具体的には、教師データ間引き部１０５は、合成教師データに含まれる小教師画像について、各分類に属する小教師画像が理想的な比率になるように、画像数が多い分類に属する小教師画像を取り出し、破棄する。これにより、教師データ間引き部１０５は、不要な小教師画像の間引きを行う。
なお、各分類に属する小教師画像の理想的な比率は、ユーザ等によって、適宜設定されるものとする。理想的な比率の例としては、上述の４パターンに小教師画像が分類分けされるものとすると、「補助情報合成有り、かつ、物体有り」：「補助情報合成有り、かつ、物体無し」：「補助情報合成無し、かつ、物体有り」：「補助情報合成無し、かつ、物体無し」が、「１：１：１：１」、「１：３：１：３」、または、「２：６：１：３」等が挙げられる。The teacher data thinning unit 105 thins out the small teacher images belonging to the biased classification based on the result of the statistic analysis unit 104 counting the number of small teacher images. Specifically, the teacher data thinning unit 105 refers to the teacher images belonging to the classification having a large number of images so that the teacher images belonging to each classification have an ideal ratio with respect to the teacher images included in the composite teacher data. Take out and discard. As a result, the teacher data thinning unit 105 thins out unnecessary small teacher images.
The ideal ratio of the teacher images belonging to each classification shall be appropriately set by the user or the like. As an example of the ideal ratio, assuming that the teacher image is classified into the above four patterns, "with auxiliary information synthesis and with object": "with auxiliary information synthesis and without object": "No auxiliary information synthesis and with object": "No auxiliary information synthesis and no object" is "1: 1: 1: 1", "1: 3: 1: 3", or "2: 6: 1: 3 "and the like.

図４は、実施の形態１において、教師データ間引き部１０５が行う、小教師画像の間引きのイメージの一例について説明するための図である。
図４では、一例として、教師データ間引き部１０５は、統計量解析部１０４が上述の４パターンに分類した後の小教師画像を間引くものとしている。また、図４では、教師データ間引き部１０５は、各分類に属する小教師画像の枚数が「１：１：１：１」の比率となるように、小教師画像を間引くものとしている。
図４では、「補助情報合成有り、かつ、物体無し」および「補助情報合成無し、かつ、物体無し」の分類に属する小教師画像が多い。そこで、教師データ間引き部１０５は、「補助情報合成有り、かつ、物体有り」および「補助情報合成無し、かつ、物体有り」の分類に属する小画像の枚数と同じ枚数になるまで、「補助情報合成有り、かつ、物体無し」および「補助情報合成無し、かつ、物体無し」の分類に属する小教師画像を間引く。
教師データ間引き部１０５は、間引き後の合成教師データを、学習部１０６に出力する。FIG. 4 is a diagram for explaining an example of an image of thinning out a small teacher image performed by the teacher data thinning unit 105 in the first embodiment.
In FIG. 4, as an example, the teacher data thinning unit 105 thins out the small teacher images after the statistic analysis unit 104 classifies them into the above four patterns. Further, in FIG. 4, the teacher data thinning unit 105 thins out the small teacher images so that the number of small teacher images belonging to each classification has a ratio of “1: 1: 1: 1”.
In FIG. 4, there are many teacher images that belong to the categories of "with auxiliary information synthesis and without objects" and "without auxiliary information synthesis and without objects". Therefore, the teacher data thinning unit 105 performs "auxiliary information" until the number of small images belonging to the categories of "with auxiliary information synthesis and with objects" and "without auxiliary information synthesis and with objects" is the same as the number of small images. The small teacher images that belong to the categories of "with composition and without objects" and "without auxiliary information composition and without objects" are thinned out.
The teacher data thinning unit 105 outputs the synthesized teacher data after thinning out to the learning unit 106.

学習部１０６は、教師データ間引き部１０５から出力された間引き後の合成教師データと、教師データ取得部１００が取得した教師データとを、所定の比率で混ぜた上で学習を実行し、機械学習モデル４０を生成する。なお、学習部１０６は、教師データ間引き部１０５から出力された間引き後の合成教師データと、教師データ取得部１００が取得した教師データとを混ぜる際、教師データに含まれる教師画像を、小教師画像と同じサイズに分割する。学習部１０６が、合成教師データと教師データとを混ぜた上で学習を実行するのは、補助情報の有無にかかわらず、推論装置５０における推論の際に、１つの機械学習モデル４０での推論を可能とするためである。また、補助情報が必ず存在する場合であっても、第１補助情報参照部１０１が、必ずしも、教師画像と紐づく補助情報を取得できるとも限らない。学習部１０６は、合成教師データと教師データとを混ぜた上で学習を実行することで、機械学習モデル４０のロバスト性を向上させることができる。
機械学習モデル４０は、学習部１０６が学習時に使用したネットワーク構造および調整後のパラメータを保持する。
実施の形態１において、機械学習モデル４０は、ＹＯＬＯ（ＹｏｕＯｎｌｙＬｏｏｋＯｎｃｅ）またはＳＳＤ（ＳｉｎｇｌｅＳｈｏｔＤｅｔｅｃｔｉｏｎ）等の、物体検出型のニューラルネットワーク等を想定している。The learning unit 106 mixes the synthetic teacher data after thinning output from the teacher data thinning unit 105 and the teacher data acquired by the teacher data acquisition unit 100 at a predetermined ratio, and then executes learning to perform machine learning. Generate model 40. When the learning unit 106 mixes the synthetic teacher data after thinning output from the teacher data thinning unit 105 with the teacher data acquired by the teacher data acquisition unit 100, the learning unit 106 uses the teacher image included in the teacher data as a small teacher. Divide into the same size as the image. The learning unit 106 executes learning after mixing the synthetic teacher data and the teacher data, regardless of the presence or absence of auxiliary information, at the time of inference in the inference device 50, inference by one machine learning model 40. This is to enable. Further, even when the auxiliary information is always present, the first auxiliary information reference unit 101 may not always be able to acquire the auxiliary information associated with the teacher image. The learning unit 106 can improve the robustness of the machine learning model 40 by executing learning after mixing the synthetic teacher data and the teacher data.
The machine learning model 40 holds the network structure and the adjusted parameters used by the learning unit 106 at the time of learning.
In the first embodiment, the machine learning model 40 assumes an object detection type neural network such as YOLO (You Only Look None) or SSD (Single Shot Detection).

第１補助情報ＤＢ２０は、補助情報を記憶するデータベースである。 The first auxiliary information DB 20 is a database that stores auxiliary information.

推論装置５０は、物体検出装置１において、機械学習における推論を実行する装置であり、例えば、高性能なワークステーションから成る。実施の形態１において、推論装置５０は、対象画像に物体が撮像されているか否かを推論する。また、実施の形態１において、推論装置５０は、物体が存在すると推論された場合に、物体の位置およびサイズを検出する。
推論装置５０は、対象画像および第２補助情報ＤＢ６０に記憶されている補助情報を取得し、学習装置１０が生成した機械学習モデル４０を用いて、対象画像中の物体の有無を推論して、対象画像中の物体の位置およびサイズを検出する。
なお、図１では、物体検出装置１は、学習装置１０および推論装置５０をそれぞれ備えるものとしたが、これは一例に過ぎない。物体検出装置は、学習装置１０および推論装置５０が統合された装置を備えるようにしてもよい。ただし、学習装置１０と推論装置５０に求められる仕様は互いに異なり、一般的に、学習装置１０の方が高い処理性能が必要とされる。
推論装置５０は、画像取得部５００、第２補助情報参照部５０１、第２補助情報合成部５０２、第２画像分割部５０３、推論部５０４、検出結果統合部５０５、および、検出結果出力部５０６を備える。The inference device 50 is a device that executes inference in machine learning in the object detection device 1, and is composed of, for example, a high-performance workstation. In the first embodiment, the inference device 50 infers whether or not an object is captured in the target image. Further, in the first embodiment, the inference device 50 detects the position and size of the object when it is inferred that the object exists.
The inference device 50 acquires the target image and the auxiliary information stored in the second auxiliary information DB 60, and uses the machine learning model 40 generated by the learning device 10 to infer the presence or absence of an object in the target image. Detects the position and size of an object in the target image.
In FIG. 1, the object detection device 1 includes a learning device 10 and an inference device 50, respectively, but this is only an example. The object detection device may include a device in which the learning device 10 and the inference device 50 are integrated. However, the specifications required for the learning device 10 and the inference device 50 are different from each other, and the learning device 10 generally requires higher processing performance.
The inference device 50 includes an image acquisition unit 500, a second auxiliary information reference unit 501, a second auxiliary information synthesis unit 502, a second image segmentation unit 503, an inference unit 504, a detection result integration unit 505, and a detection result output unit 506. To be equipped.

画像取得部５００は、対象画像を取得する。なお、対象画像は、予め用意されており、例えば、推論装置５０が参照可能な場所に記憶されている。
対象画像のメタデータには、対象画像に撮像されている地点または領域を示す位置情報（以下「第２撮像位置情報」という。）が含まれている。第２撮像位置情報は、具体的には、例えば、対象画像に撮像されている地点または領域を示す、緯度および経度の情報である。
画像取得部５００は、取得した対象画像を、第２補助情報参照部５０１に出力する。
第２補助情報参照部５０１は、第２補助情報ＤＢ６０を参照して、画像取得部５００が取得した対象画像のメタデータに含まれている、第２撮像位置情報に基づき、第２補助情報ＤＢ６０から補助情報を取得する。
第２補助情報参照部５０１が、どのような種類の補助情報を取得するかは、物体に応じて予め決められている。ここでは、物体としてボートが想定されているため、第２補助情報参照部５０１は、第２補助情報ＤＢ６０を参照して、水域に関する情報を取得する。
第２補助情報参照部５０１は、取得した補助情報を、対象画像と対応付けて、第２補助情報合成部５０２に出力する。
第２補助情報参照部５０１は、補助情報を取得できなかった場合は、対象画像をそのまま、第２補助情報合成部５０２に出力する。The image acquisition unit 500 acquires the target image. The target image is prepared in advance and is stored in a place where the inference device 50 can be referred to, for example.
The metadata of the target image includes position information (hereinafter referred to as “second imaged position information”) indicating a point or region captured in the target image. Specifically, the second imaging position information is, for example, latitude and longitude information indicating a point or region imaged in the target image.
The image acquisition unit 500 outputs the acquired target image to the second auxiliary information reference unit 501.
The second auxiliary information reference unit 501 refers to the second auxiliary information DB 60, and based on the second imaging position information included in the metadata of the target image acquired by the image acquisition unit 500, the second auxiliary information DB 60 Get auxiliary information from.
What kind of auxiliary information the second auxiliary information reference unit 501 acquires is predetermined according to the object. Here, since a boat is assumed as an object, the second auxiliary information reference unit 501 refers to the second auxiliary information DB 60 and acquires information on the water area.
The second auxiliary information reference unit 501 outputs the acquired auxiliary information to the second auxiliary information synthesis unit 502 in association with the target image.
When the auxiliary information cannot be acquired, the second auxiliary information reference unit 501 outputs the target image as it is to the second auxiliary information synthesis unit 502.

第２補助情報合成部５０２は、第２補助情報参照部５０１から出力された対象画像に基づき、対象画像に補助情報が対応付けられている場合、補助情報と対象画像とを合成する。具体的には、第２補助情報合成部５０２は、例えば、対象画像において、水域に該当する箇所以外の画素を、特定の色で塗りつぶしてマスクした画像を生成する。実施の形態１において、第２補助情報合成部５０２が補助情報と対象画像を合成して生成した画像を、「合成対象画像」ともいう。ここでは、一例として、第２補助情報合成部５０２は、合成対象画像において、水域に該当する箇所以外の箇所の画素を、黒色で塗りつぶすものとする。第２補助情報合成部５０２が、対象画像と補助情報を合成する方法は、適宜の方法とすることが可能であるが、第１補助情報合成部１０２が、補助情報と教師画像を合成する方法とあわせる必要がある。
なお、上述のとおり、補助情報は、第２撮像位置情報に基づき取得されたものであり、対象画像に撮像されている地点または領域と、地理的に対応した情報である。従って、第２補助情報合成部５０２は、補助情報に基づき、対象画像上で水域に該当する箇所を特定することができる。
第２補助情報合成部５０２は、合成対象画像を、第２画像分割部５０３に出力する。
このとき、第２補助情報合成部５０２は、合成対象画像とともに、対象画像も、第２画像分割部５０３に出力するものとする。
第２補助情報合成部５０２は、対象画像に補助情報が対応づけられていない場合、対象画像のみを、そのまま、第２画像分割部５０３に出力する。The second auxiliary information synthesizing unit 502 synthesizes the auxiliary information and the target image based on the target image output from the second auxiliary information reference unit 501 when the auxiliary information is associated with the target image. Specifically, the second auxiliary information synthesizing unit 502 generates, for example, an image in which pixels other than the portion corresponding to the water area are filled with a specific color and masked in the target image. In the first embodiment, the image generated by the second auxiliary information synthesizing unit 502 by synthesizing the auxiliary information and the target image is also referred to as a “composite target image”. Here, as an example, the second auxiliary information synthesizing unit 502 fills the pixels of the portion other than the portion corresponding to the water area with black in the image to be synthesized. The method in which the second auxiliary information synthesizing unit 502 synthesizes the target image and the auxiliary information can be an appropriate method, but the method in which the first auxiliary information synthesizing unit 102 synthesizes the auxiliary information and the teacher image. Need to be combined with.
As described above, the auxiliary information is acquired based on the second imaging position information, and is information that geographically corresponds to the point or area imaged in the target image. Therefore, the second auxiliary information synthesizing unit 502 can specify a portion corresponding to the water area on the target image based on the auxiliary information.
The second auxiliary information synthesis unit 502 outputs the image to be synthesized to the second image segmentation unit 503.
At this time, the second auxiliary information synthesizing unit 502 shall output the target image as well as the synthesizing target image to the second image segmentation unit 503.
When the auxiliary information is not associated with the target image, the second auxiliary information synthesizing unit 502 outputs only the target image to the second image segmentation unit 503 as it is.

第２画像分割部５０３は、第２補助情報合成部５０２から出力された、対象画像または合成対象画像を、予め決められたサイズに分割する。
具体的には、第２画像分割部５０３は、第２補助情報合成部５０２から対象画像のみが出力された場合は、対象画像を、予め決められたサイズに分割する。第２画像分割部５０３は、第２補助情報合成部５０２から、対象画像とともに合成対象画像が出力された場合は、合成対象画像を、予め決められたサイズに分割する。以下、第２画像分割部５０３によって分割された対象画像または合成対象画像を、「小対象画像」という。The second image segmentation unit 503 divides the target image or the composition target image output from the second auxiliary information synthesis unit 502 into a predetermined size.
Specifically, the second image segmentation unit 503 divides the target image into a predetermined size when only the target image is output from the second auxiliary information synthesis unit 502. When the composition target image is output together with the target image from the second auxiliary information composition unit 502, the second image segmentation unit 503 divides the composition target image into a predetermined size. Hereinafter, the target image or the composite target image divided by the second image segmentation unit 503 is referred to as a "small target image".

第２画像分割部５０３が対象画像または合成対象画像を分割する具体的な方法の一例について説明する。以下では、第２画像分割部５０３が合成対象画像を分割する方法の一例として説明するが、第２画像分割部５０３が対象画像を分割する方法も合成対象画像を分割する方法と同様である。 An example of a specific method in which the second image segmentation unit 503 divides the target image or the composite target image will be described. Hereinafter, the method in which the second image segmentation unit 503 divides the target image will be described as an example, but the method in which the second image segmentation unit 503 divides the target image is the same as the method in which the composite target image is divided.

図５は、実施の形態１において、第２画像分割部５０３が、小対象画像に分割する前の合成対象画像と、小対象画像に分割した後の合成対象画像のイメージの一例を説明するための図である。
図５では、合成対象画像のサイズが１０２４×２０４８であったとし、第２画像分割部５０３は、合成対象画像を、２５６×２５６のサイズの小対象画像に分割するものとしている。
実施の形態１において、第２画像分割部５０３が合成対象画像を分割した結果、合成対象画像は、例えば、５３枚の小対象画像に分割される。第２画像分割部５０３が、合成対象画像に基づき、合成対象画像を分割してできた小対象画像同士の間にオーバーラップが発生するような小対象画像を作成するのは、合成対象画像を分割してできた小対象画像同士の境界に存在する物体が検出されないことを防ぐためである。FIG. 5 is for explaining an example of an image of a composite target image before being divided into small target images and an image of a composite target image after being divided into small target images by the second image segmentation unit 503 in the first embodiment. It is a figure of.
In FIG. 5, it is assumed that the size of the composite target image is 1024 × 2048, and the second image segmentation unit 503 divides the composite target image into small target images having a size of 256 × 256.
As a result of the second image segmentation unit 503 dividing the composite target image in the first embodiment, the composite target image is divided into, for example, 53 small target images. The second image division unit 503 creates a small target image in which overlap occurs between the small target images created by dividing the composite target image based on the composite target image. This is to prevent an object existing at the boundary between the divided small target images from being detected.

第２画像分割部５０３は、対象画像と、対象画像を分割してできた小対象画像または合成対象画像を分割してできた小対象画像とを、推論部５０４に出力する。 The second image segmentation unit 503 outputs the target image and the small target image formed by dividing the target image or the small target image formed by dividing the composite target image to the inference unit 504.

推論部５０４は、第２画像分割部５０３が分割した小対象画像を入力として、機械学習モデル４０を用いて、小対象画像上の物体の有無を推論し、小対象画像上に物体が存在する場合、物体の位置およびサイズを検出する。
ただし、推論部５０４は、明らかに推論が不要な小対象画像があれば、当該小対象画像については、推論対象外として推論を行わないことも可能である。明らかに推論が不要な小対象画像とは、ここでは、画像全面が水域ではないと判断可能な小対象画像である。
例えば、水上を航行中、または、水上で出発準備中のボートを検出する場合には、陸上のボートは検出の対象外となる。従って、推論部５０４は、例えば、画像全面が陸である小対象画像については、推論を行わなくてもよい。推論部５０４は、画像全面が陸であることを、例えば、小対象画像の全面が黒く塗りつぶしがされていることで判断すればよい。推論部５０４が、明らかに推論が不要な小対象画像について推論を行わない場合、推論部５０４は、補助情報、言い換えれば、水域に関する情報に基づき、小対象画像の全面が黒く塗りつぶされた小対象画像以外の小対象画像に対して、推論を行う。
推論部５０４は、全ての小対象画像を入力とした推論、または、推論対象外の小対象画像以外の全ての小対象画像を入力した推論を行うと、対象画像と、小対象画像毎の推論結果とを、検出結果統合部５０５に出力する。The inference unit 504 infers the presence or absence of an object on the small object image by using the machine learning model 40 with the small object image divided by the second image division unit 503 as an input, and the object exists on the small object image. If so, detect the position and size of the object.
However, if there is a small object image that clearly does not require inference, the inference unit 504 may not perform inference for the small object image as a non-inference target. The small target image that clearly does not require inference is a small target image that can be determined here that the entire surface of the image is not a water area.
For example, when detecting a boat sailing on the water or preparing for departure on the water, the boat on land is excluded from the detection. Therefore, the inference unit 504 does not have to make an inference for, for example, a small target image whose entire surface is land. The inference unit 504 may determine that the entire surface of the image is land, for example, by the entire surface of the small target image being painted black. When the inference unit 504 does not infer about a small object image that clearly does not require inference, the inference unit 504 is a small object in which the entire surface of the small object image is painted black based on auxiliary information, in other words, information about the water area. Inference is performed for small target images other than images.
When the inference unit 504 makes an inference with all the small object images as inputs or an inference with all the small object images other than the small object images not inferred, the inference for each target image and each small object image is performed. The result is output to the detection result integration unit 505.

検出結果統合部５０５は、推論部５０４から出力された、小対象画像毎の推論結果に基づき、対象画像に対する推論結果となるよう、小対象画像毎の推論結果を統合して、対象画像に対する推論結果を生成する。
例えば、オーバーラップがある状態で、小対象画像をそれぞれ入力画像として推論部５０４が推論を行うと、同じ物体が複数の小対象画像にうつりこんでいる場合に、当該物体をダブルカウントすることになる。そこで、同じ物体がダブルカウントされることを避けるため、検出結果統合部５０５が、小対象画像毎の推論結果を統合する。検出結果統合部５０５は、小対象画像の境界部分を考慮しながら、小対象画像毎の推論結果を統合する。このように小対象画像毎の推論結果を統合して生成された推論結果が、対象画像に対する推論結果となる。
検出結果統合部５０５は、対象画像と、当該対象画像に対する推論結果とを、検出結果出力部５０６に出力する。The detection result integration unit 505 integrates the inference results for each small target image so as to be the inference result for the target image based on the inference result for each small target image output from the inference unit 504, and infers for the target image. Produce results.
For example, when the inference unit 504 infers with each small object image as an input image in a state where there is overlap, when the same object is transferred to a plurality of small object images, the object is double-counted. Become. Therefore, in order to avoid double counting of the same object, the detection result integration unit 505 integrates the inference results for each small target image. The detection result integration unit 505 integrates the inference results for each small target image while considering the boundary portion of the small target image. The inference result generated by integrating the inference results for each small target image in this way becomes the inference result for the target image.
The detection result integration unit 505 outputs the target image and the inference result for the target image to the detection result output unit 506.

検出結果出力部５０６は、検出結果統合部５０５から出力された対象画像と当該対象画像に対する推論結果に基づき、推論結果がユーザに目視可能となるような表示画面を示す表示データを、例えば、表示装置（図示省略）に出力する。表示装置は、例えば、物体検出装置１とネットワークを介して接続されている。検出結果出力部５０６は、具体的には、例えば、対象画像上で、物体を囲む矩形を重畳表示させる表示用データを生成し、表示装置に出力する。表示装置は、検出結果出力部５０６から出力された表示用データに従い、対象画像上に、物体を囲む矩形が重畳表示された画面を表示する。 The detection result output unit 506 displays, for example, display data indicating a display screen that makes the inference result visible to the user based on the target image output from the detection result integration unit 505 and the inference result for the target image. Output to the device (not shown). The display device is connected to the object detection device 1 via a network, for example. Specifically, the detection result output unit 506 generates, for example, display data for superimposing and displaying a rectangle surrounding an object on a target image, and outputs the display data to the display device. The display device displays a screen in which a rectangle surrounding the object is superimposed and displayed on the target image according to the display data output from the detection result output unit 506.

第２補助情報ＤＢ６０は、補助情報を記憶しているデータベースである。 The second auxiliary information DB 60 is a database that stores auxiliary information.

なお、実施の形態１では、図１に示すように、第１補助情報ＤＢ２０、第２補助情報ＤＢ６０、および、機械学習モデル４０は、物体検出装置１に備えられるものとするが、これに限らず、第１補助情報ＤＢ２０、第２補助情報ＤＢ６０、および、機械学習モデル４０は、物体検出装置１の外部の、学習装置１０または推論装置５０が参照可能な場所に備えられるようにしてもよい。また、第１補助情報ＤＢ２０と第２補助情報ＤＢ６０とは、共通の１つの補助情報ＤＢとして構成されていても良い。 In the first embodiment, as shown in FIG. 1, the first auxiliary information DB 20, the second auxiliary information DB 60, and the machine learning model 40 are provided in the object detection device 1, but the present invention is limited to this. Instead, the first auxiliary information DB 20, the second auxiliary information DB 60, and the machine learning model 40 may be provided outside the object detection device 1 at a place where the learning device 10 or the inference device 50 can be referred to. .. Further, the first auxiliary information DB 20 and the second auxiliary information DB 60 may be configured as one common auxiliary information DB.

実施の形態１に係る物体検出装置１の動作について説明する。
図６および図７は、実施の形態１に係る物体検出装置１の動作を説明するためのフローチャートである。図６は、実施の形態１に係る学習装置１０の動作を説明するためのフローチャートであり、図７は、実施の形態１に係る推論装置５０の動作を説明するためのフローチャートである。
まず、図６を用いて、学習装置１０の動作について説明する。
教師データ取得部１００は、教師データを取得し、取得した教師データを、第１補助情報参照部１０１および学習部１０６に出力する。
図６中の「ｐ＝１，教師画像数，１」は、学習装置１０が、以下のステップＳＴ６０１〜ステップＳＴ６０３の処理を、教師データに含まれる複数の教師画像の全てに対して順次行うことを示している。すなわち、以下のステップＳＴ６０１〜ＳＴ６０３の処理の説明において、「教師画像」とは、現在、処理対象となっている、ある１つの教師画像を意味し、「教師データ」とは、当該１つの教師画像と当該教師画像に対応づけられたテキスト情報とを意味している。
第１補助情報参照部１０１は、第１補助情報ＤＢ２０を参照して、教師データ取得部１００が取得した教師画像のメタデータに含まれている第１撮像位置情報に基づき、第１補助情報ＤＢ２０から補助情報を取得する（ステップＳＴ６０１）。ここでは、第１補助情報参照部１０１は、第１補助情報ＤＢ２０を参照して、水域に関する情報を取得する。
第１補助情報参照部１０１は、教師データと、教師画像上の物体に対応する補助情報とを第１補助情報合成部１０２に出力する。
具体例を挙げると、第１補助情報参照部１０１は、補助情報として、水域に関する情報を取得し、教師データとともに、第１補助情報合成部１０２に出力する。The operation of the object detection device 1 according to the first embodiment will be described.
6 and 7 are flowcharts for explaining the operation of the object detection device 1 according to the first embodiment. FIG. 6 is a flowchart for explaining the operation of the learning device 10 according to the first embodiment, and FIG. 7 is a flowchart for explaining the operation of the inference device 50 according to the first embodiment.
First, the operation of the learning device 10 will be described with reference to FIG.
The teacher data acquisition unit 100 acquires teacher data and outputs the acquired teacher data to the first auxiliary information reference unit 101 and the learning unit 106.
In FIG. 6, “p = 1, number of teacher images, 1” means that the learning device 10 sequentially performs the following steps ST601 to ST603 for all of the plurality of teacher images included in the teacher data. Is shown. That is, in the following description of the processing of steps ST601 to ST603, the "teacher image" means one teacher image currently being processed, and the "teacher data" means the one teacher. It means the image and the text information associated with the teacher image.
The first auxiliary information reference unit 101 refers to the first auxiliary information DB 20, and based on the first imaging position information included in the metadata of the teacher image acquired by the teacher data acquisition unit 100, the first auxiliary information DB 20 Auxiliary information is acquired from (step ST601). Here, the first auxiliary information reference unit 101 refers to the first auxiliary information DB 20 to acquire information on the water area.
The first auxiliary information reference unit 101 outputs the teacher data and the auxiliary information corresponding to the object on the teacher image to the first auxiliary information synthesis unit 102.
To give a specific example, the first auxiliary information reference unit 101 acquires information on the water area as auxiliary information and outputs it to the first auxiliary information synthesis unit 102 together with the teacher data.

第１補助情報合成部１０２は、ステップＳＴ６０１にて第１補助情報参照部１０１から出力された教師データおよび補助情報に基づき、補助情報と教師画像を合成する（ステップＳＴ６０２）。具体的には、第１補助情報合成部１０２は、例えば、教師画像において、水域に該当する箇所以外の箇所の画素を、特定の色で塗りつぶした合成教師画像を生成する。
第１補助情報合成部１０２は、教師画像を合成教師画像に置き換えた合成教師データを、第１画像分割部１０３に出力する。The first auxiliary information synthesizing unit 102 synthesizes the auxiliary information and the teacher image based on the teacher data and the auxiliary information output from the first auxiliary information reference unit 101 in step ST601 (step ST602). Specifically, the first auxiliary information synthesizing unit 102 generates, for example, a composite teacher image in which the pixels of a portion other than the portion corresponding to the water area are filled with a specific color in the teacher image.
The first auxiliary information synthesis unit 102 outputs the composite teacher data in which the teacher image is replaced with the composite teacher image to the first image segmentation unit 103.

第１画像分割部１０３は、ステップＳＴ６０２において第１補助情報合成部１０２から出力された合成教師データに含まれる合成教師画像のサイズが大きい場合に、当該合成教師画像を、予め決められたサイズの小教師画像に分割する（ステップＳＴ６０３）。
第１画像分割部１０３は、小教師画像に分割した後の合成教師データを、統計量解析部１０４に出力する。
学習装置１０は、以上のステップＳＴ６０１〜ステップＳＴ６０３の処理を、教師データに含まれる複数の教師画像の全てに対して順次行う。
学習装置１０は、教師データに含まれる複数の教師画像の全てに対してステップＳＴ６０１〜ステップＳＴ６０３の処理を行うと、ステップＳＴ６０４の処理へ進む。When the size of the composite teacher image included in the composite teacher data output from the first auxiliary information synthesizer 102 in step ST602 is large, the first image segmentation unit 103 sets the composite teacher image to a predetermined size. It is divided into small teacher images (step ST603).
The first image segmentation unit 103 outputs the composite teacher data after being divided into small teacher images to the statistic analysis unit 104.
The learning device 10 sequentially performs the above steps ST601 to ST603 for all of the plurality of teacher images included in the teacher data.
When the learning device 10 performs the processes of steps ST601 to ST603 on all of the plurality of teacher images included in the teacher data, the learning device 10 proceeds to the process of step ST604.

統計量解析部１０４は、ステップＳＴ６０３にて第１画像分割部１０３から出力された合成教師データについて、小教師画像の特性ごとに当該小教師画像を分類し、分類毎の小教師画像の枚数をカウントする（ステップＳＴ６０４）。
統計量解析部１０４は、小教師画像の枚数をカウントした結果に関する情報を、合成教師データとともに、教師データ間引き部１０５に出力する。このとき、統計量解析部１０４は、小教師画像に対して、どの分類に分類分けされたかの情報を付与するようにする。The statistic analysis unit 104 classifies the composite teacher image output from the first image division unit 103 in step ST603 according to the characteristics of the teacher image, and determines the number of teacher images for each classification. Count (step ST604).
The statistic analysis unit 104 outputs information regarding the result of counting the number of small teacher images to the teacher data thinning unit 105 together with the synthetic teacher data. At this time, the statistic analysis unit 104 adds information on which classification the teacher image is classified into.

教師データ間引き部１０５は、ステップＳＴ６０４にて統計量解析部１０４が小教師画像の枚数をカウントした結果に基づき、偏りのある分類に属する小教師画像を、間引く（ステップＳＴ６０５）。
教師データ間引き部１０５は、間引き後の合成教師データを、学習部１０６に出力する。The teacher data thinning unit 105 thins out the small teacher images belonging to the biased classification based on the result of the statistic analysis unit 104 counting the number of small teacher images in step ST604 (step ST605).
The teacher data thinning unit 105 outputs the synthesized teacher data after thinning out to the learning unit 106.

学習部１０６は、ステップＳＴ６０５にて教師データ間引き部１０５から出力された間引き後の合成教師データと、ステップＳＴ６０１にて教師データ取得部１００が取得した教師データとを、所定の比率で混ぜた上で学習を行い（ステップＳＴ６０６）、機械学習モデル４０を生成する（ステップＳＴ６０７）。 The learning unit 106 mixes the synthetic teacher data output from the teacher data thinning unit 105 in step ST605 and the teacher data acquired by the teacher data acquisition unit 100 in step ST601 in a predetermined ratio. (Step ST606) to generate a machine learning model 40 (step ST607).

次に、図７を用いて、推論装置５０の動作について説明する。
画像取得部５００は、対象画像を取得し、取得した対象画像を、第２補助情報参照部５０１に出力する。
第２補助情報参照部５０１は、第２補助情報ＤＢ６０を参照して、画像取得部５００が取得した対象画像のメタデータに含まれている第２撮像位置情報に基づき、第１補助情報ＤＢ２０内の補助情報が参照可能かどうかを判定する（ステップＳＴ７０１）。ここでは、第２補助情報参照部５０１は、第２補助情報ＤＢ６０内の水域に関する情報が参照可能かどうかを判定する。Next, the operation of the inference device 50 will be described with reference to FIG. 7.
The image acquisition unit 500 acquires the target image and outputs the acquired target image to the second auxiliary information reference unit 501.
The second auxiliary information reference unit 501 refers to the second auxiliary information DB 60, and based on the second imaging position information included in the metadata of the target image acquired by the image acquisition unit 500, the second auxiliary information reference unit 501 is in the first auxiliary information DB 20. It is determined whether or not the auxiliary information of the above can be referred to (step ST701). Here, the second auxiliary information reference unit 501 determines whether or not the information regarding the water area in the second auxiliary information DB 60 can be referred to.

ステップＳＴ７０１において、第２補助情報参照部５０１が、水域に関する情報が参照可能ではないと判定した場合（ステップＳＴ７０１の“ＮＯ”の場合）、推論装置５０の動作は、ステップＳＴ７０４に進む。このとき、第２補助情報参照部５０１は、対象画像を、第２補助情報合成部５０２を介して第２画像分割部５０３に出力する。
ステップＳＴ７０１において、第２補助情報参照部５０１が、水域に関する情報が参照可能であると判定した場合（ステップＳＴ７０１の“ＹＥＳ”の場合）、第２補助情報参照部５０１は、第２補助情報ＤＢ６０から補助情報を取得する（ステップＳＴ７０２）。ここでは、第２補助情報参照部５０１は、第２補助情報ＤＢ６０から水域に関する情報を取得する。
第２補助情報参照部５０１は、取得した補助情報を、対象画像と対応付けて、第２補助情報合成部５０２に出力する。When the second auxiliary information reference unit 501 determines in step ST701 that the information regarding the water area cannot be referred to (when “NO” in step ST701), the operation of the inference device 50 proceeds to step ST704. At this time, the second auxiliary information reference unit 501 outputs the target image to the second image segmentation unit 503 via the second auxiliary information synthesis unit 502.
In step ST701, when the second auxiliary information reference unit 501 determines that the information about the water area can be referred to (when “YES” in step ST701), the second auxiliary information reference unit 501 uses the second auxiliary information DB 60. Obtain auxiliary information from (step ST702). Here, the second auxiliary information reference unit 501 acquires information about the water area from the second auxiliary information DB 60.
The second auxiliary information reference unit 501 outputs the acquired auxiliary information to the second auxiliary information synthesis unit 502 in association with the target image.

第２補助情報合成部５０２は、ステップＳＴ７０２にて第２補助情報参照部５０１から出力された対象画像に基づき、補助情報と対象画像とを合成する（ステップＳＴ７０３）。具体的には、第２補助情報合成部５０２は、例えば、対象画像において、水域に該当する箇所以外の箇所の画素を、特定の色で塗りつぶした合成対象画像を生成する。
第２補助情報合成部５０２は、合成対象画像を、第２画像分割部５０３に出力する。
このとき、第２補助情報合成部５０２は、合成対象画像とともに、対象画像も、第２画像分割部５０３に出力する。The second auxiliary information synthesizing unit 502 synthesizes the auxiliary information and the target image based on the target image output from the second auxiliary information reference unit 501 in step ST702 (step ST703). Specifically, the second auxiliary information compositing unit 502 generates, for example, a compositing target image in which the pixels of a portion other than the portion corresponding to the water area of the target image are filled with a specific color.
The second auxiliary information synthesis unit 502 outputs the image to be synthesized to the second image segmentation unit 503.
At this time, the second auxiliary information synthesizing unit 502 outputs the target image together with the synthesizing target image to the second image segmentation unit 503.

第２画像分割部５０３は、第２補助情報合成部５０２から対象画像のみが出力された場合（ステップＳＴ７０１の“ＮＯ”の場合）は、対象画像を、予め決められたサイズの小対象画像に分割する。
一方、第２画像分割部５０３は、第２補助情報合成部５０２から、対象画像とともに合成対象画像が出力された場合（ステップＳＴ７０１の“ＹＥＳ”〜ステップＳＴ７０３の場合）は、合成対象画像を、予め決められたサイズの小対象画像に分割する（ステップＳＴ７０４）。
第２画像分割部５０３は、対象画像と、対象画像を分割してできた小対象画像または合成対象画像を分割してできた小対象画像とを、推論部５０４に出力する。When only the target image is output from the second auxiliary information synthesizing unit 502 (in the case of “NO” in step ST701), the second image segmentation unit 503 converts the target image into a small target image of a predetermined size. To divide.
On the other hand, when the second image segmentation unit 503 outputs the composition target image together with the target image from the second auxiliary information synthesis unit 502 (“YES” in step ST701 to step ST703), the second image segmentation unit 503 displays the composition target image. It is divided into small target images of a predetermined size (step ST704).
The second image segmentation unit 503 outputs the target image and the small target image formed by dividing the target image or the small target image formed by dividing the composite target image to the inference unit 504.

図７中の「ｐ＝１，分割数，１」は、推論装置５０が、以下のステップＳＴ７０５〜ステップＳＴ７０６の処理を、ステップＳＴ７０４にて第２画像分割部５０３から出力された全ての小対象画像に対して順次行うことを示している。
推論部５０４は、小対象画像が、明らかに推論が不要な画像であるか否かを判定する（ステップＳＴ７０５）。具体的には、ここでは、推論部５０４は、小対象画像の画像全面が陸であるかどうかを判定する。
ステップＳＴ７０５において、小対象画像の画像全面が陸ではない、すなわち、画像の一部または全面が水域であると判定した場合（ステップＳＴ７０５の“ＮＯ”の場合）、推論部５０４は、小対象画像に対して、推論を行う（ステップＳＴ７０６）。
ステップＳＴ７０５において、小対象画像の画像全面が陸であると判定した場合（ステップＳＴ７０５の“ＹＥＳ”の場合）、推論部５０４は、ステップＳＴ７０６の処理を行わない。“P = 1, number of divisions, 1” in FIG. 7 indicates that the inference device 50 performs the processing of the following steps ST705 to ST706 in all the small objects output from the second image segmentation unit 503 in step ST704. It is shown that the image is sequentially performed.
The inference unit 504 determines whether or not the small target image is an image that clearly does not require inference (step ST705). Specifically, here, the inference unit 504 determines whether or not the entire image of the small target image is land.
In step ST705, when it is determined that the entire image of the small target image is not land, that is, a part or the entire surface of the image is a water area (when “NO” in step ST705), the inference unit 504 determines that the small target image is not land. (Step ST706).
When it is determined in step ST705 that the entire image of the small target image is land (when “YES” in step ST705), the inference unit 504 does not perform the process of step ST706.

推論部５０４は、ステップＳＴ７０５〜ステップＳＴ７０６の処理を、ステップＳＴ７０４にて第２画像分割部５０３から出力された全ての小対象画像に対して行う。
推論部５０４は、全ての小対象画像に対してステップＳＴ７０５〜ステップＳＴ７０６の処理を行うと、対象画像と、小対象画像毎の推論結果とを、検出結果統合部５０５に出力するThe inference unit 504 performs the processing of steps ST705 to ST706 on all the small target images output from the second image segmentation unit 503 in step ST704.
When the inference unit 504 performs the processes of steps ST705 to ST706 for all the small target images, the inference unit 504 outputs the target image and the inference result for each small target image to the detection result integration unit 505.

検出結果統合部５０５は、ステップＳＴ７０６にて推論部５０４から出力された、小対象画像毎の推論結果に基づき、対象画像に対する推論結果となるよう、小対象画像毎の推論結果を統合する（ステップＳＴ７０７）。
検出結果統合部５０５は、対象画像と、当該対象画像に対する推論結果とを、検出結果出力部５０６に出力する。The detection result integration unit 505 integrates the inference results for each small target image so as to be the inference result for the target image based on the inference result for each small target image output from the inference unit 504 in step ST706 (step). ST707).
The detection result integration unit 505 outputs the target image and the inference result for the target image to the detection result output unit 506.

検出結果出力部５０６は、ステップＳＴ７０７にて検出結果統合部５０５から出力された対象画像と当該対象画像に対する推論結果に基づき、推論結果がユーザに目視可能となるような表示画面を示す表示データを、表示装置に出力する（ステップＳＴ７０８）。 The detection result output unit 506 displays display data indicating a display screen so that the inference result can be visually recognized by the user based on the target image output from the detection result integration unit 505 in step ST707 and the inference result for the target image. , Output to the display device (step ST708).

このように、物体検出装置１における学習装置１０は、対象画像から物体を検出するための機械学習モデル４０を生成する際、教師データと補助情報とに基づいて学習を実行することで、機械学習モデル４０を生成する。その際、学習装置１０は、教師データに含まれる教師画像と補助情報とを合成した合成教師画像を用いて学習を実行する。また、物体検出装置１において、推論装置５０は、学習装置１０が生成した機械学習モデル４０を用いて、対象画像から物体を検出する。その際、推論装置５０は、機械学習モデル４０の入力として、対象画像と補助情報とを合成した合成対象画像を入力可能とした。
物体検出装置１は、補助情報を考慮して対象画像から物体を検出できるため、物体の検出精度が向上する。As described above, when the learning device 10 in the object detection device 1 generates the machine learning model 40 for detecting an object from the target image, the learning device 10 executes the learning based on the teacher data and the auxiliary information to perform machine learning. Generate model 40. At that time, the learning device 10 executes learning by using the composite teacher image obtained by synthesizing the teacher image included in the teacher data and the auxiliary information. Further, in the object detection device 1, the reasoning device 50 detects an object from the target image by using the machine learning model 40 generated by the learning device 10. At that time, the inference device 50 made it possible to input the composite target image in which the target image and the auxiliary information were combined as the input of the machine learning model 40.
Since the object detection device 1 can detect an object from the target image in consideration of auxiliary information, the object detection accuracy is improved.

一般的には、上述した従来技術のように、補助情報を用いるためには、対象画像および補助情報の両方を入力パラメータとする機械学習モデルを改めて設計する必要があった。また、汎用ソフト等に組み込まれている機械学習モジュールを使用する場合は、機械学習モデルの変更が不能な場合もある。
これに対し、実施の形態１に係る物体検出装置１は、補助情報を入力パラメータとする機械学習モデルを用いることなく、補助情報を考慮した物体の検出を行うことができる。すなわち、例えば、対象画像のみを入力パラメータとする機械学習モデルを用いつつ、かつ、補助情報を考慮した精度の高い物体の検出を行うことができる。In general, as in the conventional technique described above, in order to use the auxiliary information, it is necessary to redesign a machine learning model in which both the target image and the auxiliary information are input parameters. In addition, when using a machine learning module built into general-purpose software or the like, it may not be possible to change the machine learning model.
On the other hand, the object detection device 1 according to the first embodiment can detect an object in consideration of the auxiliary information without using a machine learning model in which the auxiliary information is used as an input parameter. That is, for example, it is possible to detect an object with high accuracy in consideration of auxiliary information while using a machine learning model in which only the target image is used as an input parameter.

また、実施の形態１に係る物体検出装置１において、機械学習モデル４０への入力パラメータは、補助情報を用いる場合も、補助情報を用いない場合も、例えば画像のみとすることができ、いずれの場合も同じ機械学習モデル４０での推論を実行できる。これは、学習を実行する際も、補助情報を用いる場合と補助情報を用いない場合との両方の場合に対応した学習を、同時に実行できることを意味している。例えば、実施の形態１では、合成教師データと教師データとを、所定の比率で混ぜた上で学習を実行することで、補助情報を用いる場合と補助情報を用いない場合との両方に用いることができる機械学習モデル４０を生成している。そのため、実施の形態１に係る物体検出装置１においては、補助情報を用いる場合と、補助情報を用いない場合とで、互いに別の機械学習モデルを生成するために、別々の学習を実行させる場合よりも、学習時間の短縮が可能となる。 Further, in the object detection device 1 according to the first embodiment, the input parameter to the machine learning model 40 can be, for example, only an image regardless of whether the auxiliary information is used or not. In this case, the same machine learning model 40 can be used for inference. This means that when the learning is executed, the learning corresponding to both the case where the auxiliary information is used and the case where the auxiliary information is not used can be executed at the same time. For example, in the first embodiment, the synthetic teacher data and the teacher data are mixed at a predetermined ratio and then the learning is executed, so that the synthetic teacher data and the teacher data are used in both the case where the auxiliary information is used and the case where the auxiliary information is not used. The machine learning model 40 that can be used is generated. Therefore, in the object detection device 1 according to the first embodiment, different learning is executed in order to generate different machine learning models depending on whether the auxiliary information is used or not. It is possible to shorten the learning time.

また、一般に、機械学習では、教師データを様々なケースにおいて偏りなく収集することが望ましいが、多くの場合、教師データの偏りが発生することで、特定の条件下で物体の誤検知が増える。
これに対し、実施の形態１に係る物体検出装置１では、第１画像分割部１０３が、第１補助情報合成部１０２が生成した合成教師画像を複数の小教師画像に分割し、統計量解析部１０４が、当該複数の小教師画像を複数の分類に分類する。そして、教師データ間引き部１０５が、統計量解析部１０４が分類した後の、各分類に属する小教師画像を、各分類に属する小教師画像の数に応じて、間引きする。そのため、偏りのない教師データから機械学習モデル４０を作成することができ、対象画像から物体を検出する精度を向上させることができる。Further, in general, in machine learning, it is desirable to collect teacher data without bias in various cases, but in many cases, bias of teacher data increases false detection of an object under specific conditions.
On the other hand, in the object detection device 1 according to the first embodiment, the first image segmentation unit 103 divides the composite teacher image generated by the first auxiliary information synthesis unit 102 into a plurality of sub-teacher images, and performs statistical analysis. The unit 104 classifies the plurality of teacher images into a plurality of classifications. Then, the teacher data thinning unit 105 thins out the small teacher images belonging to each classification after being classified by the statistic analysis unit 104 according to the number of small teacher images belonging to each classification. Therefore, the machine learning model 40 can be created from the teacher data without bias, and the accuracy of detecting an object from the target image can be improved.

以上の実施の形態１では、ボートに対する補助情報は水域に関する情報とし、補助情報は１つとして説明したが、物体に対する補助情報は１種類に限らない。物体に対する補助情報は複数種類あってもよい。 In the above-described first embodiment, the auxiliary information for the boat is the information about the water area, and the auxiliary information is described as one, but the auxiliary information for the object is not limited to one type. There may be a plurality of types of auxiliary information for an object.

また、以上の実施の形態１では、物体をボートとしたが、これは一例に過ぎない。例えば、物体は車両とし、物体検出装置１は、車両を検出する際に、補助情報として道路に関する情報を考慮するようにしてもよい。車両を検出する際に、道路に関する情報を考慮することで、例えば、通常、車両が存在しないような、道路以外の場所での、車両の誤検出を抑制することができる。
また、以上の実施の形態１では、第１補助情報合成部１０２および第２補助情報合成部５０２は、二値の水域情報を画像に反映するものとして説明したが、これは一例に過ぎない。例えば、第１補助情報合成部１０２および第２補助情報合成部５０２は、補助情報を、二値ではなく、５０％のグレーとして教師画像または対象画像に反映するようにしてもよい。また、例えば、第１補助情報合成部１０２および第２補助情報合成部５０２は、補助情報をモノクロではなく特定の色として教師画像または対象画像に反映するようにしてもよい。Further, in the above-described first embodiment, the object is a boat, but this is only an example. For example, the object may be a vehicle, and the object detection device 1 may consider information about the road as auxiliary information when detecting the vehicle. By considering the information about the road when detecting the vehicle, it is possible to suppress erroneous detection of the vehicle in a place other than the road, for example, where the vehicle normally does not exist.
Further, in the above-described first embodiment, the first auxiliary information synthesis unit 102 and the second auxiliary information synthesis unit 502 have been described as reflecting binary water area information on the image, but this is only an example. For example, the first auxiliary information synthesis unit 102 and the second auxiliary information synthesis unit 502 may reflect the auxiliary information in the teacher image or the target image as 50% gray instead of binary. Further, for example, the first auxiliary information synthesis unit 102 and the second auxiliary information synthesis unit 502 may reflect the auxiliary information in the teacher image or the target image as a specific color instead of monochrome.

また、以上の実施の形態１では、物体検出装置１は、第１画像分割部１０３および第２画像分割部５０３を備え、第１画像分割部１０３および第２画像分割部５０３は、合成教師画像を分割するようにした。しかし、これは一例に過ぎず、物体検出装置１において、合成教師画像の分割は必須ではない。例えば、合成教師画像のサイズが小さい場合、物体検出装置１は、合成教師画像を分割しなくてもよい。この場合、物体検出装置１は、第１画像分割部１０３および第２画像分割部５０３を備えない構成とすることができる。
また、物体検出装置１において、合成教師画像を分割しない場合、学習部１０６は、合成教師データと教師データとを混ぜて学習を実行する際、教師データに含まれる教師画像を、小教師画像と同じサイズの小画像に分割する必要はない。Further, in the above embodiment 1, the object detection device 1 includes a first image segmentation unit 103 and a second image segmentation unit 503, and the first image segmentation unit 103 and the second image segmentation unit 503 are composite teacher images. Was split. However, this is only an example, and the division of the composite teacher image is not essential in the object detection device 1. For example, when the size of the composite teacher image is small, the object detection device 1 does not have to divide the composite teacher image. In this case, the object detection device 1 may be configured not to include the first image segmentation unit 103 and the second image segmentation unit 503.
Further, when the composite teacher image is not divided in the object detection device 1, when the learning unit 106 mixes the composite teacher data and the teacher data and executes learning, the teacher image included in the teacher data is referred to as a teacher image. It is not necessary to divide into small images of the same size.

ここで、図８は、実施の形態１において、例えば、物体を、道路を移動中の車両とし、補助情報を道路情報および幹線道路情報とした場合に、第１補助情報合成部１０２または第２補助情報合成部５０２が、教師画像または対象画像に対して、補助情報を反映した、合成教師画像または合成対象画像を生成するイメージの一例を説明する図である。
図８では、第１補助情報合成部１０２または第２補助情報合成部５０２は、道路（図８の８０１で示す）以外の箇所の画素（図８の８０２で示す）に、透明度５０％の赤色を重ねた合成教師画像または合成対象画像を生成するものとしている。なお、図８では、便宜上、透明度５０％の赤色を、横線で示している。
また、図８では、第１補助情報合成部１０２または第２補助情報合成部５０２は、幹線道路（図８の８０３で示す）以外の箇所の画素（図８の８０４で示す）に、透明度５０％の青色を重ねた合成教師画像または合成対象画像を生成するものとしている。なお、図８では、便宜上、透明度５０％の青色を、縦線で示している。Here, FIG. 8 shows, in the first embodiment, when the object is a vehicle moving on the road and the auxiliary information is the road information and the main road information, the first auxiliary information synthesis unit 102 or the second. It is a figure explaining an example of the image which the auxiliary information synthesis unit 502 generates the composite teacher image or the composite target image which reflected the auxiliary information with respect to the teacher image or the target image.
In FIG. 8, the first auxiliary information synthesizing unit 102 or the second auxiliary information synthesizing unit 502 is red with a transparency of 50% on pixels (indicated by 802 in FIG. 8) at locations other than the road (indicated by 801 in FIG. 8). It is assumed that a composite teacher image or a composite target image is generated by superimposing. In FIG. 8, for convenience, red with a transparency of 50% is indicated by a horizontal line.
Further, in FIG. 8, the first auxiliary information synthesizing unit 102 or the second auxiliary information synthesizing unit 502 has a transparency of 50 on pixels (indicated by 804 in FIG. 8) other than the main road (indicated by 803 in FIG. 8). It is assumed that a composite teacher image or a composite target image in which% blue is superimposed is generated. In FIG. 8, for convenience, blue with a transparency of 50% is indicated by a vertical line.

また、以上の実施の形態１では、学習装置１０および推論装置５０が、物体検出装置１に備えられるものとしたが、これは一例に過ぎない。学習装置１０および推論装置５０は、それぞれ単体で用いられるものとしてもよい。 Further, in the above-described first embodiment, the learning device 10 and the inference device 50 are provided in the object detection device 1, but this is only an example. The learning device 10 and the inference device 50 may be used alone.

図９Ａ，図９Ｂは、実施の形態１に係る学習装置１０および推論装置５０のハードウェア構成の一例を示す図である。
実施の形態１において、教師データ取得部１００と、第１補助情報参照部１０１と、第１補助情報合成部１０２と、第１画像分割部１０３と、統計量解析部１０４と、教師データ間引き部１０５と、学習部１０６の機能は、処理回路９０１により実現される。すなわち、学習装置１０は、教師データと補助情報を用いた学習を実行することにより、対象画像から物体を検出するための機械学習モデル４０を生成する処理の制御を行うための処理回路９０１を備える。
また、実施の形態１において、画像取得部５００と、第２補助情報参照部５０１と、第２補助情報合成部５０２と、第２画像分割部５０３と、推論部５０４と、検出結果統合部５０５と、検出結果出力部５０６の機能は、処理回路９０１により実現される。すなわち、推論装置５０は、対象画像および補助情報を取得し、機械学習モデル４０を用いて、対象画像から物体を検出する処理の制御を行うための処理回路９０１を備える。
処理回路９０１は、図９Ａに示すように専用のハードウェアであっても、図９Ｂに示すようにメモリ９０６に格納されるプログラムを実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０５であってもよい。9A and 9B are diagrams showing an example of the hardware configuration of the learning device 10 and the inference device 50 according to the first embodiment.
In the first embodiment, the teacher data acquisition unit 100, the first auxiliary information reference unit 101, the first auxiliary information synthesis unit 102, the first image division unit 103, the statistic analysis unit 104, and the teacher data thinning unit The functions of 105 and the learning unit 106 are realized by the processing circuit 901. That is, the learning device 10 includes a processing circuit 901 for controlling the process of generating the machine learning model 40 for detecting an object from the target image by executing learning using the teacher data and the auxiliary information. ..
Further, in the first embodiment, the image acquisition unit 500, the second auxiliary information reference unit 501, the second auxiliary information synthesis unit 502, the second image segmentation unit 503, the inference unit 504, and the detection result integration unit 505 The function of the detection result output unit 506 is realized by the processing circuit 901. That is, the inference device 50 includes a processing circuit 901 for acquiring the target image and auxiliary information and controlling the process of detecting the object from the target image by using the machine learning model 40.
The processing circuit 901 may be dedicated hardware as shown in FIG. 9A, or may be a CPU (Central Processing Unit) 905 that executes a program stored in the memory 906 as shown in FIG. 9B.

処理回路９０１が専用のハードウェアである場合、処理回路９０１は、例えば、単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサ、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＦＰＧＡ（Ｆｉｅｌｄ−ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、またはこれらを組み合わせたものが該当する。 When the processing circuit 901 is dedicated hardware, the processing circuit 901 may be, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), or an FPGA (Field-Programmable). Gate Array) or a combination of these is applicable.

処理回路９０１がＣＰＵ９０５の場合、教師データ取得部１００と、第１補助情報参照部１０１と、第１補助情報合成部１０２と、第１画像分割部１０３と、統計量解析部１０４と、教師データ間引き部１０５と、学習部１０６と、画像取得部５００と、第２補助情報参照部５０１と、第２補助情報合成部５０２と、第２画像分割部５０３と、推論部５０４と、検出結果統合部５０５と、検出結果出力部５０６の機能は、ソフトウェア、ファームウェア、または、ソフトウェアとファームウェアとの組み合わせにより実現される。すなわち、教師データ取得部１００と、第１補助情報参照部１０１と、第１補助情報合成部１０２と、第１画像分割部１０３と、統計量解析部１０４と、教師データ間引き部１０５と、学習部１０６と、画像取得部５００と、第２補助情報参照部５０１と、第２補助情報合成部５０２と、第２画像分割部５０３と、推論部５０４と、検出結果統合部５０５と、検出結果出力部５０６は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）９０２、メモリ９０６等に記憶されたプログラムを実行するＣＰＵ９０５、またはシステムＬＳＩ（Ｌａｒｇｅ−ＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の処理回路により実現される。また、ＨＤＤ９０２、またはメモリ９０６等に記憶されたプログラムは、教師データ取得部１００と、第１補助情報参照部１０１と、第１補助情報合成部１０２と、第１画像分割部１０３と、統計量解析部１０４と、教師データ間引き部１０５と、学習部１０６と、画像取得部５００と、第２補助情報参照部５０１と、第２補助情報合成部５０２と、第２画像分割部５０３と、推論部５０４と、検出結果統合部５０５と、検出結果出力部５０６の手順や方法をコンピュータに実行させるものであるとも言える。ここで、メモリ９０６とは、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュメモリ、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）等の、不揮発性もしくは揮発性の半導体メモリ、磁気ディスク、フレキシブルディスク、光ディスク、コンパクトディスク、ミニディスク、またはＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）等が該当する。 When the processing circuit 901 is the CPU 905, the teacher data acquisition unit 100, the first auxiliary information reference unit 101, the first auxiliary information synthesis unit 102, the first image division unit 103, the statistic analysis unit 104, and the teacher data The thinning unit 105, the learning unit 106, the image acquisition unit 500, the second auxiliary information reference unit 501, the second auxiliary information synthesis unit 502, the second image division unit 503, the inference unit 504, and the detection result integration. The functions of the unit 505 and the detection result output unit 506 are realized by software, firmware, or a combination of software and firmware. That is, the teacher data acquisition unit 100, the first auxiliary information reference unit 101, the first auxiliary information synthesis unit 102, the first image division unit 103, the statistic analysis unit 104, the teacher data thinning unit 105, and learning. Unit 106, image acquisition unit 500, second auxiliary information reference unit 501, second auxiliary information synthesis unit 502, second image division unit 503, inference unit 504, detection result integration unit 505, and detection result. The output unit 506 is realized by a processing circuit such as an HDD (Hard Disk Drive) 902, a CPU 905 that executes a program stored in a memory 906, or a system LSI (Large-Scale Integration). The programs stored in the HDD 902, the memory 906, or the like include the teacher data acquisition unit 100, the first auxiliary information reference unit 101, the first auxiliary information synthesis unit 102, the first image division unit 103, and statistics. Analysis unit 104, teacher data thinning unit 105, learning unit 106, image acquisition unit 500, second auxiliary information reference unit 501, second auxiliary information synthesis unit 502, second image division unit 503, and inference. It can also be said that the computer is made to execute the procedures and methods of the unit 504, the detection result integration unit 505, and the detection result output unit 506. Here, the memory 906 is, for example, a RAM (Random Access Memory), a ROM (Read Only Memory), a flash memory, an EPROM (Erasable Programmable Read Online Memory), an EEPROM (Electrically Memory), etc. This includes sexual or volatile semiconductor memories, magnetic disks, flexible disks, optical disks, compact disks, mini disks, DVDs (Digital Versailles Disc), and the like.

なお、学習装置１０において、教師データ取得部１００と、第１補助情報参照部１０１と、第１補助情報合成部１０２と、第１画像分割部１０３と、統計量解析部１０４と、教師データ間引き部１０５と、学習部１０６の機能について、一部を専用のハードウェアで実現し、一部をソフトウェアまたはファームウェアで実現するようにしてもよい。例えば、教師データ取得部１００と第１補助情報参照部１０１については専用のハードウェアとしての処理回路９０１でその機能を実現し、第１補助情報合成部１０２と、第１画像分割部１０３と、統計量解析部１０４と、教師データ間引き部１０５と、学習部１０６については処理回路がメモリ９０６に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。
また、推論装置５０において、画像取得部５００と、第２補助情報参照部５０１と、第２補助情報合成部５０２と、第２画像分割部５０３と、推論部５０４と、検出結果統合部５０５と、検出結果出力部５０６の機能について、一部を専用のハードウェアで実現し、一部をソフトウェアまたはファームウェアで実現するようにしてもよい。例えば、画像取得部５００と検出結果出力部５０６については専用のハードウェアとしての処理回路９０１でその機能を実現し、第２補助情報参照部５０１と、第２補助情報合成部５０２と、第２画像分割部５０３と、推論部５０４と、検出結果統合部５０５については処理回路がメモリ９０６に格納されたプログラムを読み出して実行することによってその機能を実現することが可能である。
また、学習装置１０および推論装置５０は、表示装置等の外部の装置との通信を行う、入力インタフェース装置９０３、および、出力インタフェース装置９０４を有する。In the learning device 10, the teacher data acquisition unit 100, the first auxiliary information reference unit 101, the first auxiliary information synthesis unit 102, the first image division unit 103, the statistic analysis unit 104, and the teacher data thinning out. Regarding the functions of the unit 105 and the learning unit 106, a part may be realized by dedicated hardware and a part may be realized by software or firmware. For example, the teacher data acquisition unit 100 and the first auxiliary information reference unit 101 are realized by the processing circuit 901 as dedicated hardware, and the first auxiliary information synthesis unit 102, the first image division unit 103, and the first image division unit 103. The functions of the statistic analysis unit 104, the teacher data thinning unit 105, and the learning unit 106 can be realized by the processing circuit reading and executing the program stored in the memory 906.
Further, in the inference device 50, the image acquisition unit 500, the second auxiliary information reference unit 501, the second auxiliary information synthesis unit 502, the second image division unit 503, the inference unit 504, and the detection result integration unit 505 , The function of the detection result output unit 506 may be partially realized by dedicated hardware and partly realized by software or firmware. For example, the image acquisition unit 500 and the detection result output unit 506 are realized by the processing circuit 901 as dedicated hardware, and the second auxiliary information reference unit 501, the second auxiliary information synthesis unit 502, and the second The functions of the image segmentation unit 503, the inference unit 504, and the detection result integration unit 505 can be realized by the processing circuit reading and executing the program stored in the memory 906.
Further, the learning device 10 and the inference device 50 include an input interface device 903 and an output interface device 904 that communicate with an external device such as a display device.

以上のように、実施の形態１に係る学習装置１０は、物体が撮像された教師画像、および、物体に対応する補助情報を取得する第１補助情報参照部１０１と、第１補助情報参照部１０１が取得した補助情報を教師画像に反映した合成教師画像を生成する第１補助情報合成部１０２と、第１補助情報合成部１０２が生成した合成教師画像を用いた学習により機械学習モデル４０を生成する学習部１０６を備えるように構成されている。そのため、補助情報を入力パラメータとする機械学習モデルを用いることなく、補助情報を考慮した機械学習モデルを生成することができる。
また、実施の形態１に係る物体検出装置１は、上述の学習装置１０と、対象画像、および、当該対象画像に対応する補助情報を取得する第２補助情報参照部５０１と、第２補助情報参照部５０１が取得した補助情報を対象画像に反映した合成対象画像を生成する第２補助情報合成部５０２と、機械学習モデル４０に合成対象画像を入力することにより物体を検出する推論部５０４を備えるように構成されている。そのため、補助情報を入力パラメータとする機械学習モデルを用いることなく、補助情報を考慮した物体の検出を行うことができる。As described above, in the learning device 10 according to the first embodiment, the first auxiliary information reference unit 101 for acquiring the teacher image captured by the object and the auxiliary information corresponding to the object, and the first auxiliary information reference unit The machine learning model 40 is created by learning using the first auxiliary information synthesis unit 102 that generates a composite teacher image that reflects the auxiliary information acquired by 101 in the teacher image and the composite teacher image generated by the first auxiliary information synthesis unit 102. It is configured to include a learning unit 106 to be generated. Therefore, it is possible to generate a machine learning model in consideration of auxiliary information without using a machine learning model in which auxiliary information is used as an input parameter.
Further, the object detection device 1 according to the first embodiment includes the above-mentioned learning device 10, a target image, a second auxiliary information reference unit 501 for acquiring auxiliary information corresponding to the target image, and a second auxiliary information. A second auxiliary information synthesis unit 502 that generates a synthesis target image that reflects the auxiliary information acquired by the reference unit 501 in the target image, and an inference unit 504 that detects an object by inputting the synthesis target image into the machine learning model 40. It is configured to be prepared. Therefore, it is possible to detect an object in consideration of the auxiliary information without using a machine learning model that uses the auxiliary information as an input parameter.

なお、本願発明はその発明の範囲内において、実施の形態の任意の構成要素の変形、もしくは実施の形態の任意の構成要素の省略が可能である。 In the present invention, it is possible to modify any component of the embodiment or omit any component of the embodiment within the scope of the invention.

この発明に係る物体検出装置は、補助情報を入力パラメータとする機械学習モデルを用いることなく、補助情報を考慮した機械学習モデルを生成することができるように構成したため、物体を検出するための機械学習モデルを生成する学習装置に適用することができる。 Since the object detection device according to the present invention is configured to be able to generate a machine learning model in consideration of auxiliary information without using a machine learning model in which auxiliary information is used as an input parameter, it is a machine for detecting an object. It can be applied to a learning device that generates a learning model.

１物体検出装置、１０学習装置、２０第１補助情報ＤＢ、４０機械学習モデル、５０推論装置、６０第２補助情報ＤＢ、１００教師データ取得部、１０１第１補助情報参照部、１０２第１補助情報合成部、１０３第１画像分割部、１０４統計量解析部、１０５教師データ間引き部、１０６学習部、５００画像取得部、５０１第２補助情報参照部、５０２第２補助情報合成部、５０３第２画像分割部、５０４推論部、５０５検出結果統合部、５０６検出結果出力部、９０１処理回路、９０２ＨＤＤ、９０３入力インタフェース装置、９０４出力インタフェース装置、９０５ＣＰＵ、９０６メモリ。 1 Object detection device, 10 Learning device, 20 1st auxiliary information DB, 40 Machine learning model, 50 Inference device, 60 2nd auxiliary information DB, 100 Teacher data acquisition unit, 101 1st auxiliary information reference unit, 102 1st auxiliary Information synthesis unit, 103 1st image division unit, 104 statistic analysis unit, 105 teacher data thinning unit, 106 learning unit, 500 image acquisition unit, 501 2nd auxiliary information reference unit, 502 2nd auxiliary information synthesis unit, 503rd 2 image division unit, 504 inference unit, 505 detection result integration unit, 506 detection result output unit, 901 processing circuit, 902 HDD, 903 input interface device, 904 output interface device, 905 CPU, 906 memory.

Claims

A teacher image in which an object is captured, a first auxiliary information reference unit that acquires auxiliary information corresponding to the object, and
A first auxiliary information synthesis unit that generates a composite teacher image that reflects the auxiliary information acquired by the first auxiliary information reference unit in the teacher image, and
A learning unit that generates a machine learning model by learning using a synthetic teacher image generated by the first auxiliary information synthesis unit, and a learning unit.
A first image segmentation unit that divides the composite teacher image generated by the first auxiliary information synthesis unit into a plurality of sub-teacher images, and a first image segmentation unit.
A statistic analysis unit that classifies a plurality of small teacher images divided by the first image segmentation unit into a plurality of categories, and a statistic analysis unit.
A teacher data thinning unit for thinning out the small teacher images belonging to each classification after being classified by the statistic analysis unit according to the number of small teacher images belonging to each classification is provided.
The learning unit generates the machine learning model by learning the teacher image after the teacher data thinning unit thins out.
A learning device characterized by that.

The learning according to claim 1, wherein the first auxiliary information synthesizing unit generates the synthesized teacher image by masking pixels corresponding to the auxiliary information on the teacher image based on the auxiliary information. Device.

An object detection device that detects the object from the target image using machine learning.
The learning device according to claim 1 and
A second auxiliary information reference unit that acquires the target image and auxiliary information corresponding to the target image, and a composite target image that reflects the auxiliary information acquired by the second auxiliary information reference unit in the target image are generated. An object detection device including a second auxiliary information synthesis unit and an inference device including an inference unit that detects the object by inputting the synthesis target image into the machine learning model.

The object according to claim 3, wherein the second auxiliary information synthesizing unit generates the composite target image by masking pixels corresponding to the auxiliary information on the target image based on the auxiliary information. Detection device.

A step in which the first auxiliary information reference unit acquires a teacher image in which an object is captured and auxiliary information corresponding to the object, and
A step in which the first auxiliary information synthesis unit generates a composite teacher image that reflects the auxiliary information acquired by the first auxiliary information reference unit in the teacher image.
A step in which the learning unit generates a machine learning model by learning using the synthetic teacher image generated by the first auxiliary information synthesis unit .
A step in which the first image segmentation unit divides the composite teacher image generated by the first auxiliary information synthesis unit into a plurality of sub-teacher images, and
A step in which the statistic analysis unit classifies a plurality of teacher images divided by the first image segmentation unit into a plurality of classifications.
The teacher data thinning unit includes a step of thinning out the small teacher images belonging to each classification after being classified by the statistic analysis unit according to the number of small teacher images belonging to each classification.
The learning unit has a step of generating the machine learning model by learning a teacher image after the teacher data thinning unit thins out.
A learning method characterized by that.