JP7138158B2

JP7138158B2 - OBJECT CLASSIFIER, OBJECT CLASSIFICATION METHOD, AND PROGRAM

Info

Publication number: JP7138158B2
Application number: JP2020217297A
Authority: JP
Inventors: 剛大槻; 貴史小野; 俊彦原田; 雅稔井藤
Original assignee: エヌ・ティ・ティ・コムウェア株式会社
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2022-09-15
Anticipated expiration: 2040-12-25
Also published as: JP2022102516A

Description

本発明は、物体分類装置、物体分類方法、およびプログラムに関する。 The present invention relates to an object classification device, an object classification method, and a program.

従来より、車載カメラによって撮像される画像を蓄積し、当該画像に含まれるマンホールや道路標示等の金属物を識別することが可能な技術が知られている。例えば、特許文献１に記載された物体認識装置は、蓄積された車載カメラからの画像から俯瞰画像を生成し、当該俯瞰画像に生じる輝度差を利用して路面上に存在する金属物を識別する。具体的に、物体認識装置は、デジタル画像処理の条件を設定し、車載カメラからの画像を俯瞰画像に変換し、俯瞰画像に含まれる金属物を検出し、金属物からの相対的な自車両の位置を特定している。 2. Description of the Related Art Conventionally, there has been known a technique capable of accumulating images captured by an in-vehicle camera and identifying metal objects such as manholes and road markings included in the images. For example, the object recognition device described in Patent Document 1 generates a bird's-eye view image from accumulated images from an in-vehicle camera, and identifies a metal object existing on the road surface by using the brightness difference occurring in the bird's-eye view image. . Specifically, the object recognition device sets the conditions for digital image processing, converts the image from the in-vehicle camera into a bird's-eye view image, detects metal objects included in the bird's-eye view image, and detects the relative position of the vehicle from the metal object. is located.

特開２００９－２６６００３号公報Japanese Patent Application Laid-Open No. 2009-266003

上述した物体認識装置のように車載カメラを用いてマンホール等や公共設備などを検出することが行われているが、情報の利用目的などに応じて、さらに高い精度で物体を分類することが望まれている。 As in the object recognition device described above, car-mounted cameras are used to detect manholes and public facilities. It is rare.

本発明は、上記の課題に鑑みてなされたものであって、物体を高い精度で分類することができる物体分類装置、物体分類方法、およびプログラムを提供することを目的としている。 SUMMARY OF THE INVENTION It is an object of the present invention to provide an object classification device, an object classification method, and a program capable of classifying objects with high accuracy.

（１）本発明の一態様は、撮像画像から物体領域を検出する検出部と、前記物体領域に含まれる物体を第１の要素に基づいて分類する第１の処理と、前記第１の処理により分類された物体を第２の要素に基づいて分類する第２の処理とを実行する分類部と、を備え、前記分類部は、前記第１の要素に基づく第１の分類結果を教師データとして機械学習され、前記物体領域の画像を入力した場合に前記物体領域の画像に含まれる前記第１の要素に基づく第１の分類結果を出力する第１のモデルを用いて前記第1の処理を実行し、前記第２の要素に基づく第２の分類結果を教師データとして機械学習され、前記物体領域の画像を入力した場合に前記物体領域の画像に含まれる前記第２の要素に基づく第２の分類結果を出力する第２のモデルを用いて前記第２の処理を実行し、前記第１の要素は、前記物体の模様および記号のうちの少なくとも一つであり、前記第２の要素は、前記物体の穴部の形状、位置、および大きさのうちの少なくとも一つである、物体分類装置である。
(1) One aspect of the present invention includes a detection unit that detects an object region from a captured image, a first process that classifies an object included in the object region based on a first element, and the first process. a classifying unit that classifies objects classified by the second element based on the second element, the classifying unit classifying the first classification result based on the first element as teacher data and the first processing using a first model that outputs a first classification result based on the first element included in the image of the object region when the image of the object region is input. is machine-learned using the second classification result based on the second element as teacher data, and when the image of the object area is input, the second classification based on the second element included in the image of the object area is performed. executing the second process using a second model that outputs 2 classification results, the first element being at least one of a pattern and a symbol of the object, and the second element is at least one of shape, location and size of a hole in said object .

（２）本発明の一態様は、上記の物体分類装置であって、前記分類部は、前記検出部により検出された前記物体領域を台形補正し、台形補正された物体領域に基づいて前記第１の処理および前記第２の処理を実行してよい。
( 2 ) An aspect of the present invention is the above-described object classification device, wherein the classification unit trapezoidally corrects the object region detected by the detection unit, and performs the trapezoidal correction based on the object region. 1 and the second process may be performed.

（３）本発明の一態様は、撮像画像から物体領域を検出する検出部と、前記物体領域に含まれる物体を、模様および記号のうちの少なくとも一つに基づいて分類する第１の分類部と、前記第１の分類部により分類された物体を、穴部の形状に基づいて分類する第２の分類部と、前記第１の分類部により分類された物体を、穴部の位置に基づいて分類する第３の分類部と、前記第２の分類部により分類された物体を大きさに基づいて分類し、前記第３の分類部により分類された物体を大きさに基づいて分類する第４の分類部と、を備え、前記第１の分類部は、前記模様および記号のうちの少なくとも一つに基づく第１の分類結果を教師データとして機械学習され、前記物体領域の画像に含まれる模様および記号のうちの少なくとも一つを入力した場合に第１の分類結果を出力する第１のモデルに基づく処理を行い、前記第２の分類部は、前記穴部の形状に基づく第２の分類結果を教師データとして機械学習され、前記第１の分類部による分類結果に基づく前記物体領域の画像を入力した場合に第２の分類結果を出力する第２のモデルに基づく処理を行い、前記第３の分類部は、前記穴部の位置に基づく第３の分類結果を教師データとして機械学習され、前記第１の分類部による分類結果に基づく前記物体領域の画像を入力した場合に第３の分類結果を出力する第３のモデルに基づく処理を行い、前記第４の分類部は、前記物体の大きさに基づいて、前記第２の分類部により分類された物体を分類し、前記第３の分類部により分類された物体を分類する、物体分類装置である。
( 3 ) An aspect of the present invention is a detection unit that detects an object region from a captured image, and a first classification unit that classifies objects included in the object region based on at least one of patterns and symbols. and a second classification unit that classifies the objects classified by the first classification unit based on the shape of the hole; and the objects classified by the first classification unit based on the position of the hole. a third classifying unit that classifies the objects classified by the second classifying unit based on size; and a third classifying unit that classifies the objects classified by the third classifying unit based on size. 4 classification units, wherein the first classification unit performs machine learning using a first classification result based on at least one of the pattern and the symbol as teacher data, and is included in the image of the object region. A process based on a first model that outputs a first classification result when at least one of a pattern and a symbol is input, and the second classification unit performs a second classification based on the shape of the hole. performing machine learning using the classification result as teacher data, performing processing based on a second model for outputting a second classification result when an image of the object region based on the classification result by the first classification unit is input, A third classification unit performs machine learning using a third classification result based on the position of the hole as teacher data, and when an image of the object region based on the classification result of the first classification unit is input, a third classification result is obtained. The fourth classification unit classifies the objects classified by the second classification unit based on the size of the objects, It is an object classification device that classifies objects classified by three classification units .

（４）本発明の一態様は、撮像画像から物体領域を検出するステップと、前記物体領域に含まれる物体を第１の要素に基づいて分類する第１の処理と、前記第１の処理により分類された物体を第２の要素に基づいて分類する第２の処理とを実行するステップと、を含み、前記第１の要素に基づく第１の分類結果を教師データとして機械学習され、前記物体領域の画像を入力した場合に前記物体領域の画像に含まれる前記第１の要素に基づく第１の分類結果を出力する第１のモデルを用いて前記第1の処理を実行し、前記第２の要素に基づく第２の分類結果を教師データとして機械学習され、前記物体領域の画像を入力した場合に前記物体領域の画像に含まれる前記第２の要素に基づく第２の分類結果を出力する第２のモデルを用いて前記第２の処理を実行し、前記第１の要素は、前記物体の模様および記号のうちの少なくとも一つであり、前記第２の要素は、前記物体の穴部の形状、位置、および大きさのうちの少なくとも一つである、物体分類方法である。
( 4 ) One aspect of the present invention is a step of detecting an object region from a captured image, a first process of classifying an object included in the object region based on a first element, and and a second process of classifying the classified objects based on a second element, wherein machine learning is performed using a first classification result based on the first element as teacher data, and the object is executing the first processing using a first model that outputs a first classification result based on the first element contained in the image of the object region when an image of the region is input; Machine learning is performed using a second classification result based on the elements of as teacher data, and when an image of the object area is input, a second classification result based on the second element contained in the image of the object area is output. performing the second process using a second model, wherein the first element is at least one of a pattern and a symbol of the object; and the second element is a hole portion of the object; is at least one of the shape, position, and size of the object classification method.

（５）本発明の一態様は、撮像画像から物体領域を検出するステップと、前記物体領域に含まれる物体を、模様および記号のうちの少なくとも一つに基づいて分類する第１分類ステップと、前記第１分類ステップにより分類された物体を、穴部の形状に基づいて分類する第２分類ステップと、前記第１分類ステップにより分類された物体を、穴部の位置に基づいて分類する第３分類ステップと、前記第２分類ステップにより分類された物体を大きさに基づいて分類し、前記第３分類ステップにより分類された物体を大きさに基づいて分類する第４分類ステップと、を含み、前記第１分類ステップは、前記模様および記号のうちの少なくとも一つに基づく第１の分類結果を教師データとして機械学習され、前記物体領域の画像に含まれる模様および記号のうちの少なくとも一つを入力した場合に第１の分類結果を出力する第１のモデルに基づく処理を行い、前記第２分類ステップは、前記穴部の形状に基づく第２の分類結果を教師データとして機械学習され、前記第１分類ステップによる分類結果に基づく前記物体領域の画像を入力した場合に第２の分類結果を出力する第２のモデルに基づく処理を行い、前記第３分類ステップは、前記穴部の位置に基づく第３の分類結果を教師データとして機械学習され、前記第１分類ステップによる分類結果に基づく前記物体領域の画像を入力した場合に第３の分類結果を出力する第３のモデルに基づく処理を行い、前記第４分類ステップは、前記物体の大きさに基づいて、前記第２分類ステップにより分類された物体を分類し、前記第３分類ステップにより分類された物体を分類する、物体分類方法である。
( 5 ) An aspect of the present invention is a step of detecting an object region from a captured image, a first classification step of classifying an object included in the object region based on at least one of a pattern and a symbol, a second classification step of classifying the objects classified by the first classification step based on the shape of the hole; and a third classification step of classifying the objects classified by the first classification step based on the position of the hole. and a fourth classification step of classifying the objects classified by the second classification step based on size, and classifying the objects classified by the third classification step based on size. , in the first classification step, machine learning is performed using a first classification result based on at least one of the pattern and the symbol as teacher data, and at least one of the pattern and the symbol included in the image of the object region is performed based on a first model for outputting a first classification result when is input, and in the second classification step, machine learning is performed using the second classification result based on the shape of the hole as teacher data, Processing is performed based on a second model for outputting a second classification result when an image of the object region based on the classification result of the first classification step is input, and the third classification step includes: A process based on a third model, which is machine-learned using a third classification result based on the above as teacher data, and outputs a third classification result when an image of the object region based on the classification result of the first classification step is input. wherein the fourth classification step classifies the objects classified by the second classification step and classifies the objects classified by the third classification step based on the size of the objects. is.

（６）本発明の一態様は、コンピュータに、撮像画像から物体領域を検出するステップと、前記物体領域に含まれる物体を第１の要素に基づいて分類する第１の処理と、前記第１の処理により分類された物体を第２の要素に基づいて分類する第２の処理とを実行するステップと、を含む処理を実行させ、前記第１の要素に基づく第１の分類結果を教師データとして機械学習され、前記物体領域の画像を入力した場合に前記物体領域の画像に含まれる前記第１の要素に基づく第１の分類結果を出力する第１のモデルを用いて前記第1の処理を実行させ、前記第２の要素に基づく第２の分類結果を教師データとして機械学習され、前記物体領域の画像を入力した場合に前記物体領域の画像に含まれる前記第２の要素に基づく第２の分類結果を出力する第２のモデルを用いて前記第２の処理を実行させ、前記第１の要素は、前記物体の模様および記号のうちの少なくとも一つであり、前記第２の要素は、前記物体の穴部の形状、位置、および大きさのうちの少なくとも一つである、プログラムである。
( 6 ) An aspect of the present invention provides a computer with a step of detecting an object region from a captured image, a first process of classifying an object included in the object region based on a first element, the first a second process of classifying the objects classified by the process of (1) based on a second element, and the first classification result based on the first element is converted to teacher data and the first processing using a first model that outputs a first classification result based on the first element included in the image of the object region when the image of the object region is input. is performed, machine learning is performed using the second classification result based on the second element as teacher data, and when the image of the object area is input, the second classification based on the second element included in the image of the object area is performed. executing the second process using a second model that outputs two classification results, the first element being at least one of a pattern and a symbol of the object, and the second element is at least one of shape, position and size of a hole in said object .

（７）本発明の一態様は、コンピュータに、撮像画像から物体領域を検出するステップと、前記物体領域に含まれる物体を、模様および記号のうちの少なくとも一つに基づいて分類する第１分類ステップと、前記第１分類ステップにより分類された物体を、穴部の形状に基づいて分類する第２分類ステップと、前記第１分類ステップにより分類された物体を、穴部の位置に基づいて分類する第３分類ステップと、前記第２分類ステップにより分類された物体を大きさに基づいて分類し、前記第３分類ステップにより分類された物体を大きさに基づいて分類する第４分類ステップと、を含む処理を実行させるプログラムであって、前記第１分類ステップは、前記模様および記号のうちの少なくとも一つに基づく第１の分類結果を教師データとして機械学習され、前記物体領域の画像に含まれる模様および記号のうちの少なくとも一つを入力した場合に第１の分類結果を出力する第１のモデルに基づく処理を行い、前記第２分類ステップは、前記穴部の形状に基づく第２の分類結果を教師データとして機械学習され、前記第１分類ステップによる分類結果に基づく前記物体領域の画像を入力した場合に第２の分類結果を出力する第２のモデルに基づく処理を行い、前記第３分類ステップは、前記穴部の位置に基づく第３の分類結果を教師データとして機械学習され、前記第１分類ステップによる分類結果に基づく前記物体領域の画像を入力した場合に第３の分類結果を出力する第３のモデルに基づく処理を行い、前記第４分類ステップは、前記物体の大きさに基づいて、前記第２分類ステップにより分類された物体を分類し、前記第３分類ステップにより分類された物体を分類する、プログラムである。 ( 7 ) An aspect of the present invention provides a computer with a step of detecting an object area from a captured image, and classifying an object included in the object area based on at least one of a pattern and a symbol. a second classification step of classifying the objects classified by the first classification step based on the shape of the hole; and classifying the objects classified by the first classification step based on the position of the hole. a fourth classification step of classifying the objects classified by the second classification step based on size, and classifying the objects classified by the third classification step based on size; wherein the first classification step is machine-learned using a first classification result based on at least one of the pattern and the symbol as teacher data, and is included in the image of the object region. performing processing based on a first model for outputting a first classification result when at least one of a pattern and a symbol is input, wherein the second classification step includes a second classification based on the shape of the hole; performing machine learning using the classification result as teacher data, performing processing based on a second model for outputting a second classification result when an image of the object region based on the classification result of the first classification step is input, and In the third classification step, machine learning is performed using the third classification result based on the position of the hole as teacher data, and when the image of the object region based on the classification result in the first classification step is input, the third classification result is and the fourth classification step classifies the objects classified by the second classification step based on the size of the objects, and classifies the objects classified by the third classification step It is a program that classifies objects that have been drawn .

本発明の一態様によれば、物体を高い精度で分類することができる。 According to one aspect of the present invention, objects can be classified with high accuracy.

実施形態の物体検出システムの一構成例を示すブロック図である。1 is a block diagram showing one configuration example of an object detection system according to an embodiment; FIG. 実施形態の物体検出装置における処理手順の一例を示すフローチャートである。4 is a flow chart showing an example of a processing procedure in the object detection device of the embodiment; 実施形態における撮像画像の一例を示す図である。It is a figure which shows an example of the captured image in embodiment. 実施形態の検出部の一例を示すブロック図である。It is a block diagram which shows an example of the detection part of embodiment. 実施形態における隠れ判定処理の一例を示す図である。It is a figure which shows an example of the hiding determination process in embodiment. 実施形態における台形補正について説明するための図であり、（ａ）は補正前の領域を示し、（ｂ）は補正後の領域を示す。It is a figure for explaining keystone correction in an embodiment, (a) shows a field before correction, and (b) shows a field after correction. 実施形態において物体領域を関数により領域に補正（変換）することを示す図である。FIG. 4 is a diagram illustrating correcting (converting) an object region to a region by a function in an embodiment; 実施形態における台形補正された領域から真円領域へ変換する処理を説明するための図であり、（ａ）は物体領域Ａを示し、（ｂ）は領域Ａ’を示し、（ｃ）は真円領域Ａ’’を示す。FIG. 4 is a diagram for explaining the process of converting from a trapezoidally corrected area to a perfect circular area in the embodiment, where (a) shows an object area A, (b) shows an area A′, and (c) shows a true circular area. A circular area A'' is shown. 実施形態における撮像画像の一例を示す図である。It is a figure which shows an example of the captured image in embodiment. 実施形態における撮像画像の一例を示す図であり、（ａ）は物体全体を含む物体領域を示し、（ｂ）は見切れた物体領域を示す。It is a figure which shows an example of the captured image in embodiment, (a) shows the object area|region containing the whole object, (b) shows the object area|region which was cut off. 物体の中心点を求めるときの課題を説明するための図であり、（ａ）は撮像画像の一例であり、（ｂ）は物体領域の平面図である。FIG. 10A is a diagram for explaining a problem when obtaining the center point of an object, FIG. 1A is an example of a captured image, and FIG. 1B is a plan view of an object region; 実施形態における矩形から台形への変換を説明するための図であり、（ａ）は矩形を示し、（ｂ）は台形を示す。It is a figure for demonstrating conversion from the rectangle to a trapezoid in embodiment, (a) shows a rectangle and (b) shows a trapezoid. 実施形態における台形補正について説明するための図であり、（ａ）は補正前の矩形を示し、（ｂ）は補正後の台形を示す。It is a figure for explaining keystone correction in an embodiment, (a) shows a rectangle before correction, and (b) shows a trapezoid after correction. 実施形態において矩形の中心点を台形の中心点に変換することを示す図である。FIG. 10 is a diagram illustrating transforming the center point of a rectangle to the center point of a trapezoid in an embodiment; 右側画像および左側画像の一例を示す図である。It is a figure which shows an example of a right side image and a left side image. 実施形態において右側画像の物体領域と左側画像の物体領域との一致性を判定する処理を説明するための図である。FIG. 10 is a diagram for explaining processing for determining matching between an object region in a right image and an object region in a left image in the embodiment; 実施形態におけるマンホールの一例を示す平面図である。It is a top view showing an example of a manhole in an embodiment. 実施形態における分類部の一構成例を示すブロック図である。It is a block diagram which shows one structural example of the classification|category part in embodiment.

以下、本発明を適用した物体分類装置、物体分類方法、およびプログラムを、図面を参照して説明する。 An object classification device, an object classification method, and a program to which the present invention is applied will be described below with reference to the drawings.

［実施形態の概要］
実施形態の物体検出システム１は、一または複数の車載装置１０から送信された撮像画像を収集し、収集した撮像画像に含まれた物体を検出する。物体検出システム１は、検出した物体の位置を推定する処理や、検出した物体を分類する処理を実行する。物体検出システム１は、物体の位置情報や物体の分類情報を含む情報を作成し、作成した情報を他のシステム（不図示）に提供する。以下の実施形態において、撮像画像は、例えば、車両の右前方部に設置されたカメラにより撮像された右側画像、および車両の左前方部に設置されたカメラにより撮像された左側画像を含む。撮像画像は、右側画像および左側画像に限定されず、車両に搭載された１台のカメラにより撮像された画像であってもよく、車両の後方を撮像した画像であってもよく、さらには、車両に搭載されていない種々のカメラにより撮像された画像であってもよい。以下の実施形態において、物体がマンホールであることについて説明するが、物体検出システム１は、これに限定されず、位置の推定や、物体の分類の対象となりうる種々の物体に適用可能である。例えば、物体検出システム１は、道路に沿って設けられている電柱等にも適用可能である。 [Overview of embodiment]
The object detection system 1 of the embodiment collects captured images transmitted from one or more in-vehicle devices 10 and detects an object included in the collected captured images. The object detection system 1 performs a process of estimating the position of a detected object and a process of classifying the detected object. The object detection system 1 creates information including object position information and object classification information, and provides the created information to other systems (not shown). In the following embodiments, captured images include, for example, a right image captured by a camera installed in the front right portion of the vehicle and a left image captured by a camera installed in the front left portion of the vehicle. The captured image is not limited to the right side image and the left side image, and may be an image captured by a single camera mounted on the vehicle, or may be an image captured behind the vehicle. Images captured by various cameras that are not mounted on the vehicle may also be used. In the following embodiments, the object is a manhole, but the object detection system 1 is not limited to this, and can be applied to various objects that can be targets for position estimation and object classification. For example, the object detection system 1 can also be applied to utility poles and the like provided along roads.

［物体検出システム１の構成］
図１は、実施形態の物体検出システム１の一構成例を示すブロック図である。物体検出システム１は、例えば、一又は複数の車載装置１０と、データ収集装置２０と、物体検出装置１００とを備える。車載装置１０、データ収集装置２０、および物体検出装置１００は、例えば、通信ネットワークに接続される。通信ネットワークに接続される各装置は、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）や無線通信モジュールなどの通信インターフェースを備えている（図１では不図示）。通信ネットワークは、例えば、インターネット、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、セルラー網などを含む。 [Configuration of object detection system 1]
FIG. 1 is a block diagram showing a configuration example of an object detection system 1 according to an embodiment. The object detection system 1 includes, for example, one or more in-vehicle devices 10, a data collection device 20, and an object detection device 100. The in-vehicle device 10, the data collection device 20, and the object detection device 100 are connected to, for example, a communication network. Each device connected to the communication network has a communication interface such as a NIC (Network Interface Card) or a wireless communication module (not shown in FIG. 1). Communication networks include, for example, the Internet, WANs (Wide Area Networks), LANs (Local Area Networks), cellular networks, and the like.

車載装置１０は、カメラおよびデータ送信装置等を備え、右側画像および左側画像を含む左右画像をデータ収集装置２０に送信する。データ収集装置２０は、右側画像および左側画像を収集し、当該右側画像および左側画像を撮像した位置情報に対応付けて記憶する。なお、以下の説明において、右側画像および左側画像を総称して撮像画像と記載する。 The in-vehicle device 10 includes a camera, a data transmission device, and the like, and transmits left and right images including a right image and a left image to the data collection device 20 . The data collection device 20 collects the right image and the left image, and stores the right image and the left image in association with the captured position information. In the following description, the right image and the left image are collectively referred to as captured images.

物体検出装置１００は、例えば、検出部１１０と、位置推定部１２０と、分類部１３０と、情報提供部１４０とを備える。検出部１１０、位置推定部１２０、分類部１３０、および情報提供部１４０といった機能部は、例えばＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）等のプロセッサがプログラムメモリに格納されたプログラムを実行することにより実現される。また、これらの機能部のうち一部または全部は、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、またはＦＰＧＡ（Ｆｉｅｌｄ-ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等のハードウェアにより実現されてもよいし、ソフトウェアとハードウェアが協働することで実現されてもよい。プログラムは、例えば、予め物体検出装置１００のＨＤＤやフラッシュメモリなどの記憶装置に格納される。 The object detection device 100 includes, for example, a detection unit 110, a position estimation unit 120, a classification unit 130, and an information provision unit 140. Functional units such as the detection unit 110, the position estimation unit 120, the classification unit 130, and the information provision unit 140 are implemented by a processor such as a CPU (Central Processing Unit) executing a program stored in a program memory. Some or all of these functional units may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), or FPGA (Field-Programmable Gate Array), It may be realized by cooperation of software and hardware. The program is stored in advance in a storage device such as the HDD or flash memory of the object detection device 100, for example.

検出部１１０は、撮像画像に含まれる物体領域を検出する。物体領域は、撮像画像のうち物体を表す領域を示す情報である。位置推定部１２０は、物体の位置を推定する。分類部１３０は、物体を分類する。情報提供部１４０は、位置推定部１２０により推定された物体の位置、および分類部１３０により求められた物体の分類を含む情報を提供する。なお、情報提供部１４０は、右側画像および左側画像のそれぞれで推定された位置および分類を含む情報を提供する。 The detection unit 110 detects an object area included in the captured image. The object area is information indicating an area representing an object in the captured image. The position estimator 120 estimates the position of the object. Classification unit 130 classifies objects. The information provider 140 provides information including the position of the object estimated by the position estimator 120 and the classification of the object determined by the classifier 130 . Note that the information providing unit 140 provides information including the estimated positions and classifications of each of the right and left images.

［物体検出システム１の処理］
図２は、実施形態の物体検出装置１００における処理手順の一例を示すフローチャートである。物体検出装置１００は、先ず、データ収集装置２０から処理対象の撮像画像を取得する（ステップＳ１００）。次に物体検出装置１００は、撮像画像から検出された物体の一部が他の物体により隠れているか否かを判定する（ステップＳ１０２）。物体検出装置１００は、撮像画像から検出された物体の一部が他の物体により隠れている場合、検出された物体を以降の処理対象から除外する。次に物体検出装置１００は、処理対象の物体の位置を推定する（ステップＳ１０４）。実施形態の位置推定処理は、撮像画像内における物体の位置を推定する。次に物体検出装置１００は、右側画像を用いて位置が推定された物体と、左側画像を用いて位置が推定された物体とが一致するか否かを判定する（ステップＳ１０６）。次に物体検出装置１００は、撮像画像から検出された物体を分類する（ステップＳ１０８）。次に物体検出装置１００は、物体の位置、および分類を含む情報を提供する（ステップＳ１１０）。物体検出装置１００は、例えば、一致性が判定された物体ごとに、位置および分類を含む情報を提供する。位置情報の提供処理は、位置推定処理により推定された撮像画像内の位置および当該撮像画像に対応した緯度経度を含む位置情報を提供してもよく、撮像画像に対応した緯度経度および撮像画像内の位置に基づいて計算した物体の緯度経度情報を提供してもよい。なお、実施形態は、隠れ判定処理（ステップＳ１０２）、位置推定処理（ステップＳ１０４）、および一致性判定処理（ステップＳ１０６）の順で処理を実行したが、これに限定されず、どの順序で実行してもよい。 [Processing of object detection system 1]
FIG. 2 is a flow chart showing an example of a processing procedure in the object detection device 100 of the embodiment. The object detection device 100 first acquires a captured image to be processed from the data acquisition device 20 (step S100). Next, the object detection apparatus 100 determines whether or not part of the object detected from the captured image is hidden by another object (step S102). Object detection apparatus 100 excludes the detected object from subsequent processing when a part of the object detected from the captured image is hidden by another object. Next, the object detection device 100 estimates the position of the object to be processed (step S104). Position estimation processing of the embodiment estimates the position of an object within a captured image. Next, the object detection apparatus 100 determines whether or not the object whose position is estimated using the right image matches the object whose position is estimated using the left image (step S106). Next, the object detection device 100 classifies the objects detected from the captured image (step S108). Object detection device 100 then provides information including the position and classification of the object (step S110). The object detection device 100 provides information including, for example, the location and classification for each object for which matching is determined. The position information providing process may provide position information including the position in the captured image estimated by the position estimation process and the latitude and longitude corresponding to the captured image. may provide latitude and longitude information for the object calculated based on the location of the object. In the embodiment, the processes are executed in the order of the hidden determination process (step S102), the position estimation process (step S104), and the match determination process (step S106). You may

なお、物体検出装置１００は、物体の位置の推定を行い、物体の分類を行わなくてもよい。この場合、物体検出装置１００は、物体の位置を含み分類を含まない情報を提供する。また、物体検出装置１００は、物体の分類を行い、物体の位置の推定を行わなくてもよい。この場合、物体検出装置１００は、物体の分類を含み位置を含まない情報を提供してもよい。 Note that the object detection apparatus 100 may estimate the position of the object and not classify the object. In this case, the object detection device 100 provides information that includes the position of the object and does not include the classification. Further, the object detection apparatus 100 may classify the objects and not estimate the position of the objects. In this case, the object detection device 100 may provide information that includes the classification of the object and not the location.

［隠れ判定処理（ステップＳ１０２）］
以下、隠れ判定処理（ステップＳ１０２）について説明する。図３は、実施形態における撮像画像の一例を示す図である。図３に示した撮像画像は、車両の右側に設けられたカメラにより撮像された画像である。撮像画像には、２つのマンホールに対応した２つの物体領域Ａ１，Ａ２が含まれる。 [Hidden determination process (step S102)]
The hiding determination process (step S102) will be described below. FIG. 3 is a diagram showing an example of a captured image according to the embodiment. The captured image shown in FIG. 3 is an image captured by a camera provided on the right side of the vehicle. The captured image includes two object areas A1 and A2 corresponding to two manholes.

図４は、実施形態の検出部１１０の一例を示すブロック図である。検出部１１０は、検出処理部１１２と、モデル構築部１１４とを備える。検出処理部１１２は、検出用モデル１１２Ａに撮像画像を入力し、検出用モデル１１２Ａから物体領域の推定結果を出力する。物体領域の推定結果には物体領域の形状および位置を示す情報が含まれる。検出用モデルは、機械学習モデルであり、例えば、畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）である。 FIG. 4 is a block diagram showing an example of the detection unit 110 of the embodiment. The detection unit 110 includes a detection processing unit 112 and a model construction unit 114 . The detection processing unit 112 inputs the captured image to the detection model 112A, and outputs the estimation result of the object region from the detection model 112A. The object region estimation result includes information indicating the shape and position of the object region. The detection model is a machine learning model, such as a convolutional neural network (CNN).

モデル構築部１１４は、検出用モデル１１２Ａを構築する。モデル構築部１１４は、教師データとして撮像画像および物体領域を示す情報を取得する。モデル構築部１１４は、学習時において、検出用モデル１１２Ａに撮像画像を入力する。検出用モデル１１２Ａは、撮像画像が入力された場合に、撮像画像に含まれる物体領域Ａの物体領域を示す推定結果を出力する。モデル構築部１１４は、検出用モデル１１２Ａから出力された物体領域を示す情報が、教師データとしての物体領域を示す情報と一致するように検出用モデル１１２Ａの処理パラメータを再帰的に更新する。処理パラメータは、例えば、畳み込みニューラルネットワークにおける、層数、各層のノード数、各層間のノードの結合方式、活性化関数、誤差関数、及び勾配降下アルゴリズム、プーリングの領域、カーネル、重み係数、および重み行列の少なくとも一つである。これにより、モデル構築部１１４は、処理パラメータを取得するために、例えば、深層学習を行う。深層学習とは、多層構造、特に３層以上のニューラルネットワークを用いた機械学習である。なお、モデル構築部１１４は、検出部１１０に含まれていなくてもよく、物体検出装置１００の初期設定時やメンテナンス時に検出用モデル１１２Ａを検出処理部１１２に導入することができればよい。検出用モデル１１２Ａは、推定時において、撮影画像を入力し、物体領域の形状や位置を示す推定結果を出力する。検出処理部１１２は、例えば推定結果に対応したタグを付与した撮影画像を出力する。 The model construction unit 114 constructs the detection model 112A. The model construction unit 114 acquires information indicating a captured image and an object region as teacher data. The model construction unit 114 inputs a captured image to the detection model 112A during learning. When a captured image is input, the detection model 112A outputs an estimation result indicating the object area of the object area A included in the captured image. The model construction unit 114 recursively updates the processing parameters of the detection model 112A so that the information indicating the object region output from the detection model 112A matches the information indicating the object region as teacher data. The processing parameters are, for example, the number of layers, the number of nodes in each layer, the method of connecting nodes between each layer, the activation function, the error function, the gradient descent algorithm, the pooling area, the kernel, the weight coefficient, and the weight in the convolutional neural network. At least one of the matrices. Thereby, the model building unit 114 performs deep learning, for example, in order to acquire the processing parameters. Deep learning is machine learning using a multi-layered structure, particularly a neural network with three or more layers. Note that the model construction unit 114 may not be included in the detection unit 110 as long as the detection model 112A can be introduced into the detection processing unit 112 at the time of initial setting or maintenance of the object detection device 100 . At the time of estimation, the detection model 112A receives a photographed image and outputs an estimation result indicating the shape and position of the object region. The detection processing unit 112 outputs a photographed image to which a tag corresponding to the estimation result is added, for example.

図５は、実施形態における隠れ判定処理の一例を示す図である。検出部１１０は、検出処理部１１２により取得された物体領域を用いて、隠れ判定処理（ステップＳ１０２）を行う。例えば図５（ａ）に示すように、マンホールに対応した物体領域Ａの一部が、障害物に対応した障害物領域Ｂによって隠れている場合がある。検出部１１０は、障害物領域Ｂによって一部が隠れている物体領域Ａを処理対象から除外するために、以下の処理を行う。 FIG. 5 is a diagram illustrating an example of hidden determination processing in the embodiment. The detection unit 110 uses the object area acquired by the detection processing unit 112 to perform the hidden determination process (step S102). For example, as shown in FIG. 5(a), a part of the object area A corresponding to the manhole may be hidden by the obstacle area B corresponding to the obstacle. The detection unit 110 performs the following processing in order to exclude the object region A partially hidden by the obstacle region B from the processing target.

まず、検出部１１０は、図５（ｂ）に示す物体領域Ａを検出する。次に検出部１１０は、物体領域Ａに対してマンホールを真上からみた領域になるように台形補正を行うことで、図５（ｃ）に示すような領域Ａ’に変換する。次に検出部１１０は、領域Ａ’を、図５（ｄ）に示す真円領域Ａ’’に変換する。次に検出部１１０は、領域Ａ’と真円領域Ａ’’との面積比率を算出する。次に検出部１１０は、算出した面積比率と所定値との比較に基づいて物体領域Ａが障害物領域Ｂによって隠れているか否かを判定する。以下、各処理について説明する。 First, the detection unit 110 detects an object area A shown in FIG. 5(b). Next, the detection unit 110 converts the object area A into an area A' as shown in FIG. Next, the detection unit 110 transforms the area A' into a perfect circular area A'' shown in FIG. 5(d). Next, the detection unit 110 calculates the area ratio between the area A' and the perfect circular area A''. Next, the detection unit 110 determines whether or not the object area A is hidden by the obstacle area B based on the comparison between the calculated area ratio and a predetermined value. Each process will be described below.

（物体領域Ａから領域Ａ’への変換処理）
図６は、実施形態における台形補正について説明するための図であり、（ａ）は補正前の領域Ａを示し、（ｂ）は補正後の領域Ａ’を示す。図６（ａ）において、（ｘ０，ｙ０）は撮像画像の収束点の座標である。（ｘ１，ｙ１）、（ｘ２，ｙ２）、（ｘ３，ｙ３）、および（ｘ４，ｙ４）は物体領域Ａを囲む矩形の端部座標である。ｗは２つの縦方向中間点(ｘｃ１，ｙｃ１)および(ｘｃ２，ｙｃ２)を結んだ物体領域Ａの幅である。（ｘ１’，ｙ１’）、（ｘ２’，ｙ２’）、（ｘ３’，ｙ３’）、および（ｘ４’，ｙ４’）は物体領域Ａと接する台形領域の端部座標である。台形領域は、Ｘ方向に平行な２本の底辺と、縦方向中間点(ｘｃ１，ｙｃ１)と収束点とを結ぶ直線と、縦方向中間点(ｘｃ２，ｙｃ２)と収束点とを結ぶ直線とにより定義される。図６（ｂ）において、補正後の領域Ａ’は、ｗを直径とした円領域である。補正後の領域Ａ’は、正方形領域に接する。正方形領域は、（ｘ１’’，ｙ１’’）、（ｘ２’’，ｙ２’’）、（ｘ３’’，ｙ３’’）、および（ｘ４’’，ｙ４’’）を持つ。 (Conversion processing from object area A to area A')
FIG. 6 is a diagram for explaining trapezoidal correction according to the embodiment, in which (a) shows area A before correction and (b) shows area A′ after correction. In FIG. 6A, (x0, y0) are the coordinates of the convergence point of the captured image. (x1, y1), (x2, y2), (x3, y3), and (x4, y4) are the edge coordinates of a rectangle enclosing object region A. w is the width of the object area A connecting the two longitudinal midpoints (xc1, yc1) and (xc2, yc2). (x1', y1'), (x2', y2'), (x3', y3'), and (x4', y4') are the edge coordinates of the trapezoidal region in contact with the object region A. The trapezoidal area has two bases parallel to the X direction, a straight line connecting the vertical midpoint (xc1, yc1) and the convergence point, and a straight line connecting the vertical midpoint (xc2, yc2) and the convergence point. defined by In FIG. 6B, the corrected area A' is a circular area with a diameter w. The area A' after correction touches the square area. The square regions have (x1'', y1''), (x2'', y2''), (x3'', y3''), and (x4'', y4'').

検出部１１０は、物体領域Ａの端部座標（ｘ１，ｙ１）、（ｘ２，ｙ２）、（ｘ３，ｙ３）、および（ｘ４，ｙ４）と、収束点（ｘ０，ｙ０）とを用いて、台形領域の端部座標（ｘ１’，ｙ１’）、（ｘ２’，ｙ２’）、（ｘ３’，ｙ３’）、および（ｘ４’，ｙ４’）を求める。検出部１１０は、例えば、下記の式１を用いる。

Using the edge coordinates (x1, y1), (x2, y2), (x3, y3), and (x4, y4) of the object region A and the convergence point (x0, y0), the detection unit 110 Find the edge coordinates (x1', y1'), (x2', y2'), (x3', y3'), and (x4', y4') of the trapezoidal region. The detection unit 110 uses, for example, Equation 1 below.

次に検出部１１０は、正方形領域の座標（ｘ１’’，ｙ１’’）、（ｘ２’’，ｙ２’’）、（ｘ３’’，ｙ３’’）、および（ｘ４’’，ｙ４’’）を求める。検出部１１０は、例えば、下記の式２を用いる。

Next, the detection unit 110 detects the coordinates (x1'', y1''), (x2'', y2''), (x3'', y3''), and (x4'', y4'') of the square area. ). The detection unit 110 uses, for example, Equation 2 below.

次に検出部１１０は、台形領域を正方形領域に射影変換する公式により、台形補正の関数Ｐ１を求める。検出部１１０は、例えば、下記の式３および式４のように定義される関数Ｐ１を求める。式３は、関数Ｐ１により台形領域の座標（ｘｉ’,ｙｉ’）を変換することで、正方形領域の座標（ｘｉ’’,ｙｉ’’）を求める式である。式４は、Ｐ１を定義する行列である。

Next, the detection unit 110 obtains a trapezoidal correction function P1 from a formula for projectively transforming a trapezoidal area into a square area. The detection unit 110 obtains a function P1 defined, for example, by Equations 3 and 4 below. Equation 3 is a formula for obtaining the coordinates (xi'', yi'') of the square area by transforming the coordinates (xi', yi') of the trapezoidal area using the function P1. Equation 4 is the matrix that defines P1.

次に検出部１１０は、物体領域Ａを関数Ｐ１に代入することで、台形補正した領域Ａ’を求めることができる。図７は、実施形態において物体領域Ａを関数Ｐ１により領域Ａ‘に補正（変換）することを示す図である。 Next, the detection unit 110 can obtain a keystone-corrected area A' by substituting the object area A into the function P1. FIG. 7 is a diagram showing correction (conversion) of an object area A to an area A' by a function P1 in the embodiment.

（領域Ａ’から真円領域Ａ’’への変換処理）
図８は、実施形態における台形補正された領域Ａ’から真円領域Ａ’’へ変換する処理を説明するための図であり、（ａ）は物体領域Ａを示し、（ｂ）は領域Ａ’を示し、（ｃ）は真円領域Ａ’’を示す。
先ず検出部１１０は、領域ＡのＸ方向最小点ｚ１、Ｘ方向最大点ｚ３、Ｙ方向最小点ｚ４、およびＹ方向最大点ｚ２を検出する。これらの検出した点ｚ１～ｚ４は、領域Ａ’の外周点の座標である。次に検出部１１０は、外周点を用いてＸ方向中間点（幅方向の中心）およびＹ方向中間点（高さ方向の中心）を求め、幅方向の中心および高さ方向の中心である領域Ａの中心点ｚ０を求める。次に検出部１１０は、Ｘ方向最小点ｚ１、Ｘ方向最大点ｚ３、Ｙ方向最小点ｚ２、Ｙ方向最大点ｚ４、および中心点ｚ０を、関数Ｐ１を用いて台形補正する。すなわち、検出部１１０は、ｚｎ’＝Ｐ１（ｚｎ）を用いて、台形補正後のｚｎ’を求める。次に検出部１１０は、中心点ｚ０’と外周点ｚ１’との距離ｄ１、中心点ｚ０’と外周点ｚ２’との距離ｄ２、中心点ｚ０’と外周点ｚ３’との距離ｄ３、中心点ｚ０’と外周点ｚ４’との距離ｄ４を求める（ｄｎ＝ｄ（ｚ０’－ｚｎ’））。検出部１１０は、距離ｄ１，ｄ２，ｄ３，ｄ４のうち最大値を真円領域Ａ’’の半径ｒとして設定する（ｒ＝ｍａｘ（ｄ１，ｄ２，ｄ３，ｄ４））。 (Conversion processing from area A' to perfect circle area A'')
8A and 8B are diagrams for explaining the process of converting from a trapezoidally corrected area A' to a perfect circle area A'' in the embodiment, where (a) shows the object area A and (b) shows the area A ', and (c) shows a perfect circular region A''.
First, the detection unit 110 detects the X-direction minimum point z1, the X-direction maximum point z3, the Y-direction minimum point z4, and the Y-direction maximum point z2 of the area A. FIG. These detected points z1 to z4 are the coordinates of the outer peripheral points of the area A'. Next, the detection unit 110 obtains the X direction middle point (the center in the width direction) and the Y direction middle point (the center in the height direction) using the outer peripheral points, and determines the area that is the center in the width direction and the center in the height direction. Find the center point z0 of A. Next, the detection unit 110 trapezoidally corrects the X-direction minimum point z1, the X-direction maximum point z3, the Y-direction minimum point z2, the Y-direction maximum point z4, and the center point z0 using the function P1. That is, the detection unit 110 uses zn'=P1(zn) to obtain zn' after trapezoidal correction. Next, the detection unit 110 detects the distance d1 between the center point z0′ and the outer peripheral point z1′, the distance d2 between the central point z0′ and the outer peripheral point z2′, the distance d3 between the central point z0′ and the outer peripheral point z3′, the center A distance d4 between the point z0' and the outer peripheral point z4' is obtained (dn=d(z0'-zn')). The detection unit 110 sets the maximum value among the distances d1, d2, d3, and d4 as the radius r of the perfect circular area A'' (r=max(d1, d2, d3, d4)).

（隠れ判定処理）
検出部１１０は、領域Ａ’に含まれる画素数Ｎ１をカウントする。検出部１１０は、真円領域Ａ’’に含まれる画素数Ｎ２をカウントする。検出部１１０は、Ｎ２を、π×ｒ^２を用いてカウントしてもよい。検出部１１０は、画素数Ｎ１と画素数Ｎ２の割合が所定数以下であるか否かを判定し、割合が所定数以下である場合、物体領域Ａが障害物により隠された物体に対応すると判定する。検出部１１０は、割合が所定数以下ではない場合、物体領域Ａが障害物により隠されていない物体に対応すると判定する。検出部１１０は、障害物により隠された物体に対応した物体領域Ａを以降の処理対象から除外し、障害物により隠されていない物体に対応した物体領域Ａを以降の処理対象として選別する。所定値は、予め設定された値であり、例えば、後段の位置推定が高い精度で実行できるような物体領域Ａを選別するように設定された値であり、例えば、物体領域Ａの８０％が障害物で隠されていない物体を選別するように設定される。 (Hidden judgment processing)
The detection unit 110 counts the number of pixels N1 included in the area A'. The detection unit 110 counts the number of pixels N2 included in the perfect circular area A''. The detection unit 110 may count ^N2 using π×r2. The detection unit 110 determines whether or not the ratio between the number of pixels N1 and the number of pixels N2 is equal to or less than a predetermined number. judge. The detection unit 110 determines that the object region A corresponds to an object that is not hidden by an obstacle when the ratio is not equal to or less than the predetermined number. The detection unit 110 excludes the object area A corresponding to the object hidden by the obstacle from the subsequent processing target, and selects the object area A corresponding to the object not hidden by the obstacle as the subsequent processing target. The predetermined value is a value set in advance, for example, a value set so as to select the object region A such that subsequent position estimation can be performed with high accuracy. It is set to pick objects that are not obscured by obstacles.

（その他の物体領域Ａの選別処理）
図９は、実施形態における撮像画像の一例を示す図である。検出部１１０は、物体領域Ａの画像内位置に基づく撮像位置から物体までの距離に基づいて、物体領域Ａに対応した物体を選別してよい。検出部１１０は、撮像画像のうち所定の範囲を物体領域Ａの検出領域として設定する。検出部１１０は、例えば、撮像画像のＹ方向における位置Ｒｍｉｎから位置Ｒｍａｘまでに含まれる領域を、検出領域として設定する。位置Ｒｍｉｎおよび位置Ｒｍａｘは、後段の位置推定が高い精度で実行できるように、例えば、カメラの撮像可能範囲のうち鮮明且つ適切な大きさの画像を撮像することができる範囲に基づいて設定される。検出部１１０は、検出領域に含まれる物体領域Ａである場合、当該物体領域Ａを後段の処理対象に設定し、検出領域に含まれる物体領域Ａではない場合、当該物体領域Ａを後段の処理対象から除外する。 (Selection processing of other object regions A)
FIG. 9 is a diagram showing an example of a captured image in the embodiment. The detection unit 110 may select the object corresponding to the object area A based on the distance from the imaging position based on the position of the object area A in the image to the object. The detection unit 110 sets a predetermined range in the captured image as the object area A detection area. The detection unit 110 sets, for example, an area included from the position Rmin to the position Rmax in the Y direction of the captured image as the detection area. The position Rmin and the position Rmax are set based on, for example, a range in which a clear and appropriately sized image can be captured in the imaging range of the camera, so that subsequent position estimation can be performed with high accuracy. . If the object region A is included in the detection region, the detection unit 110 sets the object region A as a target for subsequent processing. Exclude from coverage.

図１０は、実施形態における撮像画像の一例を示す図であり、（ａ）は物体全体を含む物体領域Ａ１を示し、（ｂ）は見切れた物体領域Ａ２を示す。検出部１１０は、物体領域の幅と所定の物体領域幅との比較に基づいて、物体領域に対応した物体を選別してよい。検出部１１０は、例えば、左側画像から検出した物体領域Ａ１の幅Ｗ１と、右側画像から検出した物体領域Ａ２の幅Ｗ２との差が所定値以上であるか否かを判定する。物体領域の幅は、撮像画像のＸ方向（横方向）における最小値（最右座標）と最大値（最左座標）との差である。検出部１１０は、幅Ｗ２が幅Ｗ１よりも所定値以上小さいと判定することにより、物体領域Ａ２を後段の処理対象から除外する。所定値は、例えば、撮像画像にマンホールの全体が含まれていない、撮像画像から見切れたマンホール（物体）を除外するような値が設定される。 FIG. 10 is a diagram showing an example of a captured image according to the embodiment, (a) showing an object area A1 including the entire object, and (b) showing a cut-out object area A2. The detection unit 110 may select an object corresponding to the object area based on comparison between the width of the object area and a predetermined object area width. For example, the detection unit 110 determines whether or not the difference between the width W1 of the object area A1 detected from the left image and the width W2 of the object area A2 detected from the right image is equal to or greater than a predetermined value. The width of the object region is the difference between the minimum value (rightmost coordinate) and the maximum value (leftmost coordinate) in the X direction (horizontal direction) of the captured image. By determining that the width W2 is smaller than the width W1 by a predetermined value or more, the detection unit 110 excludes the object region A2 from subsequent processing targets. The predetermined value is, for example, set to a value that excludes manholes (objects) that are not entirely included in the captured image and that are cut off from the captured image.

［位置推定処理（ステップＳ１０４）］
以下、位置推定処理（ステップＳ１０４）について説明する。図１１は、物体の中心点を求めるときの課題を説明するための図であり、（ａ）は撮像画像の一例であり、（ｂ）は物体領域の平面図である。検出部１１０により検出された物体領域Ａを上から見た平面に変換すると、物体領域Ａから計算した中心点Ｃが、中心点Ｃ＃にずれてしまう。すなわち、図１１（ａ）に示す物体領域Ａの横方向における最左座標ａと最右座標ｂの中間点と、物体領域Ａの縦方向における最上座標ｃと最下座標ｄの中間点とから中心点Ｃを計算しても、中心点Ｃ＃は、物体領域Ａ＃の横方向における最左座標ａ＃と最右座標ｂ＃の中間点と、物体領域Ａ＃の縦方向における最上座標ｃ＃と最下座標ｄ＃の中間点とから計算した点からずれてしまう。 [Position estimation process (step S104)]
The position estimation process (step S104) will be described below. FIGS. 11A and 11B are diagrams for explaining a problem when obtaining the center point of an object. FIG. 11A is an example of a captured image, and FIG. 11B is a plan view of an object area. When the object area A detected by the detection unit 110 is transformed into a plane viewed from above, the center point C calculated from the object area A shifts to the center point C#. That is, from the midpoint between the leftmost coordinate a and the rightmost coordinate b in the horizontal direction of the object region A shown in FIG. Even if the center point C is calculated, the center point C# is the middle point between the leftmost coordinate a# and the rightmost coordinate b# of the object area A# in the horizontal direction, and the uppermost coordinate c of the object area A# in the vertical direction. It deviates from the point calculated from # and the middle point of the lowest coordinate d#.

（矩形２００から台形３００への変換処理）
図１２は、実施形態における矩形から台形への変換を説明するための図であり、（ａ）は矩形を示し、（ｂ）は台形を示す。位置推定部１２０は、物体領域Ａの画像内位置を基準とした形状を有する矩形２００から物体領域Ａの画像内位置を基準とした形状を有する台形３００への変換関数に基づいて、矩形２００の中心点Ｃを変換する。矩形２００は、例えば、物体領域Ａの中心点Ｃを重心とする正方形であって、一辺が物体領域ＡのＸ方向の幅と同じ長さを持つ正方形である。物体領域Ａの幅は、物体領域Ａの最左座標ａから最右座標までの距離である。台形３００は、矩形２００により変換された図形であり、各辺が最左座標ａ、最右座標ｂ、最上座標ｃ、および最下座標ｄに接する図形である。 (Conversion processing from rectangle 200 to trapezoid 300)
12A and 12B are diagrams for explaining conversion from a rectangle to a trapezoid in the embodiment, where (a) shows a rectangle and (b) shows a trapezoid. The position estimating unit 120 converts a rectangle 200 having a shape based on the position in the image of the object region A to a trapezoid 300 having a shape based on the position in the image of the object region A, based on the conversion function. Transform the center point C. The rectangle 200 is, for example, a square whose center of gravity is the center point C of the object region A and whose one side has the same length as the width of the object region A in the X direction. The width of the object area A is the distance from the leftmost coordinate a of the object area A to the rightmost coordinate. A trapezoid 300 is a figure transformed by the rectangle 200, and is a figure whose sides are in contact with the leftmost coordinate a, the rightmost coordinate b, the uppermost coordinate c, and the lowermost coordinate d.

図１３は、実施形態における台形補正について説明するための図であり、（ａ）は補正前の矩形２００を示し、（ｂ）は補正後の台形３００を示す。図１３（ａ）において、矩形２００は、物体領域Ａの幅ｗの長さである辺を持つ正方形である。幅ｗは、物体領域Ａの縦方向中間点(ｘｃ１，ｙｃ１)と物体領域Ａの縦方向中間点(ｘｃ２，ｙｃ２)との距離である。矩形２００は、（ｘ１’’，ｙ１’’）、（ｘ２’’，ｙ２’’）、（ｘ３’’，ｙ３’’）、および（ｘ４’’，ｙ４’’）を端部座標として持つ。図１３（ｂ）において、（ｘ０，ｙ０）は撮像画像の収束点の座標である。（ｘ１，ｙ１）、（ｘ２，ｙ２）、（ｘ３，ｙ３）、および（ｘ４，ｙ４）は物体領域Ａを囲む矩形の端部座標である。（ｘ１’，ｙ１’）、（ｘ２’，ｙ２’）、（ｘ３’，ｙ３’）、および（ｘ４’，ｙ４’）は台形３００の端部座標である。台形３００は、Ｘ方向に平行な２本の底辺と、縦方向中間点(ｘｃ１，ｙｃ１)と収束点とを結ぶ直線と、縦方向中間点(ｘｃ２，ｙｃ２)と収束点とを結ぶ直線とにより定義される。 13A and 13B are diagrams for explaining trapezoidal correction in the embodiment, in which FIG. 13A shows a rectangle 200 before correction and FIG. 13B shows a trapezoid 300 after correction. In FIG. 13(a), a rectangle 200 is a square with sides that are the width w of the object region A. In FIG. The width w is the distance between the vertical midpoint of the object region A (xc1, yc1) and the vertical midpoint of the object region A (xc2, yc2). Rectangle 200 has (x1'', y1''), (x2'', y2''), (x3'', y3''), and (x4'', y4'') as end coordinates. . In FIG. 13B, (x0, y0) are the coordinates of the convergence point of the captured image. (x1, y1), (x2, y2), (x3, y3), and (x4, y4) are the edge coordinates of a rectangle enclosing object region A. (x1', y1'), (x2', y2'), (x3', y3'), and (x4', y4') are the edge coordinates of trapezoid 300; The trapezoid 300 has two bases parallel to the X direction, a straight line connecting the vertical midpoint (xc1, yc1) and the convergence point, and a straight line connecting the vertical midpoint (xc2, yc2) and the convergence point. defined by

位置推定部１２０は、平面を仮定した矩形（正方形）２００を求める。位置推定部１２０は、例えば、下記の式５に、ｘｃ１、ｙｃ１、およびｗを代入することで、矩形２００の端部座標（ｘ１’’，ｙ１’’）、（ｘ２’’，ｙ２’’）、（ｘ３’’，ｙ３’’）、および（ｘ４’’，ｙ４’’）を計算する。

The position estimation unit 120 obtains a rectangle (square) 200 assuming a plane. For example, the position estimation unit 120 substitutes xc1, yc1, and w into Equation 5 below to obtain end coordinates (x1'', y1''), (x2'', y2'') of the rectangle 200. ), (x3'', y3''), and (x4'', y4'').

位置推定部１２０は、物体領域Ａの端部座標（ｘ１，ｙ１）、（ｘ２，ｙ２）、（ｘ３，ｙ３）、および（ｘ４，ｙ４）と、収束点（ｘ０，ｙ０）とを用いて、台形３００の端部座標（ｘ１’，ｙ１’）、（ｘ２’，ｙ２’）、（ｘ３’，ｙ３’）、および（ｘ４’，ｙ４’）を求める。位置推定部１２０は、例えば、下記の式６を用いて、台形３００の端部座標（ｘ１’，ｙ１’）、（ｘ２’，ｙ２’）、（ｘ３’，ｙ３’）、および（ｘ４’，ｙ４’）を計算する。

Using the edge coordinates (x1, y1), (x2, y2), (x3, y3), and (x4, y4) of the object region A and the convergence point (x0, y0), the position estimation unit 120 , the edge coordinates (x1′, y1′), (x2′, y2′), (x3′, y3′), and (x4′, y4′) of the trapezoid 300 are obtained. The position estimating unit 120 uses, for example, Equation 6 below to determine the end coordinates (x1', y1'), (x2', y2'), (x3', y3'), and (x4') of the trapezoid 300. , y4′).

次に位置推定部１２０は、矩形２００の座標（ｘ１’’，ｙ１’’）、（ｘ２’’，ｙ２’’）、（ｘ３’’，ｙ３’’）、および（ｘ４’’，ｙ４’’）と、台形３００の端部座標（ｘ１’，ｙ１’）、（ｘ２’，ｙ２’）、（ｘ３’，ｙ３’）、および（ｘ４’，ｙ４’）とを用いて、台形補正の関数Ｐ２を求める。位置推定部１２０は、例えば、下記の式７を用いる。式８は、Ｐ２を定義する行列である。

Next, the position estimation unit 120 calculates the coordinates (x1'', y1''), (x2'', y2''), (x3'', y3''), and (x4'', y4') of the rectangle 200. ') and the edge coordinates (x1', y1'), (x2', y2'), (x3', y3'), and (x4', y4') of the trapezoid 300, the keystone correction Find the function P2. The position estimation unit 120 uses, for example, Equation 7 below. Equation 8 is the matrix that defines P2.

次に位置推定部１２０は、矩形２００の中心点Ｃを関数Ｐ２を用いて変換することにより、台形３００の中心点Ｃ＃を求めることができる。図１４は、実施形態において矩形２００の中心点Ｃを台形３００の中心点Ｃ＃に変換することを示す図である。位置推定部１２０は、中心点Ｃ＃を、物体領域Ａの位置として推定する。 Next, the position estimation unit 120 can obtain the center point C# of the trapezoid 300 by transforming the center point C of the rectangle 200 using the function P2. FIG. 14 is a diagram illustrating transforming the center point C of the rectangle 200 to the center point C# of the trapezoid 300 in the embodiment. Position estimation section 120 estimates center point C# as the position of object region A. FIG.

［一致性の判定処理（ステップＳ１０６）］
図１５は、右側画像および左側画像の一例を示す図である。例えば、撮像画像に複数のマンホールが含まれる場合、右側画像に含まれるマンホールと、左側画像に含まれるマンホールとが同一であると誤判定してしまう可能性がある。これを抑制するため、物体検出システム１は、右側画像に含まれる物体領域と左側画像に含まれる物体領域とが同一であるか否かを判定する一致性の判定処理を行う。 [Consistency determination processing (step S106)]
FIG. 15 is a diagram showing an example of a right image and a left image. For example, when a plurality of manholes are included in the captured image, there is a possibility of erroneously determining that the manhole included in the right image and the manhole included in the left image are the same. In order to suppress this, the object detection system 1 performs matching determination processing for determining whether or not the object area included in the right image and the object area included in the left image are the same.

図１６は、実施形態において右側画像の物体領域と左側画像の物体領域との一致性を判定する処理を説明するための図である。位置推定部１２０は、例えば、右側画像と左側画像とを重ね合わせた状態において、右側画像の物体領域ＡＲの中心点ＣＲと左側画像の物体領域ＡＬの中心点ＣＬとの距離を算出する。位置推定部１２０は、算出した中心点の距離が第１の所定値以内かつ第２の所定値以上である場合に、中心点ＣＲを持つ物体領域ＡＲに対応した物体と中心点ＣＬを持つ物体領域ＡＬに対応した物体とが同じであると判定する。 16A and 16B are diagrams for explaining the process of determining the matching between the object area of the right image and the object area of the left image in the embodiment. For example, the position estimation unit 120 calculates the distance between the center point CR of the object area AR of the right image and the center point CL of the object area AL of the left image in a state in which the right image and the left image are superimposed. If the calculated distance between the center points is within the first predetermined value and is greater than or equal to the second predetermined value, the position estimation unit 120 determines whether the object corresponding to the object area AR having the center point CR and the object having the center point CL It is determined that the object corresponding to the area AL is the same.

第１の所定値は、例えば、上述した右側画像と左側画像との位置ずれの最大値である。第２の所定値は、例えば、上述した右側画像と左側画像との位置ずれの最小値である。位置ずれの最大値は、例えば、左側画像の位置Ｒｍｉｎ上の最端点ｘ１０に対応した右側画像内の位置ｘ１０＃と、左側画像の位置Ｒｍｉｎ上の最端点ｘ２０との距離Ｄａである。位置ズレの最小点は、例えば、左側画像の位置Ｒｍａｘ上の中央点ｘ１１に対応した右側画像内の位置ｘ１１＃と、左側画像の位置Ｒｍａｘ上の中央ｘ２１との距離Ｄｂである。 The first predetermined value is, for example, the maximum value of positional deviation between the right image and the left image described above. The second predetermined value is, for example, the minimum value of positional deviation between the right image and the left image described above. The maximum positional deviation is, for example, the distance Da between the position x10# in the right image corresponding to the extreme point x10 on the position Rmin of the left image and the extreme point x20 on the position Rmin of the left image. The minimum positional deviation point is, for example, the distance Db between the position x11# in the right image corresponding to the central point x11 on the position Rmax of the left image and the center x21 on the position Rmax of the left image.

なお、位置推定部１２０は、一致性の判定において、右側画像の物体領域ＡＲの中心点ＣＲと左側画像の物体領域ＡＬの中心点ＣＬとの距離を判定したが、これに限定されず、撮像画像の遠近方向（Ｙ方向）における距離および撮像画像の横方向（Ｘ方向）の距離のそれぞれについて一致性判定のための所定値を設定しておき、右側画像および左側画像のＹ方向の距離と所定値との比較、および右側画像および左側画像のＸ方向の距離と所定値との比較に基づいて一致性を判定してもよい。この場合の所定値は、Ｙ方向における右側画像および左側画像の位置ズレに基づいて設定し、Ｘ方向における右側画像および左側画像の位置ズレに基づいて設定してよい。 Note that the position estimating unit 120 determines the distance between the center point CR of the object area AR of the right image and the center point CL of the object area AL of the left image in the determination of matching, but is not limited to this. Predetermined values for matching determination are set for each of the distance in the perspective direction (Y direction) of the image and the distance in the horizontal direction (X direction) of the captured image. Matching may be determined based on a comparison with a predetermined value, and a comparison between the X-direction distance of the right and left images and a predetermined value. The predetermined value in this case may be set based on the positional deviation of the right and left images in the Y direction, and may be set based on the positional deviation of the right and left images in the X direction.

［分類処理（ステップＳ１０８）］
図１７は、実施形態におけるマンホールの一例を示す平面図である。図１７（ａ）、（ｂ）、（ｃ）、および（ｄ）に示すように、マンホールは、模様、記号またはマーク、鍵穴の形状、鍵穴の位置、大きさが異なり、これらの要素によって特徴付けられている。これらの要素は、例えば、マンホールに適用された規格、マンホールの所有者や管理者を表す模様や記号、マンホールを開けるための鍵穴の形状や位置、マンホールの種類やタイプや用途、マンホールの大きさや形状などが挙げられるが、これに限定されず、他のマンホールと区別するための要素であればよい。 [Classification process (step S108)]
Drawing 17 is a top view showing an example of a manhole in an embodiment. As shown in Figures 17(a), (b), (c), and (d), manholes differ in pattern, symbol or mark, keyhole shape, keyhole position, and size, and are characterized by these elements. attached. These elements are, for example, the standards applied to manholes, the patterns and symbols that represent the manhole owners and managers, the shape and position of the keyholes used to open the manholes, the types, types and uses of the manholes, the size of the manholes and their use. Examples include a shape, but the present invention is not limited to this, and any element for distinguishing the manhole from other manholes may be used.

図１８は、実施形態における分類部１３０の一構成例を示すブロック図である。分類部１３０は、例えば、変換部１３２と、複数の分類処理部１３４Ａ、１３４Ｂおよび１３４Ｃと、直径分類部１３６と、モデル構築部１３８とを備える。変換部１３２、複数の分類処理部１３４Ａ、１３４Ｂおよび１３４Ｃ、および直径分類部１３６といった機能部は、例えば、ＣＰＵ等によりプログラムを実行することにより実現される。変換部１３２は、物体領域Ａを表す画像を平面画像に変換する。変換部１３２は、位置推定部１２０により変換された平面画像を取得してもよい。複数の分類処理部１３４Ａ、１３４Ｂおよび１３４Ｃは、互いに異なる要素に基づいて物体を分類する。直径分類部１３６は、直径に基づいて物体を分類する。 FIG. 18 is a block diagram showing a configuration example of the classification section 130 in the embodiment. The classification unit 130 includes, for example, a conversion unit 132, a plurality of classification processing units 134A, 134B and 134C, a diameter classification unit 136, and a model construction unit 138. Functional units such as the conversion unit 132, the plurality of classification processing units 134A, 134B and 134C, and the diameter classification unit 136 are implemented by, for example, executing a program using a CPU or the like. The conversion unit 132 converts the image representing the object region A into a plane image. The transformation unit 132 may acquire the planar image transformed by the position estimation unit 120 . A plurality of classification processors 134A, 134B and 134C classify objects based on mutually different factors. A diameter classifier 136 classifies objects based on diameter.

分類処理部１３４Ａは、物体領域Ａに含まれる物体を第１の要素に基づいて分類する第１の処理を行う。分類処理部１３４Ｂ、分類処理部１３４Ｃおよび直径分類部１３６は、第１の処理により分類された物体を第２の要素に基づいて分類する第２の処理を行う。第１の要素は、物体の模様および記号のうちの少なくとも一つであり、第２の要素は、物体の穴部の形状、位置、および大きさのうちの少なくとも一つである。分類部１３０は、分類処理部１３４Ａと、分類処理部１３４Ｂ、分類処理部１３４Ｃおよび直径分類部１３６のうち何れか一つを備えていればよい。 The classification processing unit 134A performs a first process of classifying the objects included in the object area A based on the first element. The classification processing unit 134B, the classification processing unit 134C, and the diameter classification unit 136 perform a second process of classifying the objects classified by the first process based on the second element. The first element is at least one of the pattern and symbol of the object, and the second element is at least one of the shape, position and size of the hole in the object. The classification unit 130 may include any one of the classification processing unit 134A, the classification processing unit 134B, the classification processing unit 134C, and the diameter classification unit 136.

分類処理部１３４Ａは、模様・マーク用モデル１３４１に従った分類処理を行う。模様・マーク用モデル１３４１は、例えば、物体の模様を示すタグが付与された撮像画像を教師データとして機械学習されたモデルである。実施形態における機械学習モデルは、例えば畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）であるが、これに限定されず、パーセプトロンのニューラルネットワーク、再起型ニューラルネットワーク（ＲＮＮ：Recurrent Neural Network）、残差ネットワーク（ＲｅｓＮｅｔ：Residual Network）等の他のニューラルネットワークを設定してもよい。また、模様・マーク用モデル１３４１は、決定木、回帰木、ランダムフォレスト、勾配ブースティング木、線形回帰、ロジスティック回帰、又は、ＳＶＭ（サポートベクターマシン）等の教師あり学習の推定モデルを一部或いは全部に設定してもよい。 The classification processing unit 134A performs classification processing according to the pattern/mark model 1341 . The pattern/mark model 1341 is a model machine-learned using, for example, a picked-up image to which a tag indicating the pattern of an object is attached as teacher data. The machine learning model in the embodiment is, for example, a convolutional neural network (CNN: Convolutional Neural Network), but is not limited thereto, and includes a perceptron neural network, a recurrent neural network (RNN: Recurrent Neural Network), a residual network ( Other neural networks such as ResNet (Residual Network) may be set. The pattern/mark model 1341 is partly or partly based on supervised learning estimation models such as decision trees, regression trees, random forests, gradient boosting trees, linear regressions, logistic regressions, or SVMs (support vector machines). Can be set to all.

模様・マーク用モデル１３４１は、学習時において、撮像画像が入力された場合に、撮像画像に含まれる物体領域Ａの模様やマークを示す推定結果を出力する。分類処理部１３４Ａは、推定結果が、教師データとしてのタグと一致するように模様・マーク用モデル１３４１の処理パラメータを更新する。処理パラメータは、例えば、畳み込みニューラルネットワークにおける、層数、各層のノード数、各層間のノードの結合方式、活性化関数、誤差関数、及び勾配降下アルゴリズム、プーリングの領域、カーネル、重み係数、および重み行列の少なくとも一つである。分類処理部１３４Ａは、推定時において、変換部１３２から供給された物体領域Ａの画像を入力し、分類結果を出力する。分類処理部１３４Ａは、例えば分類結果に対応したタグを付与した物体領域Ａの画像を分類処理部１３４Ｂおよび分類処理部１３４Ｃに出力する。 The pattern/mark model 1341 outputs an estimation result indicating the pattern or mark of the object region A included in the captured image when the captured image is input during learning. The classification processing unit 134A updates the processing parameters of the pattern/mark model 1341 so that the estimation result matches the tag as teacher data. The processing parameters are, for example, the number of layers, the number of nodes in each layer, the method of connecting nodes between each layer, the activation function, the error function, the gradient descent algorithm, the pooling area, the kernel, the weight coefficient, and the weight in the convolutional neural network. At least one of the matrices. At the time of estimation, the classification processing unit 134A receives the image of the object region A supplied from the conversion unit 132 and outputs a classification result. The classification processing unit 134A outputs, for example, the image of the object region A to which the tag corresponding to the classification result is added to the classification processing unit 134B and the classification processing unit 134C.

分類処理部１３４Ｂは、形状用モデル１３４２に従った分類処理を行う。形状用モデル１３４２は、例えば、鍵穴の形状を示すタグが付与された撮像画像を教師データとして機械学習されたモデルである。実施形態における機械学習モデルは、例えば畳み込みニューラルネットワーク（ＣＮＮ）であるが、模様・マーク用モデル１３４１と同様にこれに限定されず、パーセプトロンのニューラルネットワーク、再起型ニューラルネットワーク（ＲＮＮ）、残差ネットワーク（ＲｅｓＮｅｔ）等の他のニューラルネットワークを設定してもよい。また、形状用モデル１３４２は、決定木、回帰木、ランダムフォレスト、勾配ブースティング木、線形回帰、ロジスティック回帰、又は、ＳＶＭ（サポートベクターマシン）等の教師あり学習の推定モデルを一部或いは全部に設定してもよい。形状用モデル１３４２は、学習時において、撮像画像が入力された場合に、撮像画像に含まれる鍵穴の形状を示す推定結果を出力する。分類処理部１３４Ｂは、推定結果が、教師データとしてのタグと一致するように形状用モデル１３４１の処理パラメータを更新する。処理パラメータは、例えば、畳み込みニューラルネットワークにおける、層数、各層のノード数、各層間のノードの結合方式、活性化関数、誤差関数、及び勾配降下アルゴリズム、プーリングの領域、カーネル、重み係数、および重み行列の少なくとも一つである。分類処理部１３４Ｂは、推定時において、分類処理部１３４Ａから供給された物体領域Ａの画像を入力し、分類結果を出力する。分類処理部１３４Ｂは、例えば分類結果に対応したタグを付与した物体領域Ａの画像を直径分類部１３６に出力する。分類処理部１３４Ｂは、例えば図１７（ａ）および（ｂ）に示した楕円形の鍵穴の形状と、図１７（ｃ）に示した円形の鍵穴の形状とで、２種類に分類する。 The classification processing unit 134B performs classification processing according to the shape model 1342 . The shape model 1342 is a model machine-learned using, for example, a captured image with a tag indicating the shape of a keyhole as teacher data. The machine learning model in the embodiment is, for example, a convolutional neural network (CNN), but is not limited to this as in the pattern/mark model 1341, and may be a perceptron neural network, a recurrent neural network (RNN), or a residual network. Other neural networks such as (ResNet) may be set. Also, the shape model 1342 may be partly or wholly a supervised learning estimation model such as a decision tree, regression tree, random forest, gradient boosting tree, linear regression, logistic regression, or SVM (support vector machine). May be set. When a captured image is input during learning, the shape model 1342 outputs an estimation result indicating the shape of the keyhole included in the captured image. The classification processing unit 134B updates the processing parameters of the shape model 1341 so that the estimation result matches the tag as teacher data. The processing parameters are, for example, the number of layers, the number of nodes in each layer, the method of connecting nodes between each layer, the activation function, the error function, the gradient descent algorithm, the pooling area, the kernel, the weight coefficient, and the weight in the convolutional neural network. At least one of the matrices. At the time of estimation, the classification processing unit 134B receives the image of the object region A supplied from the classification processing unit 134A and outputs a classification result. The classification processing unit 134B outputs to the diameter classification unit 136, for example, the image of the object area A to which the tag corresponding to the classification result is added. The classification processing unit 134B classifies into two types, for example, the elliptical keyhole shape shown in FIGS. 17(a) and 17(b) and the circular keyhole shape shown in FIG. 17(c).

分類処理部１３４Ｃは、位置用モデル１３４３に従った分類処理を行う。位置用モデル１３４３は、例えば、鍵穴の位置を示すタグが付与された撮像画像を教師データとして機械学習されたモデルである。実施形態における機械学習モデルは、例えば畳み込みニューラルネットワークであるが、模様・マーク用モデル１３４１と同様にこれに限定されず、パーセプトロンのニューラルネットワーク、再起型ニューラルネットワーク（ＲＮＮ）、残差ネットワーク（ＲｅｓＮｅｔ）等の他のニューラルネットワークを設定してもよい。また、形状用モデル１３４２は、決定木、回帰木、ランダムフォレスト、勾配ブースティング木、線形回帰、ロジスティック回帰、又は、ＳＶＭ（サポートベクターマシン）等の教師あり学習の推定モデルを一部或いは全部に設定してもよい。位置用モデル１３４３は、学習時において、撮像画像が入力された場合に、撮像画像に含まれる鍵穴の位置を示す推定結果を出力する。分類処理部１３４Ｂは、推定結果が、教師データとしてのタグと一致するように形状用モデル１３４１の処理パラメータを更新する。処理パラメータは、例えば、畳み込みニューラルネットワークにおける、層数、各層のノード数、各層間のノードの結合方式、活性化関数、誤差関数、及び勾配降下アルゴリズム、プーリングの領域、カーネル、重み係数、および重み行列の少なくとも一つである。分類処理部１３４Ｃは、推定時において、分類処理部１３４Ａから供給された物体領域Ａの画像を入力し、分類結果を出力する。分類処理部１３４Ｃは、例えば分類結果に対応したタグを付与した物体領域Ａの画像を直径分類部１３６に出力する。分類処理部１３４Ｃは、例えば図１７（ａ）および（ｂ）に示した鍵穴の位置と、図１７（ｃ）に示した鍵穴の位置とで、２種類に分類する。 The classification processing unit 134C performs classification processing according to the position model 1343. FIG. The position model 1343 is a model machine-learned using, for example, a captured image with a tag indicating the position of the keyhole as teacher data. The machine learning model in the embodiment is, for example, a convolutional neural network, but is not limited to this as in the pattern/mark model 1341, and may be a perceptron neural network, a recurrent neural network (RNN), or a residual network (ResNet). You may set other neural networks, such as. Also, the shape model 1342 may be partly or wholly a supervised learning estimation model such as a decision tree, regression tree, random forest, gradient boosting tree, linear regression, logistic regression, or SVM (support vector machine). May be set. During learning, when a captured image is input, the position model 1343 outputs an estimation result indicating the position of the keyhole included in the captured image. The classification processing unit 134B updates the processing parameters of the shape model 1341 so that the estimation result matches the tag as teacher data. The processing parameters are, for example, the number of layers, the number of nodes in each layer, the method of connecting nodes between each layer, the activation function, the error function, the gradient descent algorithm, the pooling area, the kernel, the weight coefficient, and the weight in the convolutional neural network. At least one of the matrices. At the time of estimation, the classification processing unit 134C receives the image of the object region A supplied from the classification processing unit 134A and outputs a classification result. The classification processing unit 134C outputs to the diameter classification unit 136, for example, the image of the object area A to which the tag corresponding to the classification result is added. The classification processing unit 134C classifies into two types, for example, the positions of the keyholes shown in FIGS. 17A and 17B and the positions of the keyholes shown in FIG. 17C.

直径分類部１３６は、物体領域Ａが示す物体の直径に従った分類処理を行う。直径分類部１３６は、位置推定部１２０により求めた物体領域Ａの幅ｗを示す情報を取得する。直径分類部１３６は、物体領域Ａの幅ｗが所定値以上であるか否かを判定し、判定結果に応じて、分類結果に対応したタグを付与する。所定値は、例えば、マンホールの規格により設定されている直径である。直径分類部１３６は、タグを付与した物体領域Ａの画像を分類結果として出力する。なお、物体検出システム１により取得した物体領域Ａの幅ｗに関する情報を他のシステムに提供し、直径分類部１３６は、当該他のシステムにより取得した物体領域Ａの幅ｗを用いて分類を行ってもよい。これにより、物体検出システム１は、他のシステムにより計算された正確な物体領域Ａの幅ｗを用いて、高い精度で分類を行うことができる。 The diameter classification unit 136 performs classification processing according to the diameter of the object indicated by the object region A. FIG. The diameter classifier 136 acquires information indicating the width w of the object region A obtained by the position estimator 120 . The diameter classification unit 136 determines whether or not the width w of the object region A is equal to or larger than a predetermined value, and assigns a tag corresponding to the classification result according to the determination result. The predetermined value is, for example, the diameter set by manhole standards. The diameter classification unit 136 outputs the tagged object region A image as a classification result. Information about the width w of the object region A acquired by the object detection system 1 is provided to another system, and the diameter classification unit 136 performs classification using the width w of the object region A acquired by the other system. may This allows the object detection system 1 to perform classification with high accuracy using the accurate width w of the object region A calculated by another system.

モデル構築部１３８は、模様・マーク用モデル１３４１、形状用モデル１３４２、および位置用モデル１３４３を構築する。モデル構築部１３８は、模様・マーク用モデル１３４１に教師データを入力し、模様・マーク用モデル１３４１から出力された分類結果が、教師データとしてのタグとなるように模様・マーク用モデル１３４１の処理パラメータを再帰的に更新する。モデル構築部１３８は、形状用モデル１３４２に教師データを入力し、形状用モデル１３４２から出力された分類結果が、教師データとしてのタグとなるように形状用モデル１３４２の処理パラメータを再帰的に更新する。モデル構築部１３８は、位置用モデル１３４３に教師データを入力し、位置用モデル１３４３から出力された分類結果が、教師データとしてのタグとなるように位置用モデル１３４３の処理パラメータを再帰的に更新する。なお、モデル構築部１３８は、分類部１３０に含まれていなくてもよく、物体検出装置１００の初期設定時やメンテナンス時に各モデルを分類部１３０に導入することができればよい。 The model constructing unit 138 constructs a pattern/mark model 1341 , a shape model 1342 , and a position model 1343 . The model construction unit 138 inputs teacher data to the pattern/mark model 1341, and processes the pattern/mark model 1341 so that the classification results output from the pattern/mark model 1341 become tags as teacher data. Update parameters recursively. The model construction unit 138 inputs teacher data to the shape model 1342, and recursively updates the processing parameters of the shape model 1342 so that the classification results output from the shape model 1342 become tags as teacher data. do. The model construction unit 138 inputs teacher data to the position model 1343, and recursively updates the processing parameters of the position model 1343 so that the classification result output from the position model 1343 becomes a tag as teacher data. do. Note that the model construction unit 138 does not have to be included in the classification unit 130 as long as each model can be introduced into the classification unit 130 at the time of initial setting or maintenance of the object detection device 100 .

＜実施形態の効果＞
以上説明したように、実施形態の物体検出システム１によれば、撮像画像から物体領域Ａを検出する検出部１１０と、物体領域Ａに対応した物体の位置を推定する位置推定部１２０と、を備え、位置推定部１２０により、物体領域Ａの画像内位置を基準とした形状を有する矩形２００から物体領域Ａの画像内位置を基準とした形状を有する台形３００への変換関数に基づいて矩形２００の中心点Ｃを変換し、変換した中心点Ｃ＃を物体の位置として推定する物体検出装置１００を実現することができる。変換関数は、例えば、物体領域Ａの幅ｗに対応した長さの辺を有する矩形２００を、物体領域Ａの端部座標および撮像画像の収束点に基づいて台形３００に変換する関数である。実施形態の物体検出装置１００によれば、高い精度で物体の位置を検出することができる。実施形態の物体検出装置１００によれば、例えば、撮像画像に含まれる物体領域Ａを平面座標に変換して物体の位置を推定した場合のように、変換時に生ずる位置のズレを抑制することができる。 <Effects of Embodiment>
As described above, according to the object detection system 1 of the embodiment, the detection unit 110 that detects the object area A from the captured image and the position estimation unit 120 that estimates the position of the object corresponding to the object area A are provided. The position estimation unit 120 generates a rectangle 200 based on a conversion function from a rectangle 200 having a shape based on the position in the image of the object region A to a trapezoid 300 having a shape based on the position in the image of the object region A. , and estimates the transformed center point C# as the position of the object. The conversion function is, for example, a function that converts a rectangle 200 having sides with a length corresponding to the width w of the object region A into a trapezoid 300 based on the end coordinates of the object region A and the convergence point of the captured image. The object detection device 100 of the embodiment can detect the position of an object with high accuracy. According to the object detection apparatus 100 of the embodiment, for example, it is possible to suppress the displacement of the position that occurs during conversion, as in the case where the position of the object is estimated by converting the object area A included in the captured image into plane coordinates. can.

実施形態の物体検出システム１によれば、位置推定部１２０により、物体領域Ａを台形補正し、台形補正された物体領域Ａ’を円領域Ａ’’に変換し、台形補正された物体領域Ａ’と、円領域Ａ’’との比較に基づいて、物体領域Ａに対応した物体を選別することができる。これにより、物体検出システム１によれば、物体が障害物により隠された場合に、当該物体に対応した物体領域Ａを位置推定の対象から除外することができる。この結果、物体検出システム１によれば、位置推定の精度を高くすることができる。 According to the object detection system 1 of the embodiment, the position estimation unit 120 trapezoidally corrects the object area A, converts the trapezoidally corrected object area A′ into a circular area A″, and converts the trapezoidally corrected object area A ' and the circular area A'', the object corresponding to the object area A can be selected. As a result, according to the object detection system 1, when an object is hidden by an obstacle, the object area A corresponding to the object can be excluded from the target of position estimation. As a result, according to the object detection system 1, the accuracy of position estimation can be improved.

実施形態の物体検出システム１によれば、位置推定部１２０により、物体領域Ａの画像内位置に基づく撮像位置から物体までの距離（Ｒｍｉｎ～Ｒｍａｘ）に基づいて、物体領域Ａに対応した物体を選別することができる。これにより、実施形態の物体検出システム１によれば、車載装置１０から遠い位置の物体に対応した物体領域Ａの位置を推定する処理を回避することができる。この結果、物体検出システム１によれば、位置推定の精度を高くすることができる。 According to the object detection system 1 of the embodiment, the position estimation unit 120 detects an object corresponding to the object area A based on the distance (Rmin to Rmax) from the imaging position to the object based on the position of the object area A in the image. can be selected. Thus, according to the object detection system 1 of the embodiment, it is possible to avoid the process of estimating the position of the object area A corresponding to the object located far from the in-vehicle device 10 . As a result, according to the object detection system 1, the accuracy of position estimation can be improved.

実施形態の物体検出システム１によれば、位置推定部１２０により、検出部１１０により検出された物体領域Ａの幅と所定の物体領域幅との比較に基づいて、物体領域Ａに対応した物体を選別することができる。これにより、物体検出システム１によれば、例えば、撮像画像の端で見切れた物体領域Ａを位置推定の処理対象から除外することができる。この結果、物体検出システム１によれば、位置推定の精度を高くすることができる。 According to the object detection system 1 of the embodiment, the object corresponding to the object area A is detected by the position estimation unit 120 based on the comparison between the width of the object area A detected by the detection unit 110 and the predetermined object area width. can be selected. As a result, according to the object detection system 1, for example, the object region A that is cut off at the edge of the captured image can be excluded from the position estimation processing target. As a result, according to the object detection system 1, the accuracy of position estimation can be improved.

実施形態の物体検出システム１によれば、位置推定部により、撮像画像のうち左側画像に基づいて検出された第１の中心点ＣＬと撮像画像のうち右側画像に基づいて検出された第２の中心点ＣＲとの距離が、第１の所定値以内Ｄａかつ第２の所定値Ｄｂ以上である場合に、第１の中心点ＣＬを持つ物体領域Ａに対応した物体と第２の中心点ＣＲを持つ物体領域Ａに対応した物体とが同じであると判定することができる。これにより、物体検出システム１によれば、右側画像および左側画像における物体のズレにより物体の同一性の誤りを抑制することができる。この結果、物体検出システム１によれば、同一であると判定された物体について、左側画像を用いて推定した位置と右側画像を用いて推定した位置とを対応付けた情報を提供することができる。 According to the object detection system 1 of the embodiment, the position estimation unit detects the first center point CL detected based on the left side image of the captured image and the second center point CL detected based on the right side image of the captured image. When the distance from the center point CR is within the first predetermined value Da and is equal to or greater than the second predetermined value Db, the object corresponding to the object area A having the first center point CL and the second center point CR can be determined to be the same as the object corresponding to the object region A having . As a result, according to the object detection system 1, it is possible to suppress an error in the identity of the object due to the deviation of the object between the right image and the left image. As a result, the object detection system 1 can provide information in which the position estimated using the left image and the position estimated using the right image are associated with each other for objects determined to be identical. .

実施形態の物体検出システム１によれば、撮像画像から物体領域Ａを検出する検出部１１０と、物体領域Ａに含まれる物体を第１の要素に基づいて分類する第１の処理を実行する分類処理部１３４Ａと、第１の処理により分類された物体を第２の要素に基づいて分類する第２の処理とを実行する分類処理部１３４Ｂおよび分類処理部１３４Ｃと、を備える分類部１３０（物体分類装置）を実現することできる。第１の要素は、物体の模様および記号のうちの少なくとも一つであり、第２の要素は、物体の穴部の形状、位置、および大きさのうちの少なくとも一つである。実施形態の物体検出システム１によれば、複数の段階に分けて物体を分類するので分類精度を高くすることができる。また、実施形態の物体検出システム１によれば、複数の段階に分けて物体を分類するので分類を詳細化することができる。 According to the object detection system 1 of the embodiment, the detection unit 110 that detects the object area A from the captured image and the classification that executes the first process of classifying the objects included in the object area A based on the first element. Classification unit 130 (object classifier) can be realized. The first element is at least one of the pattern and symbol of the object, and the second element is at least one of the shape, position and size of the hole in the object. According to the object detection system 1 of the embodiment, the classification accuracy can be improved because the objects are classified in a plurality of stages. Further, according to the object detection system 1 of the embodiment, objects are classified in a plurality of stages, so classification can be made more detailed.

実施形態の物体検出システム１によれば、分類部１３０が、第１の要素および第１の分類結果（タグ）を教師データとして機械学習され、物体領域Ａに含まれる第１の要素を入力した場合に第１の分類結果を出力する第１のモデル（１３４１）と、第２の要素および第２の分類結果を教師データとして機械学習され、物体領域Ａに含まれる第２の要素を入力した場合に第２の分類結果を出力する第２のモデル（１３４２または１３４３）とを含む。実施形態の物体検出システム１によれば、複数の段階に分けて機械学習モデルを構築したので、各機械学習モデルの分類精度を高くすることができる。 According to the object detection system 1 of the embodiment, the classification unit 130 performs machine learning using the first element and the first classification result (tag) as teacher data, and inputs the first element included in the object region A. A first model (1341) that outputs the first classification result in the case, and machine learning is performed using the second element and the second classification result as teacher data, and the second element included in the object region A is input. and a second model (1342 or 1343) that outputs a second classification result in the case. According to the object detection system 1 of the embodiment, since the machine learning model is constructed in a plurality of stages, the classification accuracy of each machine learning model can be improved.

実施形態の物体検出システム１によれば、分類部１３０により、検出部１１０により検出された物体領域Ａを台形補正し、台形補正された物体領域Ａに基づいて第１の処理および第２の処理を実行するので、分類精度を高くすることができる。 According to the object detection system 1 of the embodiment, the classification unit 130 trapezoidally corrects the object region A detected by the detection unit 110, and performs the first processing and the second processing based on the trapezoidally corrected object region A. is executed, the classification accuracy can be improved.

実施形態の物体検出システム１によれば、撮像画像から物体領域を検出する検出部１１０と、物体領域Ａに含まれる物体を、模様および記号のうちの少なくとも一つに基づいて分類する分類処理部１３４Ａ（第１の分類部）と、分類処理部１３４Ａにより分類された物体を、穴部の形状に基づいて分類する分類処理部１３４Ｂ（第２の分類部）と、分類処理部１３４Ｂにより分類された物体を、穴部の位置に基づいて分類する分類処理部１３４Ｃ（第３の分類部）と、分類処理部１３４Ｂにより分類された物体を大きさに基づいて分類し、分類処理部１３４Ｃにより分類された物体を大きさに基づいて分類する直径分類部１３６と、を備える、分類部１３０を実現することができる。実施形態の物体検出システム１によれば、物体の模様および記号と、穴部の形状と、穴部の位置と、物体を大きさとに分けて多段階で分類処理を行うことができるので、分類精度を高くすることができる。 According to the object detection system 1 of the embodiment, the detection unit 110 detects an object region from a captured image, and the classification processing unit classifies objects included in the object region A based on at least one of patterns and symbols. 134A (first classification unit), a classification processing unit 134B (second classification unit) that classifies the objects classified by the classification processing unit 134A based on the shape of the hole, and the objects classified by the classification processing unit 134B. The classification processing unit 134C (third classification unit) classifies the objects based on the position of the hole, and the objects classified by the classification processing unit 134B are classified based on the size, and the classification processing unit 134C classifies them. and a diameter sorter 136 that sorts the objects that are picked up based on size. According to the object detection system 1 of the embodiment, classification processing can be performed in multiple stages by classifying the object into patterns and symbols of the object, the shape of the hole, the position of the hole, and the size of the object. Accuracy can be increased.

実施形態の物体検出システム１によれば、分類処理部１３４Ａにより、模様および記号のうちの少なくとも一つおよび第１の分類結果を教師データとして機械学習され、物体領域Ａに含まれる模様および記号のうちの少なくとも一つを入力した場合に第１の分類結果を出力する模様・マーク用モデル１３４１（第１のモデル）に基づく処理を行い、分類処理部１３４Ｂにより、穴部の形状および第２の分類結果を教師データとして機械学習され、分類処理部１３４Ａにより分類された物体に対応した穴部の形状を入力した場合に第２の分類結果を出力する形状用モデル１３４２（第２のモデル）に基づく処理を行い、分類処理部１３４Ｃにより、穴部の位置および第３の分類結果を教師データとして機械学習され、分類処理部１３４Ａにより分類された物体の穴部の位置を入力した場合に第３の分類結果を出力する位置用モデル１３４３（第３のモデル）に基づく処理を行い、直径分類部１３６により、物体の大きさに基づいて、分類処理部１３４Ｂにより分類された物体を分類し、分類処理部１３４Ｃにより分類された物体を分類する。これにより、実施形態の物体検出システム１によれば、物体の模様および記号と、穴部の形状と、穴部の位置とに分けて機械学習モデルを構築したので、各機械学習モデルの分類精度を高くすることができる。 According to the object detection system 1 of the embodiment, the classification processing unit 134A performs machine learning using at least one of the patterns and symbols and the first classification result as teacher data, and the patterns and symbols included in the object region A are determined. Processing is performed based on the pattern/mark model 1341 (first model) that outputs the first classification result when at least one of the patterns is input, and the classification processing unit 134B performs processing based on the hole shape and the second classification result. A shape model 1342 (second model) that outputs a second classification result when a shape of a hole corresponding to an object that has been machine-learned using the classification result as teacher data and has been classified by the classification processing unit 134A is input. machine learning is performed by the classification processing unit 134C using the position of the hole and the result of the third classification as teacher data, and when the position of the hole of the object classified by the classification processing unit 134A is input, the third The object classified by the classification processing unit 134B is classified by the diameter classification unit 136 based on the size of the object. The objects classified by the processing unit 134C are classified. As a result, according to the object detection system 1 of the embodiment, the machine learning model is constructed by dividing the pattern and symbol of the object, the shape of the hole, and the position of the hole. can be raised.

なお、各実施形態および変形例について説明したが、一例であってこれらに限られず、例えば、各実施形態や各変形例のうちのいずれかや、各実施形態の一部や各変形例の一部を、他の１または複数の実施形態や他の１または複数の変形例と組み合わせて本発明の一態様を実現させてもよい。 Although each embodiment and modifications have been described, these are only examples and are not limited to these. A section may be combined with one or more other embodiments or one or more other modifications to realize one aspect of the present invention.

なお、本実施形態における物体検出装置１００の各処理を実行するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、当該記録媒体に記録されたプログラムを、コンピュータシステムに読み込ませ、実行することにより、物体検出装置１００に係る上述した種々の処理を行ってもよい。 Note that a program for executing each process of the object detection apparatus 100 in this embodiment may be recorded in a computer-readable recording medium, and the program recorded in the recording medium may be read and executed by a computer system. , the above-described various processes related to the object detection device 100 may be performed.

なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器などのハードウェアを含むものであってもよい。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリなどの書き込み可能な不揮発性メモリ、ＣＤ－ＲＯＭなどの可搬媒体、コンピュータシステムに内蔵されるハードディスクなどの記憶装置のことをいう。 Note that the “computer system” referred to here may include hardware such as an OS and peripheral devices. The "computer system" also includes the home page providing environment (or display environment) if the WWW system is used. In addition, "computer-readable recording medium" means writable non-volatile memory such as flexible disk, magneto-optical disk, ROM, flash memory, portable medium such as CD-ROM, hard disk built in computer system, etc. storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネットなどのネットワークや電話回線などの通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Ｄｙｎａｍｉｃ
ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ））のように、一定時間プログラムを保持しているものも含むものとする。また、上記プログラムは、このプログラムを記憶装置などに格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。 Furthermore, "computer-readable recording medium" means a volatile memory (e.g., DRAM (Dynamic
Random Access Memory)), which holds a program for a certain period of time. Also, the program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium.

ここで、プログラムを伝送する「伝送媒体」は、インターネットなどのネットワーク（通信網）や電話回線などの通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 Here, the "transmission medium" for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. Further, the program may be for realizing part of the functions described above. Further, it may be a so-called difference file (difference program) that can realize the above-described functions in combination with a program already recorded in the computer system.

１物体検出システム
１０車載装置
１００物体検出装置
１１０検出部
１２０位置推定部
１３０分類部
１３２変換部
１３４Ａ、１３４Ｂ、１３４Ｃ分類処理部
１３４１、模様・マーク用モデル
１３４２形状用モデル
１３４３位置用モデル
１３６直径分類部
１３８モデル構築部
１４０情報提供部 1 object detection system 10 in-vehicle device 100 object detection device 110 detection unit 120 position estimation unit 130 classification unit 132 conversion units 134A, 134B, 134C classification processing unit 1341, pattern/mark model 1342 shape model 1343 position model 136 diameter classification Part 138 Model building part 140 Information providing part

Claims

a detection unit that detects an object area from a captured image;
executing a first process of classifying the objects contained in the object region based on a first element and a second process of classifying the objects classified by the first process based on a second element; and a classifier for
The classification unit is machine-learned using a first classification result based on the first element as teacher data, and based on the first element included in the image of the object area when the image of the object area is input. The first process is executed using a first model that outputs a first classification result, machine learning is performed using a second classification result based on the second element as teacher data, and an image of the object region is obtained. performing the second processing using a second model that, when input, outputs a second classification result based on the second element contained in the image of the object region;
the first element is at least one of a pattern and a symbol of the object;
the second element is at least one of the shape, position, and size of a hole in the object;
Object classifier.

The classification unit
trapezoidally correcting the object region detected by the detection unit;
2. The object classification device according to claim 1 , wherein said first process and said second process are performed based on a keystone-corrected object region.

a detection unit that detects an object area from a captured image;
a first classification unit that classifies objects included in the object region based on at least one of patterns and symbols;
a second classification unit that classifies the objects classified by the first classification unit based on the shape of the hole;
a third classification unit that classifies the objects classified by the first classification unit based on the positions of the holes;
a fourth classification unit that classifies the objects classified by the second classification unit based on size, and classifies the objects classified by the third classification unit based on size;
The first classification unit is machine-learned using a first classification result based on at least one of the pattern and the symbol as teacher data, and performs at least one of the pattern and the symbol included in the image of the object region. Perform processing based on a first model that outputs a first classification result when inputting
The second classification unit is machine-learned using a second classification result based on the shape of the hole as teacher data. Perform processing based on a second model that outputs the classification result of 2,
The third classification unit is machine-learned using a third classification result based on the position of the hole as teacher data. Perform processing based on a third model that outputs the classification result of 3,
The fourth classification unit classifies the objects classified by the second classification unit and classifies the objects classified by the third classification unit based on the size of the objects.
Object classifier.

detecting an object region from the captured image;
executing a first process of classifying the objects contained in the object region based on a first element and a second process of classifying the objects classified by the first process based on a second element; and
Machine learning is performed using a first classification result based on the first element as teacher data, and a first classification result based on the first element included in the image of the object area when the image of the object area is input. machine learning is performed using a first model that outputs a second classification result based on the second element as teacher data, and when an image of the object region is input, the performing the second processing using a second model that outputs a second classification result based on the second element included in the image of the object region;
the first element is at least one of a pattern and a symbol of the object;
the second element is at least one of the shape, position, and size of a hole in the object;
Object classification method.

detecting an object region from the captured image;
a first classification step of classifying objects included in the object region based on at least one of patterns and symbols;
a second classification step of classifying the objects classified by the first classification step based on the shape of the hole;
a third classification step of classifying the objects classified by the first classification step based on the positions of the holes;
a fourth classification step of classifying the objects classified by the second classification step based on size, and classifying the objects classified by the third classification step based on size;
including
In the first classification step, machine learning is performed using a first classification result based on at least one of the pattern and the symbol as teacher data, and at least one of the pattern and the symbol included in the image of the object region is determined. perform processing based on a first model that outputs a first classification result when input;
In the second classification step, machine learning is performed using a second classification result based on the shape of the hole as teacher data. performing processing based on a second model that outputs a classification result;
In the third classification step, machine learning is performed using the third classification result based on the position of the hole as teacher data, and when the image of the object region based on the classification result in the first classification step is input, the third classification is performed. perform processing based on a third model that outputs a classification result;
The fourth classification step classifies the objects classified by the second classification step and classifies the objects classified by the third classification step based on the size of the objects.
Object classification method.

to the computer,
detecting an object region from the captured image;
executing a first process of classifying the objects contained in the object region based on a first element and a second process of classifying the objects classified by the first process based on a second element; causing a process to be performed including the step of
Machine learning is performed using a first classification result based on the first element as teacher data, and a first classification result based on the first element included in the image of the object area when the image of the object area is input. machine learning is performed using a first model that outputs a second classification result based on the second element as teacher data, and when an image of the object region is input, the executing the second process using a second model that outputs a second classification result based on the second element included in the image of the object region;
the first element is at least one of a pattern and a symbol of the object;
the second element is at least one of the shape, position, and size of a hole in the object;
program.

to the computer,
detecting an object region from the captured image;
a first classification step of classifying objects included in the object region based on at least one of patterns and symbols;
a second classification step of classifying the objects classified by the first classification step based on the shape of the hole;
a third classification step of classifying the objects classified by the first classification step based on the positions of the holes;
a fourth classification step of classifying the objects classified by the second classification step based on size, and classifying the objects classified by the third classification step based on size;
A program that executes a process including
In the first classification step, machine learning is performed using a first classification result based on at least one of the pattern and the symbol as teacher data, and at least one of the pattern and the symbol included in the image of the object region is determined. perform processing based on a first model that outputs a first classification result when input;
In the second classification step, machine learning is performed using a second classification result based on the shape of the hole as teacher data. performing processing based on a second model that outputs a classification result;
In the third classification step, machine learning is performed using the third classification result based on the position of the hole as teacher data, and when the image of the object region based on the classification result in the first classification step is input, the third classification is performed. perform processing based on a third model that outputs a classification result;
The fourth classification step classifies the objects classified by the second classification step and classifies the objects classified by the third classification step based on the size of the objects.
program .