JP6891984B1

JP6891984B1 - Object detection device, object detection method, program and recording medium

Info

Publication number: JP6891984B1
Application number: JP2020008374A
Authority: JP
Inventors: 匡史西村; 一真山本; 増田　誠; 誠増田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2021-06-18
Anticipated expiration: 2040-01-22
Also published as: JP2021117530A

Abstract

【課題】物体同士の重なりの有無にかかわらず精度の高い物体検出を行うことが可能な物体検出装置を提供する。【解決手段】画像を取得する画像取得部と、画像内で物体が占める領域を囲む検出候補枠、検出候補枠内で物体が視認できる領域を含む視認領域、及び物体の識別クラスを含む検出候補情報を、学習モデルに基づいて推論して出力する推論部と、検出候補枠が複数出力されかつ互いに重なる場合に、一方を他方に統合する統合処理を行う検出候補枠統廃合部と、統合処理後の検出候補情報を出力する出力部と、を有する。検出候補枠統廃合部は、統合処理において、互いに重なる検出候補枠のうち手前側を前方検出候補枠、奥側を後方検出候補枠として特定し、後方検出候補枠内の視認領域の大きさに基づいて後方検出候補領域における物体の存在確率を算出し、前方検出候補枠と後方検出候補枠とを統合するか否かを決定する。【選択図】図１PROBLEM TO BE SOLVED: To provide an object detection device capable of performing highly accurate object detection regardless of the presence or absence of overlapping of objects. An image acquisition unit for acquiring an image, a detection candidate frame surrounding an area occupied by an object in the image, a visual recognition area including an area in which the object can be visually recognized in the detection candidate frame, and a detection candidate including an object identification class. An inference unit that infers and outputs information based on a learning model, a detection candidate frame consolidation unit that performs integrated processing that integrates one into the other when multiple detection candidate frames are output and overlaps with each other, and after integration processing It has an output unit that outputs detection candidate information of. In the integration process, the detection candidate frame consolidation unit specifies the front side of the overlapping detection candidate frames as the front detection candidate frame and the back side as the rear detection candidate frame, and is based on the size of the visible area in the rear detection candidate frame. The existence probability of the object in the rear detection candidate area is calculated, and it is determined whether or not to integrate the front detection candidate frame and the rear detection candidate frame. [Selection diagram] Fig. 1

Description

本発明は、物体検出装置、物体検出方法、プログラム及び記録媒体に関する。 The present invention relates to an object detection device, an object detection method, a program and a recording medium.

監視カメラ等により撮影された画像から、人や車等の物体を検出してカウントすることが行われている。近年では、ディープラーニングを利用して画像中にある物体の検出と識別とを同時に実行する物体検出方法が多く提案されており、検出の高精度化が図られている。 Objects such as people and cars are detected and counted from images taken by surveillance cameras and the like. In recent years, many object detection methods that simultaneously detect and identify an object in an image by using deep learning have been proposed, and the accuracy of detection has been improved.

例えば、非特許文献１では、様々なサイズの特徴マップと矩形の形を変化させるデフォルトボックス（以下、検出候補枠と称する）を用意し、最終層で損失関数を計算することにより、様々なサイズの検出枠の位置と枠サイズの推定と物体の識別クラス（例えば、車、人物、自転車等）とを同時に推論している。検出候補枠は特徴マップの数に比例して準備されるため、推論の際には複数の特徴マップに対応した複数の検出候補枠が発生することになる。 For example, in Non-Patent Document 1, feature maps of various sizes and default boxes (hereinafter referred to as detection candidate frames) that change the shape of a rectangle are prepared, and the loss function is calculated in the final layer to obtain various sizes. The position and frame size of the detection frame are estimated and the object identification class (for example, car, person, bicycle, etc.) is inferred at the same time. Since the detection candidate frames are prepared in proportion to the number of feature maps, a plurality of detection candidate frames corresponding to a plurality of feature maps are generated at the time of inference.

一般的な物体検出でマルチスケールな物体に対応しようとすると、そのサイズに対応した複数の検出候補枠を準備する必要がある。複数の検出候補枠が発生した場合、例えばnon-maximum suppression (nms)の手法を用いて、検出候補枠の中で信頼度（スコア）が極大となる検出候補枠を選択し、最終的な検出枠とすることが行われている。 When trying to correspond to a multi-scale object by general object detection, it is necessary to prepare a plurality of detection candidate frames corresponding to the size. When multiple detection candidate frames occur, for example, using the non-maximum suppression (nms) method, the detection candidate frame with the maximum reliability (score) is selected from the detection candidate frames, and the final detection is performed. A frame is being used.

しかし、nmsの手法では、例えば検出枠を示す矩形同士の重なりが大きい場合、２つの検出枠のうちの一方が選択され、他方は削除される。このため、例えば画像中に複数の物体が重なって表示されている場合、各々の物体についての検出枠を示す矩形の重なりが大きくなるため、それらの矩形が物体の検出枠としてそれぞれ正しい解であったとしても、どちらかが削除されてしまう。したがって、複数の物体間の重なりが大きい場合には、nmsの手法では対応することが難しい。nmsの閾値を調整して、重なりが大きい検出候補枠を削除せずに出力することも考えられるが、その場合、重なりの大きい誤検出を大量に出力するというトレードオフがあるため、nmsの調整だけでは解決が困難である。 However, in the nms method, for example, when the rectangles indicating the detection frames have a large overlap, one of the two detection frames is selected and the other is deleted. For this reason, for example, when a plurality of objects are overlapped and displayed in the image, the overlap of the rectangles indicating the detection frames for each object becomes large, and these rectangles are the correct solutions for the object detection frames. Even so, one of them will be deleted. Therefore, when the overlap between a plurality of objects is large, it is difficult to deal with it by the nms method. It is possible to adjust the threshold value of nms and output without deleting the detection candidate frames with large overlap, but in that case, there is a trade-off that a large amount of false detections with large overlap is output, so adjustment of nms It is difficult to solve by itself.

そこで、非特許文献２では、検出枠の位置及び物体の識別クラスの学習に加え、重なりのある対象を検出する際に有用となる物体数情報を学習する手法が提案されている。学習した物体数情報は、推論時には検出候補枠毎に物体数として出力される。非特許文献２の手法では、検出候補枠の物体数の和が１となるように検出候補枠の統合処理を行っている。 Therefore, Non-Patent Document 2 proposes a method of learning the number of objects information that is useful when detecting overlapping objects, in addition to learning the position of the detection frame and the identification class of the objects. The learned object number information is output as the number of objects for each detection candidate frame at the time of inference. In the method of Non-Patent Document 2, the detection candidate frame is integrated so that the sum of the number of objects in the detection candidate frame is 1.

Wei Liu, et al.,“SSD: Single Shot MultiBox Detector”, ECCV2016Wei Liu, et al., “SSD: Single Shot MultiBox Detector”, ECCV2016 鈴木哲明, 渕上弘光, 宮野博義, 「局所的な物体数推定に基づく重なりに頑健な物体検出」, 第２１回画像の認識・理解シンポジウム, ２０１８年６月Tetsuaki Suzuki, Hiromitsu Fuchigami, Hiroyoshi Miyano, "Sturdy Object Detection Based on Local Object Number Estimate", 21st Image Recognition and Understanding Symposium, June 2018

上記の通り、非特許文献１にnmsを組み合わせた手法では、複数の物体同士が重なっている場合には精度の高い検出を行うことができないという問題があった。 As described above, the method of combining nms with Non-Patent Document 1 has a problem that high-precision detection cannot be performed when a plurality of objects overlap each other.

一方、非特許文献２の手法は、物体同士の重なりがある場合には有用であるものの、物体同士の重なりがない場合には検出の精度が低下する可能性がある。例えば、１つの物体につき３つの検出候補枠が発生する場合、枠の統合後の物体数が１になるように、各検出候補枠に紐づけられる物体数はそれぞれ１／３となる。物体数を１つと判定するための閾値を０．８とすると、３つの検出候補枠のうちの１つが物体から外れた位置に発生した場合、残りの２つの検出候補枠の物体数の和である２／３が閾値（０．８）を満たさなくなり、物体は未検出として扱われてしまう。また、このような事態を避けるために閾値を低く設定すると、誤検出が発生してしまうという問題があった。 On the other hand, although the method of Non-Patent Document 2 is useful when the objects overlap each other, the detection accuracy may decrease when the objects do not overlap each other. For example, when three detection candidate frames are generated for one object, the number of objects associated with each detection candidate frame is reduced to 1/3 so that the number of objects after integration of the frames is 1. Assuming that the threshold value for determining the number of objects as one is 0.8, if one of the three detection candidate frames occurs at a position outside the object, the sum of the number of objects in the remaining two detection candidate frames is used. A certain 2/3 does not meet the threshold (0.8), and the object is treated as undetected. Further, if the threshold value is set low in order to avoid such a situation, there is a problem that erroneous detection occurs.

本発明は上記問題点に鑑みてなされたものであり、物体同士の重なりの有無にかかわらず精度の高い物体検出を行うことが可能な物体検出装置を提供することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an object detection device capable of performing highly accurate object detection regardless of the presence or absence of overlapping of objects.

本発明に係る物体検出装置は、画像を取得する画像取得部と、前記画像内において１の物体が占める領域を囲む検出候補枠、前記検出候補枠内で前記１の物体が視認できる領域を含む視認領域、及び前記１の物体の識別クラスを含む検出候補情報を、学習モデルに基づいて物体毎に推論して出力する推論部と、前記検出候補枠が複数出力されかつ互いに重なる検出候補枠が存在している場合に、当該互いに重なる検出候補枠のうちの一方を他方に統合する統合処理を行う検出候補枠統廃合部と、前記統合処理後に存在する検出候補枠の各々についての前記検出候補情報を出力する出力部と、を有し、前記検出候補枠統廃合部は、前記統合処理において、前記互いに重なる検出候補枠の一方を前記画像内において相対的に手前側に位置する物体に対応する検出候補枠である前方検出候補枠として特定し、前記互いに重なる検出候補枠の他方を前記画像内において相対的に奥側に位置する物体に対応する検出候補枠である後方検出候補枠として特定し、前記後方検出候補枠内の前記視認領域の大きさに基づいて、前記後方検出候補枠における前記物体の存在確率を算出し、算出された前記存在確率に基づいて、前記前方検出候補枠と前記後方検出候補枠とを統合するか否かを決定することを特徴とする。 The object detection device according to the present invention includes an image acquisition unit for acquiring an image, a detection candidate frame surrounding an area occupied by one object in the image, and a region in which the object 1 can be visually recognized in the detection candidate frame. The inference unit that infers and outputs the detection candidate information including the visual recognition area and the identification class of the object 1 for each object based on the learning model, and the detection candidate frames in which a plurality of the detection candidate frames are output and overlap each other If there is, the detection candidate frame consolidation unit that integrates one of the overlapping detection candidate frames into the other, and the detection candidate information for each of the detection candidate frames that exist after the integration process. The detection candidate frame consolidation unit has an output unit that outputs the above, and in the integration process, the detection candidate frame integration / abolition unit detects an object in which one of the detection candidate frames that overlap each other is located relatively in front of the image. It is specified as a front detection candidate frame which is a candidate frame, and the other of the detection candidate frames that overlap each other is specified as a rear detection candidate frame which is a detection candidate frame corresponding to an object located relatively behind in the image. The existence probability of the object in the rear detection candidate frame is calculated based on the size of the visible area in the rear detection candidate frame, and the front detection candidate frame and the rear are based on the calculated existence probability. It is characterized in that it is determined whether or not to integrate with the detection candidate frame.

また、本発明に係る物体検出方法は、学習モデルに基づいて画像内の物体を検出する検出装置が実行する物体検出方法であって、画像を取得するステップと、前記画像内において１の物体が占める領域を囲む検出候補枠、前記検出候補枠内で前記１の物体が視認できる領域を含む視認領域、及び前記１の物体の識別クラスを含む検出候補情報を、学習モデルに基づいて物体毎に推論して出力するステップと、前記検出候補枠が複数出力されかつ互いに重なる検出候補枠が存在している場合に、当該互いに重なる検出候補枠のうちの一方を他方に統合する統合処理を行うステップと、前記統合処理後に存在する検出候補枠の各々についての前記検出候補情報を出力するステップと、を有し、前記統合処理を行うステップは、前記互いに重なる検出候補枠の一方を前記画像内において相対的に手前側に位置する物体に対応する検出候補枠である前方検出候補枠として特定するステップと、前記互いに重なる検出候補枠の他方を前記画像内において相対的に奥側に位置する物体に対応する検出候補枠である後方検出候補枠として特定するステップと、前記後方検出候補枠内の前記視認領域の大きさに基づいて、前記後方検出候補枠における前記物体の存在確率を算出するステップと、算出された前記存在確率に基づいて、前記前方検出候補枠と前記後方検出候補枠とを統合するか否かを決定するステップと、を含むことを特徴とする。 Further, the object detection method according to the present invention is an object detection method executed by a detection device that detects an object in an image based on a learning model, in which a step of acquiring an image and one object in the image are included. The detection candidate frame surrounding the occupied area, the visible area including the area where the object 1 can be visually recognized in the detection candidate frame, and the detection candidate information including the identification class of the object 1 are obtained for each object based on the learning model. A step of inferring and outputting, and a step of performing an integration process of integrating one of the overlapping detection candidate frames into the other when a plurality of the detection candidate frames are output and there are detection candidate frames that overlap each other. And a step of outputting the detection candidate information for each of the detection candidate frames existing after the integration process, and in the step of performing the integration process, one of the detection candidate frames overlapping with each other is placed in the image. The step of specifying as the front detection candidate frame, which is the detection candidate frame corresponding to the object located on the relatively front side, and the other of the detection candidate frames overlapping with each other are set to the object located relatively on the back side in the image. A step of specifying the object as a rear detection candidate frame, which is a corresponding detection candidate frame, and a step of calculating the existence probability of the object in the rear detection candidate frame based on the size of the visible area in the rear detection candidate frame. It is characterized by including a step of determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame based on the calculated existence probability.

また、本発明に係るプログラムは、コンピュータに、画像を取得するステップと、前記画像内において１の物体が占める領域を囲む検出候補枠、前記検出候補枠内で前記１の物体が視認できる領域を含む視認領域、及び前記１の物体の識別クラスを含む検出候補情報を、学習モデルに基づいて物体毎に推論して出力するステップと、前記検出候補枠が複数出力されかつ互いに重なる検出候補枠が存在している場合に、前記互いに重なる検出候補枠の一方を前記画像内において相対的に手前側に位置する物体に対応する検出候補枠である前方検出候補枠として特定するステップと、前記互いに重なる検出候補枠の他方を前記画像内において相対的に奥側に位置する物体に対応する検出候補枠である後方検出候補枠として特定するステップと、前記後方検出候補枠内の前記視認領域の大きさに基づいて、前記後方検出候補枠における前記物体の存在確率を算出するステップと、算出された前記存在確率に基づいて、前記前方検出候補枠と前記後方検出候補枠とを統合するか否かを決定するステップと、前記決定に応じて行われた統合処理後に存在する検出候補枠の各々についての前記検出候補情報を出力するステップと、を実行させることを特徴とする。
Further, the program according to the present invention provides a computer with a step of acquiring an image, a detection candidate frame surrounding an area occupied by one object in the image, and a region in which the object of one can be visually recognized in the detection candidate frame. A step of inferring and outputting the visible area including the visible area and the detection candidate information including the identification class of the object 1 for each object based on the learning model, and a detection candidate frame in which a plurality of the detection candidate frames are output and overlap each other are When present, one of the overlapping detection candidate frames overlaps with the step of specifying one of the overlapping detection candidate frames as a front detection candidate frame which is a detection candidate frame corresponding to an object located relatively in front of the image. A step of specifying the other side of the detection candidate frame as a rear detection candidate frame which is a detection candidate frame corresponding to an object located relatively in the back side of the image, and the size of the visible area in the rear detection candidate frame. based on, a step of calculating the existence probability of the object in the rear detection candidate frames, based on the calculated existence probability, whether integrated with the front detection candidate frame and the rear detection candidate frames It is characterized in that a step of determining and a step of outputting the detection candidate information for each of the detection candidate frames existing after the integration process performed in response to the determination are executed.

本発明によれば、物体同士の重なりの有無にかかわらず精度の高い物体検出を行うことが可能となる。 According to the present invention, it is possible to perform highly accurate object detection regardless of whether or not the objects overlap each other.

本発明の実施例１に係る物体検出装置の構成を示すブロック図である。It is a block diagram which shows the structure of the object detection apparatus which concerns on Example 1 of this invention. ターゲットの前方に遮蔽物が存在する場合の検出候補枠及び視認領域枠を模式的に示す図である。It is a figure which shows typically the detection candidate frame and the visual area frame when a shield exists in front of a target. ターゲットが重なって表示されている場合に選択される前方検出候補枠の例を示す図である。It is a figure which shows the example of the forward detection candidate frame which is selected when the target is displayed overlapping. ターゲットが重なって表示されている場合に算出される後方領域の例を示す図である。It is a figure which shows the example of the rear region calculated when the targets are overlapped and displayed. ターゲットが重なって表示されている場合に算出される後方視認領域の例を示す図である。It is a figure which shows the example of the rear view area calculated when the targets are overlapped and displayed. 誤検出が生じた際の枠の統合の流れを模式的に示す図である。It is a figure which shows typically the flow of the integration of the frame when a false detection occurs. 複数のターゲットの前方に遮蔽物が存在する場合を模式的に示す図である。It is a figure which shows typically the case where the shield exists in front of a plurality of targets.

以下、本発明の実施例について、図面を参照して説明する。なお、以下の各実施例における説明及び添付図面においては、実質的に同一又は等価な部分には同一の参照符号を付している。 Hereinafter, examples of the present invention will be described with reference to the drawings. In the description and the accompanying drawings in each of the following examples, substantially the same or equivalent parts are designated by the same reference numerals.

図１は、本実施例の物体検出装置１００の構成を示すブロック図である。物体検出装置１００は、畳み込みニューラルネットワークを用いて画像内の物体を検出する検出装置である。物体検出装置１００は、物体学習部１０及び物体推定部２０から構成されている。 FIG. 1 is a block diagram showing the configuration of the object detection device 100 of this embodiment. The object detection device 100 is a detection device that detects an object in an image using a convolutional neural network. The object detection device 100 includes an object learning unit 10 and an object estimation unit 20.

物体検出装置１００は、ＳＳＤ（Single Shot Multibox Detector）の手法に基づいて、画像内の物体の検出を行う。ＳＳＤでは、検出する物体のサイズ及びアスペクト比に対応した矩形パターンの検出候補枠（デフォルトボックス）を画像に重畳して表示させ、物体の位置、大きさ及び識別クラス（例えば、車、人物、自転車等）の推論を行う。換言すれば、検出候補枠は、画像内で１の物体が占める領域を囲む矩形パターンの枠である。 The object detection device 100 detects an object in an image based on the method of SSD (Single Shot Multibox Detector). In SSD, the detection candidate frame (default box) of the rectangular pattern corresponding to the size and aspect ratio of the object to be detected is superimposed and displayed on the image, and the position, size and identification class (for example, car, person, bicycle) of the object are displayed. Etc.). In other words, the detection candidate frame is a rectangular pattern frame that surrounds the area occupied by one object in the image.

また、本実施例では、検出候補枠とは別に、検出候補枠内において物体が視認できる領域を含む領域範囲を「視認領域」として表示させる。例えば、本実施例では、検出対象の物体が遮蔽されることなく視認可能に表示されている領域を含む矩形の領域が視認領域として表示される。以下の説明では、視認領域を示す矩形パターンの枠を視認領域枠と称する。 Further, in this embodiment, apart from the detection candidate frame, the area range including the area where the object can be visually recognized in the detection candidate frame is displayed as the “visual area”. For example, in this embodiment, a rectangular area including an area in which the object to be detected is visually displayed without being shielded is displayed as a visible area. In the following description, the frame of the rectangular pattern indicating the visible area is referred to as the visible area frame.

図２は、検出対象の物体がヒト（人物）であり、物体の前に遮蔽物が存在している場合の検出候補枠ＤＦ及び視認領域枠ＶＦを模式的に示す図である。検出候補枠ＤＦは、遮蔽物ＳＬによって遮蔽されている領域も含めた人物全体の位置及び大きさを示している。一方、視認領域枠ＶＦは、画像内の人物のうち遮蔽物ＳＬによって遮蔽されていない視認領域の位置及び範囲を示している。 FIG. 2 is a diagram schematically showing a detection candidate frame DF and a viewing area frame VF when the object to be detected is a human (person) and a shield exists in front of the object. The detection candidate frame DF indicates the position and size of the entire person including the area shielded by the shield SL. On the other hand, the viewing area frame VF indicates the position and range of the viewing area of the person in the image that is not shielded by the shield SL.

再び図１を参照すると、物体学習部１０は、物体検出のための深層学習（ディープラーニング）を行う学習装置であり、データ保管部１１、学習部１２及び学習モデル１３を有する。 Referring to FIG. 1 again, the object learning unit 10 is a learning device that performs deep learning for object detection, and has a data storage unit 11, a learning unit 12, and a learning model 13.

データ保管部１１は、例えばフラッシュメモリやＨＤＤ（Hard Disk Drive）等の半導体記憶装置から構成されている。データ保管部１１は、学習部１２が学習に用いるデータ（以下、学習データと称する）を保管する。例えば、本実施例では、学習対象としての画像、当該画像内の物体についての検出候補枠の正解を示す物体検出正解枠、物体の識別クラス、視認領域枠等のデータが、学習データとしてデータ保管部１１に保管されている。 The data storage unit 11 is composed of a semiconductor storage device such as a flash memory or an HDD (Hard Disk Drive), for example. The data storage unit 11 stores data (hereinafter, referred to as learning data) used for learning by the learning unit 12. For example, in this embodiment, data such as an image as a learning target, an object detection correct answer frame indicating the correct answer of the detection candidate frame for an object in the image, an object identification class, and a visual recognition area frame are stored as learning data. It is stored in part 11.

学習部１２は、データ保管部１１に保管されている学習データの入力を受け、ＳＳＤの手法に基づく物体検出の深層学習を行い、学習モデル（すなわち、深層学習の手法を用いて構築された数理モデル）を生成する。例えば、学習部１２は識別クラス、物体検出正解枠、視認領域枠の３種のデータを学習に用い、この学習に基づいて、畳み込みニューラルネットワークの重み付け係数等のパラメータが学習モデルとして生成される。 The learning unit 12 receives the input of the learning data stored in the data storage unit 11, performs deep learning of object detection based on the SSD method, and performs a learning model (that is, a mathematics constructed by using the deep learning method). Model) is generated. For example, the learning unit 12 uses three types of data, an identification class, an object detection correct answer frame, and a visual recognition area frame, for learning, and based on this learning, parameters such as a weighting coefficient of a convolutional neural network are generated as a learning model.

学習モデル記憶部１３は、フラッシュメモリやＨＤＤ等の半導体記憶装置から構成され、学習部１２により生成された学習モデルを記憶する。 The learning model storage unit 13 is composed of a semiconductor storage device such as a flash memory or an HDD, and stores the learning model generated by the learning unit 12.

物体推定部２０は、ＣＰＵ（Central Processing Unit）等の処理制御部から構成され、物体学習部１０の学習により生成された学習モデルに基づいて、画像内の物体の推定（すなわち、物体検出）を行う。 The object estimation unit 20 is composed of a processing control unit such as a CPU (Central Processing Unit), and estimates an object in an image (that is, object detection) based on a learning model generated by learning of the object learning unit 10. Do.

なお、本実施例の物体推定部２０は、画像内で複数の検出候補枠に重なりが生じており、複数の物体が重なっている可能性がある場合に、検出候補枠の統合処理を行うことによって適切な検出候補枠を出力することが可能に構成されている。 The object estimation unit 20 of the present embodiment performs integration processing of the detection candidate frames when a plurality of detection candidate frames overlap in the image and there is a possibility that the plurality of objects overlap. It is configured to be able to output an appropriate detection candidate frame.

物体推定部２０は、画像取得部２１、推論部２２、検出候補枠統廃合部２３及び出力部２４を有する。これらの機能ブロックは、例えば物体推定部２０が所定のプログラムを実行することにより形成される。 The object estimation unit 20 includes an image acquisition unit 21, an inference unit 22, a detection candidate frame consolidation / abolition unit 23, and an output unit 24. These functional blocks are formed, for example, by the object estimation unit 20 executing a predetermined program.

画像取得部２１は、検出対象の物体を含む画像を取得する。画像取得部２１は、カメラ等の撮像装置から構成されていてもよく、他の撮像装置で撮影された画像を外部から取得するように構成されていてもよい。 The image acquisition unit 21 acquires an image including an object to be detected. The image acquisition unit 21 may be configured to be composed of an image pickup device such as a camera, or may be configured to acquire an image taken by another image pickup device from the outside.

推論部２２は、学習モデル記憶部１３に記憶されている学習モデルに基づいて、画像取得部２１によって取得された画像内の物体についての検出候補枠、物体の識別クラス、及び視認領域枠を推論する。以下の説明では、１の物体についての検出候補枠、視認領域枠及び識別クラスを「検出候補情報」とも称する。 The inference unit 22 infers the detection candidate frame, the object identification class, and the visual recognition area frame for the object in the image acquired by the image acquisition unit 21 based on the learning model stored in the learning model storage unit 13. To do. In the following description, the detection candidate frame, the viewing area frame, and the identification class for one object are also referred to as "detection candidate information".

検出候補枠統廃合部２３は、推論部２２によって推論された複数の検出候補枠の統合処理を行う。なお、本実施例では、重なりがある一対の検出候補枠のうちの一方を廃して他方のみを検出候補枠として残すことを「統合する」と称する。また、検出候補枠統廃合部２３が実行する「統合処理」は、実際に検出候補枠を統合する場合の処理の他、検出候補枠を統合しないと決定して、両方の検出候補枠をそのまま残す場合の処理を含む。 The detection candidate frame consolidation / abolition unit 23 performs integration processing of a plurality of detection candidate frames inferred by the inference unit 22. In this embodiment, eliminating one of the pair of overlapping detection candidate frames and leaving only the other as the detection candidate frame is referred to as "integration". In addition, the "integration process" executed by the detection candidate frame consolidation unit 23 determines not to integrate the detection candidate frames in addition to the process when the detection candidate frames are actually integrated, and leaves both detection candidate frames as they are. Including case processing.

検出候補枠統廃合部２３は、全体重なり選択部２３１、前方候補枠選択部２３２、後方候補枠重なり選択部２３３、後方領域計算部２３４、後方視認領域計算部２３５、物体存在確率計算部２３６、閾値処理部２３７及び枠統合決定部２３８を有する。 The detection candidate frame consolidation / abolition unit 23 includes a total overlap selection unit 231, a front candidate frame selection unit 232, a rear candidate frame overlap selection unit 233, a rear area calculation unit 234, a rear visibility area calculation unit 235, an object existence probability calculation unit 236, and a threshold value. It has a processing unit 237 and a frame integration determination unit 238.

全体重なり選択部２３１は、推論部によって推論された全体の検出候補枠から重なっている検出候補枠を選択する。 The total weight selection unit 231 selects an overlapping detection candidate frame from the entire detection candidate frame inferred by the inference unit.

前方候補枠選択部２３２は、重なっている複数の検出候補枠のうち画像内で相対的に手前側に位置する物体に対応する検出候補枠を「前方検出候補枠」として選択する。 The front candidate frame selection unit 232 selects the detection candidate frame corresponding to the object located relatively in the foreground side in the image among the plurality of overlapping detection candidate frames as the “front detection candidate frame”.

後方候補枠重なり選択部２３３は、全体重なり選択部２３１により選択された検出候補枠（すなわち、重なっている検出候補枠）のうち、前方候補枠選択部２３２により選択された前方検出候補枠以外の検出候補枠を「後方検出候補枠」として選択する。これにより、画像内で相対的に奥側に位置する物体に対応する検出候補枠が後方検出候補枠として選択される。 The rear candidate frame overlap selection unit 233 is a detection candidate frame selected by the total overlap selection unit 231 (that is, overlapping detection candidate frames) other than the front detection candidate frame selected by the front candidate frame selection unit 232. Select the detection candidate frame as the "backward detection candidate frame". As a result, the detection candidate frame corresponding to the object located relatively in the back side in the image is selected as the rear detection candidate frame.

後方領域計算部２３４は、後方検出候補枠の中の前方検出候補枠と重なっていない領域を「後方領域」として算出する。 The rear area calculation unit 234 calculates the area in the rear detection candidate frame that does not overlap with the front detection candidate frame as the “rear area”.

後方視認領域計算部２３５は、後方検出候補枠内における物体の視認領域を「後方視認領域」として算出する。 The rear viewing area calculation unit 235 calculates the viewing area of the object in the rear detection candidate frame as the “rear viewing area”.

物体存在確率計算部２３６は、後方領域計算部２３４により算出された後方領域と後方視認領域計算部２３５により算出された後方視認領域とに基づいて、後方検出候補枠内における物体の存在確率を算出する。 The object existence probability calculation unit 236 calculates the existence probability of the object in the rear detection candidate frame based on the rear area calculated by the rear area calculation unit 234 and the rear visual area calculated by the rear visual area calculation unit 235. To do.

閾値処理部２３７は、物体存在確率計算部２３６により算出された物体の存在確率と閾値とを比較する処理を行う。 The threshold value processing unit 237 performs a process of comparing the existence probability of the object calculated by the object existence probability calculation unit 236 with the threshold value.

枠統合決定部２３８は、閾値処理部２３７の処理結果に基づいて、重なっている検出候補枠を統合するか否かを決定する。 The frame integration determination unit 238 determines whether or not to integrate the overlapping detection candidate frames based on the processing result of the threshold value processing unit 237.

出力部２４は、検出候補枠統廃合部２３による統合処理後に存在する検出候補枠についての検出候補情報（検出候補枠、視認領域及び物体の識別クラス）を物体推定の結果として出力する。例えば、出力部２４は、画像取得部２１により取得された画像に検出候補枠及び視認領域枠の各々を示す矩形パターンの枠を表示させた画像を出力する。また、出力部２４は、例えば検出候補枠の色に差異を設けることにより、物体の識別クラスの情報を画像に重畳して出力する。 The output unit 24 outputs the detection candidate information (detection candidate frame, visual recognition area, and object identification class) for the detection candidate frame existing after the integration process by the detection candidate frame consolidation / abolition unit 23 as the result of object estimation. For example, the output unit 24 outputs an image obtained by displaying a rectangular pattern frame indicating each of the detection candidate frame and the visual recognition area frame on the image acquired by the image acquisition unit 21. Further, the output unit 24 outputs the information of the object identification class by superimposing it on the image, for example, by providing a difference in the color of the detection candidate frame.

次に、本実施例の物体検出装置１００が実行する物体検出の動作について説明する。なお、ここでは検出対象の物体が人物であり、画像中で２人の人物が重なって表示されている場合を例として説明を行う。 Next, the operation of object detection executed by the object detection device 100 of this embodiment will be described. Here, a case where the object to be detected is a person and two people are displayed overlapping in the image will be described as an example.

物体推定部２０の推論部２２は、画像取得部２１から画像を取得し、学習モデル記憶部１３に記憶されている学習モデルに基づいて推論を行う。推論部２２は、推論により得られた多数の検出候補枠、視認領域枠、及び物体の識別クラスの情報を検出候補枠統廃合部２３に出力する。 The inference unit 22 of the object estimation unit 20 acquires an image from the image acquisition unit 21 and makes inferences based on the learning model stored in the learning model storage unit 13. The inference unit 22 outputs a large number of detection candidate frames, a visible area frame, and information of the object identification class obtained by inference to the detection candidate frame consolidation / abolition unit 23.

検出候補枠統廃合部２３の全体重なり選択部２３１は、推論部２２から多数の検出候補枠の入力を受け、その中から枠同士の重なりが発生している複数の検出候補枠を選択する。 The overall overlap selection unit 231 of the detection candidate frame consolidation / abolition unit 23 receives input of a large number of detection candidate frames from the inference unit 22, and selects a plurality of detection candidate frames in which the frames overlap each other.

前方候補枠選択部２３２は、全体重なり選択部２３１により選択された検出候補枠の各々について、検出候補枠に占める視認領域枠の面積の割合（すなわち、視認領域枠の面積／検出候補枠の面積）を視認率として算出する。そして、前方候補枠選択部２３２は、視認率が最も高い検出候補枠を前方検出候補枠として選択する。 The front candidate frame selection unit 232 is the ratio of the area of the visible area frame to the detection candidate frame (that is, the area of the visible area frame / the area of the detection candidate frame) for each of the detection candidate frames selected by the overall overlap selection unit 231. ) Is calculated as the visibility rate. Then, the front candidate frame selection unit 232 selects the detection candidate frame having the highest visibility rate as the front detection candidate frame.

後方重なり選択部２３３は、全体重なり選択部２３１により選択された複数の検出候補枠のうち、前方検出候補枠を除外したものを後方検出候補枠として選択する。 The rear overlap selection unit 233 selects a plurality of detection candidate frames selected by the overall overlap selection unit 231 excluding the front detection candidate frame as the rear detection candidate frame.

図３は、前方候補枠選択部２３２によって選択される前方検出候補枠ＦＦ、及び後方重なり選択部２３３によって選択される後方検出候補枠ＢＦの例を示す図である。なお、ここでは画像中の物体が存在する領域のみを抽出して示している。 FIG. 3 is a diagram showing an example of a front detection candidate frame FF selected by the front candidate frame selection unit 232 and a rear detection candidate frame BF selected by the rear overlap selection unit 233. Here, only the region where the object in the image exists is extracted and shown.

画像内で人物Ｐ１が人物Ｐ２よりも前方、すなわち手前側に位置しているため、人物Ｐ１の方が人物Ｐ２よりも視認率が高い。このため、人物Ｐ１の検出候補枠が前方検出候補枠ＦＦ、人物Ｐ２の検出候補枠が後方検出候補枠ＢＦとして選択される。 Since the person P1 is located in front of the person P2 in the image, that is, on the front side, the person P1 has a higher visibility rate than the person P2. Therefore, the detection candidate frame of the person P1 is selected as the front detection candidate frame FF, and the detection candidate frame of the person P2 is selected as the rear detection candidate frame BF.

次に、後方領域計算部２３４は、前方検出候補枠ＦＦ及び後方検出候補枠ＢＦに基づいて、後方の物体の枠領域である後方領域を算出する。例えば、後方検出候補枠ＢＦの領域を第１領域Ａ、前方検出候補枠ＦＦの領域を第２領域Ｂとすると、後方領域は次の数式（１）のように表される。 Next, the rear region calculation unit 234 calculates the rear region, which is the frame region of the rear object, based on the front detection candidate frame FF and the rear detection candidate frame BF. For example, assuming that the region of the rear detection candidate frame BF is the first region A and the region of the front detection candidate frame FF is the second region B, the rear region is expressed by the following mathematical formula (1).

図４は、後方領域計算部２３４によって算出される後方領域ＲＡの例を示す図である。斜線で示すように、後方検出候補枠ＢＦのうち前方検出候補枠ＦＦと重なっていない部分の領域が後方領域ＲＡとして算出される。 FIG. 4 is a diagram showing an example of the rear region RA calculated by the rear region calculation unit 234. As shown by diagonal lines, the region of the rear detection candidate frame BF that does not overlap with the front detection candidate frame FF is calculated as the rear region RA.

一方、後方視認領域計算部２３５は、後方検出候補枠ＢＦの視認領域に基づいて、後方の物体の視認領域を後方視認領域として算出する。例えば、後方検出候補枠ＢＦの視認領域をＶ_Aとすると、後方視認領域は次の数式（２）のように表される。 On the other hand, the rear viewing area calculation unit 235 calculates the viewing area of the rear object as the rear viewing area based on the viewing area of the rear detection candidate frame BF. For example, assuming that the visible area of the rear detection candidate frame BF is _VA , the rear visible area is expressed by the following mathematical formula (2).

図５は、後方視認領域計算部２３５によって算出される後方視認領域ＢＶＡの例を示す図である。斜線で示すように、後方の人物Ｐ２の視認領域が後方視認領域ＢＶＡとして算出される。 FIG. 5 is a diagram showing an example of a rear view area BVA calculated by the rear view area calculation unit 235. As shown by the diagonal lines, the visible area of the person P2 behind is calculated as the rear visible area BVA.

物体存在確率計算部２３６は、後方領域計算部２３４により算出された後方領域ＲＡと、後方視認領域計算部２３５により算出された後方視認領域ＢＶＡとに基づいて、後方領域ＲＡに占める後方視認領域ＢＶＡの割合を、後方の物体についての物体存在確率として算出する。後方の物体の物体存在確率は、次の数式（３）のように表される。 The object existence probability calculation unit 236 is based on the rear view area RA calculated by the rear area calculation unit 234 and the rear view area BVA calculated by the rear view area calculation unit 235, and the rear view area BVA occupying the rear area RA. Is calculated as the object existence probability for the object behind. The object existence probability of the object behind is expressed by the following mathematical formula (3).

例えば、後方視認領域ＢＶＡの割合が大きい場合、後方に物体が存在する蓋然性が高いため、物体存在確率が高くなる。これに対し、後方視認領域ＢＶＡの割合が小さい場合、後方に物体が存在する蓋然性が低いため、物体存在確率が低くなる。 For example, when the ratio of the rear view region BVA is large, the probability that an object exists behind is high, so that the probability of existence of an object is high. On the other hand, when the ratio of the rear view region BVA is small, the probability that an object exists behind is low, so that the probability of existence of an object is low.

閾値処理部２３７は、物体存在確率計算部により算出された物体存在確率を所定の閾値と比較し、後方の物体の存在確率が閾値以上なのか閾値未満なのかを判定する。 The threshold value processing unit 237 compares the object existence probability calculated by the object existence probability calculation unit with a predetermined threshold value, and determines whether the existence probability of the object behind is equal to or less than the threshold value.

枠統合決定部２３８は、閾値処理部２３７の判定結果に基づいて、検出後方枠の統合を行うか否かを決定する。例えば、後方の物体についての物体存在確率が閾値以上であると判定された場合、後方に物体が存在すると判断して、検出候補枠の統合を行わないと決定する。図３〜図５に示したように、前方の人物Ｐ１と重なる位置に後方の人物Ｐ２が存在している場合、検出候補枠の統合は行われず、前方検出候補枠ＦＦが人物Ｐ１の検出候補枠として、後方検出候補枠ＢＦが人物Ｐ２の検出候補枠として、それぞれ出力される。 The frame integration determination unit 238 determines whether or not to integrate the detection rear frames based on the determination result of the threshold value processing unit 237. For example, when it is determined that the object existence probability for the object behind is equal to or greater than the threshold value, it is determined that the object exists behind, and it is determined not to integrate the detection candidate frames. As shown in FIGS. 3 to 5, when the rear person P2 exists at a position overlapping the front person P1, the detection candidate frames are not integrated, and the front detection candidate frame FF is the detection candidate of the person P1. As a frame, the rear detection candidate frame BF is output as a detection candidate frame for the person P2, respectively.

一方、後方の物体についての物体存在確率が閾値未満であると判定された場合、枠統合決定部２３８は、後方に物体が存在しないと判断して、検出候補枠の統合を行うと決定する。例えば、物体同士の重なりが存在しないにもかかわらず、後方の物体が存在するかのように検出候補枠が誤検出された場合に、検出候補枠の統合が行われる。 On the other hand, when it is determined that the object existence probability for the rear object is less than the threshold value, the frame integration determination unit 238 determines that the object does not exist behind and decides to integrate the detection candidate frames. For example, when the detection candidate frame is erroneously detected as if the rear object exists even though there is no overlap between the objects, the detection candidate frames are integrated.

図６は、このような検出候補枠の誤検出が生じた場合における、検出候補枠の統合の流れを模式的に示す図である。ここでは、図中に黒塗りで示す領域が後方の物体の視認領域ＢＶＡとして誤検出され、実線で示すような検出候補枠が後方検出候補枠ＢＦとして誤検出された場合を示している。 FIG. 6 is a diagram schematically showing a flow of integration of detection candidate frames when such erroneous detection of detection candidate frames occurs. Here, the black-painted area in the figure is erroneously detected as the visible area BVA of the rear object, and the detection candidate frame as shown by the solid line is erroneously detected as the rear detection candidate frame BF.

前方候補枠選択部２３２は、重なっている検出候補枠のうち視認率が最も高い検出候補枠を前方検出候補枠として選択する。ここでは、前方検出候補枠ＦＦの視認率が高いため、人物Ｐ１が前方に位置する人物として選択される。これにより図中に破線で示すような検出候補枠が前方検出候補枠ＦＦとして選択される。一方、後方候補枠重なり選択部２３３は、実線で示す検出候補枠を後方検出候補枠ＢＦとして選択する（ＳＴＥＰ１）。 The front candidate frame selection unit 232 selects the detection candidate frame having the highest visibility rate among the overlapping detection candidate frames as the front detection candidate frame. Here, since the visibility rate of the front detection candidate frame FF is high, the person P1 is selected as the person located in front. As a result, the detection candidate frame as shown by the broken line in the figure is selected as the forward detection candidate frame FF. On the other hand, the rear candidate frame overlap selection unit 233 selects the detection candidate frame shown by the solid line as the rear detection candidate frame BF (STEP 1).

後方領域計算部２３４は、後方検出候補枠の中の前方検出候補枠と重なっていない領域、すなわち図中に斜線で示すような領域を後方領域ＲＡとして算出する。また、後方視認領域計算部２３５は、図中に黒塗りで示す領域を後方視認領域ＢＶＡとして算出する（ＳＴＥＰ２）。 The rear region calculation unit 234 calculates the region in the rear detection candidate frame that does not overlap with the front detection candidate frame, that is, the region as shown by the diagonal line in the figure as the rear region RA. Further, the rear view area calculation unit 235 calculates the area shown in black in the figure as the rear view area BVA (STEP 2).

物体存在確率計算部２３６は、後方領域ＲＡに占める後方視認領域ＢＶＡの割合を、後方の物体についての物体存在確率として算出する。閾値処理部２３７は、算出された物体存在確率と閾値との比較を行う。例えば、閾値の値が０．６だとすると、図６に示す例において後方視認領域ＢＶＡの面積が後方領域ＲＡの面積に占める割合が０．６に満たないため、物体存在確率は閾値未満であると判定される（ＳＴＥＰ３）。 The object existence probability calculation unit 236 calculates the ratio of the rear visual region BVA to the rear region RA as the object existence probability for the rear object. The threshold value processing unit 237 compares the calculated object existence probability with the threshold value. For example, if the threshold value is 0.6, the object existence probability is less than the threshold because the ratio of the area of the rear visual region BVA to the area of the rear region RA is less than 0.6 in the example shown in FIG. It is determined (STEP3).

枠統合決定部２３８は、閾値処理部２３７の処理結果に基づいて、検出候補枠の統合を行うか否かを決定する。ここでは、物体存在確率が閾値未満であるため、検出枠の統合が行われる。具体的には、後方検出候補枠ＢＦＦが廃され、図６に破線で示す前方検出候補枠ＦＦのみが検出候補枠として出力される（ＳＴＥＰ４）。 The frame integration determination unit 238 determines whether or not to integrate the detection candidate frames based on the processing result of the threshold value processing unit 237. Here, since the object existence probability is less than the threshold value, the detection frames are integrated. Specifically, the rear detection candidate frame BFF is abolished, and only the front detection candidate frame FF shown by the broken line in FIG. 6 is output as the detection candidate frame (STEP 4).

以上のように、本実施例の物体検出装置１００では、視認率に基づいて後方の物体について物体存在確率を算出し、検出候補枠を統合するか否かを決定する。物体存在確率が高い場合には、画像内で物体同士の重なりが生じている可能性が高いため、検出候補枠の統合を行わない。一方、物体存在確率が低い場合には、後方の物体が存在せず、画像内で物体同士の重なりが生じていない可能性が高いため、後方検出候補枠を廃して前方検出候補枠のみを検出候補枠とする検出候補枠の統合を行う。 As described above, the object detection device 100 of the present embodiment calculates the object existence probability for the rear object based on the visibility rate, and determines whether or not to integrate the detection candidate frames. When the object existence probability is high, there is a high possibility that the objects overlap each other in the image, so the detection candidate frames are not integrated. On the other hand, when the object existence probability is low, there is no rear object and there is a high possibility that the objects do not overlap each other in the image. Therefore, the rear detection candidate frame is abolished and only the front detection candidate frame is detected. The detection candidate frames to be used as candidate frames are integrated.

したがって、本実施例の物体検出装置１００によれば、マルチスケールに対応した物体検出において検出候補枠が多数発生した場合にも、物体存在確率に応じて検出枠の統合を適宜行うことにより、物体同士の重なり発生時の物体検出の精度劣化を防ぐことができる。また、重なりが発生しない場合の精度も担保することができる。すなわち、本実施例の物体検出装置１００によれば、物体同士の重なりの有無にかかわらず、精度の高い物体検出を行うことが可能となる。 Therefore, according to the object detection device 100 of the present embodiment, even when a large number of detection candidate frames are generated in the object detection corresponding to the multi-scale, the detection frames are appropriately integrated according to the object existence probability to obtain the object. It is possible to prevent deterioration in the accuracy of object detection when overlapping occurs. In addition, the accuracy when no overlap occurs can be ensured. That is, according to the object detection device 100 of this embodiment, it is possible to perform highly accurate object detection regardless of whether or not the objects overlap each other.

なお、本発明の実施形態は、上記実施例に記載したものに限られない。例えば、上記実施例では、閾値処理部２３７が所定の閾値を用いて処理を行う例について説明したが、閾値の値は動的に変更可能である。 The embodiments of the present invention are not limited to those described in the above examples. For example, in the above embodiment, an example in which the threshold value processing unit 237 performs processing using a predetermined threshold value has been described, but the value of the threshold value can be dynamically changed.

例えば、図７に示すように、人物Ｐ１及び人物Ｐ２の両方を遮蔽するように遮蔽物ＳＬが存在している場合、算出される物体存在確率が小さいため、閾値を高い値（例えば、０．８等）に設定していると、物体存在確率が閾値未満であると判定される。その結果、検出候補枠の統合が行われ、後方の物体として人物Ｐ２が存在しているにもかかわらず、後方検出候補枠ＢＦが出力されなくなってしまう。そこで、例えば前方検出候補枠ＦＦの視認率に応じて閾値の値を変更し、視認率が低い場合には閾値を小さくすることにより、検出候補枠の統合が誤ってなされてしまうことを抑制することができる。 For example, as shown in FIG. 7, when the shield SL exists so as to shield both the person P1 and the person P2, the calculated object existence probability is small, so that the threshold value is set to a high value (for example, 0. When it is set to 8 etc.), it is determined that the object existence probability is less than the threshold value. As a result, the detection candidate frames are integrated, and the rear detection candidate frame BF is not output even though the person P2 exists as a rear object. Therefore, for example, by changing the threshold value according to the visibility rate of the front detection candidate frame FF and reducing the threshold value when the visibility rate is low, it is possible to prevent the detection candidate frames from being erroneously integrated. be able to.

また、上記実施例では、検出対象を人物として、画像中で２人の人物が重なって表示されている場合について説明した。しかし、検出対象は人物に限られず、犬や猫等の人間以外の生物であってもよく、車や自転車等の無生物であってもよい。また、重なっている物体の数が３つ以上の場合にも応用することが可能である。 Further, in the above embodiment, a case where two people are displayed overlapping in the image has been described with the detection target as a person. However, the detection target is not limited to a person, and may be a non-human organism such as a dog or a cat, or an inanimate object such as a car or a bicycle. It can also be applied when the number of overlapping objects is three or more.

また、上記実施例では、推論部２２が学習モデルに基づいて視認領域枠の推論を行う例について説明した。しかし、画像中の画素毎に識別クラスを検出するセグメンテーションの手法を用いて、画素単位の視認領域を求めてもよい。 Further, in the above embodiment, an example in which the reasoning unit 22 infers the visual recognition area frame based on the learning model has been described. However, the visual recognition area for each pixel may be obtained by using a segmentation method that detects the identification class for each pixel in the image.

また、上記実施例で説明した一連の処理は、例えばＲＯＭなどの記録媒体に格納されたプログラムに従ったコンピュータ処理により行うことができる。 Further, the series of processes described in the above embodiment can be performed by computer processing according to a program stored in a recording medium such as a ROM.

１００物体検出装置
１０物体学習部
１１データ保管部
１２学習部
１３学習モデル記憶部
２０物体推定部
２１画像取得部
２２推論部
２４出力部
２３検出候補枠統廃合部
２３１全体重なり選択部
２３２前方候補枠選択部
２３３後方候補枠重なり選択部
２３４後方領域計算部
２３５後方視認領域計算部
２３６物体存在確率計算部
２３７閾値処理部
２３８枠統合決定部 100 Object detection device 10 Object learning unit 11 Data storage unit 12 Learning unit 13 Learning model storage unit 20 Object estimation unit 21 Image acquisition unit 22 Reasoning unit 24 Output unit 23 Detection candidate frame consolidation / abolition unit 231 Total weight selection unit 232 Forward candidate frame selection Unit 233 Rear candidate frame overlap selection unit 234 Rear area calculation unit 235 Rear visual area calculation unit 236 Object existence probability calculation unit 237 Threshold processing unit 238 Frame integration determination unit

Claims

The image acquisition unit that acquires images and
The detection candidate frame surrounding the area occupied by the object 1 in the image, the visible area including the area in which the object 1 can be visually recognized in the detection candidate frame, and the detection candidate information including the identification class of the object 1. An inference unit that infers and outputs each object based on the learning model,
When a plurality of detection candidate frames are output and there are detection candidate frames that overlap each other, a detection candidate frame consolidation unit that performs an integration process for integrating one of the overlapping detection candidate frames into the other, and a detection candidate frame consolidation unit.
An output unit that outputs the detection candidate information for each of the detection candidate frames existing after the integration process, and
Have,
In the integrated process, the detection candidate frame consolidation unit
One of the detection candidate frames that overlap each other is specified as a front detection candidate frame that is a detection candidate frame corresponding to an object located relatively in front of the image.
The other of the detection candidate frames that overlap each other is specified as a rear detection candidate frame that is a detection candidate frame corresponding to an object located relatively far in the image.
Based on the size of the visible area in the rear detection candidate frame , the existence probability of the object in the rear detection candidate frame is calculated.
An object detection device for determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame based on the calculated existence probability.

The first aspect of the present invention is that the detection candidate frame consolidation unit identifies the front detection candidate frame and the rear detection candidate frame based on the area of the visible area for each of the overlapping detection candidate frames. The object detection device described.

The detection candidate frame consolidation unit compares the existence probability calculated for the rear detection candidate frame with the threshold value, and based on the comparison result, whether or not to integrate the rear detection candidate frame into the front detection candidate frame. The object detection device according to claim 1 or 2, wherein the object detection device is characterized in that.

When the existence probability calculated for the rear detection candidate frame is lower than the threshold value, the detection candidate frame consolidation unit integrates the rear detection candidate frame with the front detection candidate frame.
The object detection according to claim 3, wherein when the existence probability calculated for the rear detection candidate frame is higher than the threshold value, the rear detection candidate frame and the front detection candidate frame are not integrated. apparatus.

An object detection method executed by a detection device that detects an object in an image based on a learning model.
Steps to get the image and
The detection candidate frame surrounding the area occupied by the object 1 in the image, the visible area including the area in which the object 1 can be visually recognized in the detection candidate frame, and the detection candidate information including the identification class of the object 1. Steps to infer and output each object based on the learning model,
When a plurality of detection candidate frames are output and there are detection candidate frames that overlap each other, a step of performing an integration process of integrating one of the detection candidate frames that overlap each other into the other.
A step of outputting the detection candidate information for each of the detection candidate frames existing after the integration process, and
Have,
The step of performing the integration process is
A step of specifying one of the detection candidate frames that overlap each other as a front detection candidate frame that is a detection candidate frame corresponding to an object located relatively in front of the image.
A step of specifying the other of the detection candidate frames that overlap each other as a rear detection candidate frame that is a detection candidate frame corresponding to an object located relatively in the back side of the image.
A step of calculating the existence probability of the object in the rear detection candidate frame based on the size of the visible area in the rear detection candidate frame, and a step of calculating the existence probability of the object in the rear detection candidate frame.
A step of determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame based on the calculated existence probability, and
An object detection method comprising.

In the step of specifying the front detection candidate frame and the step of specifying the rear detection candidate frame, the front detection candidate frame and the rear detection candidate frame are based on the area of the visible area for each of the overlapping detection candidate frames. The object detection method according to claim 5, wherein the frame is specified.

The step of performing the integration process includes a step of comparing the existence probability calculated for the backward detection candidate frame with the threshold value.
In the step of determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame, the rear detection candidate frame is used as the front detection candidate frame based on the comparison result between the existence probability and the threshold value. The object detection method according to claim 5 or 6, wherein it is determined whether or not to integrate.

In the step of determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame, if the existence probability calculated for the rear detection candidate frame is lower than the threshold value, the rear detection candidate frame It was decided to integrate the frame into the forward detection candidate frame,
The seventh aspect of claim 7, wherein when the existence probability calculated for the rear detection candidate frame is higher than the threshold value, it is determined that the rear detection candidate frame and the front detection candidate frame are not integrated. Object detection method.

On the computer
Steps to get the image and
The detection candidate frame surrounding the area occupied by the object 1 in the image, the visible area including the area in which the object 1 can be visually recognized in the detection candidate frame, and the detection candidate information including the identification class of the object 1. Steps to infer and output each object based on the learning model,
When a plurality of detection candidate frames are output and there are detection candidate frames that overlap each other, the detection candidate frame corresponding to an object in which one of the detection candidate frames that overlap each other is located relatively in front of the image. The step to identify as a forward detection candidate frame, which is
A step of specifying the other of the detection candidate frames that overlap each other as a rear detection candidate frame that is a detection candidate frame corresponding to an object located relatively in the back side of the image.
A step of calculating the existence probability of the object in the rear detection candidate frame based on the size of the visible area in the rear detection candidate frame, and a step of calculating the existence probability of the object in the rear detection candidate frame.
A step of determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame based on the calculated existence probability, and
A step of outputting the detection candidate information for each of the detection candidate frames existing after the integration process performed according to the determination, and a step of outputting the detection candidate information.
A program characterized by executing.

In the step of specifying the front detection candidate frame and the step of specifying the rear detection candidate frame, the front detection candidate frame and the rear detection candidate frame are based on the area of the visible area for each of the overlapping detection candidate frames. The program according to claim 9, wherein the frame is specified.

Including a step of comparing the existence probability calculated in the step of calculating the existence probability of the object in the rear detection candidate frame with the threshold value.
In the step of determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame, the rear detection candidate frame is used as the front detection candidate frame based on the comparison result between the existence probability and the threshold value. The program according to claim 9 or 10, wherein it determines whether or not to integrate.

In the step of determining whether or not to integrate the front detection candidate frame and the rear detection candidate frame, if the existence probability calculated for the rear detection candidate frame is lower than the threshold value, the rear detection candidate frame It was decided to integrate the frame into the forward detection candidate frame,
The eleventh aspect of claim 11, wherein when the existence probability calculated for the rear detection candidate frame is higher than the threshold value, it is determined that the rear detection candidate frame and the front detection candidate frame are not integrated. Program.