JP2022175103A

JP2022175103A - Object detection device

Info

Publication number: JP2022175103A
Application number: JP2021081250A
Authority: JP
Inventors: 隆瀧本; Takashi Takimoto
Original assignee: Chubu Electric Power Grid Co Inc; Chubu Electric Power Co Inc
Current assignee: Chubu Electric Power Grid Co Inc; Chubu Electric Power Co Inc
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2022-11-25

Abstract

To provide a technique capable of improving detection accuracy of an object included in a photographed image.SOLUTION: Subtraction image generation means 31, constituting first detection means 30, sequentially generates subtraction image information showing a difference of a photographed image captured by imaging means 50. Moving object detection means 32 detects a moving speed and a movement locus of a moving object based on a subtraction image. When the detected moving speed and movement locus of the moving object satisfy a setting condition set correspondingly to the object being the detection target, first object detection means 33 detects the moving object as an object being a detection target. Image dividing means 42, constituting second detection means 40, divides a photographed image captured by the imaging means 50 into a plurality of divided images. Second object detection means 41 detects an object included in each divided image using deep learning. Detection result composition means 43 combines an object detection result for each divided image, and outputs it as an object detection result for the photographed image.SELECTED DRAWING: Figure 1

Description

本発明は、画像情報に基づいて物体を検出する物体検出装置に関する。 The present invention relates to an object detection device that detects an object based on image information.

物体を検出する技術として、ディープラーニング（多層ニューラルネットワークによる機械学習手法）を用いて物体を検出する技術（ＡＩを利用した物体検出技術）が研究されている。例えば、ＣＣＤカメラ等の撮像手段で撮像した撮像画像を示す画像情報に基づいて物体の位置とカテゴリを同時に検出するＳＳＤ(Single Shot MultiBox Detector)やＹＯＬＯ(You Only Look Once)といったエンド・ツー・エンド(end-to-end)の手法が多数提案されている。これらの手法は、物体の位置検出のための多層ニューラルネットワークによる学習と、物体のカテゴリ判別のための多層ニューラルネットワークによる学習を同時に行うマルチタスク学習を基本としている。
ＳＳＤによる物体検出技術は、例えば、非特許文献１に開示され、ＹＯＬＯによる物体検出技術は、例えば、非特許文献２に開示されている。 As a technique for detecting an object, a technique for detecting an object using deep learning (a machine learning method using a multilayer neural network) (an object detection technique using AI) is being researched. For example, end-to-end systems such as SSD (Single Shot MultiBox Detector) and YOLO (You Only Look Once) that simultaneously detect the position and category of an object based on image information representing an image captured by an imaging means such as a CCD camera. Many (end-to-end) methods have been proposed. These methods are based on multi-task learning in which learning by a multi-layer neural network for object position detection and learning by a multi-layer neural network for classifying object categories are performed simultaneously.
An object detection technique by SSD is disclosed in Non-Patent Document 1, for example, and an object detection technique by YOLO is disclosed in Non-Patent Document 2, for example.

近年、撮像手段の性能が向上し、撮像画像を示す画像情報の画素数（解像度）が増大する傾向にある。例えば、現在、多くの監視カメラは、２Ｋ（２００万画素）以下の対応のものが用いられているが、今後、４Ｋ（８００万画素）や８Ｋ（３３００万画素）対応のものが普及することが考えられる。画素数が多い（解像度が高い）撮像手段を用いることができれば、高精細な画像情報を得ることができ、物体の検出精度が向上する。
一方、現行の物体検出装置は、２Ｋ未満の解像度(例えば、数百×数百画素)の画像情報を処理するように構成されている。このため、現行の物体検出装置により、４Ｋや８Ｋ対応やそれ以上の高解像度の画像情報を処理すると、検出取りこぼしが多くなり、物体の検出精度が低下するおそれがある。現行の物体検出装置を、高解像度の画像情報を処理可能に構成するには、多大の労力と費用を要する。
そこで、本発明者は、撮像画像を複数の分割画像に分割し、各分割画像を示す分割画像情報を処理することにより、現行の物体検出装置を用いながら物体の検出精度を向上させることができる技術を開発し、出願した。 In recent years, the performance of imaging means has improved, and the number of pixels (resolution) of image information representing a captured image tends to increase. For example, currently, many surveillance cameras are compatible with 2K (2 million pixels) or less, but in the future, cameras compatible with 4K (8 million pixels) and 8K (33 million pixels) will spread. can be considered. If an imaging means with a large number of pixels (high resolution) can be used, high-definition image information can be obtained, and object detection accuracy can be improved.
On the other hand, current object detection devices are configured to process image information with a resolution of less than 2K (eg, hundreds by hundreds of pixels). For this reason, if current object detection devices process 4K, 8K, or higher resolution image information, many detection errors may occur, and object detection accuracy may decrease. It takes a lot of labor and cost to configure current object detection devices to be able to process high-resolution image information.
Therefore, the present inventor divides a captured image into a plurality of divided images and processes divided image information indicating each divided image, thereby improving the object detection accuracy while using the current object detection apparatus. Developed and applied for technology.

“SSD: Single Shot MulitiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf“SSD: Single Shot MultiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf “You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf“You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf

ここで、ディープラーニングを用いて、遠方に配置されている撮像手段により、広い監視領域を撮像した撮像画像に基づいて、監視領域内に存在する物体を検出する場合には、撮像画像に含まれる物体の画像が非常に小さくなることがある。例えば、ダムの下流の河川敷等の監視領域に人が存在するか否かを検出する場合には、監視領域を撮像した撮像画像では、監視領域に存在する人の画像は非常に小さい。
このように、撮像画像に含まれている物体の画像が小さい場合には、前述した、撮像画像を分割し、各分割画像を示す分割画像情報を処理する技術を用いても、物体を検出することができない。
本発明者は、撮像画像に含まれている物体の画像が小さい場合における物体の検出技術について種々検討した。その結果、物体の画像が小さい場合でも、物体の画像の大きさと移動距離（移動速度）に着目することにより、物体を検出することができることを見出した。
本発明は、このような点に鑑みて創案されたものであり、撮像画像に含まれている物体の検出精度を向上させることができる技術を提供することを目的とする。 Here, using deep learning, based on a captured image of a wide monitoring area captured by an imaging means arranged at a distance, when detecting an object existing in the monitored area, it is included in the captured image The image of the object may be very small. For example, when detecting whether or not a person exists in a monitoring area such as a riverbed downstream of a dam, the image of the person present in the monitoring area is very small in the captured image of the monitoring area.
In this way, when the image of the object included in the captured image is small, the object can also be detected using the above-described technique of dividing the captured image and processing the divided image information indicating each divided image. I can't.
The inventor of the present invention has studied various techniques for detecting an object when the image of the object included in the captured image is small. As a result, it was found that even when the image of the object is small, the object can be detected by focusing on the size of the image of the object and the moving distance (moving speed).
SUMMARY OF THE INVENTION The present invention has been devised in view of such a point, and an object of the present invention is to provide a technique capable of improving the detection accuracy of an object included in a captured image.

本発明の物体検出装置は、撮像手段と第１の検出手段を備えている。
撮像手段は、監視領域を撮像した撮像画像を示す画像情報を順次出力する。撮像手段としては、ＣＣＤカメラ等の公知の撮像手段を用いることができる。
第１の検出手段は、撮像手段から出力された画像情報に基づいて、監視領域に存在する移動体（例えば、検出対象である人等）を検出する。
第１の検出手段は、差分画像作成手段、移動体検出手段および第１の物体検出手段を有している。
差分画像作成手段は、撮像手段から異なる時間に出力された２つの画像情報で示される撮像画像の差分画像を示す差分画像情報を順次作成する。差分画像は、２つの撮像画像から、静止している背景画像を取り除いた、移動体の画像を示す。差分画像情報を作成する方法としては、公知の種々の方法を用いることができる。２つの画像情報の間隔としては、例えば、撮像手段から画像情報が出力される間隔の整数倍（「１」を含む）に設定される。
移動体検出手段は、差分画像情報に基づいて、移動体の移動速度と移動軌跡を検出する。移動体の移動速度と移動軌跡は、公知の種々の方法を用いて検出することができる。
第１の物体検出手段は、移動体検出手段で検出した移動体の移動速度と移動軌跡が、検出対象である物体に対応して設定されている設定条件を満足する場合に、当該移動体を検出対象の物体として検出する。
設定条件としては、検出対象である物体に特有の条件が用いられる。例えば、検出対象である物体が人である場合には、「第１の設定期間内における平均移動速度が、下限値（例えば、ゆっくり歩く速度）と上限量（例えば、早く走る速度）の範囲内である」という条件と、「第２の設定期間内における移動軌跡が、連続した所定形状の軌跡を形成している」という条件が設定される。第１の設定期間および第２の設定期間としては、検出対象である人に特有の移動を検出することができる期間が設定される。
本発明では、撮像画像に含まれている物体の大きさが小さい場合でも、物体を検出することができる。これにより、撮像画像に含まれている物体の検出精度を向上させることができる
本発明の異なる形態は、第２の検出手段を備えている。
第２の検出手段は、ディープラーニングを用いて、撮像手段から出力された画像情報で示される撮像画像に含まれている、検出対象である物体（例えば、人等）を検出する。
第２の検出手段は、第２の物体検出手段、画像分割手段および検出結果合成手段を有している。
画像分割手段は、撮像手段から出力された画像情報で示される撮像画像を複数の分割画像に分割する。撮像画像を複数の分割画像に分割する方法（分割画像の数、分割画像の大きさ、分割回数等）としては、適宜の方法を用いることができる。例えば、縦方向および横方向に等間隔に分割する方法、あるいは、縦方向および横方向に等間隔に同じ分割数で分割する方法が用いられる。
第２の物体検出手段は、ディープラーニングを用いて、各分割画像情報に基づいて、各分割画像に含まれている物体を検出する。第２の物体検出手段としては、画像情報に基づいて物体の位置とカテゴリを同時に検出する、ＳＳＤやＹＯＬＯ等の公知の画像処理手段が用いられる。
検出結果合成手段は、第２の物体検出手段による、各分割画像に対する物体検出結果を合成して、撮像画像の物体検出結果（各分割画像に対する物体検出結果に基づいた撮像画像の物体検出結果）として出力する。各分割画像に対する物体検出結果を合成する方法としては、例えば、分割画像における物体の位置情報を、撮像画像における物体の位置情報に変換する方法が用いられる。
本形態では、第１の検出手段による物体検出と第２の検出手段による物体検出を行うことができるため、物体の検出精度をより高めることができる。
第１の検出手段による物体検出結果と第２の検出手段による物体検出結果を利用する態様は、適宜設定可能である。
本発明の異なる形態では、先ず、第２の検出手段による物体検出を実行し、第２の検出手段によって物体を検出することができなかった場合に、第１の検出手段による物体検出を実行するように構成されている。
本形態では、第１の検出手段および第２の検出手段の処理負担を軽減することができる。
本発明の異なる形態では、第１の検出手段による物体検出と第２の検出手段による物体検出を併行して実行するように構成されている。
本形態では、撮像画像に含まれている物体を短時間に検出することができる。 An object detection device of the present invention comprises an imaging means and a first detection means.
The imaging means sequentially outputs image information indicating captured images of the monitored area. A known imaging means such as a CCD camera can be used as the imaging means.
The first detection means detects a moving object (for example, a person to be detected) existing in the monitoring area based on the image information output from the imaging means.
The first detection means has difference image creation means, moving object detection means, and first object detection means.
The difference image creating means sequentially creates difference image information indicating a difference image between the captured images represented by the two pieces of image information output from the imaging means at different times. A differential image represents an image of a moving object obtained by removing a stationary background image from the two captured images. Various known methods can be used as a method for creating the differential image information. The interval between the two pieces of image information is set, for example, to an integral multiple (including "1") of the interval at which the image information is output from the imaging means.
The moving object detection means detects a moving speed and a moving locus of the moving object based on the differential image information. The moving speed and moving trajectory of the moving body can be detected using various known methods.
The first object detection means detects the moving object when the moving speed and the moving locus of the moving object detected by the moving object detecting means satisfy the set conditions set corresponding to the object to be detected. Detect as an object to be detected.
As the setting conditions, conditions peculiar to the object to be detected are used. For example, when the object to be detected is a person, the average moving speed within the first set period is within the lower limit (e.g., walking slowly) and the upper limit (e.g., running fast). and the condition that "the movement trajectory within the second set period forms a continuous trajectory of a predetermined shape" is set. As the first set period and the second set period, a period is set in which movements peculiar to a person to be detected can be detected.
According to the present invention, an object can be detected even if the size of the object included in the captured image is small. Thereby, the detection accuracy of the object included in the captured image can be improved. A different form of the present invention comprises the second detection means.
The second detection means uses deep learning to detect an object (for example, a person), which is a detection target, included in the captured image indicated by the image information output from the imaging means.
The second detection means has second object detection means, image division means, and detection result synthesis means.
The image dividing means divides the captured image indicated by the image information output from the imaging means into a plurality of divided images. As a method for dividing a captured image into a plurality of divided images (number of divided images, size of divided images, number of divisions, etc.), an appropriate method can be used. For example, a method of vertically and horizontally dividing at equal intervals, or a method of vertically and horizontally dividing at equal intervals with the same number of divisions is used.
The second object detection means uses deep learning to detect an object included in each divided image based on each divided image information. As the second object detection means, a known image processing means such as SSD or YOLO is used that simultaneously detects the position and category of an object based on image information.
The detection result synthesizing means synthesizes the object detection results for each divided image by the second object detecting means, and obtains the object detection result for the captured image (the object detection result for the captured image based on the object detection result for each divided image). output as As a method of synthesizing the object detection results for each divided image, for example, a method of converting the position information of the object in the divided images into the position information of the object in the captured image is used.
In this embodiment, object detection by the first detection means and object detection by the second detection means can be performed, so that the object detection accuracy can be further improved.
The mode of using the object detection result by the first detection means and the object detection result by the second detection means can be set as appropriate.
In a different form of the present invention, the object detection is first performed by the second detection means, and if the object cannot be detected by the second detection means, the object detection is performed by the first detection means. is configured as
In this embodiment, the processing load on the first detection means and the second detection means can be reduced.
In a different form of the present invention, the object detection by the first detection means and the object detection by the second detection means are executed in parallel.
In this embodiment, an object included in the captured image can be detected in a short period of time.

本発明は、撮像画像に含まれている物体の検出精度を向上させることができる。 ADVANTAGE OF THE INVENTION This invention can improve the detection accuracy of the object contained in the captured image.

本発明の物体検出装置の一実施形態のブロック図である。1 is a block diagram of an embodiment of an object detection device of the present invention; FIG. ディープラーニングを用いて物体を検出する第２の検出手段の一例の概要を示す図である。FIG. 4 is a diagram showing an overview of an example of second detection means that detects an object using deep learning; 第２の検出手段を用いて、分割画像に基づいて物体を検出する場合と撮像画像に基づいて物体を検出する場合の検出精度を示す図である。FIG. 10 is a diagram showing detection accuracy when an object is detected based on divided images and when an object is detected based on a captured image using the second detection means; 第２の検出手段を用いた物体検出動作の第１実施例を説明する図である。It is a figure explaining 1st Example of the object detection operation|movement using a 2nd detection means. 第２の検出手段を用いた物体検出動作の第２実施例を説明する図である。It is a figure explaining the 2nd Example of the object detection operation|movement using a 2nd detection means. 第２の検出手段を用いた物体検出動作の第３実施例を説明する図である。It is a figure explaining the 3rd Example of the object detection operation|movement using a 2nd detection means. 第１の検出手段を用いて物体検出動作の一例を説明する図である。It is a figure explaining an example of object detection operation using a 1st detection means. 第１の検出手段を用いて物体検出動作の一例を説明する図である。It is a figure explaining an example of object detection operation using a 1st detection means. 第１の検出手段による物体検出結果と第２の検出手段による物体検出結果を組み合わせた例を示す図である。It is a figure which shows the example which combined the object detection result by a 1st detection means, and the object detection result by a 2nd detection means.

以下に、本発明の物体検出装置の実施形態を、図面を参照して説明する。
図１は、一実施形態の物体検出装置１０のブロック図を示している。
一実施形態の物体検出装置１０は、有線通信回線や無線通信回線等により接続されている、処理手段２０、撮像手段５０、記憶手段６０、入力手段７０、出力手段８０等により構成されている。
撮像手段５０は、例えば、ＣＣＤやＣＭＯＳを用いたデジタルカメラにより構成される。撮像手段５０は、撮像画像を示す画像情報を設定期間間隔（フレームレート）で出力する。なお、撮像手段５０は、監視領域が撮像画像に含まれるように配置される。
撮像手段５０が、本発明の「撮像手段」に対応し、撮像手段５０から出力される画像情報が、本発明の「撮像した撮像画像を示す画像情報」に対応する。
記憶手段６０は、ＲＯＭやＲＡＭ等により構成され、処理手段２０の処理を実行するプログラムや種々のデータが記憶される。
入力手段７０は、キーボードやタッチパネル等により構成され、種々の情報を入力する。
出力手段８０は、液晶表示装置や有機ＥＬ表示装置等により構成される表示手段や、印刷手段等により構成され、種々の情報を出力する。なお、表示手段として、表示画面に表示されている表示部をタッチすることによって情報を入力することができる表示手段が用いられる場合には、入力手段７０は、タッチセンサにより構成される。
撮像手段５０、記憶手段６０、入力手段７０，出力手段８０等は、処理手段２０と離れた場所に配置されていてもよい。 An embodiment of an object detection device of the present invention will be described below with reference to the drawings.
FIG. 1 shows a block diagram of an object detection device 10 of one embodiment.
The object detection device 10 of one embodiment is composed of a processing means 20, an imaging means 50, a storage means 60, an input means 70, an output means 80, etc., which are connected by a wired communication line, a wireless communication line, or the like.
The imaging means 50 is composed of, for example, a digital camera using a CCD or CMOS. The imaging means 50 outputs image information indicating a captured image at set period intervals (frame rate). Note that the imaging means 50 is arranged so that the monitored area is included in the captured image.
The image capturing means 50 corresponds to the "image capturing means" of the present invention, and the image information output from the image capturing means 50 corresponds to the "image information indicating the captured image" of the present invention.
The storage means 60 is composed of ROM, RAM, etc., and stores programs for executing the processing of the processing means 20 and various data.
The input means 70 is composed of a keyboard, a touch panel, etc., and inputs various information.
The output means 80 includes a display means configured by a liquid crystal display device, an organic EL display device, or the like, a printing means, or the like, and outputs various information. In addition, when a display means that can input information by touching a display portion displayed on a display screen is used as the display means, the input means 70 is configured by a touch sensor.
The imaging means 50 , the storage means 60 , the input means 70 , the output means 80 and the like may be arranged at a location separate from the processing means 20 .

処理手段２０は、ＣＰＵ等により構成される。
処理手段２０は、第１の検出手段３０と第２の検出手段４０を有している。
第２の検出手段４０は、撮像手段５０から出力された画像情報に基づいて、ディープラーニング等を用いて、画像情報で示される撮像画像に含まれている物体を検出する。好適には、第２の検出手段により、撮像画像に含まれている監視領域内に存在する物体を検出するように設定される。第２の検出手段は、物体のカテゴリと位置を検出することができるため、特定の物体（検出対象である物体）を検出することもできる。
第１の検出手段３０は、画像情報で示される撮像画像に含まれている物体の大きさが小さく、第２の検出手段４０で物体を検出することができない場合に、撮像画像に含まれている物体を検出する。第１の検出手段３０は、撮像画像に含まれている物体（好適には、撮像画像に含まれている監視領域内に存在する物体）の移動速度と移動軌跡が、検出対象である物体に対応して設定されている設定条件を満足する場合に、画像情報に含まれている物体が、検出対象である物体であることを検出する。 The processing means 20 is composed of a CPU and the like.
The processing means 20 has a first detection means 30 and a second detection means 40 .
Based on the image information output from the imaging means 50, the second detection means 40 detects an object included in the captured image indicated by the image information using deep learning or the like. Preferably, the second detection means is set to detect an object existing within the monitored area included in the captured image. Since the second detection means can detect the category and position of the object, it can also detect a specific object (an object to be detected).
The first detection means 30 detects the object included in the captured image indicated by the image information when the size of the object included in the captured image is small and the second detection means 40 cannot detect the object. Detect objects in the The first detection means 30 determines whether the moving speed and moving trajectory of an object included in the captured image (preferably, an object existing within a monitoring area included in the captured image) When the setting condition set correspondingly is satisfied, it is detected that the object included in the image information is the object to be detected.

先ず、第２の検出手段４０について説明する。
第２の検出手段４０は、第２の物体検出手段４１、画像分割手段４２、検出結果合成手段４３を有している。 First, the second detection means 40 will be explained.
The second detection means 40 has second object detection means 41 , image division means 42 , and detection result synthesis means 43 .

画像分割手段４２は、撮像手段５０から出力された画像情報で示される撮像画像（「元画像」という）を複数の分割画像に分割する。画像情報には、撮像手段５０から出力されて記憶手段６０に記憶されている画像情報も含まれる。画像分割手段４２による撮像画像を分割する方法については後述する。 The image dividing means 42 divides a captured image (referred to as an “original image”) indicated by the image information output from the imaging means 50 into a plurality of divided images. The image information includes image information output from the imaging means 50 and stored in the storage means 60 . A method of dividing the captured image by the image dividing means 42 will be described later.

第２の物体検出手段４１は、画像分割手段４２で分割された分割画像を示す分割画像情報に基づいて、分割画像に含まれている物体および位置を検出する。なお、第２の物体検出手段４１は、撮像画像を示す画像情報に基づいて、撮像画像に含まれている物体および位置を検出することもできる。
第２の物体検出手段４１としては、ディープラーニングを用いて、撮像画像を示す画像情報あるいは分割画像を示す分割画像情報に基づいて、撮像画像あるいは分割画像に含まれている物体のカテゴリと位置を検出する、公知の種々の物体検出手段を用いることができる。例えば、ＳＳＤやＹＯＬＯの手法を用いて物体のカテゴリと位置を検出する物体検出手段を用いることができる。
例えば、ＳＳＤは、図２に示されているように、多層のＣＮＮ(Convolutional neural network)（畳み込みニューラルネットワーク）を基本とし、物体の存在領域候補を推定するレイヤと、存在領域候補内の物体を判別するレイヤとにより構成される。物体の存在領域候補を推定するレイヤでは、画像情報を、複数の所定サイズの矩形領域（デフォルトボックス）に分割し、矩形領域のずれを考慮しながら物体の存在領域候補（バウンディングボックス）を推定する。存在領域候補内の物体を判別するレイヤでは、別途学習済のＣＮＮを用いて存在領域候補内の物体を判別する。 The second object detection means 41 detects the object and the position included in the divided image based on the divided image information indicating the divided image divided by the image dividing means 42 . The second object detection means 41 can also detect objects and positions included in the captured image based on image information indicating the captured image.
The second object detection means 41 uses deep learning to determine the category and position of an object included in the captured image or the divided image based on the image information indicating the captured image or the divided image information indicating the divided image. Various known object detection means for detecting can be used. For example, an object detection means that detects the category and position of an object using the SSD or YOLO technique can be used.
For example, as shown in FIG. 2, SSD is based on a multilayer CNN (Convolutional neural network) (convolutional neural network), a layer for estimating the existence area candidate of the object, and an object in the existence area candidate and a layer to discriminate. In the layer for estimating object existence area candidates, image information is divided into a plurality of rectangular areas (default boxes) of a predetermined size, and object existence area candidates (bounding boxes) are estimated while considering the deviation of the rectangular areas. . In the layer for discriminating the object in the existence area candidate, the separately trained CNN is used to discriminate the object in the existence area candidate.

検出結果合成手段４３は、第２の物体検出手段４１による、各分割画像に対する物体検出結果を合成し、撮像画像に対する物体検出結果として出力する。
例えば、各分割画像に対する物体検出結果を、各分割画像における物体の位置情報を撮像画像における位置情報に変換した状態で合成して、撮像画像（「元画像」）に対する物体検出結果（この場合、各分割画像に対する物体検出結果に基づいた撮像画像に対する物体検出結果）として出力する。
なお、検出結果合成手段４３は、第２の物体検出手段４１による、各分割画像に対する物体検出結果と撮像画像（「元画像」）に対する物体検出結果を合成し、撮像画像（「元画像」）に対する物体検出結果（この場合、各分割画像に対する物体影検出結果と撮像画像に対する物体検出結果に基づいた撮像画像に対する物体検出結果）として出力することができる。この時、合成される物体検出結果に、位置がほぼ同じ物体が含まれている場合には、例えば、物体検出時に用いられる、物体らしさを示すスコアが高い方を選択する。あるいは、両方を出力することもできる。
また、検出結果合成手段４３は、撮像画像に対する物体検出結果を、撮像画像の物体検出結果（この場合、「撮像画像に対する物体検出結果に基づいた撮像画像の物体検出結果」）として出力することもできる。 The detection result synthesizing means 43 synthesizes the object detection result for each divided image by the second object detecting means 41 and outputs it as the object detection result for the captured image.
For example, the object detection result for each divided image is combined with the position information of the object in each divided image converted into the position information in the captured image, and the object detection result for the captured image (“original image”) (in this case, output as an object detection result for the captured image based on the object detection result for each divided image.
Note that the detection result synthesizing unit 43 synthesizes the object detection result for each divided image and the object detection result for the captured image (“original image”) by the second object detection unit 41 to obtain the captured image (“original image”). (in this case, the object detection result for the captured image based on the object shadow detection result for each divided image and the object detection result for the captured image). At this time, if the synthesized object detection results include objects having substantially the same position, for example, the one with the higher score indicating the likelihood of being an object, which is used when detecting the object, is selected. Alternatively, both can be output.
The detection result synthesizing unit 43 may also output the object detection result for the captured image as the object detection result for the captured image (in this case, "the object detection result for the captured image based on the object detection result for the captured image"). can.

ここで、物体検出手段として現行の画像処理手段を用い、撮像画像に対して物体検出処理を実行する場合と、撮像画像を分割した分割画像に対して物体検出処理を実行する場合の物体の検出精度を、図３を参照して説明する。
（Ｍ１）は、遠方に存在する二人の人を含む画像を示している。
（Ｍ２）は、画像（Ｍ１）を、物体検出処理に用いられる撮像画像（Ｘ）に対応する大きさに縮小した画像を示している。なお、撮像手段５０のズーム機能を用いることによって、撮像画像（Ｘ）中における画像（Ｍ２）の大きさは変化する。
（Ｍ３）は、画像（Ｍ１）を、撮像画像（Ｘ）を分割した分割画像（図３では、縦方向および横方向それぞれに等間隔に２分割した４分割画像）に対応する大きさに縮小した画像を示している。
（Ｎ１）は、撮像画像（Ｘ）における、画像（Ｍ２）に対応する領域の画像（処理対象画像）を示している。
（Ｎ２）は、分割画像における、画像（Ｍ３）に対応する領域の画像（処理対象画像）を示している。
現行の画像処理手段を用いて、撮像画像（Ｘ）の処理対象画像（Ｎ１）に対して物体検出処理を実行した場合、物体の画像（Ｍ２）を検出することができなかった。一方、分割画像の処理対象画像（Ｎ２）に対して物体検出処理を実行した場合には、物体の画像（Ｍ３）を検出することができた。 Here, using the current image processing means as the object detection means, object detection is performed in the case of executing the object detection processing on the captured image and in the case of executing the object detection processing on the divided images obtained by dividing the captured image. Accuracy is explained with reference to FIG.
(M1) shows an image containing two people who are far away.
(M2) shows an image obtained by reducing the image (M1) to a size corresponding to the captured image (X) used for object detection processing. By using the zoom function of the imaging means 50, the size of the image (M2) in the captured image (X) changes.
(M3) reduces the image (M1) to a size corresponding to a divided image obtained by dividing the captured image (X) (in FIG. 3, a four-divided image obtained by dividing the captured image (X) into two at equal intervals in the vertical and horizontal directions). It shows an image with
(N1) indicates an image (processing target image) of a region corresponding to the image (M2) in the captured image (X).
(N2) indicates an image (image to be processed) of the area corresponding to the image (M3) in the divided image.
When object detection processing was performed on the processing target image (N1) of the captured image (X) using the current image processing means, the object image (M2) could not be detected. On the other hand, when the object detection process was performed on the processing target image (N2) of the divided images, the object image (M3) could be detected.

以上のように、撮像画像を分割した分割画像に対して、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を実行することにより、現行の画像処理手段では検出することができなかった、撮像画像に含まれている小さい画像の物体を検出することが可能となる。すなわち、物体の検出精度を向上させることができる。
実験では、現行の画像処理手段を用いた場合には、撮像画像における物体の最小検出サイズは、約［１０×３０画素］であったが、第２の検出手段４０を用いた場合には、４分割画像における物体の最小検出サイズは、約［７×１５画素］であった。 As described above, by executing object detection processing by the second detection means 40 (second object detection means 41) on the divided images obtained by dividing the captured image, current image processing means can detect It becomes possible to detect a small image object included in the captured image that could not be detected. That is, object detection accuracy can be improved.
In the experiment, when the current image processing means was used, the minimum detectable size of the object in the captured image was about [10 × 30 pixels], but when the second detection means 40 was used, The minimum detectable size of an object in a quadrant image was approximately [7×15 pixels].

次に、第２の検出手段４０の動作を説明する。
第２の検出手段４０の動作の第１実施例を、図４を参照して説明する。
第１実施例では、画像分割手段４２は、撮像画像（Ｘ）を、４分割線（１本の横方向分割線、１本の縦方向分割線）によって、４個（横方向に等間隔に２個×縦方向に等間隔に２個）の４分割画像（ａ）～（ｄ）に分割する。なお、撮像画像（Ｘ）における、各４分割画像（ａ）～（ｄ）の位置（例えば、撮像画像（Ｘ）の座標上における、各４分割画像（ａ）～（ｄ）の角部の位置）は、記憶手段６０に記憶される。
第１実施例では、撮像画像（Ｘ）は、２の２乗個（２^２）（縦方向および横方向それぞれに等間隔に２個）の４分割画像（ａ）～（ｄ）に分割されている。すなわち、４分割画像（ａ）～（ｄ）の縦横比（アスペクト比）が、撮像画像（Ｘ）の縦横比（アスペクト比）と等しい（「ほぼ等しい」を含む）。このため、第２の物体検出手段４１により、各４分割画像（ａ）～（ｄ）に対して物体検出処理を実行する場合に、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。 Next, the operation of the second detection means 40 will be explained.
A first embodiment of the operation of the second detection means 40 will now be described with reference to FIG.
In the first embodiment, the image dividing means 42 divides the captured image (X) into 4 pieces (horizontally at equal intervals) by 4 dividing lines (one horizontal dividing line and one vertical dividing line). The image is divided into four divided images (a) to (d) of 2 × 2 at equal intervals in the vertical direction. The positions of the quadrant images (a) to (d) in the captured image (X) (for example, the corners of the quadrant images (a) to (d) on the coordinates of the captured image (X) position) is stored in the storage means 60 .
In the first embodiment, the captured image (X) is divided into four divided images (a) to (d) of the square of 2 (2 ² ) (two equally spaced in each of the vertical and horizontal directions). ing. That is, the aspect ratio of the four divided images (a) to (d) is equal to the aspect ratio of the captured image (X) (including "substantially equal"). Therefore, when the second object detection means 41 executes the object detection processing on each of the four divided images (a) to (d), there is no distortion caused by changing the scale of the image, and the object detection performance is not affected. do not have.

第２の物体検出手段４１は、各４分割画像（ａ）～（ｄ）に対する物体検出処理を実行して、各４分割画像（ａ）～（ｄ）に対する物体検出結果を出力する。
また、第２の物体検出手段４１は、撮像画像（Ｘ）に対する物体検出処理を実行して、撮像画像（Ｘ）に対する物体検出結果を出力する。
検出結果合成手段４３は、各４分割画像（ａ）～（ｄ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果を合成し、撮像画像（Ｘ）の物体検出結果（各分割画像（ａ）～（ｄ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果に基づいた撮像画像（Ｘ）に対する物体検出結果）として出力する。例えば、各４分割画像（ａ）～（ｄ）に対する物体検出結果に含まれている物体の位置情報を、撮像画像（Ｘ）における位置情報に変換した状態で、各４分割画像（ａ）～（ｄ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果を合成する。 The second object detection means 41 executes object detection processing for each of the four-part images (a) to (d) and outputs object detection results for each of the four-part images (a) to (d).
Further, the second object detection means 41 executes object detection processing on the captured image (X) and outputs the object detection result on the captured image (X).
The detection result synthesizing means 43 synthesizes the object detection result for each of the four divided images (a) to (d) and the object detection result for the captured image (X), and obtains the object detection result of the captured image (X) (each divided image ( Output as an object detection result for the captured image (X) based on the object detection results for a) to (d) and the object detection result for the captured image (X). For example, the object position information included in the object detection results for each of the quadrant images (a) to (d) is converted into the position information in the captured image (X), and each quadrant image (a) to The object detection result for (d) and the object detection result for the captured image (X) are synthesized.

第１実施例では、撮像画像（Ｘ）を４分割した４個の４分割画像（ａ）～（ｄ）に対して物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理では検出することができない物体を検出することができる。
これにより、物体の検出精度を向上させることができる。
なお、撮像画像（Ｘ）を４個の４分割画像（ａ）～（ｄ）に分割した場合、各４分割画像（ａ）～（ｄ）の境界部分に存在する物体、例えば、４分割画像（ａ）～（ｄ）の境界部分を跨いで存在する物体を検出することができない可能性がある。例えば、図５に示されているように、４分割画像（ａ）と（ｂ）に跨って存在する物体（Ｐ）は、４分割画像（ａ）、（ｂ）に対する物体検出処理では検出することができない可能性がある。
第１実施例では、撮像画像（Ｘ）に対する物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理を出力する。そして、撮像画像（Ｘ）に対する物体検出結果と各４分割画像（ａ）～（ｄ）に対する物体検出結果を合成している。これにより、各４分割画像（ａ）～（ｄ）に対する物体検出処理では検出することができない、各４分割画像（ａ）～（ｄ）の境界部分に存在する物体を、撮像画像（Ｘ）に対する物体検出処理によって検出することができる。 In the first embodiment, by executing the object detection processing on four quadrant images (a) to (d) obtained by dividing the captured image (X) into four, the object detection processing for the captured image (X) Objects that cannot be detected can be detected.
Thereby, the object detection accuracy can be improved.
Note that when the captured image (X) is divided into four quadrant images (a) to (d), an object existing in the boundary portion of each quadrant image (a) to (d), for example, the quadrant image There is a possibility that an object existing across the boundaries of (a) to (d) cannot be detected. For example, as shown in FIG. 5, an object (P) that exists across the quadrant images (a) and (b) is detected in the object detection processing for the quadrant images (a) and (b). may not be possible.
In the first embodiment, the object detection process for the captured image (X) is output by executing the object detection process for the captured image (X). Then, the object detection result for the captured image (X) and the object detection result for each of the four divided images (a) to (d) are synthesized. As a result, an object present in the boundary portion of each of the four-divided images (a) to (d), which cannot be detected by the object detection processing for each of the four-divided images (a) to (d), is detected in the captured image (X). can be detected by an object detection process for

第２の検出手段４０の動作の第２実施例を、図５を参照して説明する。
第２実施例では、画像分割手段４２は、撮像画像（Ｘ）を、４分割線によって４個の４分割画像（ａ）～（ｄ）に分割するとともに、９分割線（３本の横方向分割線、３本の縦方向分割線）によって９個（横方向に等間隔に３個×縦方向に等間隔に３個）の９分割画像（Ａ）～（Ｉ）に分割する。なお、撮像画像（Ｘ）における、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）の位置（例えば、撮像画像（Ｘ）の座標上における、各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）の角部の位置）は、記憶手段６０に記憶される。
第２実施例では、撮像画像（Ｘ）は、２の２乗個（２^２）（縦方向および横方向それぞれに等間隔に２個）の４分割画像（ａ）～（ｄ）と、３の２乗個（３^２）（縦方向および横方向それぞれに等間隔に３個）の９分割画像（Ａ）～（Ｉ）に分割されている。すなわち、４分割画像（ａ）～（ｄ）および９分割画像（Ａ）～（Ｉ）の縦横比（アスペクト比）が、撮像画像の縦横比（アスペクト比）と等しい（「ほぼ等しい」を含む）。このため、第２の物体検出手段４１により、４分割画像（ａ）～（ｄ）および９分割画像（Ａ）～（Ｉ）に対して物体検出処理を実行する場合に、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。 A second embodiment of the operation of the second detection means 40 will now be described with reference to FIG.
In the second embodiment, the image dividing means 42 divides the captured image (X) into four 4-part images (a) to (d) by the 4-part lines, and 9-part lines (three horizontal lines). The image is divided into 9 divided images (A) to (I) (3 at equal intervals in the horizontal direction×3 at equal intervals in the vertical direction) by dividing lines and three vertical dividing lines. Note that the positions of the 4-divided images (a) to (d) and the 9-divided images (A) to (I) in the captured image (X) (for example, each divided image on the coordinates of the captured image (X) (a) to (d) and (A) to (I) corner positions) are stored in the storage means 60. FIG.
In the second embodiment, the captured image (X) includes 2 squared (2 ² ) (two equally spaced vertically and horizontally) divided images (a) to (d), and 3 (3 ² ) (three equally spaced in each of the vertical and horizontal directions) divided into 9 divided images (A) to (I). That is, the aspect ratio of the 4-divided images (a) to (d) and the 9-divided images (A) to (I) is equal to the vertical and horizontal ratio (aspect ratio) of the captured image (including "substantially equal" ). Therefore, when the second object detection means 41 executes the object detection processing on the 4-divided images (a) to (d) and the 9-divided images (A) to (I), the scale of the image is changed. No distortion, no impact on object detection performance.

第２の物体検出手段４１は、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）に対して物体検出処理を実行し、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）に対する物体検出結果を出力する。
また、第２の物体検出手段４１は、撮像画像（Ｘ）に対して物体検出処理を実行し、撮像画像（Ｘ）に対する物体検出結果を出力する。
検出結果合成手段４３は、各４分割画像（ａ）～（ｄ）に対する物体検出結果および各９分割画像（Ａ）～（Ｉ）に対する物体検出結果と、撮像画像（Ｘ）に対する物体検出結果を合成し、撮像画像（Ｘ）に対する物体検出結果（各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果に基づいた撮像画像（Ｘ）に対する物体検出結果果）として出力する。例えば、各４分割画像（ａ）～（ｄ）および各９分割画像（Ａ）～（Ｉ）に対する物体検出結果に含まれている物体の位置情報を、撮像画像（Ｘ）における位置情報に変換した状態で、各４分割画像（ａ）～（ｄ）および各９分割画像に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果を合成する。検出結果合成手段４３による物体検出結果の合成処理は、第１実施例における合成処理と同様の方法を用いることができる。 The second object detection means 41 performs object detection processing on each of the four-part images (a) to (d) and each of the nine-part images (A) to (I), and performs object detection processing on each of the four-part images (a) to (d) and an object detection result for each of the 9-divided images (A) to (I) are output.
Also, the second object detection means 41 executes object detection processing on the captured image (X) and outputs the object detection result for the captured image (X).
The detection result synthesizing means 43 synthesizes the object detection result for each of the 4-divided images (a) to (d), the object detection result for each of the 9-divided images (A) to (I), and the object detection result for the captured image (X). A captured image based on object detection results for captured image (X) (object detection results for divided images (a) to (d) and (A) to (I) and object detection results for captured image (X)). (X) is output as an object detection result). For example, the object position information included in the object detection results for each of the four-part images (a) to (d) and each of the nine-part images (A) to (I) is converted into position information in the captured image (X). In this state, the object detection result for each of the 4-divided images (a) to (d) and each 9-divided image and the object detection result for the captured image (X) are synthesized. The synthesis processing of the object detection results by the detection result synthesizing means 43 can use the same method as the synthesis processing in the first embodiment.

第２実施例では、撮像画像（Ｘ）を４分割した４個の４分割画像（ａ）～（ｄ）および９分割した９個の９分割画像（Ａ）～（Ｉ）に対して物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理では検出することができない物体を検出することができる。
これにより、物体の検出精度を向上させることができる。
また、撮像画像（Ｘ）を、偶数である２の２乗個（２^２）に４分割するとともに、奇数である３の２乗個（３^２）に９分割している。これにより、４分割画像（ａ）～（ｄ）の境界部分（縦方向境界線、横方向境界線）と、９分割画像（Ａ）～（Ｉ）の境界部分（縦方向境界線、横方向境界線）は、交差するが、平行に重ならない。
のため、各４分割画像（ａ）～（ｄ）の境界部分における物体検出精度の低下（例えば、境界部分を跨いで存在する物体を検出することができない）を、各９分割画像（Ａ）～（Ｉ）の物体検出結果によって補うことができる。例えば、図５に示されているように、４分割画像（ａ）と（ｂ）に跨って存在する物体（Ｐ）は、４分割画像（ａ）、（ｂ）に対する物体検出処理によって検出することができない可能性があるが、９分割画像（Ｂ）に対する物体検出処理によって検出することができる。同様に、各９分割画像（Ａ）～（Ｉ）の境界部分における物体検出精度の低下を、各４分割画像（ａ）～（ｄ）の物体検出結果によって補うことができる。さらに、撮像画像（Ｘ）に対する物体検出処理によって補うこともできる。
したがって、物体の検出精度をより向上させることができる。 In the second embodiment, object detection is performed on four 4-divided images (a) to (d) obtained by dividing the captured image (X) into 4 and nine 9-divided images (A) to (I) obtained by dividing the captured image (X) into 9. By executing the processing, an object that cannot be detected by the object detection processing for the captured image (X) can be detected.
Thereby, the object detection accuracy can be improved.
In addition, the captured image (X) is divided into 4 even-numbered squares of 2 (2 ² ) and divided into 9 odd-numbered squares of 3 (3 ² ). As a result, the boundary portions (vertical boundary line, horizontal boundary line) of the 4-divided images (a) to (d) and the boundary portions (vertical boundary line, horizontal boundary line) of the 9-divided images (A) to (I) borders) intersect but do not overlap parallel.
Therefore, a decrease in object detection accuracy (for example, an object existing across the boundary cannot be detected) at the boundary of each of the 4-part images (a) to (d) is reduced to each 9-part image (A). (I) can be supplemented by the object detection results. For example, as shown in FIG. 5, an object (P) that exists across the quadrant images (a) and (b) is detected by object detection processing for the quadrant images (a) and (b). However, it can be detected by object detection processing on the 9-divided image (B). Similarly, a decrease in object detection accuracy at the boundaries of the 9-divided images (A) to (I) can be compensated for by the object detection results of the 4-divided images (a) to (d). Further, it can be supplemented by object detection processing for the captured image (X).
Therefore, object detection accuracy can be further improved.

第２の検出手段４０の動作の第３実施例を、図６を参照して説明する。
第３実施例では、画像分割手段４２は、撮像画像（Ｘ）を、４分割線によって４個の４分割画像（ａ）～（ｄ）を得、９分割線によって９個の９分割画像（Ａ）～（Ｉ）を得るとともに、１６分割線（４本の横方向分割線、４本の縦方向分割線）によって１６個（横方向に等間隔に４個×縦方向に等間隔に４個）の１６分割画像（１）～（１６）に分割する。なお、撮像画像（Ｘ）における、各４分割画像（ａ）～（ｄ）、各９分割画像（Ａ）～（Ｉ）および各１６分割画像（１）～（１６）の位置（例えば、撮像画像（Ｘ）の座標上における、各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）、（１）～（１６）の角部の位置）は、記憶手段６０に記憶される。
第３実施例では、撮像画像（Ｘ）は、２の２乗個（２^２）（縦方向および横方向それぞれに等間隔に２個）の４分割画像（ａ）～（ｄ）と、３の２乗個（３^２）（縦方向および横方向それぞれに等間隔に３個）の９分割画像（Ａ）～（Ｉ）と、４の２乗個（４^２）（縦方向および横方向それぞれに等間隔に４個）の１６分割画像（１）～（１６）に分割されている。すなわち、４分割画像（ａ）～（ｃ）、９分割画像（Ａ）～（Ｉ）および１６分割画像（１）～（１６）の縦横比（アスペクト比）が、撮像画像の縦横比（アスペクト比）と等しい（「ほぼ等しい」を含む）。このため、第２の物体検出手段４１により、４分割画像（ａ）～（ｄ）、９分割画像（Ａ）～（Ｉ）および１６分割画像（１）～（１６）に対して物体検出処理を実行する場合に、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。 A third embodiment of the operation of the second detection means 40 will now be described with reference to FIG.
In the third embodiment, the image dividing means 42 obtains four 4-divided images (a) to (d) from the captured image (X) by the 4-dividing lines, and obtains 9 9-divided images ( A) to (I) are obtained, and 16 dividing lines (4 horizontal dividing lines, 4 vertical dividing lines) are divided into 16 (4 at equal intervals in the horizontal direction × 4 at equal intervals in the vertical direction). ) into 16 divided images (1) to (16). In the captured image (X), the positions of the 4-divided images (a) to (d), the 9-divided images (A) to (I), and the 16-divided images (1) to (16) (for example, the captured image The positions of the corners of the divided images (a) to (d), (A) to (I), and (1) to (16) on the coordinates of the image (X) are stored in the storage means 60. .
In the third embodiment, the captured image (X) includes 2 squared (2 ² ) (two equally spaced vertically and horizontally) divided images (a) to (d), and 3 (3 ² ) (three equally spaced in the vertical and horizontal directions) divided into 9 images (A) to (I) and 4 (4 ² ) (vertical and horizontal) Each image is divided into 16 divided images (1) to (16) of which four are divided at equal intervals. That is, the aspect ratio of the 4-split images (a) to (c), the 9-split images (A) to (I), and the 16-split images (1) to (16) is the aspect ratio of the captured image. ratio) (including "approximately equal"). For this reason, the second object detection means 41 performs object detection processing on 4-divided images (a) to (d), 9-divided images (A) to (I), and 16-divided images (1) to (16). , there is no image scaling distortion and no impact on object detection performance.

第２の物体検出手段４１は、各４分割画像（ａ）～（ｄ）、各４分割画像（Ａ）～（Ｉ）および各１６分割画像（１）～（１６）に対して物体検出処理を実行し、各４分割画像（ａ）～（ｄ）、各９分割画像（Ａ）～（Ｉ）および各１６分割画像（１）～（１６）に対する物体検出結果を出力する。
また、第２の物体検出手段４１は、撮像画像（Ｘ）に対する物体検出処理を実行して、撮像画像（Ｘ）に対する物体検出結果を出力する。
検出結果合成手段４３は、各４分割画像（ａ）～（ｄ）に対する物体検出結果、各９分割画像（Ａ）～（Ｉ）に対する物体検出結果および各１６分割画像（１）～（１６）に対する物体検出結果と、撮像画像（Ｘ）に対する物体検出結果を合成し、撮像画像（Ｘ）に対する物体検出結果（各分割画像（ａ）～（ｄ）、（Ａ）～（Ｉ）、（１）～（１４）に対する物体検出結果と撮像画像（Ｘ）に対する物体検出結果に基づいた撮像画像（Ｘ）に対する物体検出結果）として出力する。検出結果合成手段４３による物体検出結果の合成処理は、第１実施例や第２実施例における合成処理と同様の方法を用いることができる。 The second object detection means 41 performs object detection processing on each of the 4-divided images (a) to (d), each of the 4-divided images (A) to (I), and each of the 16-divided images (1) to (16). is executed to output object detection results for each of the 4-divided images (a) to (d), the 9-divided images (A) to (I), and the 16-divided images (1) to (16).
Further, the second object detection means 41 executes object detection processing on the captured image (X) and outputs the object detection result on the captured image (X).
The detection result synthesizing means 43 synthesizes the object detection result for each of the 4-divided images (a) to (d), the object detection result for each of the 9-divided images (A) to (I), and each of the 16-divided images (1) to (16). and the object detection result for the captured image (X) are synthesized, and the object detection result for the captured image (X) (each of the divided images (a) to (d), (A) to (I), (1 ) to (14) and the object detection result for the captured image (X) based on the object detection result for the captured image (X). The synthesis processing of the object detection results by the detection result synthesizing means 43 can use the same method as the synthesis processing in the first embodiment and the second embodiment.

第３実施例では、撮像画像（Ｘ）を４分割した４個の４分割画像（ａ）～（ｄ）、９分割した９個の９分割画像（Ａ）～（Ｉ）および１６分割した１６個の１６分割画像（１）～（１６）に対して物体検出処理を実行することにより、撮像画像（Ｘ）に対する物体検出処理では検出することができない物体を検出することができる。
これにより、物体の検出精度を向上させることができる。
また、撮像画像（Ｘ）を、偶数である２の２乗個（２^２）に４分割および４の２乗個（４^２）に１６分割するとともに、奇数である３の２乗個（３^２）に９分割している。これにより、４分割画像（ａ）～（ｄ）と１６分割画像（１）～（１６）の境界部分の一部が平行に重なっているが、４分割画像（ａ）～（ｄ）の境界部分および１６分割画像(１)～（１６）の境界部分と、９分割画像（Ａ）～（Ｉ）の境界部分は、平行に重なっていない。
このため、各９分割画像（Ａ）～（Ｉ）の境界部分における物体検出精度の低下（例えば、境界部分を跨いで存在する物体を検出することができない）を、各４分割画像（ａ）～（ｄ）および各１６分割画像（１）～（１６）の物体検出結果によって補うことができる。同様に、各４分割画像（ａ）～（ｄ）および各１６分割画像（１）～（１６）の境界部分における物体検出精度の低下を、各９分割画像（Ａ）～（Ｉ）の物体検出結果によって補うことができる。さらに、撮像画像（Ｘ）の物体検出結果によって補うこともできる。
したがって、物体の検出精度をより向上させることができる。 In the third embodiment, four 4-divided images (a) to (d) obtained by dividing the captured image (X) into 4, nine 9-divided images (A) to (I) obtained by dividing the captured image (X) into 9, and 16 divided images obtained by dividing 16 By executing the object detection processing on the 16 divided images (1) to (16), an object that cannot be detected by the object detection processing on the captured image (X) can be detected.
Thereby, the object detection accuracy can be improved.
In addition, the captured image (X) is divided into 4 even-numbered squares of 2 (2 ² ) and 16 divided into 16 squares of 4 (4 ² ), and odd-numbered squares of 3 (3 ² ) is divided into 9 parts. As a result, part of the boundary between the 4-divided images (a) to (d) and the 16-divided images (1) to (16) overlaps in parallel, but the boundaries of the 4-divided images (a) to (d) The boundary portions of the 16-divided images (1) to (16) and the boundary portions of the 9-divided images (A) to (I) do not overlap in parallel.
For this reason, a decrease in object detection accuracy in the boundary portion of each of the 9-part images (A) to (I) (for example, the inability to detect an object existing across the boundary portion) is (d) and object detection results of the 16-part images (1) to (16). Similarly, the deterioration of the object detection accuracy at the boundaries of the 4-division images (a) to (d) and the 16-division images (1) to (16) is evaluated as the objects in the 9-division images (A) to (I). It can be supplemented by detection results. Furthermore, it is also possible to supplement with the object detection result of the captured image (X).
Therefore, object detection accuracy can be further improved.

次に、第１の検出手段３０について説明する。
第１の検出手段３０は、差分画像作成手段３１、移動体検出手段３２および第１の物体検出手段３３を有している。 Next, the first detection means 30 will be explained.
The first detection means 30 has differential image creation means 31 , moving body detection means 32 and first object detection means 33 .

差分画像作成手段３１は、撮像手段５０から異なる時間に出力された二つの画像情報で示される撮像画像の差分画像を示す差分画像情報を順次作成する。差分画像は、二つの撮像画像それぞれから、二つの画像情報に共通の画像（背景画像）を除去した画像である。すなわち、差分画像は、移動体の画像を示している。差分画像情報を作成する方法としては、公知の種々の方法を用いることができる。例えば、ＯｐｅｎＣＶの背景差分法を用いることができる。 The differential image creating means 31 sequentially creates differential image information indicating a differential image of the captured images represented by the two image information output from the imaging means 50 at different times. A difference image is an image obtained by removing an image (background image) common to two pieces of image information from each of the two captured images. That is, the difference image indicates the image of the moving object. Various known methods can be used as a method for creating the difference image information. For example, OpenCV's background subtraction method can be used.

差分画像情報を作成する方法の一例を、図７、図８を参照して説明する。
なお、図７は、時点［ｔ］より１タイミング前の時点［ｔ－１］に、撮像手段５０から出力された画像情報で示される撮像画像（Ｘ［ｔ－１］）を示している。撮像画像（Ｘ［ｔ－１］）には、背景画像（静止画像）（Ｍ［ｔ－１］）と、移動体（Ｙ１［ｔ－１］）、（Ｙ２［ｔ－１］）の画像が含まれている。
図８は、時点［ｔ］に、撮像手段５０から出力された画像情報で示される撮像画像Ｘ［ｔ］）を示している。撮像画像（Ｘ［ｔ］）には、背景画像（静止画像）（Ｍ［ｔ］）と、移動体（Ｙ１［ｔ］）、（Ｙ２［ｔ］）の画像が含まれている。
撮像画像（Ｘ［ｔ］）に含まれている移動体（Ｙ１［ｔ］）、（Ｙ２［ｔ］）の位置は、撮像画像（Ｘ［ｔ－１］）に含まれている移動体（Ｙ１［ｔ－１］）、（Ｙ２［ｔ－１］）の位置と異なっている。すなわち、移動体（Ｙ１）、（Ｙ２）は、時点［ｔ－１］と時点［ｔ］の間で移動している。 An example of a method for creating differential image information will be described with reference to FIGS. 7 and 8. FIG.
Note that FIG. 7 shows a captured image (X[t-1]) indicated by image information output from the imaging means 50 at time [t-1], which is one timing before time [t]. The captured image (X[t-1]) includes a background image (still image) (M[t-1]) and moving objects (Y1[t-1]) and (Y2[t-1]) images. It is included.
FIG. 8 shows a captured image X[t]) indicated by the image information output from the imaging means 50 at time [t]. The captured image (X[t]) includes a background image (still image) (M[t]) and images of moving objects (Y1[t]) and (Y2[t]).
The positions of the moving bodies (Y1[t]) and (Y2[t]) included in the captured image (X[t]) are the positions of the moving body ( Y1[t−1]) and (Y2[t−1]) are different. In other words, the moving bodies (Y1) and (Y2) are moving between time [t-1] and time [t].

差分画像は、例えば、撮像画像（Ｘ［ｔ］）の各画素の状態（明度等）を、撮像画像（Ｘ［ｔ－１]）の対応する各画素の状態（明度等）と対比し、異なっていると判断される画素を抽出することによって作成される。
具体的には、図８に示されている撮像画像（Ｘ［ｔ］）に含まれている移動体（Ｙ１［ｔ］）と（Ｙ２［ｔ］）の画像と、移動体（Ｙ１［ｔ－１］）と（Ｙ２［ｔ－１］）に対応する位置の背景画像を含む差分画像が作成される。 The difference image is obtained by, for example, comparing the state (brightness, etc.) of each pixel of the captured image (X[t]) with the state (brightness, etc.) of each corresponding pixel of the captured image (X[t−1]), It is created by extracting pixels that are determined to be different.
Specifically, the images of the moving bodies (Y1[t]) and (Y2[t]) included in the captured image (X[t]) shown in FIG. −1]) and (Y2[t−1]).

移動体検出手段３２は、差分画像作成手段３１で作成された差分画像情報に基づいて、時点［ｔ］における（撮像画像（Ｘ[ｔ]）に含まれている）移動体（Ｙ１[ｔ]）、（Ｙ２[ｔ]）の位置を検出する。また、時点［t－1］における（撮像画像（Ｘ[ｔ－１]）に含まれている）移動体（Ｙ１[ｔ－１]）、（Ｙ２[ｔ－１]）の位置を検出する。
なお、（Ｙ１ｓ[ｔ]）、（Ｙ２ｓ[ｔ]）、（Ｙ１ｓ[ｔ－１]）、（Ｙ２ｓ[ｔ－１]）は、それぞれ移動体（Ｙ１[ｔ]）、（Ｙ２[ｔ]）、（Ｙ１[ｔ－１]）、（Ｙ２[ｔ－１]）の大きさを示している。移動体の大きさは、例えば、画素の数で判別する。すなわち、移動体の画像に外接する長方形や正方形の領域の画素数（縦の画素数×横の画素数）で判別する。
そして、時点［ｔ－１］における移動体（Ｙ１[ｔ－１]）、（Ｙ２[ｔ－１]）の位置と、時点［ｔ］における移動体（Ｙ１[ｔ]）、（Ｙ２[ｔ]）の位置との間の距離（移動距離）および方向（移動方向）を検出する。距離および方向を検出する方法としては、公知の種々の方法を用いることができる。例えば、「ＯｐｔｉｃａｌＦｌｏｗ」法を用いることができる。
図８には、移動体（Ｙ１[ｔー１]）の位置と（Ｙ１[ｔ]）の位置との間の距離および方向が移動ベクトル[Ｙ１（Ｗｔ）]で示され、移動体（Ｙ２[ｔー１]）の位置と（Ｙ２[ｔ]）の位置との間の距離および方向が移動ベクトル[Ｙ２（Ｗｔ）]で示されている。
さらに、各移動体に対する、各時点における移動ベクトルに基づいて、第１の設定期間内における各位導体の平均移動速度と、第２の設定期間内における各移動体の移動軌跡を検出する。第１の設定期間および第２の設定期間は、検出対象である物体に対応して適宜設定される。移動ベクトルとしては、２次元平面（例えば、撮像画像の左右方向および上下方向を含む２次元平面）上の移動を示すベクトルであってもよいが、好適には、３次元空間（例えば、撮像画像の左右方向、上下方向および前後方向を含む３次元空間）の移動を示すベクトルが用いられる。
好適には、移動体検出手段３２は、撮像画像に含まれている監視領域内に存在する移体を検出するように構成される。監視領域は、例えば、撮像画像における、監視領域の境界個所の位置を設定することにより規定される。 Based on the differential image information created by the differential image creating means 31, the moving body detecting means 32 detects the moving body (Y1[t] ), (Y2[t]). Also, the positions of the moving bodies (Y1[t-1]) and (Y2[t-1]) (included in the captured image (X[t-1])) at time [t-1] are detected. .
Note that (Y1s[t]), (Y2s[t]), (Y1s[t−1]), and (Y2s[t−1]) are the moving bodies (Y1[t]) and (Y2[t]), respectively. ), (Y1[t−1]), and (Y2[t−1]). The size of the moving object is determined, for example, by the number of pixels. That is, determination is made based on the number of pixels (the number of pixels in the vertical direction×the number of pixels in the horizontal direction) in a rectangular or square area circumscribing the image of the moving object.
Then, the positions of the moving bodies (Y1[t-1]) and (Y2[t-1]) at the time [t-1] and the moving bodies (Y1[t]) and (Y2[t] at the time [t] ]) and the distance (movement distance) and direction (movement direction). Various known methods can be used to detect the distance and direction. For example, the "Optical Flow" method can be used.
In FIG. 8, the distance and direction between the position of the moving body (Y1[t−1]) and the position of (Y1[t]) are indicated by the moving vector [Y1(Wt)], and the moving body (Y2 [t−1]) and (Y2[t]) are indicated by the motion vector [Y2(Wt)].
Furthermore, based on the movement vector of each moving body at each time point, the average moving speed of each conductor within the first set period and the movement trajectory of each moving body within the second set period are detected. The first set period and the second set period are appropriately set according to the object to be detected. The movement vector may be a vector indicating movement on a two-dimensional plane (for example, a two-dimensional plane including the left-right direction and the up-down direction of the captured image), but preferably a three-dimensional space (for example, a captured image A vector indicating the movement in a three-dimensional space including the left-right direction, the up-down direction, and the front-rear direction is used.
Preferably, the moving body detection means 32 is configured to detect a moving body present within the monitored area included in the captured image. The monitoring area is defined, for example, by setting the position of the boundary of the monitoring area in the captured image.

第１の物体検出手段３３は、移動体（Ｙ１）、（Ｙ２）が検出対象である物体（以下、「対象物体」という）であるか否かを検出する。
移動体（Ｙ１）、（Ｙ２）が対象物体であるか否かを検出する方法としては、種々の方法を用いることができる。本実施形態では、移動体（Ｙ１）、（Ｙ２）の移動速度と移動軌跡が、対象物体に対応して設定されている設定条件を満足しているか否かによって、移動体（Ｙ１）、（Ｙ２）が対象物体であるか否かを検出している。
移動体が対象物体であることを検出するための、移動速度と移動軌跡に関する設定条件としては、対象物体に固有に設定条件が用いられる。移動速度および移動軌跡としては、２次元平面（例えば、撮像画像の左右方向および上下方向を含む２次元空間）上における移動速度および移動軌跡が用いてもよいが、好適には、３次元空間（例えば、撮像画像の左右方向、上下方向および前後方向を含む３次元空間）における移動速度および移動軌跡が用いられる。
例えば、対象物体が人である場合には、以下の設定条件が用いられる。
（１）第１の設定期間内における平均移動速度が、下限値と上限値の範囲内である。
下限値としては、例えば、人がゆっくり歩く速度に設定され、上限量としては、例えば、人が早く走る速度に設定される。第１の設定期間としては、人の平均移動速度を判別することができる適宜の期間が設定される。
人の平均移動速度は、鳥や車両等の移動体の平均移動速度と異なっている。
（２）第２の設定期間内における移動軌跡が、連続した所定形状の軌跡を形成している。
人の移動軌跡は、鳥や車両等の移動体の移動軌跡と、異なっている。第２の設定期間としては、人の移動軌跡を判別することができる適宜の期間が設定される。
設定条件は、人の移動に関するデータを収集し、収集したデータに基づいて人の移動に特有の設定条件を設定することができる。あるいは、ディープラーニングを用いた学習により設定することもできる。 The first object detection means 33 detects whether or not the moving objects (Y1) and (Y2) are objects to be detected (hereinafter referred to as "target objects").
Various methods can be used to detect whether the moving bodies (Y1) and (Y2) are target objects. In this embodiment, the moving bodies (Y1), (Y2), (Y1), (Y2), (Y1), (Y2), (Y1), (Y2), (Y1), (Y2), (Y1), (Y2), (Y1), (Y2), (Y1), (Y2), (Y2), Y2) is detected as a target object.
As setting conditions related to the moving speed and the moving trajectory for detecting that the moving body is the target object, setting conditions unique to the target object are used. As the moving speed and moving trajectory, a moving speed and a moving trajectory on a two-dimensional plane (for example, a two-dimensional space including the horizontal direction and the vertical direction of the captured image) may be used, but preferably a three-dimensional space ( For example, the movement speed and movement trajectory in a three-dimensional space including the left-right direction, the up-down direction, and the front-rear direction of the captured image are used.
For example, when the target object is a person, the following setting conditions are used.
(1) The average moving speed within the first set period is within the range between the lower limit and the upper limit.
For example, the lower limit is set to the speed at which a person walks slowly, and the upper limit is set to the speed at which a person runs fast, for example. As the first set period, an appropriate period is set in which the average moving speed of a person can be determined.
The average moving speed of humans is different from the average moving speed of moving objects such as birds and vehicles.
(2) The movement trajectory within the second set period forms a continuous trajectory of a predetermined shape.
The trajectory of movement of a person is different from the trajectory of movement of a moving object such as a bird or a vehicle. As the second set period, an appropriate period is set in which the movement trajectory of the person can be determined.
The setting conditions can be set by collecting data on the movement of people and setting conditions specific to the movement of people based on the collected data. Alternatively, it can be set by learning using deep learning.

以上のように、移動体の動きを検出し、移動体の動きに基づいて検出対象の物体を検出することにより、第２の検出手段４０（第２の物体検出手段４１）では検出することができない、撮像画像に含まれている小さい画像の物体を検出することが可能となった。すなわち、物体の検出精度を向上させることができる。
実験では、撮像画像における物体の最小検出サイズ（図８に示されている、（Ｙ１ｓ[ｔ]）、（Ｙ２ｓ[ｔ]）、（Ｙ１ｓ[ｔ－１]）、（Ｙ２ｓ[ｔ－１]）の最小サイズ）は、第２の検出手段４０を用いた場合の、４分割画像における物体の最小検出サイズ［７×１５画素］より小さい、約［４×７画素］であった。 As described above, by detecting the movement of the moving body and detecting the object to be detected based on the movement of the moving body, the second detection means 40 (second object detection means 41) can detect It is now possible to detect small objects contained in captured images that cannot be detected. That is, object detection accuracy can be improved.
In the experiment, the minimum detectable size of the object in the captured image (shown in FIG. 8, (Y1s[t]), (Y2s[t]), (Y1s[t−1]), (Y2s[t−1] ) was about [4×7 pixels], which is smaller than the minimum detectable size [7×15 pixels] of the object in the quadrant image when the second detection means 40 is used.

第１の検出手段３０による物体検出結果と第２の検出手段４０による物体検出結果を組み合わせた例が図９に示されている。
図９は、撮像手段５０により、監視領域である、ダムの下流の河川敷を撮像した撮像画面（Ｘ）が示されている。図９には、監視領域に、移動体（１）～（４）が存在する状態が示されている。移動体（１）～（４）は、対象物体である人の画像である。撮像画面（Ｘ）では、移動体（１）と（２）の画像は大きく、移動体（３）と（４）の画像は小さい。
移動体（１）と（２）の画像は大きいので、第２の検出手段４０による、ディープラーニングを用いた物体検出処理（ＡＩ検出）で移動体（１）と（２）を検出することができる。
移動体（３）と（４）の画像は小さいので、第２の検出手段４０による物体検出処理（ＡＩ検出）では移動体（３）と（４）を検出することはできないが、第１の検出手段３０による、移動体の動きに基づいた物体検出処理（動きの検出）で移動体（３）と（４）を検出することができる。 FIG. 9 shows an example in which the result of object detection by the first detection means 30 and the result of object detection by the second detection means 40 are combined.
FIG. 9 shows an imaged screen (X) obtained by imaging the riverbed downstream of the dam, which is a monitoring area, by the imaging means 50 . FIG. 9 shows a state in which moving objects (1) to (4) are present in the monitoring area. Moving objects (1) to (4) are images of people, which are target objects. On the imaging screen (X), the images of moving objects (1) and (2) are large, and the images of moving objects (3) and (4) are small.
Since the images of the moving bodies (1) and (2) are large, the moving bodies (1) and (2) can be detected by object detection processing (AI detection) using deep learning by the second detection means 40. can.
Since the images of the moving bodies (3) and (4) are small, the moving bodies (3) and (4) cannot be detected by the object detection processing (AI detection) by the second detection means 40, but the first moving bodies (3) and (4) cannot be detected. The moving bodies (3) and (4) can be detected by the object detection processing (motion detection) based on the movement of the moving bodies by the detecting means 30. FIG.

以上のように、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理では、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理に比べて、種々の物体を精度良く検出することができる。一方、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理では、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理に比べて、画像サイズが小さい物体を検出することができる。
このため、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理と第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を組み合わせることにより、物体の検出精度を向上させることができる。 As described above, in the object detection processing by the second detection means 40 (second object detection means 41), various object can be detected with high accuracy. On the other hand, in the object detection process by the first detection means 30 (first object detection means 33), the image size is smaller than in the object detection process by the second detection means 40 (second object detection means 41). Objects can be detected.
Therefore, by combining object detection processing by the first detection means 30 (first object detection means 33) and object detection processing by the second detection means 40 (second object detection means 41), object detection can be performed. Accuracy can be improved.

第１実施例では、先ず、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を実行する。そして、第２の検出手段４０による物体検出処理によって検出対象である物体を検出することができなかった場合には、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理を実行する。
第１実施例では、第１の検出手段３０（第１の物体検出手段３３）および第２の検出手段４０（第２の物体検出手段４１）の処理負担を軽減することができる。
なお、第１実施例において、第２の検出手段４０による物体検出処理では物体を検出することができなかったが、第１の検出手段３０による物体検出処理で検出対象である物体を検出した場合には、さらに、係員による物体の確認を促す報知を行うように構成することもできる。あるいは、撮像手段５０のズーム機能を用いて撮像画像を拡大し、拡大した撮像画像を示す画像情報に基づいて、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を実行するように構成することもできる。 In the first embodiment, first, object detection processing is performed by the second detection means 40 (second object detection means 41). When the object to be detected cannot be detected by the object detection processing by the second detection means 40, the object detection processing by the first detection means 30 (first object detection means 33) is performed. Run.
In the first embodiment, the processing load on the first detection means 30 (first object detection means 33) and the second detection means 40 (second object detection means 41) can be reduced.
In the first embodiment, the object could not be detected by the object detection processing by the second detection means 40, but when the object to be detected by the object detection processing by the first detection means 30 is detected In addition, it is also possible to configure so as to issue a notification prompting the person in charge to confirm the object. Alternatively, the captured image is enlarged using the zoom function of the imaging means 50, and object detection processing is performed by the second detection means 40 (second object detection means 41) based on the image information indicating the enlarged captured image. It can also be configured to

第２実施例では、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理と第２の検出手段４０（第２の物体検出手段４１）による物体検出処理を、併行して（同時に）実行する。
第２実施例では、短時間で物体を検出することができる。
なお、第２実施例において、第１の検出手段３０による物体検出処理で物体を検出した場合には、さらに、第１実施例と同様に、係員による物体の確認を促す報知を行うように構成し、あるいは、撮像手段５０のズーム機能を用いて撮像画像を拡大し、拡大した撮像画像を示す画像情報に基づいて、第２の検出手段４０による物体検出処理を実行するように構成することもできる。 In the second embodiment, object detection processing by the first detection means 30 (first object detection means 33) and object detection processing by the second detection means 40 (second object detection means 41) are performed in parallel. to run (at the same time).
In the second embodiment, an object can be detected in a short period of time.
In the second embodiment, when an object is detected by the object detection process performed by the first detection means 30, as in the first embodiment, the configuration is such that a notification prompting confirmation of the object by the staff is made. Alternatively, the captured image may be enlarged using the zoom function of the imaging means 50, and the object detection process may be executed by the second detection means 40 based on the image information indicating the enlarged captured image. can.

また、撮像画像に含まれている物体の画像が大きい場合には、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理によって検出することができる。一方、撮像画像に含まれている物体の画像が小さい場合には、第２の検出手段４０（第２の物体検出手段４１）による物体検出処理では検出することはできないが、第１の検出手段３０（第１の物体検出手段３３）による物体検出処理により検出することができる。
このため、撮像画像に含まれている物体の画像が小さい場合には、第１の検出手段３０（第１の物体検出手段３３）のみにより物体検出処理を実行することによって、撮像画像に含まれている物体を検出するように構成することもできる。
すなわち、本発明は、第１の検出手段３０（第１の物体検出手段３３）のみで構成することもできる。 Moreover, when the image of the object included in the captured image is large, it can be detected by the object detection processing by the second detection means 40 (second object detection means 41). On the other hand, when the image of the object included in the captured image is small, it cannot be detected by the object detection processing by the second detection means 40 (second object detection means 41), but the first detection means cannot detect it. 30 (first object detection means 33) can detect the object by object detection processing.
Therefore, when the image of the object included in the captured image is small, by executing the object detection processing only with the first detection means 30 (first object detection means 33), the object included in the captured image is small. It can also be configured to detect an object that is
That is, the present invention can also be configured with only the first detection means 30 (first object detection means 33).

以上の実施形態では、分割画像として、撮像画像を２の２乗（２^２）個（縦方向および横方向に等間隔に２分割）に分割した４分割画像を含む分割画像グループ、撮像画像を２の２乗（２^２）個に分割した４分割画像および３の２乗（３^２）個（縦方向および横方向に等間隔に３分割）に分割した９分割画像を含む分割画像グループ、２の２乗（２^２）個に分割した４分割画像、３の２乗（３^２）個に分割した９分割画像および４の２乗（４^２）個（縦方向および横方向に等間隔に４分割）に分割した１６分割画像を含む分割画像グループを用いたが、分割画像グループを構成する分割画像あるいは分割画像の組み合わせは、これに限定されない。
各分割画像に対する物体検出結果と撮像画像に対する物体検出結果を合成して撮像画像に対する物体検出結果として出力したが、分割画像に対する物体検出結果を合成して撮像画像に対する物体検出結果として出力するように構成することもできる。
１つの分割画像グループを用いたが、複数の分割画像グループを用い、選択した１つの分割画像グループを構成する分割画像に対して物体検出処理を実行し、物体検出結果に物体が含まれていない場合は、異なる分割画像グループを選択し、選択した分割画像グループを構成する分割画像に対して物体検出処理を実行するように構成することもできる。異なる分割画像グループに対する物体検出処理の繰り返しは、適宜のタイミングで終了させることができる。例えば、物体検出処理を実行した分割画像グループの数が設定値に達した時点あるいは物体検出処理の開始から設定時間経過した時点で終了させることができる。
第２の物体検出手段４１による物体検出処理を実行する際に、物体の存在の有無（少なくとも一つの物体が存在していること）を検出することを目的とする場合には、以下のように構成することができる。
第２の物体検出手段４１は、撮像画像に対する物体検出処理を実行し、撮像画像に対する物体検出結果に物体が含まれていない場合には、各分割画像に対する物体検出処理を実行する。検出結果合成手段４３は、撮像画像に対する物体検出結果に物体が含まれている場合には、撮像画像に対する物体検出結果を、撮像画像に対する物体検出結果として出力し、撮像画像に対する物体検出結果に物体が含まれていない場合には、各分割画像に対する物体検出結果を合成し、撮像画像に対する物体検出結果として出力する。なお、各分割画像に対する物体検出結果に物体が含まれていない場合には、異なる数の各分割画像に対する物体検出処理を実行し、異なる数の各分割画像に対する物体検出結果を合成し、撮像画像に対する物体検出結果として出力するように構成することもできる。異なる数の各分割画像に対する物体検出処理の繰り返しは、例えば、前述した、異なる分割画像グループに対する物体検出処理の繰り返しを終了させるタイミングと同様のタイミングで終了させることができる。
この場合、第２の物体検出手段４１による物体検出処理の回数を軽減することができる。 In the above-described embodiment, as divided images, a divided image group including four divided images obtained by dividing a captured image into 2 squared (2 ² ) pieces (divided into two at equal intervals in the vertical direction and the horizontal direction), and a captured image. A divided image group containing 4 divided images divided into 2 squared (2 ² ) pieces and 9 divided images divided into 3 squared (3 ² ) pieces (divided into 3 at equal intervals in the vertical and horizontal directions), 2 squared (2 ² ) images divided into 4 parts, 3 squared (3 ² ) parts divided into 9 parts images and 4 squared (4 ² ) parts (equally spaced vertically and horizontally) Although a divided image group including 16 divided images divided into four divisions is used, a divided image or a combination of divided images constituting a divided image group is not limited to this.
The object detection result for each divided image and the object detection result for the captured image are combined and output as the object detection result for the captured image. Can also be configured.
One divided image group is used, but a plurality of divided image groups are used, object detection processing is performed on the divided images that make up the selected one divided image group, and the object is not included in the object detection result. In this case, it is also possible to select a different divided image group and execute object detection processing on the divided images constituting the selected divided image group. Repetition of object detection processing for different divided image groups can be terminated at an appropriate timing. For example, the object detection processing can be terminated when the number of divided image groups subjected to the object detection processing reaches a set value, or when a set time elapses from the start of the object detection processing.
When executing the object detection process by the second object detection means 41, if the object is to detect the presence or absence of an object (that at least one object exists), the following is performed. Can be configured.
The second object detection unit 41 executes object detection processing on the captured image, and executes object detection processing on each divided image when the object detection result for the captured image does not include an object. When an object is included in the object detection result for the captured image, the detection result synthesizing unit 43 outputs the object detection result for the captured image as the object detection result for the captured image, and outputs the object detection result for the captured image. is not included, the object detection results for each divided image are synthesized and output as the object detection result for the captured image. When the object detection result for each divided image does not include an object, the object detection processing is executed for each of the different numbers of divided images, and the object detection results for each of the different numbers of divided images are combined to form the captured image. It can also be configured to output as an object detection result for. The repetition of the object detection process for different numbers of divided images can be ended at the same timing as the timing for ending the repetition of the object detection process for different divided image groups, for example.
In this case, the number of object detection processes by the second object detection means 41 can be reduced.

本発明は、以下のように構成することもできる。
「（態様１）請求項２～４のうちのいずれかの物体検出装置であって、
前記画像分割手段は、前記撮像手段から出力された前記画像情報で示される前記撮像画像を、少なくとも、第１の数の第１の分割画像に分割するとともに第２の数の第２の分割画像に分割し、
前記第１の数と前記第２の数は、前記第１の分割画像の境界部分と前記第２の分割画像の境界部分が、平行に重ならないように設定されていることを特徴とする物体検出装置。」として構成することができる。
本態様では、第１の分割画像の境界部分と第２の分割画像の境界部分が交差することは許容される。これにより、例えば、一方の分割画像に対する物体検出では検出することができない、一方の分割画像の境界部分に跨って存在する物体を、他方の分割画像に対する物体検出によって検出することができる。第１の数および第２の数としては、適宜の数を設定することができる。分割画像の種類は、第１の数の第１の分割画像と第２の数の第２の分割画像の２種類に限定されない。
本態様では、第１の分割画像と第２の分割画像のうちの一方の分割画像の境界部分における物体の検出精度の低下を、他方の分割画像に対する物体の検出結果によって補うことができる。
また、「（態様２）請求項２～４、態様１のうちのいずれかの物体検出装置であって、
前記画像分割手段は、前記撮像手段から出力された前記画像情報で示される前記撮像画像を、少なくとも、第１の奇数の２乗個の第１の分割画像に分割するとともに第１の偶数の２乗個の第２の分割画像に分割することを特徴とする物体検出装置。」として構成することができる。
好適には、撮像画像を、縦方向および横方向に、等間隔で同じ分割数（奇数あるいは偶数）で分割する。分割画像の種類は、第１の奇数個の２乗個の第１の分割画像と第１の偶数個の２乗個の第２の分割画像の２種類に限定されない。
本態様では、第１の分割画像および第２の分割画像として、撮像画像の縦横比（アスペクト比）とほぼ同じ縦横比の分割画像を用いることができるため、通常の物体検出装置で用いられている画像処理手段を用いて、分割画像に対して物体検出処理を実行した場合でも、画像の縮尺変更によるひずみが無く、物体検出性能に影響はない。
また、「（態様３）請求項２～４、態様１、２のうちのいずれかの物体検出装置であって、
前記画像分割手段は、前記撮像手段から出力された前記画像情報で示される前記撮像画像を、少なくとも１種類の分割画像を含み、分割画像の総数が異なる複数の分割画像グループに分割可能であり、
前記検出結果合成手段は、１つの分割画像グループを構成する各分割画像を示す分割画像情報に基づいて、各分割画像に含まれている前記検出対象である物体を検出し、各分割画像に対する検出結果のいずれかに前記検出対象である物体が含まれている場合には、各分割画像に対する物体検出結果を合成し、前記撮像手段から出力された前記画像情報で示される前記撮像画像に対する物体検出結果として出力し、各分割画像に対する物体検出結果に前記検出対象である物体が含まれていない場合には、異なる分割画像グループに対して同様の処理を行うことを特徴とする物体検出装置。」として構成することができる。
異なる分割画像グループに対する物体検出処理の繰り返しは、適宜のタイミングで終了させることができる。例えば、物体検出処理を実行した分割画像グループの数が設定値に達した時点あるいは物体検出処理の開始から設定時間経過した時点で終了させることができる。
本態様は、好適には、撮像画像に少なくとも一つの物体が含まれていることを検出する場合に用いることができる。
本態様では、撮像画像に物体が存在することを検出した時点で物体検出処理を終了させることができるため、第２の検出手段の処理負担を軽減することができる。
また、「（態様４）請求項２～４、態様１、２のうちのいずれかの物体検出装置であって、
前記第２の物体検出手段は、ディープラーニングを用いて、前記撮像手段から出力された前記画像情報で示される前記撮像画像に含まれている、前記検出対象である物体を検出し、
前記検出結果合成手段は、前記各分割画像に対する物体影検出結果と前記撮像画像に対す物体検出結果を合成し、前記撮像画像に対する物体検出結果として出力することを特徴とする物体検出装置。」として構成することができる。
各分割画像に対する物体検出結果と撮像画像に対する物体検出結果を合成する方法としては、例えば、分割画像における物体の位置情報を撮像画像における位置情報に変換する方法を用いることができる。また、複数の物体検出結果に、カテゴリと位置が同じ物体が含まれている場合には、例えば、物体検出処理において用いた、人物体らしさを示すスコアが高い方の物体を選択する方法を用いることができる。
本態様では、物体の検出精度をより向上させることができる。
また、「（態様５）請求項２～３、態様１～４のうちのいずれかの物体検出装置であって、
前記第２の物体検出手段は、ディープラーニングを用いて、前記撮像画像に含まれている、前記検出対象である物体を検出し、前記撮像画像に対する物体検出結果に、前記検出対象である物体が含まれていない場合に、前記各分割画像に含まれている、前記検出対象である物体を検出し、
前記検出結果合成手段は、前記撮像画像に対する物体検出結果に、前記検出対象である物体が含まれている場合には、前記撮像画像に対する物体検出結果を、前記撮像画像に対する物体検出結果として出力し、前記撮像画像に対する物体検出結果に、前記検出対象である物体が含まれていない場合には、前記各分割画像に対する物体検出結果を合成し、前記撮像画像に対する物体検出結果として出力することを特徴とする物体検出装置。」として構成することができる。
本態様は、好適には、撮像画像に少なくとも一つの物体が存在していることを検出する場合に用いることができる。
本態様は、第２の検出手段の処理負担を軽減することができる。 The present invention can also be configured as follows.
"(Aspect 1) The object detection device according to any one of claims 2 to 4,
The image dividing means divides the captured image indicated by the image information output from the imaging means into at least a first number of first divided images and a second number of second divided images. split into
The object, wherein the first number and the second number are set so that the boundary portion of the first divided image and the boundary portion of the second divided image do not overlap in parallel. detection device. can be configured as
In this aspect, it is permissible for the boundary portion of the first divided image and the boundary portion of the second divided image to intersect. As a result, for example, an object existing across the boundary portion of one divided image, which cannot be detected by object detection on one divided image, can be detected by object detection on the other divided image. Appropriate numbers can be set as the first number and the second number. The types of split images are not limited to the two types of first split images of the first number and second split images of the second number.
In this aspect, a decrease in object detection accuracy in the boundary portion of one of the first divided image and the second divided image can be compensated for by the object detection result for the other divided image.
Further, "(Aspect 2) The object detection device according to any one of claims 2 to 4 and aspect 1,
The image dividing means divides the captured image indicated by the image information output from the imaging means into at least a first odd number squared first divided images and a first even number 2 An object detection device that divides into a power number of second divided images. can be configured as
Preferably, the captured image is divided vertically and horizontally by the same number of divisions (odd or even) at equal intervals. The types of divided images are not limited to the two types of the first odd-numbered squared first divided images and the first even-numbered squared second divided images.
In this aspect, as the first divided image and the second divided image, divided images having an aspect ratio that is substantially the same as the aspect ratio of the captured image can be used. Even if object detection processing is performed on the divided images using the existing image processing means, there is no distortion due to a change in scale of the image, and there is no effect on the object detection performance.
Further, "(Aspect 3) The object detection device according to any one of claims 2 to 4 and aspects 1 and 2,
The image dividing means is capable of dividing the captured image indicated by the image information output from the imaging means into a plurality of divided image groups each including at least one type of divided image and having a different total number of divided images;
The detection result synthesizing means detects the detection target object included in each divided image based on divided image information indicating each divided image constituting one divided image group, and detects the detection target object for each divided image. When any of the results includes the object to be detected, the object detection results for each divided image are synthesized, and object detection is performed for the captured image indicated by the image information output from the imaging means. An object detection apparatus that outputs a result, and performs similar processing on different divided image groups when the object to be detected is not included in the object detection result for each divided image. can be configured as
Repetition of object detection processing for different divided image groups can be terminated at an appropriate timing. For example, the object detection processing can be terminated when the number of divided image groups subjected to the object detection processing reaches a set value, or when a set time elapses from the start of the object detection processing.
This aspect can be preferably used when detecting that at least one object is included in the captured image.
In this aspect, since the object detection processing can be terminated when it is detected that an object exists in the captured image, the processing load on the second detection means can be reduced.
Further, "(Aspect 4) The object detection device according to any one of claims 2 to 4 and aspects 1 and 2,
The second object detection means uses deep learning to detect the object, which is the detection target, included in the captured image indicated by the image information output from the imaging means,
The object detection apparatus, wherein the detection result synthesizing means synthesizes an object shadow detection result for each divided image and an object detection result for the captured image, and outputs the result as an object detection result for the captured image. can be configured as
As a method of synthesizing the object detection result for each divided image and the object detection result for the captured image, for example, a method of converting the position information of the object in the divided image into the position information in the captured image can be used. In addition, when multiple object detection results include objects with the same category and position, for example, the method of selecting an object with a higher score indicating human-object-likeness used in the object detection process is used. be able to.
In this aspect, the object detection accuracy can be further improved.
Further, "(Aspect 5) The object detection device according to any one of Claims 2 to 3 and Aspects 1 to 4,
The second object detection means uses deep learning to detect the object as the detection target contained in the captured image, and the object as the detection target is detected in the object detection result for the captured image. If not included, detecting the object that is the detection target included in each of the divided images;
The detection result synthesizing means outputs the object detection result for the captured image as the object detection result for the captured image when the object to be detected is included in the object detection result for the captured image. and when the object to be detected is not included in the object detection result for the captured image, the object detection results for the divided images are combined and output as the object detection result for the captured image. object detection device. ” can be configured as
This aspect can be preferably used when detecting the presence of at least one object in a captured image.
This aspect can reduce the processing load of the second detection means.

本発明は、実施形態で説明した構成に限定されず、種々の変更、追加、削除が可能である。
差分画像生成手段、移動体検出手段、第１の物体検出手段、第２の物体検出手段、画像分割手段、検出結果合成手段は、実施形態で説明した構成に限定されない。
第１の検出手段は、実施形態で説明した構成に限定されない。
第２の検出手段は、実施形態で説明した構成に限定されない。
実施形態で説明した各構成は、単独で用いることもできるし、適宜選択した複数を組み合わせて用いることもできる。 The present invention is not limited to the configurations described in the embodiments, and various modifications, additions, and deletions are possible.
The difference image generating means, moving body detecting means, first object detecting means, second object detecting means, image dividing means, and detection result synthesizing means are not limited to the configurations described in the embodiments.
The first detection means is not limited to the configuration described in the embodiment.
The second detection means is not limited to the configuration described in the embodiment.
Each configuration described in the embodiment can be used alone, or can be used in combination of appropriately selected pluralities.

１０物体検出装置
２０処理手段
３０第１の検出手段
３１差分画像作成手段
３２移動体検出手段
３３第１の物体検出手段
４０第２の検出手段
４１第２の物体検出手段
４２画像分割手段
４３検出結果合成手段
５０撮像手段
６０記憶手段
７０入力手段
８０出力手段 10 Object detection device 20 Processing means 30 First detection means 31 Difference image creation means 32 Moving body detection means 33 First object detection means 40 Second detection means 41 Second object detection means 42 Image dividing means 43 Detection result Synthesizing means 50 Imaging means 60 Storage means 70 Input means 80 Output means

Claims

An imaging means and a first detection means,
The first detection means has difference image creation means, moving object detection means, and first object detection means,
The imaging means outputs image information indicating a captured image,
The differential image creating means creates differential image information indicating a differential image of the captured image represented by the two image information output from the imaging means at different times,
The moving body detection means detects a moving speed and a movement trajectory of the moving body based on the differential image information created by the differential image creating means,
The first object detection means performs the detection when the moving speed and the moving locus of the moving object detected by the moving object detecting means satisfy the set condition set corresponding to the object to be detected. 1. An object detection device, characterized in that it detects the moving object as the object to be detected.

The object detection device according to claim 1,
comprising a second detection means;
The second detection means has second object detection means, image division means, and detection result synthesis means,
The image dividing means divides the captured image indicated by the image information output from the imaging means into a plurality of divided images, and outputs divided image information indicating each divided image;
The second object detection means uses deep learning to detect the object, which is the detection target, included in each divided image indicated by each divided image information output from the image dividing means,
The detection result synthesizing means synthesizes the object detection results for the respective divided images by the second object detecting means, and outputs the result as an object detection result for the captured image indicated by the image information output from the imaging means. An object detection device characterized by:

The object detection device according to claim 2,
The object detection device, wherein the first detection means is configured to operate when the object to be detected is not detected by the second detection means.

The object detection device according to claim 2,
The object detection device, wherein the first detection means and the second detection means are configured to operate in parallel.