WO2024069778A1 - Object detection system, camera, and object detection method - Google Patents

Object detection system, camera, and object detection method Download PDF

Info

Publication number
WO2024069778A1
WO2024069778A1 PCT/JP2022/036062 JP2022036062W WO2024069778A1 WO 2024069778 A1 WO2024069778 A1 WO 2024069778A1 JP 2022036062 W JP2022036062 W JP 2022036062W WO 2024069778 A1 WO2024069778 A1 WO 2024069778A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
camera
area
camera image
detection
Prior art date
Application number
PCT/JP2022/036062
Other languages
French (fr)
Japanese (ja)
Inventor
一成 岩永
Original Assignee
株式会社日立国際電気
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立国際電気 filed Critical 株式会社日立国際電気
Priority to PCT/JP2022/036062 priority Critical patent/WO2024069778A1/en
Publication of WO2024069778A1 publication Critical patent/WO2024069778A1/en

Links

Images

Definitions

  • the present invention relates to an object detection system that detects objects from camera images taken of a surveillance area.
  • object detection can be performed using a variety of methods that utilize AI (Artificial Intelligence) technology.
  • AI Artificial Intelligence
  • a high-resolution camera image 11 such as a 4K video image
  • a high-resolution learning model 12 for analysis.
  • the high-resolution learning model 12 outputs a result image 13 in which a detection frame indicating a detected object is added to the input image.
  • the result image 13 is displayed on a monitor terminal either as is or after being adjusted for user confirmation.
  • highly accurate detection can be achieved, but analysis by the high-resolution learning model 12 takes time. Furthermore, the high-resolution learning model 12 requires a significant amount of time for learning.
  • a high-resolution camera image 21 is reduced to a predetermined size (e.g., VGA size) to generate a reduced image 22, which is then input to a low-resolution learning model 23 for analysis.
  • the low-resolution learning model 23 outputs a result image 24 as the analysis result, in which a detection frame indicating the detected object is added to the input image.
  • a final output image 25 is generated in which the detection frame is reflected in the original camera image 21 (or a camera image adjusted for user confirmation), and is displayed on a monitor terminal.
  • analysis can be performed quickly in the low-resolution learning model 23, but small objects in the distance become difficult to see due to the image reduction, resulting in a lower detection rate.
  • a reduced image 32 is generated by reducing a high-resolution camera image 31 to a predetermined size, and a cropped image 33 is generated by cutting out a distant area from the camera image, and these are input to a low-resolution learning model 34 for analysis.
  • the low-resolution learning model 34 outputs a result image 35 showing the detection result based on the reduced image 32, and a result image 36 showing the detection result based on the cropped image 33.
  • result images 35 and 36 are combined to generate a final output image 37 in which the detection frame is reflected in the original camera image 37, and this is displayed on the monitor terminal.
  • the third conventional method can achieve high-speed, high-precision object detection.
  • the third conventional method does not pose any particular problems because the distant area in the camera image is also fixed.
  • the distant area setting differs for each camera.
  • setting an appropriate distant area requires specialized knowledge and is a tedious and time-consuming task.
  • the present invention was made in consideration of the above-mentioned conventional circumstances, and aims to make it easier to set an appropriate far-field area to achieve high-speed, high-precision object detection, regardless of the camera installation conditions.
  • an object detection system is configured as follows. That is, in an object detection system that detects objects from camera images taken of a surveillance area, before the start of operation, a process is executed to set a distant area including an area in the image area of the camera image where the size of the object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple objects included in a camera image taken in advance, and during operation, a process is executed to generate a first image by reducing the camera image based on the taken camera image and a second image by cutting out a portion of the distant area from the camera image, input both the first image and the second image into a predetermined learning model to detect objects, and output a process to combine the detection result based on the first image and the detection result based on the second image.
  • the multiple object frames can be set by user operations on a previously captured camera image.
  • the multiple object frames can be set based on the detection results obtained by inputting a previously captured camera image into a learning model.
  • a camera is configured as follows. That is, in a camera that photographs a surveillance area to detect objects, the camera has a function of setting a distant area including an area in the image area of a camera image where the size of an object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple objects included in a camera image photographed before the start of operation, and a function of generating a first image by reducing the camera image based on a camera image photographed during operation and a second image by cutting out a portion of the distant area from the camera image, inputting both the first image and the second image into a predetermined learning model to detect objects, and outputting a combination of the detection results based on the first image and the detection results based on the second image.
  • An object detection method is configured as follows. That is, the object detection method detects an object from a camera image taken of a surveillance area, and includes the steps of: before starting operation, setting a distant area including an area in the image area of the camera image where the size of the object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple objects included in the camera image taken in advance; and during operation, generating a first image by reducing the camera image based on the taken camera image and a second image by cutting out a portion of the distant area from the camera image, inputting both the first image and the second image into a predetermined learning model to detect the object, and outputting a combination of the detection result based on the first image and the detection result based on the second image.
  • the present invention makes it easy to set an appropriate far-field area to achieve high-speed, high-precision object detection, regardless of the camera installation conditions.
  • FIG. 1 is a diagram showing an overview of object detection according to a first conventional method.
  • FIG. 13 is a diagram showing an overview of object detection according to a second conventional method.
  • FIG. 13 is a diagram showing an overview of object detection by a third conventional method.
  • 1 is a diagram illustrating an example of the configuration of an object detection system according to an embodiment of the present invention.
  • 11A and 11B are diagrams illustrating an example of setting a detection area and an object frame for a camera image.
  • FIG. 13 is a diagram showing an example of setting a far region for a camera image.
  • FIG. 13 is a diagram showing another example of setting the far region for the camera image.
  • FIG. 13 is a diagram showing another example of setting an object frame for a camera image.
  • FIG. 4 shows an example of the configuration of an object detection system according to one embodiment of the present invention.
  • the object detection system of this example includes an imaging device 110, an image processing device 120, a monitor terminal 130, and an operation terminal 140. These devices can be connected to each other so that they can communicate with each other via wires or wirelessly. In addition, any network such as the Internet may be interposed between these devices.
  • the imaging device 110 is a device such as a surveillance camera that captures images of a monitored area.
  • a camera capable of outputting high-quality camera images such as 4K video is used as the imaging device 110.
  • the imaging device 110 can be installed under installation conditions that correspond to the situation at the site. Therefore, the angle of view and tilt of the imaging device 110 are not particularly limited. However, for the sake of simplicity of explanation, the imaging device 110 in this example is installed in a posture with virtually no tilt, and captures camera images that capture the monitored area with a horizontal or nearly horizontal line of sight.
  • the high-resolution camera images captured by the imaging device 110 are transmitted to the image processing device 120.
  • the image processing device 120 is, for example, a computer equipped with hardware resources such as a processor and memory, and is configured to read from the memory programs relating to the following functions of the present invention and execute them with the processor.
  • the image processing device 120 has a function of performing object detection using a method similar to the third conventional method described above, based on high-resolution camera images received from the imaging device 110 during system operation.
  • the image processing device 120 further has a function of setting the far area. The far area is set before the system starts operating. The setting of the far area is explained in detail below.
  • the learning model used by the image processing device 120 is for detecting people contained in images, and has, for example, a yolo v3 network structure, and is trained using an input image of 640 x 360 pix.
  • the image processing device 120 receives from the user a detection area to be subject to person detection in a high-resolution camera image captured in advance by the imaging device 110. Similarly, multiple object frames surrounding multiple people included in the camera image are received from the user. In FIG. 5, one detection area 210 and two object frames 221, 222 are set for a camera image 200 captured in advance. Note that detection areas may be set in two or more locations. Also, object frames may be set in three or more locations.
  • the image processing device 120 is configured to display the camera image 200 on the operation terminal 140 and accept settings for the detection area and object frame through operations on that image. Note that the above explanation is just one example, and the method for setting the detection area and object frame is not particularly limited.
  • the image processing device 120 After receiving the detection area and object frame settings, the image processing device 120 estimates the size of the person at each coordinate in the detection area by linear interpolation.
  • the height h1 75 pix
  • the height h1 450 pix.
  • 14 far area frames 230 (7 horizontal x 2 vertical), each with a size of 640 x 360, are set for the image range of (0,0) to (3840,200).
  • the system is set to cut out 14 cropped images from the camera image when it is in operation.
  • multiple far area frames 230 may be positioned so that the boundaries with adjacent frames overlap. This makes it possible to avoid a decrease in detection accuracy caused by a person being cut off in the cropped image when there is a person on the boundary line between adjacent far area frames.
  • the size of the far region frame may be other sizes as long as the value satisfies the above formula.
  • the image may be cropped to a size larger than the expected input size of the low resolution learning model, such as cropping to a size of 1280 x 720 and resizing to 640 x 360, and then reduced to match the expected input size.
  • the far region frame 230 may not be set outside the detection area 210 in the camera image 200. In Figure 7, fewer than the 14 in Figure 6, i.e., seven far region frames 230, are set.
  • the image processing device 120 performs the above processing before the system is operated to set the far area for the camera image of the imaging device 110. During subsequent system operation, the image processing device 120 performs object detection for the camera image received from the imaging device 110 in accordance with the far area setting in a manner similar to the third conventional method described above. That is, the image processing device 120 generates a reduced image obtained by reducing the high-resolution camera image and a cropped image obtained by cutting out a portion of the far area from the camera image. In the case of the setting in FIG. 6, 14 cropped images are generated, and in the case of the setting in FIG. 7, 14 cropped images are generated. The image processing device 120 inputs both the reduced image and the cropped image into a learning model for low resolution to perform human detection.
  • the image processing device 120 combines the detection result based on the reduced image and the multiple detection results based on the multiple cropped images, and outputs the result as a final output image.
  • the final output image output from the image processing device 120 is transmitted to the monitor terminal 130 and displayed by the monitor terminal 130.
  • the reduced image in the above object detection may be reduced to the expected input size of the low-resolution learning model after removing areas other than the detection area. Furthermore, if the aspect ratio of the image after removing areas other than the detection area differs from the aspect ratio of the expected input size, the range to be removed may be expanded so that the aspect ratio matches. Alternatively, the image may be reduced while maintaining the aspect ratio after removal, and the missing parts may be padded.
  • the image processing device 120 of this example has the function of setting a distant area including an area in the image area of a camera image (200) captured in advance by the imaging device 110 where the size of the object frame is equal to or smaller than a threshold value (Th) based on the positions and sizes of multiple object frames (221, 222) surrounding multiple people included in the camera image (200), and the function of generating a reduced image by reducing the camera image and a cropped image by cutting out a portion of the distant area from the camera image based on a camera image captured during system operation, inputting both the reduced image and the cropped image into a low-resolution learning model to perform person detection, and outputting a combination of the detection results based on the reduced image and the detection results based on the cropped image.
  • a threshold value based on the positions and sizes of multiple object frames (221, 222) surrounding multiple people included in the camera image (200
  • the function of generating a reduced image by reducing the camera image and a cropped image by cutting out a portion of the distant area from
  • a person detection system that detects people from camera images has been used as an example, but this technology can be applied to any object detection system that detects various other objects.
  • the image capture device 110 is installed in a substantially tilted position, but the installation manner of the image capture device 110 is not limited to this, and for example, the image capture device 110 may be installed at an angle. In this case, by setting at least three object frames for the camera image, it is possible to appropriately set the distant area.
  • the object frame is set by the user operating the operation terminal 140, but it is also possible to automate the setting of the object frame.
  • a plurality of provisional far area frames 240 are set so as to cover the entire image for a camera image captured in advance by the imaging device 110, and the system is trial-run.
  • a total of 35 provisional far area frames 240, 7 horizontal by 5 vertical, are set.
  • the imaging device 110 and the image processing device 120 are separate devices, but these devices may be integrated.
  • the imaging device 110 may have not only a function for capturing camera images, but also a function for setting a distant area including an area in the image area of a camera image where the size of an object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple people included in the camera image captured before the system is operated, and a function for generating a reduced image by reducing the camera image and a cropped image by cutting out a portion of the distant area from the camera image based on a camera image captured during the system operation, inputting both the reduced image and the cropped image into a learning model for low resolution to perform person detection, and outputting a combination of the detection result based on the reduced image and the detection result based on the cropped image.
  • the present invention can be provided not only as the devices described above or as systems composed of these devices, but also as methods executed by these devices, programs for implementing the functions of these devices using a processor, and storage media for storing such programs in a computer-readable format.
  • the present invention relates to an object detection system that detects objects from camera images taken of a surveillance area.

Abstract

The purpose of the present invention is to facilitate the setting of an appropriate distant area for realizing high-speed, high-accuracy object detection regardless of the camera installation conditions. An image processing device 120 according to the present disclosure comprises: a function for setting, on the basis of positions and sizes of a plurality of object frames (221, 222) that respectively surround a plurality of persons that are included in a camera image (200) that is captured in advance by an image capture device 110, a distant area that includes an area in which the sizes of the object frames in the image area of the camera image become equal to or less than a threshold value; and a function for generating, on the basis of a camera image that is captured during operation, a reduced image that is a reduction of the camera image and a trimmed image in which the distant area portion is trimmed from the camera image, inputting both the reduced image and the trimmed image into a low-resolution learning model to perform person detection, and synthesizing and outputting the detection result based on the reduced image and the detection result based on the trimmed image.

Description

物体検知システム、カメラ、及び物体検知方法OBJECT DETECTION SYSTEM, CAMERA, AND OBJECT DETECTION METHOD
 本発明は、監視エリアを撮影したカメラ画像から対象物を検知する物体検知システムに関する。 The present invention relates to an object detection system that detects objects from camera images taken of a surveillance area.
 従来、監視エリアを撮影したカメラ画像を物体検知用の学習モデルにより解析することで対象物を検知する物体検知システムが研究・開発されている。物体検知は、AI(Artificial Intellignece)技術を利用した種々の手法により行うことが可能である。 Traditionally, research and development has been conducted on object detection systems that detect targets by analyzing camera images taken of a surveillance area using a learning model for object detection. Object detection can be performed using a variety of methods that utilize AI (Artificial Intelligence) technology.
 図1に示す第1の従来方式では、4K映像などの高解像度のカメラ画像11を、そのまま高解像度用学習モデル12に入力して解析させる。高解像度用学習モデル12は、解析結果として、検知した物体を示す検知枠を入力画像に付した結果画像13を出力する。結果画像13は、そのまま又はユーザ確認用に調整された後に、モニタ端末に表示される。第1の従来方式によれば、高精度な検知を実現することができるが、高解像度用学習モデル12での解析に時間がかかる。また、高解像度用学習モデル12は、学習に多大な時間を要する。 In the first conventional method shown in FIG. 1, a high-resolution camera image 11, such as a 4K video image, is input as is into a high-resolution learning model 12 for analysis. As the analysis result, the high-resolution learning model 12 outputs a result image 13 in which a detection frame indicating a detected object is added to the input image. The result image 13 is displayed on a monitor terminal either as is or after being adjusted for user confirmation. According to the first conventional method, highly accurate detection can be achieved, but analysis by the high-resolution learning model 12 takes time. Furthermore, the high-resolution learning model 12 requires a significant amount of time for learning.
 図2に示す第2の従来方式では、高解像度のカメラ画像21を所定サイズ(例えば、VGAサイズ)に縮小した縮小画像22を生成し、低解像度用学習モデル23に入力して解析させる。低解像度用学習モデル23は、解析結果として、検知した物体を示す検知枠を入力画像に付した結果画像24を出力する。この結果画像24に基づいて、検知枠を元のカメラ画像21(又はユーザ確認用に調整したカメラ画像)に反映した最終出力画像25が生成され、モニタ端末に表示される。第2の従来方式によれば、低解像度用学習モデル23での解析を高速に行うことができるが、画像縮小により遠方の小物体が見えづらくなり、検知率が低下してしまう。 In the second conventional method shown in FIG. 2, a high-resolution camera image 21 is reduced to a predetermined size (e.g., VGA size) to generate a reduced image 22, which is then input to a low-resolution learning model 23 for analysis. The low-resolution learning model 23 outputs a result image 24 as the analysis result, in which a detection frame indicating the detected object is added to the input image. Based on this result image 24, a final output image 25 is generated in which the detection frame is reflected in the original camera image 21 (or a camera image adjusted for user confirmation), and is displayed on a monitor terminal. According to the second conventional method, analysis can be performed quickly in the low-resolution learning model 23, but small objects in the distance become difficult to see due to the image reduction, resulting in a lower detection rate.
 図3に示す第3の従来方式では、高解像度のカメラ画像31を所定サイズに縮小した縮小画像32と、カメラ画像から遠方領域の部分を切り出したトリミング画像33とを生成し、低解像度用学習モデル34に入力して解析させる。低解像度用学習モデル34は、解析結果として、縮小画像32に基づく検知結果を示す結果画像35と、トリミング画像33に基づく検知結果を示す結果画像36とを出力する。これら結果画像35,36を合成して、検知枠を元のカメラ画像37に反映した最終出力画像37が生成され、モニタ端末に表示される。 In the third conventional method shown in Figure 3, a reduced image 32 is generated by reducing a high-resolution camera image 31 to a predetermined size, and a cropped image 33 is generated by cutting out a distant area from the camera image, and these are input to a low-resolution learning model 34 for analysis. As the analysis results, the low-resolution learning model 34 outputs a result image 35 showing the detection result based on the reduced image 32, and a result image 36 showing the detection result based on the cropped image 33. These result images 35 and 36 are combined to generate a final output image 37 in which the detection frame is reflected in the original camera image 37, and this is displayed on the monitor terminal.
 第3の従来方式によれば、高速かつ高精度な物体検知を実現することができる。第3の従来方式は、特許文献1に開示された車載カメラのように、カメラの設置条件が固定されている場合には、カメラ画像内の遠方領域も固定されるので、特に問題とはならない。しかしながら、監視カメラのように、現場の状況に応じた多様な設置条件が想定される場合には、カメラ毎に遠方領域の設定が異なる。しかも、適切な遠方領域の設定は、専門的知見が必要であり、面倒な作業である上に手間がかかる。 The third conventional method can achieve high-speed, high-precision object detection. When the installation conditions of the camera are fixed, such as in the case of the vehicle-mounted camera disclosed in Patent Document 1, the third conventional method does not pose any particular problems because the distant area in the camera image is also fixed. However, when a variety of installation conditions are expected depending on the situation at the site, such as in the case of surveillance cameras, the distant area setting differs for each camera. Moreover, setting an appropriate distant area requires specialized knowledge and is a tedious and time-consuming task.
特開2020-4366号公報JP 2020-4366 A
 本発明は、上記のような従来の事情に鑑みて為されたものであり、カメラの設置条件にかかわらず、高速かつ高精度な物体検知を実現するための適切な遠方領域の設定を容易化することを目的とする。 The present invention was made in consideration of the above-mentioned conventional circumstances, and aims to make it easier to set an appropriate far-field area to achieve high-speed, high-precision object detection, regardless of the camera installation conditions.
 上記の目的を達成するために、本発明の一態様に係る物体検知システムは、以下のように構成される。すなわち、監視エリアを撮影したカメラ画像から対象物を検知する物体検知システムにおいて、運用の開始前に、予め撮影されたカメラ画像に含まれる複数の対象物をそれぞれ取り囲む複数の物体枠の位置およびサイズに基づいて、カメラ画像の画像領域において物体枠のサイズが閾値以下となる領域を含む遠方領域を設定する処理を実行し、運用中に、撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した第1画像と、当該カメラ画像から遠方領域の部分を切り出した第2画像を生成し、第1画像と第2画像の両方を所定の学習モデルに入力して対象物の検知を行い、第1画像に基づく検知結果と第2画像に基づく検知結果を合成して出力する処理とを実行する。 In order to achieve the above object, an object detection system according to one aspect of the present invention is configured as follows. That is, in an object detection system that detects objects from camera images taken of a surveillance area, before the start of operation, a process is executed to set a distant area including an area in the image area of the camera image where the size of the object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple objects included in a camera image taken in advance, and during operation, a process is executed to generate a first image by reducing the camera image based on the taken camera image and a second image by cutting out a portion of the distant area from the camera image, input both the first image and the second image into a predetermined learning model to detect objects, and output a process to combine the detection result based on the first image and the detection result based on the second image.
 ここで、複数の物体枠は、予め撮影されたカメラ画像に対するユーザ操作によって設定され得る。または、複数の物体枠は、予め撮影されたカメラ画像を学習モデルに入力して得られた検知結果に基づいて設定され得る。 Here, the multiple object frames can be set by user operations on a previously captured camera image. Alternatively, the multiple object frames can be set based on the detection results obtained by inputting a previously captured camera image into a learning model.
 本発明の別の態様に係るカメラは、以下のように構成される。すなわち、対象物を検知するために監視エリアを撮影するカメラにおいて、運用の開始前に撮影されたカメラ画像に含まれる複数の対象物をそれぞれ取り囲む複数の物体枠の位置およびサイズに基づいて、当該カメラ画像の画像領域において物体枠のサイズが閾値以下となる領域を含む遠方領域を設定する機能と、運用中に撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した第1画像と、当該カメラ画像から遠方領域の部分を切り出した第2画像を生成し、第1画像と第2画像の両方を所定の学習モデルに入力して対象物の検知を行い、第1画像に基づく検知結果と第2画像に基づく検知結果を合成して出力する機能とを有する。 A camera according to another aspect of the present invention is configured as follows. That is, in a camera that photographs a surveillance area to detect objects, the camera has a function of setting a distant area including an area in the image area of a camera image where the size of an object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple objects included in a camera image photographed before the start of operation, and a function of generating a first image by reducing the camera image based on a camera image photographed during operation and a second image by cutting out a portion of the distant area from the camera image, inputting both the first image and the second image into a predetermined learning model to detect objects, and outputting a combination of the detection results based on the first image and the detection results based on the second image.
 本発明の更に別の態様に係る物体検知方法は、以下のように構成される。すなわち、監視エリアを撮影したカメラ画像から対象物を検知する物体検知方法において、運用の開始前に、予め撮影されたカメラ画像に含まれる複数の対象物をそれぞれ取り囲む複数の物体枠の位置およびサイズに基づいて、当該カメラ画像の画像領域において物体枠のサイズが閾値以下となる領域を含む遠方領域を設定するステップと、運用中に、撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した第1画像と、当該カメラ画像から遠方領域の部分を切り出した第2画像を生成し、第1画像と第2画像の両方を所定の学習モデルに入力して対象物の検知を行い、第1画像に基づく検知結果と第2画像に基づく検知結果を合成して出力するステップとを有する。  An object detection method according to yet another aspect of the present invention is configured as follows. That is, the object detection method detects an object from a camera image taken of a surveillance area, and includes the steps of: before starting operation, setting a distant area including an area in the image area of the camera image where the size of the object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple objects included in the camera image taken in advance; and during operation, generating a first image by reducing the camera image based on the taken camera image and a second image by cutting out a portion of the distant area from the camera image, inputting both the first image and the second image into a predetermined learning model to detect the object, and outputting a combination of the detection result based on the first image and the detection result based on the second image.
 本発明によれば、カメラの設置条件にかかわらず、高速かつ高精度な物体検知を実現するための適切な遠方領域の設定を容易化することができる。 The present invention makes it easy to set an appropriate far-field area to achieve high-speed, high-precision object detection, regardless of the camera installation conditions.
第1の従来方式による物体検知の概要を示す図である。FIG. 1 is a diagram showing an overview of object detection according to a first conventional method. 第2の従来方式による物体検知の概要を示す図である。FIG. 13 is a diagram showing an overview of object detection according to a second conventional method. 第3の従来方式による物体検知の概要を示す図である。FIG. 13 is a diagram showing an overview of object detection by a third conventional method. 本発明の一実施形態に係る物体検知システムの構成例を示す図である。1 is a diagram illustrating an example of the configuration of an object detection system according to an embodiment of the present invention. カメラ画像に対する検知エリア及び物体枠の設定例を示す図である。11A and 11B are diagrams illustrating an example of setting a detection area and an object frame for a camera image. カメラ画像に対する遠方領域の設定例を示す図である。FIG. 13 is a diagram showing an example of setting a far region for a camera image. カメラ画像に対する遠方領域の別の設定例を示す図である。FIG. 13 is a diagram showing another example of setting the far region for the camera image. カメラ画像に対する物体枠の別の設定例を示す図である。FIG. 13 is a diagram showing another example of setting an object frame for a camera image.
 本発明の一実施形態について、図面を参照して説明する。図4には、本発明の一実施形態に係る物体検知システムの構成例を示してある。図4に示すように、本例の物体検知システムは、撮像装置110と、画像処理装置120と、モニタ端末130と、操作端末140とを備えている。これらの装置は、有線又は無線により互いに通信可能に接続され得る。また、これらの装置の間に、インターネットなどの任意のネットワークを介在させてもよい。 One embodiment of the present invention will be described with reference to the drawings. FIG. 4 shows an example of the configuration of an object detection system according to one embodiment of the present invention. As shown in FIG. 4, the object detection system of this example includes an imaging device 110, an image processing device 120, a monitor terminal 130, and an operation terminal 140. These devices can be connected to each other so that they can communicate with each other via wires or wirelessly. In addition, any network such as the Internet may be interposed between these devices.
 撮像装置110は、監視エリアを撮影する監視カメラなどの装置である。本例では、撮像装置110として、4K映像などの高画質のカメラ画像を出力できるカメラを使用する。撮像装置110は、現場の状況に応じた設置条件の下で設置され得る。従って、撮像装置110の画角や傾きは、特に限定されない。ただし、説明の簡略化のために、本例の撮像装置110は、実質的に傾きのない姿勢で設置され、監視エリアを水平または略水平の視線で捉えたカメラ画像を撮影するものとする。撮像装置110により撮影された高解像度のカメラ画像は、画像処理装置120へ送信される。 The imaging device 110 is a device such as a surveillance camera that captures images of a monitored area. In this example, a camera capable of outputting high-quality camera images such as 4K video is used as the imaging device 110. The imaging device 110 can be installed under installation conditions that correspond to the situation at the site. Therefore, the angle of view and tilt of the imaging device 110 are not particularly limited. However, for the sake of simplicity of explanation, the imaging device 110 in this example is installed in a posture with virtually no tilt, and captures camera images that capture the monitored area with a horizontal or nearly horizontal line of sight. The high-resolution camera images captured by the imaging device 110 are transmitted to the image processing device 120.
 画像処理装置120は、例えば、プロセッサやメモリなどのハードウェア資源を備えたコンピュータであり、本発明に係る下記の各機能に関するプログラムをメモリから読み出して、プロセッサが実行するように構成される。画像処理装置120は、システム運用中に撮像装置110から受信した高解像度のカメラ画像に基づいて、前述した第3の従来方式と同様な方式で物体検知を行う機能を有する。画像処理装置120は更に、遠方領域の設定機能を有する。遠方領域の設定は、システム運用開始前に実施される。以下、遠方領域の設定について具体的に説明する。 The image processing device 120 is, for example, a computer equipped with hardware resources such as a processor and memory, and is configured to read from the memory programs relating to the following functions of the present invention and execute them with the processor. The image processing device 120 has a function of performing object detection using a method similar to the third conventional method described above, based on high-resolution camera images received from the imaging device 110 during system operation. The image processing device 120 further has a function of setting the far area. The far area is set before the system starts operating. The setting of the far area is explained in detail below.
 ここで、画像処理装置120が用いる学習モデルは、画像に含まれる人物を検出するためのものであり、例えば、yolo v3のネットワーク構造を有し、640×360pixの入力画像を用いて学習されている。また、学習モデルで安定して検知可能な人物の画像内での大きさ(高さ)を表す閾値をThと規定し、Th=50pixであるものとする。つまり、画像内の大きさが50pix以上の人物は安定して検知できるが、それ以下になると検知精度が低下するものとする。なお、閾値Th=50pixは一例であり、学習モデルの構造や学習時に与えるデータによって異なる。 Here, the learning model used by the image processing device 120 is for detecting people contained in images, and has, for example, a yolo v3 network structure, and is trained using an input image of 640 x 360 pix. Furthermore, a threshold representing the size (height) of a person in an image that can be stably detected by the learning model is defined as Th, and Th = 50 pix. In other words, a person whose size in an image is 50 pix or more can be stably detected, but if the size falls below that, the detection accuracy decreases. Note that the threshold Th = 50 pix is just an example, and will vary depending on the structure of the learning model and the data provided during learning.
 遠方領域の設定にあたり、画像処理装置120は、撮像装置110で予め撮影しておいた高解像度のカメラ画像について、人物検知の対象となる検知エリアをユーザから受け付ける。同様に、カメラ画像に含まれる複数の人物をそれぞれ取り囲む複数の物体枠をユーザから受け付ける。図5では、予め撮影したカメラ画像200に対して、1つの検知エリア210と、2つの物体枠221,222を設定してある。なお、検知エリアを2箇所以上に設定してもよい。また、物体枠を3箇所以上に設定してもよい。 When setting the distant area, the image processing device 120 receives from the user a detection area to be subject to person detection in a high-resolution camera image captured in advance by the imaging device 110. Similarly, multiple object frames surrounding multiple people included in the camera image are received from the user. In FIG. 5, one detection area 210 and two object frames 221, 222 are set for a camera image 200 captured in advance. Note that detection areas may be set in two or more locations. Also, object frames may be set in three or more locations.
 これらの設定は、ユーザにより操作端末140を通じて入力され、画像処理装置120に提供される。本例では、画像処理装置120が操作端末140にカメラ画像200を表示させ、その画像に対する操作により検知エリア及び物体枠の設定を受け付けるように構成されている。なお、上記の説明は一例であり、検知エリア及び物体枠を設定する手法は特に限定されない。 These settings are input by the user through the operation terminal 140 and provided to the image processing device 120. In this example, the image processing device 120 is configured to display the camera image 200 on the operation terminal 140 and accept settings for the detection area and object frame through operations on that image. Note that the above explanation is just one example, and the method for setting the detection area and object frame is not particularly limited.
 画像処理装置120は、検知エリア及び物体枠の設定を受け付けた後、検知エリア内の各座標で人物の大きさがどの程度になるかを線形補間により推定する。一例として、4K(3840×2150pix)のカメラ画像において、検知エリアの最も上部のY座標y_u=400、最も下部のY座標y_b=2150であり、第1の物体枠のY座標y1=500、高さh1=75pixであり、第2の物体枠のY座標y2=1000、高さh1=450pixであるとする。また、人物の身長はいずれもh=170cmであるとする。また、リサイズ倍率をrと定義すると、4K(3840×2150pix)のカメラ画像を所定サイズ(例えば、640×360pix)に変換する場合、リサイズ倍率r=640/3850≒0.1667となる。 After receiving the detection area and object frame settings, the image processing device 120 estimates the size of the person at each coordinate in the detection area by linear interpolation. As an example, in a 4K (3840 x 2150 pix) camera image, the Y coordinate of the top of the detection area is y_u = 400, the Y coordinate of the bottom is y_b = 2150, the Y coordinate of the first object frame is y1 = 500, the height h1 = 75 pix, and the Y coordinate of the second object frame is y2 = 1000, the height h1 = 450 pix. In addition, the height of each person is h = 170 cm. If the resize ratio is defined as r, when a 4K (3840 x 2150 pix) camera image is converted to a predetermined size (for example, 640 x 360 pix), the resize ratio r = 640/3850 ≒ 0.1667.
 ここで、任意のY座標における人物の高さHは、以下の式で表すことができる。
  p1=h1/h
  p2=h2/h
  H=(Y×(p2-p1)/(y2-y1)
     +(p2-(p2-p1)/(y2-y1)×y1))×h×r
Here, the height H of a person at any Y coordinate can be expressed by the following formula.
p1=h1/h
p2 = h2/h
H = (Y × (p2 - p1) / (y2 - y1)
+ (p2-(p2-p1)/(y2-y1) x y1) x h x r
 上記の式によると、Y=199.94の場合にH=50となり、閾値Thと同じ値になる。このため、3840×2160から640×360に縮小した画像では、Y=200以上となる位置が、人物を安定的に検出することが可能な範囲となる。従って、それ以外の画像範囲、すなわち、(0,0)~(3840,200)の画像範囲を遠方領域として設定すれば、カメラ画像200の全てのエリアで最大限の条件を満たせるようになる。 According to the above formula, when Y = 199.94, H = 50, which is the same value as the threshold value Th. For this reason, in an image reduced from 3840 x 2160 to 640 x 360, positions where Y = 200 or greater are the range in which people can be reliably detected. Therefore, if the rest of the image range, that is, the image range from (0,0) to (3840,200), is set as the distant region, the maximum conditions can be met in all areas of the camera image 200.
 図6の例では、(0,0)~(3840,200)の画像範囲に対し、各々が640×360のサイズを有する14個(=横7個×縦2個)の遠方領域枠230を設定してある。つまり、システム運用時に、カメラ画像から14個のトリミング画像を切り出すように設定してある。なお、図6に示すように、複数の遠方領域枠230を隣接枠との境界部分が重複するように配置してもよい。これにより、隣接する遠方領域枠同士の境界線上に人物が存在する場合に、トリミング画像上で人物が見切れることによる検知精度の低下を回避することが可能である。 In the example of Figure 6, 14 far area frames 230 (7 horizontal x 2 vertical), each with a size of 640 x 360, are set for the image range of (0,0) to (3840,200). In other words, the system is set to cut out 14 cropped images from the camera image when it is in operation. Note that, as shown in Figure 6, multiple far area frames 230 may be positioned so that the boundaries with adjacent frames overlap. This makes it possible to avoid a decrease in detection accuracy caused by a person being cut off in the cropped image when there is a person on the boundary line between adjacent far area frames.
 ここで、遠方領域枠のサイズは、前述の式を満たす値であれば、他のサイズでも構わない。例えば、1280×720のサイズに切り出して640×360にリサイズするなど、低解像度用学習モデルの想定入力サイズより大きいサイズに切り出し、これを縮小して想定入力サイズに合わせるようにしてもよい。また、図7に示すように、カメラ画像200における検知エリア210の外側には遠方領域枠230が設定されないようにしてもよい。図7では、図6の14個より少ない数、すなわち、7個の遠方領域枠230が設定されている。 Here, the size of the far region frame may be other sizes as long as the value satisfies the above formula. For example, the image may be cropped to a size larger than the expected input size of the low resolution learning model, such as cropping to a size of 1280 x 720 and resizing to 640 x 360, and then reduced to match the expected input size. Also, as shown in Figure 7, the far region frame 230 may not be set outside the detection area 210 in the camera image 200. In Figure 7, fewer than the 14 in Figure 6, i.e., seven far region frames 230, are set.
 画像処理装置120は、システム運用前に上記の処理を行うことで、撮像装置110のカメラ画像に対する遠方領域を設定する。その後のシステム運用時に、画像処理装置120は、撮像装置110から受信したカメラ画像に対して、遠方領域の設定に従って、前述した第3の従来方式と同様な方式で物体検知を行う。つまり、画像処理装置120は、高解像度のカメラ画像を縮小した縮小画像と、当該カメラ画像から遠方領域の部分を切り出したトリミング画像を生成する。図6の設定の場合には14個のトリミング画像が生成され、図7の設定の場合には14個のトリミング画像が生成される。画像処理装置120は、これら縮小画像とトリミング画像の両方を低解像度用学習モデルに入力して人物検知を行う。その後、画像処理装置120は、縮小画像に基づく検知結果と複数のトリミング画像に基づく複数の検知結果とを合成し、最終出力画像として出力する。画像処理装置120から出力される最終出力画像は、モニタ端末130へ送信され、モニタ端末130により表示される。 The image processing device 120 performs the above processing before the system is operated to set the far area for the camera image of the imaging device 110. During subsequent system operation, the image processing device 120 performs object detection for the camera image received from the imaging device 110 in accordance with the far area setting in a manner similar to the third conventional method described above. That is, the image processing device 120 generates a reduced image obtained by reducing the high-resolution camera image and a cropped image obtained by cutting out a portion of the far area from the camera image. In the case of the setting in FIG. 6, 14 cropped images are generated, and in the case of the setting in FIG. 7, 14 cropped images are generated. The image processing device 120 inputs both the reduced image and the cropped image into a learning model for low resolution to perform human detection. After that, the image processing device 120 combines the detection result based on the reduced image and the multiple detection results based on the multiple cropped images, and outputs the result as a final output image. The final output image output from the image processing device 120 is transmitted to the monitor terminal 130 and displayed by the monitor terminal 130.
 ここで、上記の物体検知における縮小画像は、検知エリア以外の領域を除去した上で、低解像度用学習モデルの想定入力サイズに縮小するようにしてもよい。また、検知エリア以外の領域を除去した後の画像の縦横比が想定入力サイズの縦横比とは異なる場合には、縦横比が合うように除去する範囲を広げてもよい。あるいは、除去後に縦横比を維持したまま縮小して、不足部分をパディングしてもよい。 Here, the reduced image in the above object detection may be reduced to the expected input size of the low-resolution learning model after removing areas other than the detection area. Furthermore, if the aspect ratio of the image after removing areas other than the detection area differs from the aspect ratio of the expected input size, the range to be removed may be expanded so that the aspect ratio matches. Alternatively, the image may be reduced while maintaining the aspect ratio after removal, and the missing parts may be padded.
 以上のように、本例の画像処理装置120は、撮像装置110により予め撮影されたカメラ画像(200)に含まれる複数の人物をそれぞれ取り囲む複数の物体枠(221,222)の位置およびサイズに基づいて、当該カメラ画像の画像領域において物体枠のサイズが閾値(Th)以下となる領域を含む遠方領域を設定する機能と、システム運用中に撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した縮小画像と、当該カメラ画像から遠方領域の部分を切り出したトリミング画像を生成し、縮小画像とトリミング画像の両方を低解像度用学習モデルに入力して人物検知を行い、縮小画像に基づく検知結果とトリミング画像に基づく検知結果を合成して出力する機能とを有している。このような構成により、撮像装置110の設置条件にかかわらず、高速かつ高精度な物体検知を実現するための適切な遠方領域の設定を容易に実施することが可能である。 As described above, the image processing device 120 of this example has the function of setting a distant area including an area in the image area of a camera image (200) captured in advance by the imaging device 110 where the size of the object frame is equal to or smaller than a threshold value (Th) based on the positions and sizes of multiple object frames (221, 222) surrounding multiple people included in the camera image (200), and the function of generating a reduced image by reducing the camera image and a cropped image by cutting out a portion of the distant area from the camera image based on a camera image captured during system operation, inputting both the reduced image and the cropped image into a low-resolution learning model to perform person detection, and outputting a combination of the detection results based on the reduced image and the detection results based on the cropped image. With this configuration, it is possible to easily set an appropriate distant area to achieve high-speed and high-precision object detection regardless of the installation conditions of the imaging device 110.
 ここで、上記の説明では、カメラ画像から人物を検知する人物検知システムを例にしたが、本技術は、他の種々の物体を検知する任意の物体検知システムに適用することが可能である。また、上記の説明では、撮像装置110が実質的に傾きのない姿勢で設置されることを前提としているが、撮像装置110の設置態様はこれに限定されず、例えば、撮像装置110を傾けて設置される場合もあり得る。この場合には、カメラ画像に対して少なくとも3つの物体枠を設定することで、遠方領域を適切に設定することが可能である。 In the above explanation, a person detection system that detects people from camera images has been used as an example, but this technology can be applied to any object detection system that detects various other objects. Also, in the above explanation, it is assumed that the image capture device 110 is installed in a substantially tilted position, but the installation manner of the image capture device 110 is not limited to this, and for example, the image capture device 110 may be installed at an angle. In this case, by setting at least three object frames for the camera image, it is possible to appropriately set the distant area.
 また、上記の説明では、操作端末140をユーザが操作することで物体枠の設定を行っているが、物体枠の設定を自動化することも可能である。具体的には、例えば、撮像装置110により予め撮影されたカメラ画像に対し、図8に示すように、画像全体を網羅するように複数の仮の遠方領域枠240を設定して、システムを試運用する。図8の例では、横7個×縦5個の合計35個の仮の遠方領域枠240を設定してある。このような設定でシステムを試運用することで、画像処理装置120の処理負担が大きくなり、かつ、ある程度の時間がかかるものの、カメラ画像に含まれる人物を正確に検知することができる。従って、各人物をそれぞれ取り囲む複数の物体枠を、ユーザ操作なしで自動的に設定することが可能となる。物体枠の設定の自動化は、撮像装置110の姿勢をシステム運用中に変化させる場合などにおいて、特に有効である。 In the above description, the object frame is set by the user operating the operation terminal 140, but it is also possible to automate the setting of the object frame. Specifically, for example, as shown in FIG. 8, a plurality of provisional far area frames 240 are set so as to cover the entire image for a camera image captured in advance by the imaging device 110, and the system is trial-run. In the example of FIG. 8, a total of 35 provisional far area frames 240, 7 horizontal by 5 vertical, are set. By trial-running the system with such settings, the processing load on the image processing device 120 increases and it takes a certain amount of time, but it is possible to accurately detect people included in the camera image. Therefore, it is possible to automatically set a plurality of object frames surrounding each person without user operation. The automation of the setting of the object frame is particularly effective in cases where the attitude of the imaging device 110 is changed during system operation.
 また、上記の説明では、撮像装置110と画像処理装置120を別々の装置としたが、これらの装置を一体化してもよい。つまり、撮像装置110が、カメラ画像を撮影する機能だけでなく、システム運用前に撮影されたカメラ画像に含まれる複数の人物をそれぞれ取り囲む複数の物体枠の位置およびサイズに基づいて、当該カメラ画像の画像領域において物体枠のサイズが閾値以下となる領域を含む遠方領域を設定する機能と、システム運用中に撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した縮小画像と、当該カメラ画像から遠方領域の部分を切り出したトリミング画像を生成し、縮小画像とトリミング画像の両方を低解像度用学習モデルに入力して人物検知を行い、縮小画像に基づく検知結果とトリミング画像に基づく検知結果を合成して出力する機能とを有してもよい。 In addition, in the above description, the imaging device 110 and the image processing device 120 are separate devices, but these devices may be integrated. In other words, the imaging device 110 may have not only a function for capturing camera images, but also a function for setting a distant area including an area in the image area of a camera image where the size of an object frame is equal to or smaller than a threshold based on the positions and sizes of multiple object frames surrounding each of multiple people included in the camera image captured before the system is operated, and a function for generating a reduced image by reducing the camera image and a cropped image by cutting out a portion of the distant area from the camera image based on a camera image captured during the system operation, inputting both the reduced image and the cropped image into a learning model for low resolution to perform person detection, and outputting a combination of the detection result based on the reduced image and the detection result based on the cropped image.
 以上、本発明の実施形態について説明したが、これら実施形態は例示に過ぎず、本発明の技術的範囲を限定するものではない。本発明は、その他の様々な実施形態をとることが可能であると共に、本発明の要旨を逸脱しない範囲で、省略や置換等の種々の変形を行うことができる。これら実施形態及びその変形は、本明細書等に記載された発明の範囲や要旨に含まれると共に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The above describes the embodiments of the present invention, but these embodiments are merely illustrative and do not limit the technical scope of the present invention. The present invention can take various other embodiments, and various modifications such as omissions and substitutions can be made without departing from the gist of the present invention. These embodiments and their modifications are included in the scope and gist of the invention described in this specification, etc., and are included in the scope of the invention described in the claims and their equivalents.
 また、本発明は、上記の説明で挙げたような装置や、これら装置で構成されたシステムとして提供することが可能なだけでなく、これら装置により実行される方法、これら装置の機能をプロセッサにより実現させるためのプログラム、そのようなプログラムをコンピュータ読み取り可能に記憶する記憶媒体などとして提供することも可能である。 In addition, the present invention can be provided not only as the devices described above or as systems composed of these devices, but also as methods executed by these devices, programs for implementing the functions of these devices using a processor, and storage media for storing such programs in a computer-readable format.
 本発明は、監視エリアを撮影したカメラ画像から対象物を検知する物体検知システムに関する。 The present invention relates to an object detection system that detects objects from camera images taken of a surveillance area.
 110:撮像装置、 120:画像処理装置、 130:モニタ端末、 140:操作端末 110: Imaging device, 120: Image processing device, 130: Monitor terminal, 140: Operation terminal

Claims (5)

  1.  監視エリアを撮影したカメラ画像から対象物を検知する物体検知システムにおいて、
     運用の開始前に、予め撮影されたカメラ画像に含まれる複数の対象物をそれぞれ取り囲む複数の物体枠の位置およびサイズに基づいて、当該カメラ画像の画像領域において前記物体枠のサイズが閾値以下となる領域を含む遠方領域を設定する処理を実行し、
     運用中に、撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した第1画像と、当該カメラ画像から前記遠方領域の部分を切り出した第2画像を生成し、前記第1画像と前記第2画像の両方を所定の学習モデルに入力して対象物の検知を行い、前記第1画像に基づく検知結果と前記第2画像に基づく検知結果を合成して出力する処理とを実行することを特徴とする物体検知システム。
    In an object detection system that detects objects from camera images taken in a surveillance area,
    Before starting operation, a process is executed to set a distant region including a region in an image region of a camera image where the size of the object frame is equal to or smaller than a threshold value based on the positions and sizes of a plurality of object frames surrounding a plurality of objects included in the camera image captured in advance;
    An object detection system characterized by performing a process of generating, during operation, a first image obtained by reducing a captured camera image and a second image obtained by cutting out a portion of the distant area from the camera image, inputting both the first image and the second image into a predetermined learning model to detect objects, and synthesizing and outputting the detection result based on the first image and the detection result based on the second image.
  2.  請求項1に記載の物体検知システムにおいて、
     前記複数の物体枠は、前記予め撮影されたカメラ画像に対するユーザ操作によって設定されることを特徴とする物体検知システム。
    2. The object detection system according to claim 1,
    The object detection system according to claim 1, wherein the plurality of object frames are set by a user operation on the pre-captured camera image.
  3.  請求項1に記載の物体検知システムにおいて、
     前記複数の物体枠は、前記予め撮影されたカメラ画像を前記学習モデルに入力して得られた検知結果に基づいて設定されることを特徴とする物体検知システム。
    2. The object detection system according to claim 1,
    An object detection system characterized in that the multiple object frames are set based on detection results obtained by inputting the previously captured camera images into the learning model.
  4.  対象物を検知するために監視エリアを撮影するカメラにおいて、
     運用の開始前に撮影されたカメラ画像に含まれる複数の対象物をそれぞれ取り囲む複数の物体枠の位置およびサイズに基づいて、当該カメラ画像の画像領域において前記物体枠のサイズが閾値以下となる領域を含む遠方領域を設定する機能と、
     運用中に撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した第1画像と、当該カメラ画像から前記遠方領域の部分を切り出した第2画像を生成し、前記第1画像と前記第2画像の両方を所定の学習モデルに入力して対象物の検知を行い、前記第1画像に基づく検知結果と前記第2画像に基づく検知結果を合成して出力する機能とを有することを特徴とするカメラ。
    In a camera that captures images of a surveillance area to detect objects,
    A function of setting a distant area including an area in an image area of a camera image where the size of an object frame is equal to or smaller than a threshold value based on the positions and sizes of a plurality of object frames surrounding each of a plurality of objects included in the camera image captured before the start of operation;
    A camera characterized by having a function of generating a first image by reducing the camera image based on a camera image taken during operation, and a second image by cutting out a portion of the distant area from the camera image, inputting both the first image and the second image into a predetermined learning model to detect objects, and synthesizing and outputting the detection results based on the first image and the detection results based on the second image.
  5.  監視エリアを撮影したカメラ画像から対象物を検知する物体検知方法において、
     運用の開始前に、予め撮影されたカメラ画像に含まれる複数の対象物をそれぞれ取り囲む複数の物体枠の位置およびサイズに基づいて、当該カメラ画像の画像領域において前記物体枠のサイズが閾値以下となる領域を含む遠方領域を設定するステップと、
     運用中に、撮影されたカメラ画像に基づいて、当該カメラ画像を縮小した第1画像と、当該カメラ画像から前記遠方領域の部分を切り出した第2画像を生成し、前記第1画像と前記第2画像の両方を所定の学習モデルに入力して対象物の検知を行い、前記第1画像に基づく検知結果と前記第2画像に基づく検知結果を合成して出力するステップとを有することを特徴とする物体検知方法。
    An object detection method for detecting an object from a camera image of a surveillance area, comprising:
    Before starting operation, a step of setting a distant area including an area in an image area of a camera image where the size of the object frame is equal to or smaller than a threshold value based on the positions and sizes of a plurality of object frames surrounding each of a plurality of objects included in a camera image captured in advance;
    An object detection method comprising the steps of: generating, during operation, a first image obtained by reducing a captured camera image and a second image obtained by cutting out a portion of the distant area from the camera image; inputting both the first image and the second image into a predetermined learning model to detect an object; and synthesizing and outputting the detection result based on the first image and the detection result based on the second image.
PCT/JP2022/036062 2022-09-28 2022-09-28 Object detection system, camera, and object detection method WO2024069778A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/036062 WO2024069778A1 (en) 2022-09-28 2022-09-28 Object detection system, camera, and object detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/036062 WO2024069778A1 (en) 2022-09-28 2022-09-28 Object detection system, camera, and object detection method

Publications (1)

Publication Number Publication Date
WO2024069778A1 true WO2024069778A1 (en) 2024-04-04

Family

ID=90476784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/036062 WO2024069778A1 (en) 2022-09-28 2022-09-28 Object detection system, camera, and object detection method

Country Status (1)

Country Link
WO (1) WO2024069778A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012164804A1 (en) * 2011-06-02 2012-12-06 パナソニック株式会社 Object detection device, object detection method, and object detection program
JP2020004366A (en) * 2018-06-25 2020-01-09 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Information processing device, information processing method, and program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012164804A1 (en) * 2011-06-02 2012-12-06 パナソニック株式会社 Object detection device, object detection method, and object detection program
JP2020004366A (en) * 2018-06-25 2020-01-09 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America Information processing device, information processing method, and program

Similar Documents

Publication Publication Date Title
JP5906028B2 (en) Image processing apparatus and image processing method
US8319851B2 (en) Image capturing apparatus, face area detecting method and program recording medium
US8274572B2 (en) Electronic camera capturing a group of a plurality of specific objects
JP6381353B2 (en) Image processing apparatus, imaging apparatus, image processing method, and program
JP4540705B2 (en) Image processing method, image processing system, imaging apparatus, image processing apparatus, and computer program
JP4872396B2 (en) Image editing apparatus, image editing method, and image editing program
JP2013218432A (en) Image processing device, image processing method, program for image processing, and recording medium
JP5911227B2 (en) Determination apparatus, determination method, and program
JP4894708B2 (en) Imaging device
EP3503021B1 (en) Image information processing device and system, image information processing method and storage medium
JP5247338B2 (en) Image processing apparatus and image processing method
EP3576419A1 (en) Image processing apparatus, information processing apparatus, information processing method, and program
WO2018196854A1 (en) Photographing method, photographing apparatus and mobile terminal
WO2024069778A1 (en) Object detection system, camera, and object detection method
US20180174345A1 (en) Non-transitory computer-readable storage medium, display control device and display control method
US20180220077A1 (en) Information processing apparatus having camera function, display control method thereof, and storage medium
JP2016111561A (en) Information processing device, system, information processing method, and program
US9159118B2 (en) Image processing apparatus, image processing system, and non-transitory computer-readable medium
US20150109487A1 (en) Image correction method and apparatus for visually impaired person
JP2007249526A (en) Imaging device, and face area extraction method
US20200068130A1 (en) Image capturing device and captured image display method
JP2021005798A (en) Imaging apparatus, control method of imaging apparatus, and program
KR101392382B1 (en) Apparatus and method for emboding panorama function of camera in portable terminal
JP6234200B2 (en) Image processing apparatus, image processing method, and program
JP2006190106A (en) Pattern detection program and pattern detection apparatus