JP2021077091A

JP2021077091A - Image processing device and image processing method

Info

Publication number: JP2021077091A
Application number: JP2019203261A
Authority: JP
Inventors: 康貴岡田; Yasutaka Okada; 竜介関; Ryusuke Seki
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2021-05-20
Anticipated expiration: 2039-11-08
Also published as: JP7360303B2

Abstract

To provide a technique capable of precisely detecting an object even when the object to be detected in an image is small.SOLUTION: The image processing device includes: a setting unit that sets multiple detection target areas where to be tried to detect objects in an acquired captured image; a conversion unit that converts the images at least multiple detection target areas out of the captured images from a color image into a gray scale image; and an object detection unit that has multiple channels for independently inputting the gray scale images in each of the multiple detection target areas, and outputs the detection result of the object based on the detection processing of the object which is made on each of the channels.SELECTED DRAWING: Figure 1

Description

本発明は、画像処理装置および画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method.

従来、ニューラルネットワークを用いて画像中の顔等の物体を検知することが行われている（例えば特許文献１参照）。近年においては、多層のニューラルネットワーク（ＤＮＮ：Deep Neural Network）を用いた物体検知手法の開発が盛んである。 Conventionally, an object such as a face in an image has been detected by using a neural network (see, for example, Patent Document 1). In recent years, the development of an object detection method using a multi-layer neural network (DNN: Deep Neural Network) has been active.

特開２０１２−４８４７６号公報Japanese Unexamined Patent Publication No. 2012-48476

ところで、ＤＮＮを用いた物体検知においては、ＤＮＮの入力層のサイズに合わせるために、カメラで撮影した撮影画像に対して画像サイズの縮小処理（解像度を下げる処理）が行われることがある。例えば車載向け等の計算リソースが限られた状況下では、縮小処理が行われた画像をＤＮＮに入力して、物体検知処理が行われることが一般的である。 By the way, in the object detection using the DNN, in order to match the size of the input layer of the DNN, an image size reduction process (a process of lowering the resolution) may be performed on the captured image taken by the camera. For example, in a situation where the calculation resources for automobiles are limited, it is common to input the reduced image to the DNN and perform the object detection process.

例えば、撮影画像において検知したい物体の大きさが小さい場合、又は、検知したい物体が遠方に存在するために見かけ上小さくなっている場合等には、画像サイズの縮小処理によって特徴量が失われることがある。このために、カメラで撮影した画像に対して単純に縮小処理を行ってＤＮＮを用いた物体検知処理を行うと、物体検知の精度が低下する虞がある。 For example, when the size of the object to be detected in the captured image is small, or when the object to be detected is apparently small because it exists in a distant place, the feature amount is lost by the image size reduction process. There is. For this reason, if the image captured by the camera is simply reduced and the object detection process using the DNN is performed, the accuracy of the object detection may decrease.

本発明は、上記の課題に鑑み、画像中の検知したい物体が小さい場合でも精度良く物体検知を行うことができる技術を提供することを目的とする。 In view of the above problems, it is an object of the present invention to provide a technique capable of accurately detecting an object even when the object to be detected in the image is small.

上記目的を達成するために本発明の画像処理装置は、取得した撮影画像に物体の検知を試みる複数の検知対象領域を設定する設定部と、前記撮影画像のうち少なくとも前記複数の検知対象領域の画像をカラー画像からグレースケール画像に変換する変換部と、前記複数の検知対象領域それぞれの前記グレースケール画像を別々に入力する複数のチャンネルを有し、前記チャンネル毎に行われる前記物体の検知処理に基づき前記物体の検知結果を出力する物体検知部と、を備える構成（第１の構成）になっている。 In order to achieve the above object, the image processing apparatus of the present invention includes a setting unit for setting a plurality of detection target areas for attempting to detect an object in the acquired captured image, and at least the plurality of detection target regions of the captured image. It has a conversion unit that converts an image from a color image to a grayscale image, and a plurality of channels for separately inputting the grayscale image of each of the plurality of detection target areas, and the object detection process performed for each channel. The configuration (first configuration) includes an object detection unit that outputs the detection result of the object based on the above.

また、上記第１の構成の画像処理装置において、前記設定部は、前記撮影画像の一部の範囲を対象として前記複数の検知対象領域を設定する構成（第２の構成）であることが好ましい。 Further, in the image processing device having the first configuration, it is preferable that the setting unit has a configuration (second configuration) in which the plurality of detection target regions are set for a part of the captured image. ..

また、上記第１又は第２の構成の画像処理装置において、前記物体検知部は、各前記検知対象領域の座標系で求めた前記チャンネル毎の処理結果を、前記撮影画像全体の座標系に統合して前記物体の検知結果を出力する構成（第３の構成）であることが好ましい。 Further, in the image processing apparatus having the first or second configuration, the object detection unit integrates the processing results for each channel obtained in the coordinate system of each detection target region into the coordinate system of the entire captured image. It is preferable that the configuration is such that the detection result of the object is output (third configuration).

また、上記第１から第３のいずれかの構成の画像処理装置において、各前記チャンネルに入力される前記グレースケール画像の解像度は、前記検知対象領域が設定される位置に応じて変更される構成（第４の構成）であってよい。 Further, in the image processing apparatus having any of the first to third configurations, the resolution of the grayscale image input to each of the channels is changed according to the position where the detection target area is set. (Fourth configuration).

また、上記第１から第４のいずれかの構成の画像処理装置において、前記複数の検知対象領域は３つであり、前記複数のチャンネルは３つである構成（第５の構成）であることが好ましい。 Further, in the image processing apparatus having any of the first to fourth configurations, the plurality of detection target regions are three and the plurality of channels are three (fifth configuration). Is preferable.

また、上記第１から第５のいずれかの構成の画像処理装置は、先に取得された前記撮影画像に対する前記物体検知部の前記物体の検知結果に基づいて、現在取得されている前記撮影画像に対して探索範囲を設定し、前記探索範囲内において前記物体の追跡を行うトラッキング部を更に備える構成（第６の構成）であってよい。 Further, the image processing device having any of the first to fifth configurations is currently acquired the captured image based on the detection result of the object by the object detection unit with respect to the captured image previously acquired. A search range may be set for the object, and a tracking unit for tracking the object within the search range may be further provided (sixth configuration).

また、上記第６の構成の画像処理装置において、前記探索範囲は、過去の前記物体の動きを示す軌跡情報に基づいて変更される構成（第７の構成）であってよい。 Further, in the image processing apparatus having the sixth configuration, the search range may be changed based on the locus information indicating the past movement of the object (seventh configuration).

また、上記第１から第７のいずれかの構成の画像処理装置において、前記設定部は、前記撮影画像と、予め準備された背景画像との比較により変化量が大きい領域を前記検知対象領域に設定する構成（第８の構成）であってよい。 Further, in the image processing apparatus having any of the first to seventh configurations, the setting unit sets a region having a large amount of change as the detection target region by comparing the captured image with the background image prepared in advance. It may be a configuration to be set (eighth configuration).

また、上記目的を達成するために本発明の画像処理方法は、取得した撮影画像に物体の検知を試みる複数の検知対象領域を設定する設定工程と、前記撮影画像のうち少なくとも前記複数の検知対象領域の画像をカラー画像からグレースケール画像に変換する変換工程と、前記複数の検知対象領域それぞれの前記グレースケール画像を互いに異なるチャンネルに入力し、前記チャンネル毎に行われた前記物体の検知処理に基づき前記物体の検知結果を出力する物体検知工程と、を備える構成（第９の構成）になっている。 Further, in order to achieve the above object, the image processing method of the present invention includes a setting step of setting a plurality of detection target areas for attempting to detect an object in the acquired captured image, and at least the plurality of detection targets of the captured image. In the conversion step of converting the image of the region from the color image to the grayscale image, and the grayscale image of each of the plurality of detection target regions is input to different channels, and the object detection process performed for each channel is performed. Based on this, the configuration (9th configuration) includes an object detection step for outputting the detection result of the object.

本発明によれば、画像中の検知したい物体が小さい場合でも精度良く物体検知を行うことができる。 According to the present invention, even if the object to be detected in the image is small, the object can be detected with high accuracy.

第１実施形態に係る画像処理装置の構成を示す図The figure which shows the structure of the image processing apparatus which concerns on 1st Embodiment 設定部による複数の検知対象領域の設定例を示す図The figure which shows the setting example of a plurality of detection target areas by a setting part 設定部による複数の検知対象領域の別の設定例を示す図The figure which shows another setting example of a plurality of detection target areas by a setting part 物体検知部の機能を説明するための模式図Schematic diagram for explaining the function of the object detection unit 第１実施形態に係る画像処理装置の動作例を示すフローチャートA flowchart showing an operation example of the image processing apparatus according to the first embodiment. 取得部により取得される撮影画像の一例を示す図The figure which shows an example of the photographed image acquired by the acquisition part 検知対象領域の設定例を示す図The figure which shows the setting example of the detection target area 物体の検知結果を例示する図Diagram exemplifying the detection result of an object 第２実施形態に係る画像処理装置の構成を示す図The figure which shows the structure of the image processing apparatus which concerns on 2nd Embodiment トラッキング部の機能を説明するための図Diagram for explaining the function of the tracking unit 軌跡情報に基づく探索範囲の変更を説明するための図Diagram for explaining the change of the search range based on the trajectory information 第２実施形態に係る画像処理装置の動作例を説明するための図The figure for demonstrating the operation example of the image processing apparatus which concerns on 2nd Embodiment

以下、本発明の例示的な実施形態について、図面を参照しながら詳細に説明する。 Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the drawings.

＜１．第１実施形態＞
（１−１．画像処理装置の構成）
図１は、本発明の第１実施形態に係る画像処理装置１の構成を示す図である。なお、図１においては、第１実施形態の画像処理装置１の特徴を説明するために必要な構成要素のみが示されており、一般的な構成要素についての記載は省略されている。また、図１には、理解を容易とするために画像処理装置１とは別の構成要素であるカメラ２も示されている。 <1. First Embodiment>
(1-1. Configuration of image processing device)
FIG. 1 is a diagram showing a configuration of an image processing device 1 according to a first embodiment of the present invention. Note that FIG. 1 shows only the components necessary for explaining the features of the image processing device 1 of the first embodiment, and the description of general components is omitted. Further, FIG. 1 also shows a camera 2 which is a component different from the image processing device 1 in order to facilitate understanding.

画像処理装置１は、例えば車両等の移動体に搭載されてよい。車両には、例えば自動車、電車、無人搬送車等の車輪のついた乗り物が広く含まれる。画像処理装置１は、例えば車両に搭載されるナビゲーション装置やドライブレコーダ等の車載装置に含まれてよい。画像処理装置１は、移動体に搭載されなくてもよく、例えば、商業施設や駐車場等に設けられる監視施設や、高速道路の料金所等の建物内に配置されてもよい。また、画像処理装置１は、例えば、車載装置等の端末装置とネットワーク等を介して通信可能に設けられるクラウドサーバ等のサーバ装置に含まれてもよい。また、画像処理装置１は、例えば、スマートフォンやタブレット等の携帯端末に含まれてもよい。 The image processing device 1 may be mounted on a moving body such as a vehicle. Vehicles include a wide range of vehicles with wheels, such as automobiles, trains, and automatic guided vehicles. The image processing device 1 may be included in an in-vehicle device such as a navigation device or a drive recorder mounted on a vehicle, for example. The image processing device 1 does not have to be mounted on a moving body, and may be arranged in a building such as a monitoring facility provided in a commercial facility or a parking lot, or a tollhouse on an expressway, for example. Further, the image processing device 1 may be included in, for example, a server device such as a cloud server provided so as to be able to communicate with a terminal device such as an in-vehicle device via a network or the like. Further, the image processing device 1 may be included in a mobile terminal such as a smartphone or a tablet, for example.

カメラ２は、車両等の移動体に搭載されてもよいし、商業施設等の建物内、駐車場等の屋外に固定配置されてもよい。カメラ２は、例えば、有線又は無線により、或いは、ネットワークを利用して、撮影した画像（撮影画像）を画像処理装置１に出力する。 The camera 2 may be mounted on a moving body such as a vehicle, or may be fixedly arranged in a building such as a commercial facility or outdoors such as a parking lot. The camera 2 outputs a captured image (captured image) to the image processing device 1, for example, by wire, wirelessly, or by using a network.

図１に示すように、画像処理装置１は、取得部１１と、制御部１２と、記憶部１３と、を備える。 As shown in FIG. 1, the image processing device 1 includes an acquisition unit 11, a control unit 12, and a storage unit 13.

取得部１１は、撮影画像を取得する。取得部１１は、例えば車両に搭載されるカメラ２からアナログ又はデジタルの撮影画像を所定の周期（例えば、１／３０秒周期）で時間的に連続して取得する。取得部１１によって取得される撮影画像（１フレームの画像）の集合体が、カメラ２で撮影された動画像である。本実施形態では、取得部１１が取得する撮影画像はカラー画像である。取得した撮影画像がアナログの場合には、取得部１１は、そのアナログの撮影画像をデジタルの撮影画像に変換（Ａ／Ｄ変換）する。取得部１１は、取得した撮影画像（Ａ／Ｄ変換が行われた場合は変換後の画像）を制御部１２に出力する。 The acquisition unit 11 acquires a captured image. The acquisition unit 11 continuously acquires analog or digital captured images from, for example, a camera 2 mounted on a vehicle at a predetermined cycle (for example, a 1/30 second cycle). An aggregate of captured images (one frame image) acquired by the acquisition unit 11 is a moving image captured by the camera 2. In the present embodiment, the captured image acquired by the acquisition unit 11 is a color image. When the acquired captured image is analog, the acquisition unit 11 converts the analog captured image into a digital captured image (A / D conversion). The acquisition unit 11 outputs the acquired captured image (or the converted image if A / D conversion is performed) to the control unit 12.

制御部１２は、画像処理装置１の全体を統括的に制御するコントローラである。制御部１２は、例えば、ハードウェアプロセッサであるＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、および、ＲＯＭ（Read Only Memory）等を含むコンピュータとして構成される。 The control unit 12 is a controller that comprehensively controls the entire image processing device 1. The control unit 12 is configured as a computer including, for example, a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and the like, which are hardware processors.

記憶部１３は、例えば、ＲＡＭやフラッシュメモリ等の半導体メモリ素子、ハードディスク、或いは、光ディスク等の可搬型の記録媒体を用いる記憶装置等で構成される。記憶部１３は、ファームウェアとしてのプログラムや各種のデータを記憶する。本実施形態では、記憶部１３には、後述の物体検知部１２３により用いられる学習済みモデルが記憶される。学習済みモデルは、例えばＣＮＮ（Convolutional Neural Network）等を用いた公知のディープラーニング（深層学習）により得ることができる。 The storage unit 13 is composed of, for example, a semiconductor memory element such as a RAM or a flash memory, a hard disk, or a storage device using a portable recording medium such as an optical disk. The storage unit 13 stores a program as firmware and various data. In the present embodiment, the storage unit 13 stores the learned model used by the object detection unit 123, which will be described later. The trained model can be obtained by known deep learning (deep learning) using, for example, a CNN (Convolutional Neural Network) or the like.

図１に示す、設定部１２１、変換部１２２、および、物体検知部１２３は、制御部１２のＣＰＵが記憶部１３に記憶されるプログラムに従って演算処理を実行することにより実現される制御部１２の機能である。換言すると、画像処理装置１は、設定部１２１と、変換部１２２と、物体検知部１２３と、を備える。 The setting unit 121, the conversion unit 122, and the object detection unit 123 shown in FIG. 1 are realized by the CPU of the control unit 12 executing arithmetic processing according to a program stored in the storage unit 13. It is a function. In other words, the image processing device 1 includes a setting unit 121, a conversion unit 122, and an object detection unit 123.

なお、制御部１２における、設定部１２１、変換部１２２、および、物体検知部１２３の少なくともいずれか１つは、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）、ＧＰＵ（Graphics Processing Unit）等のハードウェアで構成されてもよい。 In addition, at least one of the setting unit 121, the conversion unit 122, and the object detection unit 123 in the control unit 12 is an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), and a GPU (Graphics Processing Unit). ) Etc. may be configured.

また、設定部１２１、変換部１２２、および、物体検知部１２３は、概念的な構成要素である。１つの構成要素が実行する機能を複数の構成要素に分散させたり、複数の構成要素が有する機能を１つの構成要素に統合させたりしてよい。また、取得部１１は、制御部１２のＣＰＵがプログラムに従って演算処理を行うことによって実現される構成としてもよい。また、画像処理装置１の具体的なハードウェア構成に関して、実施の形態に応じて、適宜、構成要素の省略、置換、又は、追加を行ってよい。例えば、制御部１２は、複数のハードウェアプロセッサを含んでもよい。 Further, the setting unit 121, the conversion unit 122, and the object detection unit 123 are conceptual components. The functions executed by one component may be distributed to a plurality of components, or the functions of the plurality of components may be integrated into one component. Further, the acquisition unit 11 may be configured to be realized by the CPU of the control unit 12 performing arithmetic processing according to a program. Further, with respect to the specific hardware configuration of the image processing device 1, the components may be omitted, replaced, or added as appropriate according to the embodiment. For example, the control unit 12 may include a plurality of hardware processors.

設定部１２１は、取得した撮影画像に物体の検知を試みる複数の検知対象領域を設定する。複数の検知対象領域が設定される撮影画像は、取得部１１から得られた画像そのものでもよいし、取得部１１から得られた後に加工が施された画像であってよい。施される加工として、例えばグレースケール化等が挙げられる。また、物体は、動体でも静止物でもよい。詳細には、物体は、例えば人や車両等である。物体は、例えば、人の顔や、車両のナンバープレート等、或る物の一部と解される部分であってもよい。 The setting unit 121 sets a plurality of detection target areas for attempting to detect an object in the acquired captured image. The captured image in which a plurality of detection target areas are set may be the image itself obtained from the acquisition unit 11, or may be an image obtained from the acquisition unit 11 and then processed. Examples of the processing to be performed include grayscale conversion. Further, the object may be a moving body or a stationary object. Specifically, the object is, for example, a person, a vehicle, or the like. The object may be a part that is understood as a part of something, such as a human face or a vehicle license plate.

図２は、設定部１２１による複数の検知対象領域３１の設定例を示す図である。図２に示す例では、設定部１２１は、撮影画像３の一部の範囲３０を対象として複数の検知対象領域３１を設定する。これによれば、撮影画像３の一部を対象として物体の検知を行う構成となり、撮影画像３の全範囲を対象として物体の検知を行う場合に比べて物体の検知処理の負担を低減することができる。 FIG. 2 is a diagram showing a setting example of a plurality of detection target areas 31 by the setting unit 121. In the example shown in FIG. 2, the setting unit 121 sets a plurality of detection target areas 31 for a part of the range 30 of the captured image 3. According to this, the object is detected by targeting a part of the captured image 3, and the burden of the object detection process is reduced as compared with the case where the object is detected by targeting the entire range of the captured image 3. Can be done.

一部の範囲３０は、例えば、物体の検知目的や、検知したい物体の種類等を考慮して決められる。例えば、自車両から何メートル遠方にある物体を検知するかが決まっている場合や、撮影画像中に現れる物体の範囲が限られた範囲であることがわかっている場合等に、一部の範囲３０はカメラ２からの特定の距離範囲とされる。このような場合、カメラ２の設定位置に関する既知のパラメータと三角測量とを用いて一部の範囲３０を求めることできる。 A part of the range 30 is determined in consideration of, for example, the purpose of detecting an object, the type of an object to be detected, and the like. For example, when it is decided how many meters away from the own vehicle to detect an object, or when it is known that the range of the object appearing in the captured image is a limited range, a part of the range is used. 30 is a specific distance range from the camera 2. In such a case, a part of the range 30 can be obtained by using a known parameter regarding the set position of the camera 2 and triangulation.

図２に示す例では、複数の検知対象領域３１は、一部の範囲３０を分割することにより得られる。詳細には、複数の検知対象領域３１は、矩形状の一部の範囲３０を３つに均等に分割して得られる。すなわち、図２に示す例では、複数の検知対象領域３１は３つである。３つの検知対象領域３１は、それぞれ矩形状であり、撮影画像３の左右方向に並ぶ。３つの検知対象領域３１は、左端に位置する第１検知対象領域３１ａと、真ん中に位置する第２検知対象領域３１ｂと、第３検知対象領域３１ｃとで構成される。３つの検知対象領域３１ａ、３１ｂ、３１ｃの大きさは、互いに同じであることが好ましいが、場合によっては互いに異なってもよい。 In the example shown in FIG. 2, the plurality of detection target areas 31 are obtained by dividing a part of the range 30. Specifically, the plurality of detection target areas 31 are obtained by evenly dividing a part of the rectangular range 30 into three. That is, in the example shown in FIG. 2, the plurality of detection target areas 31 are three. Each of the three detection target areas 31 has a rectangular shape and is arranged in the left-right direction of the captured image 3. The three detection target areas 31 are composed of a first detection target area 31a located at the left end, a second detection target area 31b located in the middle, and a third detection target area 31c. The sizes of the three detection target regions 31a, 31b, and 31c are preferably the same as each other, but may be different from each other in some cases.

なお、設定部１２１により設定される検知対象領域３１の形状は、矩形状に限らず、適宜変更されてよい。設定部１２１は、複数の検知対象領域３１を、撮影画像３の一部の範囲３０でなく全範囲を対象として設定してもよい。すなわち、撮影画像３の全範囲を複数の領域に分け、分けられた各領域を検知対象領域３１としてもよい。設定部１２１が設定する複数の検知対象領域３１は固定された領域でもよいが、例えば、カメラ２が車載カメラである場合に車両の舵角等に応じて変動される領域であってもよい。 The shape of the detection target area 31 set by the setting unit 121 is not limited to the rectangular shape, and may be changed as appropriate. The setting unit 121 may set the plurality of detection target areas 31 not only for a part of the range 30 of the captured image 3 but for the entire range. That is, the entire range of the captured image 3 may be divided into a plurality of regions, and each divided region may be used as the detection target region 31. The plurality of detection target areas 31 set by the setting unit 121 may be fixed areas, but may be, for example, areas that are changed according to the steering angle of the vehicle when the camera 2 is an in-vehicle camera.

図３は、設定部１２１による複数の検知対象領域３１の別の設定例を示す図である。図３に示すように、複数の検知対象領域３１（詳細には３つの検知対象領域３１）は、撮影画像３の一部の範囲３０の更に一部の領域だけを占める構成であってよい。複数の検知対象領域３１は、互いに隣接する領域でなくてもよい。 FIG. 3 is a diagram showing another setting example of the plurality of detection target areas 31 by the setting unit 121. As shown in FIG. 3, the plurality of detection target areas 31 (specifically, the three detection target areas 31) may occupy only a part of a part of the range 30 of the captured image 3. The plurality of detection target areas 31 do not have to be adjacent to each other.

また、複数の検知対象領域３１は、一部の範囲３０を設定することなく得られてもよい。例えば、設定部１２１は、撮影画像３と、予め準備された背景画像との比較により変化量が大きい領域を検知対象領域３１に設定してよい。これによれば、物体が存在する可能性が高い領域に絞って物体の検知処理を効率良く行うことができる。 Further, the plurality of detection target areas 31 may be obtained without setting a part of the range 30. For example, the setting unit 121 may set a region having a large amount of change as the detection target region 31 by comparing the captured image 3 with the background image prepared in advance. According to this, it is possible to efficiently perform the object detection process by focusing on the area where the object is likely to exist.

なお、この構成の場合には、変化量が大きい領域に対して機械学習を行った学習済みモデル（例えばＣＮＮ等）による物体検知（例えばクラス分類等）が行われる。背景画像は、記憶部１３に記憶される。背景画像との比較を利用して物体の種別を適切に得るという目的においては、物体検知の処理は、以下に説明する物体検知部１２３の処理と同じでもよいが、物体検知部１２３の処理と異なる処理であってもよい。 In the case of this configuration, object detection (for example, classification) is performed by a learned model (for example, CNN or the like) in which machine learning is performed on a region where the amount of change is large. The background image is stored in the storage unit 13. For the purpose of appropriately obtaining the type of the object by using the comparison with the background image, the processing of the object detection may be the same as the processing of the object detection unit 123 described below, but it is the same as the processing of the object detection unit 123. It may be a different process.

背景画像は、公知の手法（例えば差分方式）によって得られた画像であってよい。ただし、背景画像は、次のような公知の手法以外の方法で得られた画像であってもよい。 The background image may be an image obtained by a known method (for example, a difference method). However, the background image may be an image obtained by a method other than the following known methods.

背景画像を生成する背景画像生成装置（不図示）は、異なるタイミングで得られた複数の画像のそれぞれについて、学習済みモデルを用いた物体検知を行う。異なるタイミングで得られる複数の画像は、１つのカメラから得られる画像でもよいが、例えばカメラを搭載する複数のコネクテッドカーから得られた画像等であってもよい。複数の画像は、同じ場所が映る撮影画像である。 The background image generator (not shown) that generates a background image detects an object using a trained model for each of a plurality of images obtained at different timings. The plurality of images obtained at different timings may be images obtained from one camera, or may be, for example, images obtained from a plurality of connected cars equipped with cameras. The plurality of images are photographed images showing the same place.

背景画像生成装置が用いる学習済みモデルは、例えばＣＮＮ等の物体の検知を可能とするニューラルネットワークであってよい。学習済みモデルは、ピクセル（画素）毎に意味をラベル付けする画像セグメンテーションを行う構成であってもよいし、物体の種別を分類する構成であってもよい。 The trained model used by the background image generator may be a neural network capable of detecting an object such as a CNN. The trained model may be configured to perform image segmentation that labels the meaning for each pixel, or may be configured to classify the types of objects.

背景画像生成装置は、異なるタイミングで得られた複数の画像のそれぞれについて、物体が検知された領域を取り除いた差し引き画像を生成する。背景画像生成装置は、得られた複数の差し引き画像を組み合わせることで、物体が存在しない背景画像を完成する。背景画像生成装置は、同じ場所の背景画像について、一定時間ごとの背景画像を生成することが好ましい。 The background image generator generates a deducted image obtained by removing the region where the object is detected for each of the plurality of images obtained at different timings. The background image generator completes a background image in which no object exists by combining the obtained plurality of subtracted images. The background image generation device preferably generates a background image at regular intervals for the background image at the same location.

このような背景画像生成装置によれば、機械学習を行った学習済みモデルを用いて背景画像を生成するために正確な背景画像を生成することができる。背景画像生成装置により生成された背景画像を利用すれば、例えば、時間によって形状が変化する影を物体と区別し易くなり、物体を適切に検知することができる。 According to such a background image generation device, it is possible to generate an accurate background image in order to generate a background image using a trained model that has undergone machine learning. If the background image generated by the background image generation device is used, for example, a shadow whose shape changes with time can be easily distinguished from an object, and the object can be appropriately detected.

変換部１２２は、取得部１１から得られた撮影画像３のうち少なくとも複数の検知対象領域３１の画像をカラー画像からグレースケール画像に変換する。変換部１２２は、撮影画像全体をグレースケール画像に変換してもよいが、複数の検知対象領域３１のみをグレースケール画像に変換してよい。例えば図２に示す例では、３つの検知対象領域３１ａ〜３１ｃの画像はグレースケール画像に変換される。グレースケール画像への変換手法は、例えばＮＴＳＣ加重平均法や、ＲＧＢのうちの１つの要素値を抽出してグレースケール値として採用する方法等であってよい。変換部１２２によるグレースケール変換により、各検知対象領域３１において、２５６階調（８ビット）のグレースケール画像が得られる。 The conversion unit 122 converts at least a plurality of images in the detection target area 31 among the captured images 3 obtained from the acquisition unit 11 from a color image to a grayscale image. The conversion unit 122 may convert the entire captured image into a grayscale image, but may convert only a plurality of detection target areas 31 into a grayscale image. For example, in the example shown in FIG. 2, the images of the three detection target areas 31a to 31c are converted into grayscale images. The conversion method to a grayscale image may be, for example, an NTSC weighted average method, a method of extracting one element value of RGB and adopting it as a grayscale value, or the like. By grayscale conversion by the conversion unit 122, a grayscale image of 256 gradations (8 bits) can be obtained in each detection target area 31.

物体検知部１２３は、記憶部１３に記憶される機械学習後の学習済みモデル（一例としてＣＮＮ）を用いて物体の検知処理を行う。物体の検知処理は、畳み込みやプーリングによって画像から特徴量を抽出するパートと、全結合層を繰り返すことで抽出した特徴量に基づいてクラス分類を行うパートとを有する。 The object detection unit 123 performs object detection processing using a learned model (CNN as an example) after machine learning stored in the storage unit 13. The object detection process includes a part for extracting features from an image by convolution and pooling, and a part for classifying based on the features extracted by repeating a fully connected layer.

なお、物体検知部１２３において用いられる学習済みモデルは、好ましい形態として、教師データとしてグレースケール画像を用いて学習が行われた学習済みモデルである。このために、グレースケール画像により精度良く物体の検知を行うことができる。 The trained model used in the object detection unit 123 is, as a preferred form, a trained model in which training is performed using a grayscale image as teacher data. Therefore, the grayscale image can accurately detect the object.

図４は、物体検知部１２３の機能を説明するための模式図である。物体検知部１２３は、複数の検知対象領域３１それぞれのグレースケール画像を別々に入力する複数のチャンネルを有する。チャンネルは入力層と言い換えられる。本実施形態では、第１検知対象領域３１ａのグレースケール画像３１ａＧは、１ｃｈに入力される。第２検知対象領域３１ｂのグレースケール画像３１ｂＧは、２ｃｈに入力される。第３検知対象領域３１ｃのグレースケール画像３１ｃＧは、３ｃｈに入力される。すなわち、本実施形態では、複数のチャンネルは３つである。 FIG. 4 is a schematic diagram for explaining the function of the object detection unit 123. The object detection unit 123 has a plurality of channels for separately inputting grayscale images of each of the plurality of detection target areas 31. Channels are paraphrased as input layers. In the present embodiment, the grayscale image 31aG of the first detection target area 31a is input to 1ch. The grayscale image 31bG of the second detection target area 31b is input to 2ch. The grayscale image 31cG of the third detection target area 31c is input to 3ch. That is, in the present embodiment, the plurality of channels is three.

なお、本実施形態において、各チャンネル（１ｃｈ、２ｃｈ、３ｃｈ）において入力することができる画像のサイズ（解像度）の上限値は、互いに同じであり、固定値である。各検知対象領域３１ａ、３１ｂ、３１ｃのグレースケール画像３１ａＧ、３１ｂＧ、３１ｃＧは、各チャンネルの入力許容サイズに応じて、そのままの解像度、或いは、低解像度化されて各チャンネルに入力される。本実施形態では、各チャンネルに入力される画像のサイズは同一である。 In the present embodiment, the upper limit of the image size (resolution) that can be input in each channel (1ch, 2ch, 3ch) is the same as each other and is a fixed value. The grayscale images 31aG, 31bG, and 31cG of the detection target areas 31a, 31b, and 31c are input to each channel with the same resolution or reduced resolution depending on the input allowable size of each channel. In this embodiment, the size of the image input to each channel is the same.

各チャンネル（１ｃｈ、２ｃｈ、３ｃｈ）に入力されるグレースケール画像３１ａＧ、３１ｂＧ、３１ｃＧの解像度は、検知対象領域３１が設定される位置に応じて変更されることが好ましい。検知対象領域３１がカメラ２から遠い位置に設定される場合には、解像度を低下させない、或いは、解像度を低下させる必要がある際には低下度合いをなるべく小さくすることが好ましい。検知対象領域３１がカメラ２から近い側に設定される場合には、物体の検知精度が低下しない範囲で解像度を低下させる度合いをなるべく大きくすることが好ましい。このように構成することで、遠方に存在する物体を検知する必要がある場合に、物体の検知精度が低下することを抑制することができる。一方で、近方に存在する物体を検知する必要がある場合に、物体の検知処理の処理負担を低減することができる。 The resolutions of the grayscale images 31aG, 31bG, and 31cG input to each channel (1ch, 2ch, 3ch) are preferably changed according to the position where the detection target area 31 is set. When the detection target area 31 is set at a position far from the camera 2, it is preferable not to reduce the resolution, or when it is necessary to reduce the resolution, it is preferable to reduce the degree of reduction as much as possible. When the detection target area 31 is set closer to the camera 2, it is preferable to reduce the resolution as much as possible within a range in which the detection accuracy of the object does not decrease. With such a configuration, when it is necessary to detect an object existing in a distant place, it is possible to suppress a decrease in the detection accuracy of the object. On the other hand, when it is necessary to detect an object existing in the vicinity, the processing load of the object detection process can be reduced.

物体検知部１２３は、チャンネル毎に行われる物体の検知処理の結果に基づき物体の検知結果を出力する。図４に示すように、各チャンネル（１ｃｈ、２ｃｈ、３ｃｈ）に入力されたグレースケール画像３１ａＧ、３１ｂＧ、３１ｃＧのそれぞれに対して、学習済みモデル（ＤＮＮ）を用いた物体の検知処理が行われる。チャンネル毎の物体の検知処理は並行して進められる。物体検知部１２３においては、一旦、検知対象領域３１ａ、３１ｂ、３１ｃ毎の、物体の検知処理結果が得られる。 The object detection unit 123 outputs an object detection result based on the result of the object detection process performed for each channel. As shown in FIG. 4, object detection processing using the trained model (DNN) is performed on each of the grayscale images 31aG, 31bG, and 31cG input to each channel (1ch, 2ch, 3ch). .. Object detection processing for each channel proceeds in parallel. The object detection unit 123 once obtains the object detection processing result for each of the detection target areas 31a, 31b, and 31c.

図４に示すように、物体検知部１２３においては、チャンネル毎の物体の検知処理の結果を統合する統合処理が行われる。詳細には、物体検知部１２３は、各検知対象領域３１ａ、３１ｂ、３１ｃの座標系で求めたチャンネル毎の処理結果を、撮影画像３全体の座標系に統合して物体の検知結果を出力する。各検知対象領域３１ａ、３１ｂ、３１ｃは、撮影画像３から切り出した画像であり、撮影画像３に占める座標領域が既知である。このために、各チャンネルで求めた物体の検知領域を撮影画像３全体の座標に変換することができる。なお、複数の検知対象領域３１に跨って物体が存在する場合には、各検知対象領域３１で重複して物体が検知されることがあり、重複して検知された物体について検知した領域を結合する必要がある。 As shown in FIG. 4, the object detection unit 123 performs an integrated process for integrating the results of the object detection process for each channel. Specifically, the object detection unit 123 integrates the processing results for each channel obtained in the coordinate systems of the detection target areas 31a, 31b, and 31c into the coordinate system of the entire captured image 3 and outputs the object detection result. .. Each of the detection target areas 31a, 31b, and 31c is an image cut out from the captured image 3, and the coordinate region occupied in the captured image 3 is known. Therefore, the detection area of the object obtained in each channel can be converted into the coordinates of the entire captured image 3. When an object exists over a plurality of detection target areas 31, the objects may be detected in duplicate in each detection target area 31, and the detected areas of the duplicated detected objects are combined. There is a need to.

各チャンネルの処理結果を統合した物体の検知結果は、例えば、撮影画像（カラー画像）中に、検知した物体の領域を囲むバウンディングボックスを付与した画像である。例えば、当該画像が表示装置（不図示）に出力され、撮影画像３上に物体の検知を示すバウンディングボックスが施された画像が画面表示される。 The object detection result that integrates the processing results of each channel is, for example, an image in which a bounding box surrounding the detected object area is added to the captured image (color image). For example, the image is output to a display device (not shown), and an image with a bounding box indicating the detection of an object is displayed on the screen of the captured image 3.

なお、物体検知部１２３は、場合によっては、チャンネル毎の物体の検知処理の結果を統合することなく、別々に出力してもよい。ただし、本実施形態のようにチャンネル毎の物体の検知処理の結果を統合した方が、撮影画像全体における物体の検知結果を認識し易くすることができる。 In some cases, the object detection unit 123 may output the results of the object detection processing for each channel separately without integrating them. However, if the results of the object detection processing for each channel are integrated as in the present embodiment, it is possible to make it easier to recognize the object detection result in the entire captured image.

物体検知を行う学習済みモデル（ＣＮＮ）にカラー画像を入力する従来の構成では、例えば、カラーの撮影画像がＲＧＢの各成分に分解され、分解された３つの成分が別々の輝度チャンネル（Ｒｃｈ、Ｇｃｈ、Ｂｃｈ）に入力されて物体の検知処理が行われる。この点、本実施形態では、学習済みモデルに入力する画像がグレースケール画像である。グレースケール画像では、使用するチャンネルは１つでよいために、上述の従来の構成と同様の構成の学習済みモデルを想定した場合に２つのチャンネルが余る。 In the conventional configuration in which a color image is input to a trained model (CNN) that detects an object, for example, a color photographed image is decomposed into RGB components, and the three decomposed components are separated into separate luminance channels (Rch,). It is input to Gch, Bch) and the object detection process is performed. In this respect, in the present embodiment, the image input to the trained model is a grayscale image. In the grayscale image, since only one channel is used, two channels are left over when a trained model having the same configuration as the above-mentioned conventional configuration is assumed.

そこで、本実施形態では、撮影画像３から得られる３つの検知対象領域３１ａ、３１ｂ、３１ｃのグレースケール画像３１ａＧ、３１ｂＧ、３１ｃＧを、それぞれ別々のチャンネルに入力する構成として、３つの入力チャンネルを有効活用している。すなわち、本願発明は、従来の学習済みモデルを応用して実現することができる。このような観点から説明すると、例えば、第１検知対象領域３１ａのグレースケール画像３１ａＧは輝度チャンネルの１つであるＲ（Red）ｃｈに入力され、第２検知対象領域３１ｂのグレースケール画像３１ｂＧは輝度チャンネルの１つであるＧ（Green）ｃｈに入力され、第３検知対象領域３１ｃのグレースケール画像３１ｃＧは輝度チャンネルの１つであるＢ（Blue）ｃｈに入力される。換言すると、本実施形態において複数のチャンネルは、ＲＧＢの３チャンネルである。 Therefore, in the present embodiment, the three input channels are effective as a configuration in which the grayscale images 31aG, 31bG, and 31cG of the three detection target areas 31a, 31b, and 31c obtained from the captured image 3 are input to different channels, respectively. I am using it. That is, the present invention can be realized by applying a conventional trained model. From this point of view, for example, the grayscale image 31aG of the first detection target area 31a is input to R (Red) ch, which is one of the luminance channels, and the grayscale image 31bG of the second detection target area 31b is It is input to G (Green) ch, which is one of the luminance channels, and the grayscale image 31cG of the third detection target region 31c is input to B (Blue) ch, which is one of the luminance channels. In other words, in the present embodiment, the plurality of channels are RGB three channels.

本実施形態によれば、１つの撮影画像３から物体を検知するに際して、画像をグレースケール化することによって複数のチャンネルに分けて入力することができる。そして、各チャンネルに入力する画像は、撮影画像の一部を分割した画像であり、各チャンネルへの画像の入力サイズ（ピクセル数）を小さくすることができる。この結果、各チャンネル（入力サイズの上限が決まっている）に入力する画像の解像度を低下させる度合いを小さくすることができ、画像に映る小さい物体の特徴量が失われることを抑制できる。すなわち、本実施形態によれば、例えば、撮影画像３中の検知したい物体が顔やナンバープレート等の小さい物体であっても精度良く物体の検知を行うことができる。 According to the present embodiment, when an object is detected from one captured image 3, the image can be grayscaled so that the image can be divided into a plurality of channels and input. The image input to each channel is an image obtained by dividing a part of the captured image, and the input size (number of pixels) of the image to each channel can be reduced. As a result, the degree to which the resolution of the image input to each channel (the upper limit of the input size is fixed) is lowered can be reduced, and the loss of the feature amount of the small object displayed in the image can be suppressed. That is, according to the present embodiment, for example, even if the object to be detected in the captured image 3 is a small object such as a face or a license plate, the object can be detected with high accuracy.

また、本実施形態の学習済みモデルは、従来のカラー画像を入力して物体検知を行う学習済みモデルと同様の構成とできるために、従来の構成に比べて処理負荷が極端に大きくなったり、高性能の処理装置が要求されたりすることを避けることができる。 Further, since the trained model of the present embodiment can have the same configuration as the trained model in which the conventional color image is input and the object is detected, the processing load becomes extremely large as compared with the conventional configuration. It is possible to avoid the demand for high-performance processing equipment.

（１−２．画像処理装置の動作例）
図５は、本発明の第１実施形態に係る画像処理装置１の動作例を示すフローチャートである。なお、画像処理装置１は、例えば、取得部１１により撮影画像が取得される毎に図５に示すフローチャートの動作を行う。 (1-2. Operation example of image processing device)
FIG. 5 is a flowchart showing an operation example of the image processing device 1 according to the first embodiment of the present invention. The image processing device 1 operates the flowchart shown in FIG. 5 every time a captured image is acquired by, for example, the acquisition unit 11.

ステップＳ１では、取得部１１がカメラ２より撮影画像３を取得する。取得部１１は、例えば、図６に示すような撮影画像３を取得する。図６に示す撮影画像３には、道路Ｒの脇に配置される壁Ｗに沿って二人の人Ｈが歩いている様子が映っている。二人の人Ｈは、大人の男性と、女の子であり、以下、大人の男性を人Ｈ１、女の子を人Ｈ２と表現することがある。取得部１１が撮影画像を取得すると、次のステップＳ２に処理が進められる。 In step S1, the acquisition unit 11 acquires the captured image 3 from the camera 2. The acquisition unit 11 acquires, for example, a captured image 3 as shown in FIG. The photographed image 3 shown in FIG. 6 shows two people H walking along the wall W arranged on the side of the road R. The two persons H are an adult man and a girl, and hereinafter, an adult man may be referred to as a person H1 and a girl may be referred to as a person H2. When the acquisition unit 11 acquires the captured image, the process proceeds to the next step S2.

ステップＳ２では、設定部１２１が撮影画像３に３つの検知対象領域３１ａ、３１ｂ、３１ｃを設定する。例えば、検知したい物体が顔であり、図６に示す撮影画像３が取得された場合には、図７に破線で示す３つの検知対象領域３１ａ、３１ｂ、３１ｃが撮影画像３に設定される。検知対象領域３１ａ、３１ｂ、３１ｂは、例えば、撮影画像３において人Ｈの顔Ｆを検知したい範囲３０を、左右方向に均等に３つに分割して得られる。設定部１２１による検知対象領域３１ａ、３１ｂ、３１ｃの設定が完了すると、次のステップＳ３に処理が進められる。 In step S2, the setting unit 121 sets three detection target areas 31a, 31b, and 31c in the captured image 3. For example, when the object to be detected is a face and the captured image 3 shown in FIG. 6 is acquired, the three detection target areas 31a, 31b, and 31c shown by the broken lines in FIG. 7 are set in the captured image 3. The detection target areas 31a, 31b, and 31b are obtained, for example, by dividing the range 30 in which the face F of the person H is desired to be detected in the captured image 3 into three evenly in the left-right direction. When the setting of the detection target areas 31a, 31b, and 31c by the setting unit 121 is completed, the process proceeds to the next step S3.

ステップＳ３では、変換部１２２が３つの検知対象領域３１ａ、３１ｂ、３１ｃの画像（カラー画像）のそれぞれをグレースケール画像に変換する。なお、ステップＳ３の処理は、ステップＳ２の処理より先に行われてよい。この場合には、撮影画像３の全体がグレースケール画像に変換された後に、当該グレースケール化された撮影画像に３つの検知対象領域３１ａ、３１ｂ、３１ｃが設定されてよい。変換部１２２によるグレースケール化が完了すると、次のステップＳ４に処理が進められる。 In step S3, the conversion unit 122 converts each of the three detection target areas 31a, 31b, and 31c images (color images) into grayscale images. The process of step S3 may be performed before the process of step S2. In this case, after the entire captured image 3 is converted into a grayscale image, three detection target regions 31a, 31b, and 31c may be set in the grayscale captured image. When the grayscale conversion by the conversion unit 122 is completed, the process proceeds to the next step S4.

ステップＳ４では、各検知対象領域３１ａ、３１ｂ、３１ｃのグレースケール画像が、学習済みモデル（ＣＮＮ）の別々のチャンネル（１ｃｈ、２ｃｈ、３ｃｈ）に入力される。なお、各チャンネルに入力されるグレースケール画像は、必要に応じて低解像度化される。図７に示す例では、壁Ｗのみが映るグレースケール画像が１ｃｈに入力される。人Ｈ１の顔Ｆと人Ｈ２の顔Ｆの一部とが映るグレースケール画像が２ｃｈに入力される。人Ｈ２の顔Ｆの一部が映るグレースケール画像が３ｃｈに入力される。各チャンネルへのグレースケール画像の入力が完了すると、次のステップＳ５に処理が進められる。 In step S4, the grayscale images of the detection target areas 31a, 31b, and 31c are input to the separate channels (1ch, 2ch, 3ch) of the trained model (CNN). The grayscale image input to each channel is reduced in resolution as needed. In the example shown in FIG. 7, a grayscale image showing only the wall W is input to 1ch. A grayscale image showing the face F of the person H1 and a part of the face F of the person H2 is input to 2ch. A grayscale image showing a part of the face F of the person H2 is input to 3ch. When the input of the grayscale image to each channel is completed, the process proceeds to the next step S5.

ステップＳ５では、チャンネル（１ｃｈ、２ｃｈ、３ｃｈ）毎に、学習済みモデル（ＣＮＮ）を用いた物体の検知処理が行われる。図７に示す例においては、１ｃｈでは物体（ここでは顔Ｆ）が検知されず、２ｃｈでは人Ｈ１と人Ｈ２との顔Ｆが検知され、３ｃｈでは人Ｈ２の顔Ｆが検知される。チャンネル毎の物体の検知処理が全て完了すると、次のステップＳ６に処理が進められる。 In step S5, object detection processing using the trained model (CNN) is performed for each channel (1ch, 2ch, 3ch). In the example shown in FIG. 7, an object (here, face F) is not detected in 1ch, a face F between a person H1 and a person H2 is detected in 2ch, and a face F of a person H2 is detected in 3ch. When all the object detection processes for each channel are completed, the process proceeds to the next step S6.

ステップＳ６では、チャンネル（１ｃｈ、２ｃｈ、３ｃｈ）毎の検知処理の結果が統合され、撮影画像３における物体の検知結果が出力される。各検知対象領域３１ａ、３１ｂ、３１ｃの座標系で求めたチャンネル毎の検知結果が、それぞれ撮影画像３全体の座標系の結果に変換され、処理結果の統合が図られる。 In step S6, the results of the detection processing for each channel (1ch, 2ch, 3ch) are integrated, and the detection result of the object in the captured image 3 is output. The detection results for each channel obtained in the coordinate systems of the detection target areas 31a, 31b, and 31c are converted into the results of the coordinate system of the entire captured image 3, and the processing results are integrated.

図７に示す例では、２ｃｈと３ｃｈとで、人Ｈ２の顔Ｆが重複して検知される。座標系を変換して各チャンネルの処理結果を統合する際に、重複して検知されたと判断される物体は結合される。重複して検知された物体であるか否かは、例えば、撮影画像３全体の座標系に変換した場合における、検知された物体間の距離に基づいて判断できる。例えば、検知された物体が重なっていると判断される場合や、検知された物体間の距離が極めて近いと判断される場合には、検知された物体が重複していると判断される。なお、当該重複判断には、物体検知の際に得ることができる付加情報（例えば年齢や性別等）も参照されてよい。 In the example shown in FIG. 7, the face F of the person H2 is detected in duplicate on 2ch and 3ch. When the coordinate system is transformed and the processing results of each channel are integrated, the objects judged to be detected in duplicate are combined. Whether or not the objects are detected in duplicate can be determined based on, for example, the distance between the detected objects when converted to the coordinate system of the entire captured image 3. For example, when it is determined that the detected objects overlap, or when it is determined that the distance between the detected objects is extremely short, it is determined that the detected objects overlap. In addition, additional information (for example, age, gender, etc.) that can be obtained at the time of object detection may also be referred to in the duplication determination.

図８は、物体の検知結果を例示する図である。図８は、図７に示す例の処理が進められた結果である。２ｃｈのみで検知された人Ｈ１の顔Ｆに物体検知を示すバウンディングボックスＢが付与されている。２ｃｈと３ｃｈとの両方で検知された人Ｈ２の顔Ｆについては、両チャンネルの結果が結合されて、人Ｈ２の顔Ｆに１つのバウンディングボックスＢが付与されている。物体の検知結果が出力されると、図５に示すフローチャートの処理が一旦終了される。次のフレーム画像の取得により、図５に示すフローチャートの処理が再開される。 FIG. 8 is a diagram illustrating an object detection result. FIG. 8 shows the result of proceeding with the processing of the example shown in FIG. A bounding box B indicating object detection is attached to the face F of the person H1 detected only on 2ch. For the face F of the person H2 detected in both 2ch and 3ch, the results of both channels are combined, and one bounding box B is given to the face F of the person H2. When the object detection result is output, the processing of the flowchart shown in FIG. 5 is temporarily terminated. By acquiring the next frame image, the processing of the flowchart shown in FIG. 5 is restarted.

＜２．第２実施形態＞
次に、第２実施形態に係る画像処理装置について説明する。第２実施形態の画像処理装置の説明に際して、第１実施形態と重複する部分については、特に説明の必要がない場合には説明を省略する。 <2. Second Embodiment>
Next, the image processing apparatus according to the second embodiment will be described. In the description of the image processing apparatus of the second embodiment, the description of the portion overlapping with the first embodiment will be omitted unless it is particularly necessary to explain.

（２−１．画像処理装置の構成）
図９は、本発明の第２実施形態に係る画像処理装置１Ａの構成を示す図である。なお、図９においては、第２実施形態の画像処理装置１Ａの特徴を説明するために必要な構成要素のみを示しており、一般的な構成要素についての記載は省略されている。また、図９には、理解を容易とするために画像処理装置１Ａとは別の構成要素であるカメラ２も示されている。 (2-1. Configuration of image processing device)
FIG. 9 is a diagram showing a configuration of an image processing device 1A according to a second embodiment of the present invention. Note that FIG. 9 shows only the components necessary for explaining the features of the image processing apparatus 1A of the second embodiment, and the description of general components is omitted. Further, FIG. 9 also shows a camera 2 which is a component different from the image processing device 1A for easy understanding.

図９に示すように、画像処理装置１Ａは、取得部１１と、制御部１２Ａと、記憶部１３と、を備える。取得部１１および記憶部１３は、第１実施形態と同様であるために、その説明を省略する。 As shown in FIG. 9, the image processing device 1A includes an acquisition unit 11, a control unit 12A, and a storage unit 13. Since the acquisition unit 11 and the storage unit 13 are the same as those in the first embodiment, the description thereof will be omitted.

制御部１２Ａは、第１実施形態と同様に、画像処理装置１Ａの全体を統括的に制御するコントローラである。制御部１２Ａは、例えば、ＣＰＵ、ＲＡＭ、および、ＲＯＭ等を含むコンピュータとして構成される。ただし、制御部１２Ａは、第１実施形態と異なる機能を備える。図９に示す、設定部１２１、変換部１２２、物体検知部１２３、および、トラッキング部１２４は、制御部１２ＡのＣＰＵが記憶部１３に記憶されるプログラムに従って演算処理を実行することにより実現される制御部１２Ａの機能である。換言すると、画像処理装置１Ａは、設定部１２１と、変換部１２２と、物体検知部１２３と、トラッキング部１２４と、を備える。 The control unit 12A is a controller that comprehensively controls the entire image processing device 1A as in the first embodiment. The control unit 12A is configured as a computer including, for example, a CPU, a RAM, a ROM, and the like. However, the control unit 12A has a function different from that of the first embodiment. The setting unit 121, the conversion unit 122, the object detection unit 123, and the tracking unit 124 shown in FIG. 9 are realized by the CPU of the control unit 12A executing arithmetic processing according to a program stored in the storage unit 13. This is a function of the control unit 12A. In other words, the image processing device 1A includes a setting unit 121, a conversion unit 122, an object detection unit 123, and a tracking unit 124.

なお、制御部１２Ａの各部１２１〜１２４の少なくともいずれか１つは、ＡＳＩＣ、ＦＰＧＡ、ＧＰＵ等のハードウェアで構成されてもよい。また、各部１２１〜１２４は、概念的な構成要素である。１つの構成要素が実行する機能を複数の構成要素に分散させたり、複数の構成要素が有する機能を１つの構成要素に統合させたりしてよい。 In addition, at least one of each unit 121 to 124 of the control unit 12A may be configured by hardware such as ASIC, FPGA, and GPU. Further, each part 121 to 124 is a conceptual component. The functions executed by one component may be distributed to a plurality of components, or the functions of the plurality of components may be integrated into one component.

設定部１２１、変換部１２２、および、物体検知部１２３の構成は、第１実施形態と同様であるために、その説明を省略する。 Since the configurations of the setting unit 121, the conversion unit 122, and the object detection unit 123 are the same as those in the first embodiment, the description thereof will be omitted.

なお、物体検知のリアルタイム性を向上するという目的においては、第２実施形態の物体検知部１２３は、必ずしも、第１実施形態における物体の検知処理を行わなくてもよい。物体検知部１２３は、撮影画像３から大局的に物体を検知できるアルゴリズムを備えていればよい。物体検知部１２３は、ディープラーニングにより得られた学習済みモデル（ＣＮＮ等）を用いて物体の検知を行う公知の構成であってもよい。この場合において、設定部１２１および変換部１２２は設けられなくてよい。 For the purpose of improving the real-time property of object detection, the object detection unit 123 of the second embodiment does not necessarily have to perform the object detection process of the first embodiment. The object detection unit 123 may be provided with an algorithm that can detect an object from the captured image 3 in a global manner. The object detection unit 123 may have a known configuration for detecting an object using a learned model (CNN or the like) obtained by deep learning. In this case, the setting unit 121 and the conversion unit 122 may not be provided.

トラッキング部１２４は、局所的に物体を検知し、物体検知部１２３よりも高速に物体を検知できるアルゴリズムを備える。トラッキング部１２４は、物体検知部１２３と交替して物体の検知を行う。詳細には、トラッキング部１２４は、先に取得された撮影画像３に対する物体検知部１２３の物体の検知結果に基づいて、現在取得されている撮影画像３に対して探索範囲を設定する。そして、トラッキング部１２４は、探索範囲内において物体の追跡を行う。トラッキング部１２４は、撮影画像３のうち、物体が検知されそうな範囲に絞って物体の追跡を行うために処理負荷を小さくして物体の追跡を行うことができる。 The tracking unit 124 includes an algorithm that can detect an object locally and detect the object at a higher speed than the object detection unit 123. The tracking unit 124 replaces the object detection unit 123 to detect an object. Specifically, the tracking unit 124 sets a search range for the currently acquired captured image 3 based on the detection result of the object of the object detection unit 123 with respect to the previously acquired captured image 3. Then, the tracking unit 124 tracks the object within the search range. The tracking unit 124 can track the object by reducing the processing load in order to track the object in the range in which the object is likely to be detected in the captured image 3.

図１０は、トラッキング部１２４の機能を説明するための図である。図１０において、太い破線で示す枠Ｂは、現在より１フレーム前のフレーム画像において物体検知部１２３が検知した物体の位置を示すバウンディングボックスを現在のフレーム画像３に便宜的に重ねて示したものである。 FIG. 10 is a diagram for explaining the function of the tracking unit 124. In FIG. 10, the frame B shown by the thick broken line is a bounding box showing the position of the object detected by the object detection unit 123 in the frame image one frame before the present, which is conveniently superimposed on the current frame image 3. Is.

トラッキング部１２４は、この１フレーム前のフレーム画像で得られたバウンディングボックスＢの位置に基づいて、現在のフレーム画像３に対して探索範囲４０を設定する。探索範囲４０は、物体が移動する可能性があることを考慮に入れて、例えば、先のフレーム画像におけるバウンディングボックスＢを囲むように設定される。すなわち、探索範囲４０は、先のフレーム画像におけるバウンディングボックスＢより大きく設定される。検知対象となる物体の種類によって、例えば１フレーム前の撮影タイミングから現在フレームの撮影タイミングまでに動くことができる範囲が異なる。このために、検知対象となる物体の種類によって、探索範囲４０が変更されることが好ましい。例えば、検知対象が人の顔である場合に比べて、検知対象がナンバープレートである場合の方が探索範囲は広く設定されてよい。 The tracking unit 124 sets the search range 40 for the current frame image 3 based on the position of the bounding box B obtained in the frame image one frame before. The search range 40 is set to surround the bounding box B in the previous frame image, for example, in consideration of the possibility that the object may move. That is, the search range 40 is set to be larger than the bounding box B in the previous frame image. Depending on the type of the object to be detected, for example, the range in which the object can move from the shooting timing one frame before to the shooting timing of the current frame differs. Therefore, it is preferable that the search range 40 is changed depending on the type of the object to be detected. For example, the search range may be set wider when the detection target is a license plate than when the detection target is a human face.

なお、トラッキング部１２４は、現在より１フレーム前のフレーム画像において複数の物体が検知されている場合には、複数の物体のそれぞれに対して探索範囲４０を設定する。図１０に示す例では、人Ｈ１の顔Ｆと、人Ｈ２の顔Ｆとの２つが物体検知部１２３により検知されているために、２つの顔Ｆのそれぞれに対して探索範囲４０が設定されている。 When a plurality of objects are detected in the frame image one frame before the present, the tracking unit 124 sets the search range 40 for each of the plurality of objects. In the example shown in FIG. 10, since the face F of the person H1 and the face F of the person H2 are detected by the object detection unit 123, the search range 40 is set for each of the two faces F. ing.

また、探索範囲４０は、過去の物体の動きを示す軌跡情報に基づいて変更されてよい。軌跡情報は、例えば、過去の複数フレームにおける同一物体のバウンディングボックスＢの中心位置を結んで得ることができる。図１１は、軌跡情報に基づく探索範囲４０の変更を説明するための図である。図１１に示す例では、軌跡情報から物体（顔）の移動方向が太矢印Ｘの方向であることが予想されている。この軌跡情報に基づく予想を考慮して、探索範囲４０は、バウンディングボックスＢを基準として太矢印Ｘの方向に偏って広く設定されている。太矢印Ｘと逆方向には、物体が移動する可能性が低いと考えられるためである。矢印Ｘの方向が変わった場合に、探索範囲４０は変更される。 Further, the search range 40 may be changed based on the locus information indicating the movement of the past object. The locus information can be obtained, for example, by connecting the center positions of the bounding boxes B of the same object in a plurality of past frames. FIG. 11 is a diagram for explaining the change of the search range 40 based on the locus information. In the example shown in FIG. 11, it is expected that the moving direction of the object (face) is the direction of the thick arrow X from the locus information. In consideration of the prediction based on this locus information, the search range 40 is set wide and biased in the direction of the thick arrow X with reference to the bounding box B. This is because it is considered unlikely that the object will move in the direction opposite to the thick arrow X. When the direction of the arrow X changes, the search range 40 is changed.

このように、探索範囲４０を一律に設定するのではなく、軌跡情報に応じて変更する構成とすると、物体の追跡のために物体を探索する範囲を物体が存在する可能性が高い範囲に絞ることができ、追跡処理の処理速度を速くすることができる。なお、図１１に示す例では、軌跡情報から移動方向の傾向のみを取り出して探索範囲４０を変更する構成としたが、これは例示にすぎない。例えば、軌跡情報から移動方向に加えて移動速度の傾向も取り出し、移動方向に移動速度も加味して探索範囲４０が設定されてもよい。 In this way, if the search range 40 is not set uniformly but is changed according to the trajectory information, the range for searching the object for tracking the object is narrowed down to the range where the object is likely to exist. It is possible to increase the processing speed of the tracking process. In the example shown in FIG. 11, only the tendency in the moving direction is extracted from the locus information and the search range 40 is changed, but this is only an example. For example, the search range 40 may be set by extracting the tendency of the moving speed in addition to the moving direction from the locus information and taking the moving speed into consideration in the moving direction.

トラッキング部１２４は、例えばテンプレートマッチングにより物体の追跡を行う。トラッキング部１２４は、例えば、１フレーム前の物体検知部１２３による物体検知の結果から、物体のテンプレート画像を得る。そして、トラッキング部１２４は、探索範囲４０内において、テンプレート画像と同じパターンの画像を探索する。トラッキング部１２４は、類似度が閾値以上のパターンを見つけると、当該領域を追跡対象の物体として検知する。例えば、トラッキング部１２４は、物体の追跡に成功すると、物体検知部１２３と同様に、検知した物体の位置にバウンディングボックスＢを施す。 The tracking unit 124 tracks an object by, for example, template matching. The tracking unit 124 obtains a template image of an object from, for example, the result of object detection by the object detection unit 123 one frame before. Then, the tracking unit 124 searches for an image having the same pattern as the template image within the search range 40. When the tracking unit 124 finds a pattern having a similarity equal to or higher than a threshold value, the tracking unit 124 detects the region as an object to be tracked. For example, when the tracking unit 124 succeeds in tracking an object, the tracking unit 124 applies a bounding box B to the position of the detected object in the same manner as the object detection unit 123.

なお、トラッキング部１２４は、軌跡情報に応じてテンプレート画像の拡縮を行ってテンプレートマッチングを行ってもよい。例えば、軌跡情報から物体がカメラ２に接近していると判断される場合、テンプレート画像は拡大される。一方、軌跡情報から物体がカメラ２から離れていると判断される場合、テンプレート画像は縮小される。また、トラッキング部１２４は、上述のテンプレートマッチングを用いた方法ではなく、例えば、ＫＣＦ（Kernelized Correlation Filter）等の他のトラッキング手法により探索範囲４０内の物体の追跡を行ってよい。 The tracking unit 124 may perform template matching by scaling the template image according to the trajectory information. For example, when it is determined from the trajectory information that the object is approaching the camera 2, the template image is enlarged. On the other hand, when it is determined from the trajectory information that the object is away from the camera 2, the template image is reduced. Further, the tracking unit 124 may track an object within the search range 40 by another tracking method such as KCF (Kernelized Correlation Filter), instead of the method using the above-mentioned template matching.

（２−２．画像処理装置の動作例）
図１２は、第２実施形態に係る画像処理装置１Ａの動作例を説明するための図である。図１２において、破線矢印で示す「Ｉｎ」は、画像処理装置１Ａがカメラ２から撮影画像３を取得するタイミングを示す。図１２に示すように、撮影画像３を取得するタイミングは、所定の周期（例えば１／３０秒）で発生する。 (2-2. Operation example of image processing device)
FIG. 12 is a diagram for explaining an operation example of the image processing device 1A according to the second embodiment. In FIG. 12, “In” indicated by the broken line arrow indicates the timing at which the image processing device 1A acquires the captured image 3 from the camera 2. As shown in FIG. 12, the timing of acquiring the captured image 3 occurs in a predetermined cycle (for example, 1/30 second).

図１２において、太矢印は処理が実行されている状態を示す。図１２に示す例では、最初の撮影画像３が取得されると、物体検知部１２３による物体の検知処理が行われる。詳細には、第１実施形態の場合と同様に（図５参照）、物体検知部１２３の処理が行われる前に、設定部１２１および変換部１２２による処理が実行される。このために、図１２において、物体検知部１２３の処理が実行されている状態を示す太矢印は、設定部１２１および変換部１２２による処理も含む。 In FIG. 12, the thick arrow indicates the state in which the process is being executed. In the example shown in FIG. 12, when the first captured image 3 is acquired, the object detection unit 123 performs the object detection process. Specifically, as in the case of the first embodiment (see FIG. 5), the processing by the setting unit 121 and the conversion unit 122 is executed before the processing of the object detection unit 123 is performed. Therefore, in FIG. 12, the thick arrow indicating the state in which the processing of the object detection unit 123 is being executed includes the processing by the setting unit 121 and the conversion unit 122.

物体検知部１２３による処理が実行されている間は、トラッキング部１２４による処理は実行されない。物体検知部１２３による物体検知が完了すると、トラッキング部１２４による処理が実行される。トラッキング部１２４による処理が実行されている間には、物体検知部１２３による処理が実行されない。すなわち、物体検知部１２３と、トラッキング部１２４とは、交替で動作する。 While the process by the object detection unit 123 is being executed, the process by the tracking unit 124 is not executed. When the object detection by the object detection unit 123 is completed, the processing by the tracking unit 124 is executed. While the processing by the tracking unit 124 is being executed, the processing by the object detection unit 123 is not executed. That is, the object detection unit 123 and the tracking unit 124 operate alternately.

なお、本実施形態では、１フレームごとに、物体検知部１２３とトラッキング部１２４とが交互に処理を行う。ただし、これは例示である。例えば、物体検知部１２３によるフレーム画像の処理が完了したのち、後続する２つ以上のフレーム画像に対してトラッキング部１２４による処理が行われてもよい。この場合には、トラッキング部１２４は、１フレーム前のフレーム画像３に対するトラッキング部１２４の追跡結果に基づいて現在のフレーム画像３に対して探索範囲を設定し、探索範囲内において物体の追跡を行うことがある。 In the present embodiment, the object detection unit 123 and the tracking unit 124 alternately perform processing for each frame. However, this is an example. For example, after the processing of the frame image by the object detection unit 123 is completed, the processing by the tracking unit 124 may be performed on the subsequent two or more frame images. In this case, the tracking unit 124 sets a search range for the current frame image 3 based on the tracking result of the tracking unit 124 for the frame image 3 one frame before, and tracks the object within the search range. Sometimes.

ディープラーニングによる学習を行った学習済みモデルを用いた物体の検知は、処理負担が大きく、処理に時間を要することがある。物体の検知精度を高めようとすると、処理時間が長くなる傾向がある。図１２に示す例では、物体検知部１２３の処理時間が長く、次のフレーム画像３を取得するまでに、現在のフレーム画像３における物体検知部１２３による物体の検知処理は完了しない。 Detection of an object using a trained model trained by deep learning has a large processing load and may take a long time to process. When trying to improve the detection accuracy of an object, the processing time tends to be long. In the example shown in FIG. 12, the processing time of the object detection unit 123 is long, and the object detection process by the object detection unit 123 in the current frame image 3 is not completed by the time the next frame image 3 is acquired.

ただし、物体検知部１２３による物体の検知処理が完了すると、次のフレーム画像３に対して、処理速度が速いトラッキング部１２４による物体の追跡処理が行われる。トラッキング部１２４による処理は、その次のフレーム画像３が取得されるまでに完了する。すなわち、２フレーム単位でみると、フレームごとの物体の検知が次のフレームが取得されるまでに完了する。トラッキング部１２４による処理が完了すると、再び、物体検知部１２３による処理が行われ、物体検知部１２３による処理と、トラッキング部１２４による処理とが交互に繰り返される。 However, when the object detection process by the object detection unit 123 is completed, the object tracking process by the tracking unit 124, which has a high processing speed, is performed on the next frame image 3. The process by the tracking unit 124 is completed by the time the next frame image 3 is acquired. That is, when viewed in units of two frames, the detection of the object for each frame is completed by the time the next frame is acquired. When the processing by the tracking unit 124 is completed, the processing by the object detection unit 123 is performed again, and the processing by the object detection unit 123 and the processing by the tracking unit 124 are alternately repeated.

本実施形態によれば、検知精度の向上を狙った物体検知部１２３による処理が行われた後に、処理速度の向上を狙ったトラッキング部１２４による処理が行われ、当該交互処理が繰り返される。このために、本実施形態によれば、物体の検知精度を高めつつ、物体検知のリアルタイム性の低下を抑制することができる。 According to the present embodiment, after the processing by the object detection unit 123 aiming at the improvement of the detection accuracy is performed, the processing by the tracking unit 124 aiming at the improvement of the processing speed is performed, and the alternating processing is repeated. Therefore, according to the present embodiment, it is possible to suppress the deterioration of the real-time property of the object detection while improving the detection accuracy of the object.

＜３．留意事項等＞
本明細書中に開示されている種々の技術的特徴は、上記実施形態のほか、その技術的創作の主旨を逸脱しない範囲で種々の変更を加えることが可能である。すなわち、上記実施形態は、全ての点で例示であって、制限的なものではないと考えられるべきであり、本発明の技術的範囲は、上記実施形態の説明ではなく、特許請求の範囲によって示されるものであり、特許請求の範囲と均等の意味及び範囲内に属する全ての変更が含まれると理解されるべきである。また、本明細書中に示される複数の実施形態及び変形例は可能な範囲で適宜組み合わせて実施されてよい。 <3. Points to note>
The various technical features disclosed herein can be modified in addition to the above embodiments without departing from the gist of the technical creation. That is, it should be considered that the above embodiments are exemplary in all respects and are not restrictive, and the technical scope of the present invention is not the description of the above embodiments but the claims. It is shown and should be understood to include all modifications that fall within the meaning and scope of the claims. In addition, a plurality of embodiments and modifications shown in the present specification may be appropriately combined and implemented to the extent possible.

１、１Ａ・・・画像処理装置
３１・・・検知対象領域
４０・・・探索範囲
１２１・・・設定部
１２２・・・変換部
１２３・・・物体検知部
１２４・・・トラッキング部 1, 1A ... Image processing device 31 ... Detection target area 40 ... Search range 121 ... Setting unit 122 ... Conversion unit 123 ... Object detection unit 124 ... Tracking unit

Claims

A setting unit that sets multiple detection target areas that try to detect an object in the acquired captured image,
A conversion unit that converts at least the images of the plurality of detection target areas among the captured images from a color image to a grayscale image, and
An object detection unit that has a plurality of channels for separately inputting the grayscale image of each of the plurality of detection target areas and outputs the detection result of the object based on the detection processing of the object performed for each channel.
An image processing device.

The image processing device according to claim 1, wherein the setting unit sets a plurality of detection target areas for a part of a range of the captured image.

Claim 1 or 2 that the object detection unit integrates the processing result for each channel obtained in the coordinate system of each detection target area into the coordinate system of the entire captured image and outputs the detection result of the object. The image processing apparatus according to.

The image processing apparatus according to any one of claims 1 to 3, wherein the resolution of the grayscale image input to each of the channels is changed according to a position where the detection target area is set.

The plurality of detection target areas are three.
The image processing apparatus according to any one of claims 1 to 4, wherein the plurality of channels are three.

Based on the detection result of the object of the object detection unit for the captured image acquired earlier, a search range is set for the captured image currently acquired, and the tracking of the object is performed within the search range. The image processing apparatus according to any one of claims 1 to 5, further comprising a tracking unit for performing the image processing.

The image processing apparatus according to claim 6, wherein the search range is changed based on trajectory information indicating the movement of the object in the past.

The image processing apparatus according to any one of claims 1 to 7, wherein the setting unit sets a region having a large amount of change as the detection target region by comparing the captured image with a background image prepared in advance. ..

A setting process that sets multiple detection target areas that try to detect an object in the acquired captured image, and
A conversion step of converting at least the images of the plurality of detection target areas among the captured images from a color image to a grayscale image, and
An object detection step of inputting the grayscale images of each of the plurality of detection target areas to different channels and outputting the detection result of the object based on the detection process of the object performed for each channel.
An image processing method.