JP2019219804A

JP2019219804A - Object detection device and object detection method

Info

Publication number: JP2019219804A
Application number: JP2018115379A
Authority: JP
Inventors: 清柱段; Seichu Dan
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-06-18
Filing date: 2018-06-18
Publication date: 2019-12-26

Abstract

To provide an object detection device and an object detection method which can improve accuracy in detecting an object located deep in an imaging space with less increase in cost for object detection.SOLUTION: A human head detection device 6 comprises: a detection region setting unit 35 which sets a detection region for person detection to an input image; a preprocessing unit 32 which subjects the input mage to preprocessing including projective transformation; a preprocessing parameter calculation unit 37 which generates a parameter for use in the preprocessing; and a human head detection unit 34 which detects a person from the image which is subjected to the preprocessing by using a prescribed detection model 42. The preprocessing parameter calculation unit 37 generates the parameter for the preprocessing including the projective transformation on the basis of a size on the image of a human head which is calculated from a camera parameter 44 at the time of imaging with a camera on the assumption that the human head has a fixed size, and a recommended size on the image of the human head which can be detected by using the detection model.SELECTED DRAWING: Figure 3

Description

本発明は、カメラ等を用いて人物等の物体を検出する物体検出装置及び物体検出方法に関する。 The present invention relates to an object detection device and an object detection method for detecting an object such as a person using a camera or the like.

監視カメラにて人物等の物体を検出する際、撮影画像内での物体のサイズや向きに関わらず、物体を正しく検出できることが要求される。その手法の１つとして、特許文献１には、撮影画像に射影変換を施した後で、機械学習などの手段により物体検出を行うことが開示されている。すなわち、人物等の物体の姿勢を判定するために、特定方向から撮影した学習画像を用いて特定姿勢の所定物体の特徴を学習しておき、所定物体がとり得る複数通りの姿勢を仮定して、仮定した姿勢ごとに当該姿勢の所定物体の像を特定姿勢の像に変換する射影変換を入力画像に施す。そして、射影変換を施した入力画像に窓領域を設定し、窓領域に特定姿勢の所定物体の特徴が現れている度合いであるスコアを算出することで、スコアが最も高い姿勢の所定物体が入力画像に撮影されていると判定する構成となっている。 When an object such as a person is detected by a surveillance camera, it is required that the object can be correctly detected regardless of the size and orientation of the object in a captured image. As one of the techniques, Patent Document 1 discloses that after projective transformation is performed on a captured image, object detection is performed by means such as machine learning. That is, in order to determine the posture of an object such as a person, the characteristics of a predetermined object in a specific posture are learned using a learning image taken from a specific direction, and a plurality of possible postures of the predetermined object are assumed. For each assumed posture, the input image is subjected to projective transformation for converting an image of a predetermined object in the posture into an image of a specific posture. Then, a window area is set in the input image that has been subjected to the projective transformation, and a score that is a degree at which the characteristic of the predetermined object in the specific posture appears in the window area is calculated. It is configured to determine that an image has been captured.

特開２０１７−０４９６７６号公報JP 2017-049676 A

近年、監視カメラの解像度が向上し、畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Numeral Network）を代表とする深層学習を用いて、物体検出を高精度に行うことが可能になっている。ＣＮＮでは基本的に矩形画像を入力し、画像内から所定の物体の検出を行う。その際計算が複雑になるため、よく利用される深層学習のモデルでは小さい画像で学習及び検出を行う。また物体検出アルゴリズムであるＳＳＤ（Single Shot Multibox Detector）では、検出できる物体のサイズは学習時のモデルに依存し、一定サイズ以上の物体しか検出できない。つまり、検出できる物体の画像内サイズは、学習時の画像内モデルサイズに依存する（以下、最小検出サイズと呼ぶ）。 In recent years, the resolution of surveillance cameras has been improved, and it has become possible to perform object detection with high accuracy using deep learning represented by a convolutional neural network (CNN). The CNN basically inputs a rectangular image and detects a predetermined object from the image. At that time, the calculation becomes complicated, and the learning and detection are performed with a small image in a deep learning model that is often used. In an SSD (Single Shot Multibox Detector) which is an object detection algorithm, the size of an object that can be detected depends on a model at the time of learning, and only an object having a certain size or more can be detected. That is, the size of the detectable object in the image depends on the model size in the image at the time of learning (hereinafter, referred to as a minimum detection size).

高解像度で大きな画像の場合には、撮影空間の奥の物体まで検出できるが、処理負荷（以下、検出コスト）が増大する。このため、画像を縮小してから物体検出を行うことになるが、奥にある物体は画像内サイズが最小検出サイズに近づき、検出精度が低下する。一方、撮影空間の手前にある物体は、画面内サイズが最小検出サイズに比べて非常に大きくなるので、無駄なコストを費やすことになる。このように、奥行きのある撮影空間においては、物体の検出コストの低減と検出精度の向上を両立させることは困難であった。 In the case of a large image with a high resolution, it is possible to detect an object deep in the shooting space, but the processing load (hereinafter, detection cost) increases. Therefore, the object detection is performed after the image is reduced. However, the size of the object at the back approaches the minimum detection size, and the detection accuracy is reduced. On the other hand, since the size of the object in front of the shooting space is very large compared to the minimum detection size, unnecessary cost is consumed. As described above, in a deep imaging space, it has been difficult to achieve both reduction in the cost of detecting an object and improvement in the detection accuracy.

前記特許文献１に記載される技術は、物体の姿勢の変化に対しては有効であるものの、物体の画像内サイズの変化については特に考慮されていない。すなわち、同じ物体であっても撮影空間においてカメラからの距離に応じて画像内サイズが変化することについて、考慮されていない。また、前記特許文献１では、複数通りの姿勢を仮定してそれぞれに対応した射影変換を繰り返して行う必要があり、深層学習を用いる物体検出ではコストが増大するという課題がある。 The technique described in Patent Document 1 is effective for a change in the posture of an object, but does not particularly consider a change in the image size of the object. That is, even if the same object is used, the fact that the size in the image changes according to the distance from the camera in the shooting space is not considered. Further, in Patent Document 1, it is necessary to repeatedly perform projective transformation corresponding to each of a plurality of postures, and there is a problem that the cost increases in object detection using deep learning.

本発明の目的は、物体検出のコスト増加を抑えつつ撮影空間の奥にある物体の検出精度を向上させる物体検出装置及び物体検出方法を提供することである。 An object of the present invention is to provide an object detection device and an object detection method that improve the detection accuracy of an object located deep in an imaging space while suppressing an increase in the cost of object detection.

本発明の物体検出装置は、入力画像に対し物体を検出する検出領域を設定する検出領域設定部と、入力画像に対し射影変換を含む前処理を施す前処理部と、前処理の際に用いるパラメータを生成する前処理パラメータ算出部と、所定の検出モデルを用いて前処理を施した画像から物体を検出する物体検出部と、を備える。ここに前処理パラメータ算出部は、物体を一定のサイズと仮定してカメラの撮影時のカメラパラメータから算出される物体の画像上のサイズと、検出モデルを用いて検出可能な物体の画像上の推奨サイズに基づき、射影変換を含む前処理のパラメータを生成する。 The object detection device of the present invention includes a detection area setting unit that sets a detection area for detecting an object in an input image, a preprocessing unit that performs preprocessing including projective transformation on the input image, and a preprocessing unit. The apparatus includes a preprocessing parameter calculation unit that generates a parameter, and an object detection unit that detects an object from an image that has been preprocessed using a predetermined detection model. Here, the pre-processing parameter calculation unit estimates the size of the object on the image of the object calculated from the camera parameters at the time of photographing the camera, assuming that the object has a certain size, and the size of the image of the object that can be detected using the detection model. Based on the recommended size, parameters for preprocessing including projective transformation are generated.

また本発明の物体検出方法は、入力画像に対し前記物体を検出する検出領域を設定するステップと、入力画像に対し射影変換を含む前処理を施すステップと、前処理の際に用いるパラメータを生成する前処理パラメータの算出ステップと、所定の検出モデルを用いて前処理を施した画像から物体を検出するステップと、を備える。ここに前処理パラメータの算出ステップでは、物体を一定のサイズと仮定してカメラの撮影時のカメラパラメータから物体の画像上のサイズを算出し、検出モデルを用いて検出可能な物体の画像上の推奨サイズと比較して、射影変換を含む前処理のパラメータを生成する。 Further, the object detection method of the present invention includes a step of setting a detection area for detecting the object in an input image, a step of performing preprocessing including projective transformation on the input image, and generating a parameter used in the preprocessing. And a step of detecting an object from an image pre-processed using a predetermined detection model. Here, in the pre-processing parameter calculation step, the size of the object on the image of the object is calculated from the camera parameters at the time of photographing the camera, assuming that the object has a fixed size, Generate parameters for preprocessing including projective transformation in comparison with the recommended size.

本発明によれば、物体検出のコスト増加を抑えつつ撮影空間の奥にある物体の検出精度を向上させることが可能となる。 ADVANTAGE OF THE INVENTION According to this invention, it becomes possible to improve the detection precision of the object in the back of imaging space, suppressing the increase in the cost of object detection.

人数計測システムの全体の構成を示す図。The figure which shows the whole structure of a people counting system. 人物頭部検出装置のハードウェア構成を示す図。FIG. 2 is a diagram illustrating a hardware configuration of a human head detection device. 人物頭部検出装置のシステム構成を示す図。FIG. 1 is a diagram illustrating a system configuration of a human head detection device. 人物頭部検出装置の全体の動作フローを示す図。The figure which shows the whole operation flow of a person head detection apparatus. 事前設定（Ｓ１）の流れを示す図。The figure which shows the flow of advance setting (S1). カメラパラメータの設定例を示す図。FIG. 4 is a diagram illustrating an example of setting camera parameters. 検出モデル属性のデータ構造を示す図。The figure which shows the data structure of a detection model attribute. 検出領域設定の画面を示す図。The figure which shows the screen of a detection area setting. 検出領域情報のデータ構造を示す図。The figure which shows the data structure of detection area information. 前処理パラメータ生成（Ｓ２）の流れを示す図。The figure which shows the flow of preprocessing parameter generation (S2). 画面上頭部サイズ一覧のデータ構造を示す図。The figure which shows the data structure of a head size list on a screen. 前処理パラメータ４６のデータ構造を示す図。FIG. 6 is a diagram showing a data structure of a preprocessing parameter 46. 前処理パラメータの確認画面を示す図。The figure which shows the confirmation screen of a pre-processing parameter. 前処理（Ｓ３）の流れを示す図。The figure which shows the flow of pre-processing (S3). 人数計測処理（Ｓ４）の流れを示す図。The figure which shows the flow of a number measurement process (S4). 人物検出結果の表示例を示す図。The figure which shows the example of a display of a person detection result. 人物検出結果のデータ構造を示す図。The figure which shows the data structure of a person detection result.

以下、本発明の物体検出装置の実施形態として、人物の頭部を検出する人物頭部検出装置と、これを用いて道路等を通行する人物の数を計測する人数計測システムについて説明する。 Hereinafter, as an embodiment of the object detection device of the present invention, a person head detection device that detects a person's head and a number measurement system that uses the device to measure the number of people traveling on a road or the like will be described.

図１は、人数計測システムの全体の構成を示す図である。人数計測システム１は、道路等の特定の領域２を通行する人物３を検出し、検出した人数の結果を時刻順に保存するシステムである。人数計測システム１は、特定の領域２を撮影する監視カメラ４と、ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）、ＶＰＮ（Virtual Private Network）などのネットワーク５と、ネットワーク５を介して監視カメラ４の画像を受信し、画像内に写っている人物３の頭部を検出し、検出した人数を計測する人物頭部検出装置６と、を有する。さらに、監視カメラ４の画像及び人物頭部検出装置６による検出結果をリアルタイムに表示する映像表示装置７と、検出した人数情報を時刻ごとに蓄積する人数履歴蓄積装置８と、を備えて構成される。このシステムでは人数の計測を目的としているため、人物頭部検出装置６では人物３の頭部を検出する方式として処理効率を上げている。以下、人物頭部検出装置６の詳細を説明する。 FIG. 1 is a diagram showing the overall configuration of the number measurement system. The people counting system 1 is a system that detects a person 3 passing through a specific area 2 such as a road, and saves the results of the detected number of people in chronological order. The people counting system 1 monitors a surveillance camera 4 for photographing a specific area 2, a network 5 such as a LAN (Local Area Network), a WAN (Wide Area Network), and a VPN (Virtual Private Network), and monitoring via the network 5. A human head detecting device 6 that receives the image of the camera 4, detects the head of the person 3 in the image, and measures the number of detected persons. Further, it is provided with a video display device 7 for displaying an image of the monitoring camera 4 and a detection result by the human head detection device 6 in real time, and a head count storage device 8 for storing detected head count information for each time. You. Since the purpose of this system is to measure the number of people, the human head detecting device 6 increases the processing efficiency as a method of detecting the head of the person 3. Hereinafter, the details of the human head detection device 6 will be described.

図２は、人物頭部検出装置６のハードウェア構成を示す図である。人物頭部検出装置６は、ＣＰＵ（Central Processing Unit）１１、ストレージ部１２、メモリ２０、入出力部２１、通信部２２と、これらを接続するバス１９を含んで構成される。 FIG. 2 is a diagram illustrating a hardware configuration of the human head detection device 6. The human head detecting device 6 includes a CPU (Central Processing Unit) 11, a storage unit 12, a memory 20, an input / output unit 21, a communication unit 22, and a bus 19 connecting these.

ＣＰＵ１１は、各種演算を実行するユニットである。ＣＰＵ１１は、ストレージ部１２からメモリ２０にロードした所定のプログラムを実行することにより、各種処理を実行する。メモリ２０には、ＣＰＵ１１により実行されるプログラムや、プログラムの実行に必要なデータが一時保存される。 The CPU 11 is a unit that executes various calculations. The CPU 11 executes various processes by executing a predetermined program loaded from the storage unit 12 to the memory 20. The memory 20 temporarily stores programs executed by the CPU 11 and data necessary for executing the programs.

ストレージ部１２は、デジタル情報を記憶可能なハードディスク（Hard Disk Drive）やＳＳＤ（Solid State Drive）、あるいはフラッシュメモリなどの不揮発性記憶装置である。ストレージ部１２には、以下のプログラムやデータが格納されている。
人数計測プログラム１３は、人数計測に係わる全てのコンピュータに対する命令を記述したものである。ＣＰＵ１１は人数計測プログラム１３をメモリ２０に展開して、人数計測の各種処理を行う。 The storage unit 12 is a nonvolatile storage device such as a hard disk (Hard Disk Drive), an SSD (Solid State Drive), or a flash memory capable of storing digital information. The storage unit 12 stores the following programs and data.
The number-of-people measurement program 13 describes instructions for all computers related to the number-of-people measurement. The CPU 11 develops the number-of-people measurement program 13 in the memory 20 and performs various processes of the number-of-people measurement.

検出モデルＤＢ（データベース）１４は、物体を識別するためのパラメータを格納する。例えば、ＳＶＭ（Support Vector Machine）識別器を用いて物体検出を行う場合、検出モデルＤＢ１４には、当該物体と他の物体の特徴量を区別するサポートベクターが格納されている。ＣＮＮを用いて物体検出を行う場合、検出モデルＤＢ１４には、ＣＮＮネットワークの構造を示すパラメータ及びＣＮＮネットワークの学習済みモデルの初期化数値が格納されている。本実施例では、ＣＮＮネットワークのモデル情報が蓄積されている場合を想定して説明する。
検出モデル属性ＤＢ１５は、検出モデルＤＢ１４に格納されているモデルについての、入力可能な画像サイズや検出可能な頭部サイズなどの属性を格納する。 The detection model DB (database) 14 stores parameters for identifying an object. For example, when an object is detected using an SVM (Support Vector Machine) identifier, the detection model DB 14 stores a support vector for distinguishing the feature amount between the object and another object. When performing object detection using the CNN, the detection model DB 14 stores parameters indicating the structure of the CNN network and initialization values of the learned model of the CNN network. In the present embodiment, description will be made on the assumption that model information of the CNN network is stored.
The detection model attribute DB 15 stores attributes of the model stored in the detection model DB 14, such as an inputtable image size and a detectable head size.

カメラパラメータＤＢ１６は、監視カメラ４の高さ、俯角などの設置情報、及び画角や焦点距離などのカメラの撮影パラメータを格納する。
検出領域設定ＤＢ１７は、特定領域２において人物３を検出したい領域を設定するための情報を格納する。 The camera parameter DB 16 stores installation information such as the height and depression angle of the monitoring camera 4 and shooting parameters of the camera such as an angle of view and a focal length.
The detection area setting DB 17 stores information for setting an area where the person 3 is to be detected in the specific area 2.

前処理パラメータＤＢ１８は、監視カメラ４からの入力画像に対して、検出精度を向上させるために射影変換などの前処理を行うためのパラメータを格納する。
入出力部２１は、ユーザの操作を入力したり、外部装置と送受信する画像やデータの信号変換を行う。例えば入出力部２１には、グラフィックボードやビデオカードなどが含まれ、監視カメラ４の画像や人数計測の結果などを映像表示装置７で表示可能な信号に変換する。
通信部２２は、ネットワーク５を介して監視カメラ４の画像を受信し、人物検出結果を映像表示装置７や人数履歴蓄積装置８に送信する。 The pre-processing parameter DB 18 stores parameters for performing pre-processing such as projective transformation on the input image from the monitoring camera 4 to improve detection accuracy.
The input / output unit 21 inputs a user operation and performs signal conversion of an image or data transmitted / received to / from an external device. For example, the input / output unit 21 includes a graphic board, a video card, and the like, and converts an image of the monitoring camera 4, a result of the measurement of the number of persons, and the like into a signal that can be displayed on the video display device 7.
The communication unit 22 receives an image of the monitoring camera 4 via the network 5 and transmits a person detection result to the video display device 7 and the number history storage device 8.

図３は、人物頭部検出装置６のシステム構成を示す図である。人物頭部検出装置６は、画像入力部３１、前処理部３２、特徴量抽出部３３、人物頭部検出部３４、検出領域設定部３５、画面上頭部サイズ算出部３６、前処理パラメータ算出部３７、検出結果統合部３８、操作入力部３９、出力制御部４０、及びプログラム動作用データ４１からなる。 FIG. 3 is a diagram illustrating a system configuration of the human head detection device 6. The human head detection device 6 includes an image input unit 31, a preprocessing unit 32, a feature amount extraction unit 33, a human head detection unit 34, a detection area setting unit 35, an on-screen head size calculation unit 36, a preprocessing parameter calculation It comprises a unit 37, a detection result integration unit 38, an operation input unit 39, an output control unit 40, and program operation data 41.

まず、プログラム動作用データ４１から説明すると、ＣＰＵ１１によりストレージ部１２からロードされたもので、以下のデータが含まれる。検出モデル４２は検出モデルＤＢ１４のデータであり、検出モデル属性４３は検出モデル属性ＤＢ１５のデータである。カメラパラメータ４４はカメラパラメータＤＢ１６のデータであり、検出領域情報４５は検出領域設定ＤＢ１７のデータである。前処理パラメータ４６は前処理パラメータＤＢ１８のデータである。 First, the program operation data 41 will be described. The data is loaded from the storage unit 12 by the CPU 11 and includes the following data. The detection model 42 is data of the detection model DB 14, and the detection model attribute 43 is data of the detection model attribute DB 15. The camera parameter 44 is data of the camera parameter DB 16, and the detection area information 45 is data of the detection area setting DB 17. The pre-processing parameter 46 is data of the pre-processing parameter DB 18.

画像入力部３１は、ネットワーク５を介して監視カメラ４から符号化された画像データを受信し、これを復号化してフレーム単位の画像に変換する。
前処理部３２は、画像入力部３１のカメラ画像（入力画像とも呼ぶ）に対して、前処理パラメータ４６を用いて、画像の切出しや縮小と射影変換などの前処理を実施する。前処理は、撮影空間の奥にいる人物を精度良く検出するために行う。 The image input unit 31 receives encoded image data from the surveillance camera 4 via the network 5, decodes the image data, and converts the image data into an image in frame units.
The pre-processing unit 32 uses the pre-processing parameter 46 to perform pre-processing such as image extraction, reduction, and projective transformation on the camera image (also referred to as an input image) of the image input unit 31. The pre-processing is performed to accurately detect a person in the back of the shooting space.

特徴量抽出部３３は、前処理部３２による処理後の画像に対して、ＣＮＮネットワークなどを用いて特徴量の抽出を行う。
人物頭部検出部３４は、検出モデル４２と特徴量抽出部３３で抽出した特徴量を用いて、人物頭部の有無を判定し、頭部ありの場合、さらに頭部の中心位置と頭部サイズ（幅と高さ）を算出する。 The feature amount extraction unit 33 extracts a feature amount from the image processed by the preprocessing unit 32 using a CNN network or the like.
The human head detection unit 34 determines the presence or absence of a human head using the detection model 42 and the feature amount extracted by the feature amount extraction unit 33, and if there is a head, furthermore, the center position of the head and the head Calculate the size (width and height).

検出領域設定部３５は、画像入力部３１からのカメラ画像に対して、どの領域の人物を検出すべきかを設定する。カメラ画像のサイズ（横×縦）が大きい場合、ＣＮＮを用いて検出を行うために適切な画像サイズに縮小する。その際、検出領域を設定しないと、カメラ画像全体を所定サイズに縮小する結果、人物の頭部サイズが小さすぎて検出精度が低下することになる。そこで、検出領域を設定することで画像の縮小率を抑え、検出精度の低下を回避することができる。また、検出領域を設定することで、重視すべき領域が分かり、その領域に特化した前処理パラメータを算出することが可能になる。 The detection area setting section 35 sets which area of the camera image from the image input section 31 should be detected. If the size of the camera image (horizontal x vertical) is large, the image is reduced to an appropriate image size for detection using CNN. At this time, if the detection area is not set, the whole camera image is reduced to a predetermined size, and as a result, the head size of the person is too small and the detection accuracy is reduced. Therefore, by setting the detection area, the reduction ratio of the image can be suppressed, and a decrease in detection accuracy can be avoided. Further, by setting the detection area, it is possible to know the area to be emphasized, and to calculate a preprocessing parameter specialized for that area.

画面上頭部サイズ算出部３６は、所定のサイズ（身長、頭部サイズなど）の人物が画面上の各位置の地面に立っていると仮定して、画面上の各位置における人物頭部の画面上サイズを算出する。この計算では、監視カメラ４のカメラパラメータ４４を用いる。すなわち、画面内に見える人物頭部の大きさは、カメラと人物の位置の幾何学関係から決定され、カメラから距離が遠くなるほど小さく見えるからである。 The on-screen head size calculation unit 36 assumes that a person of a predetermined size (height, head size, etc.) is standing on the ground at each position on the screen, and Calculate the size on the screen. In this calculation, the camera parameters 44 of the monitoring camera 4 are used. That is, the size of the person's head seen in the screen is determined from the geometrical relationship between the position of the camera and the person, and the smaller the distance from the camera, the smaller it looks.

前処理パラメータ算出部３７は、画面上頭部サイズ算出部３６の算出結果に基づき、前処理部３２にて射影変換などの前処理を行うときに用いる前処理パラメータ４６を算出する。前処理パラメータ４６には、画像の切出し領域や縮小率の情報も含む。
検出結果統合部３８は、人物頭部検出部３４の検出結果をもとに、人物の有無の情報と、検出した人物の座標（中心座標及び人物頭部に重畳する矩形マークの座標であって、前処理の前の画像における座標）を算出して、これらを統合して人物検出結果を出力する。 The preprocessing parameter calculation unit 37 calculates a preprocessing parameter 46 used when the preprocessing unit 32 performs preprocessing such as projective transformation based on the calculation result of the on-screen head size calculation unit 36. The pre-processing parameter 46 also includes information on an image cut-out area and a reduction ratio.
Based on the detection result of the human head detection unit 34, the detection result integration unit 38 outputs information on the presence or absence of a person and the coordinates of the detected person (center coordinates and coordinates of a rectangular mark superimposed on the human head. , The coordinates in the image before the pre-processing) are calculated, and these are integrated to output a person detection result.

操作入力部３９は、キーボードやマウスなどのデバイスであり、ユーザの操作を識別して操作命令に変換する。
出力制御部４０は、検出結果統合部３８からの人物検出結果と画像入力部３１からのカメラ画像を、ユーザが視認可能な形式に変換して映像表示装置７にて表示させる。また、人物検出結果を人数履歴蓄積装置８へ出力して蓄積させる。 The operation input unit 39 is a device such as a keyboard and a mouse, and identifies a user operation and converts it into an operation command.
The output control unit 40 converts the person detection result from the detection result integration unit 38 and the camera image from the image input unit 31 into a format that can be visually recognized by the user, and causes the video display device 7 to display the converted image. Further, the person detection result is output to the number history storage device 8 and stored.

次に、人物頭部検出装置６の動作を説明する。
図４は、人物頭部検出装置６の全体の動作フローを示す図である。動作フローは大きく４段階に分かれ、事前設定（Ｓ１）、前処理パラメータ生成（Ｓ２）、前処理（Ｓ３）、人数計測（Ｓ４）の順に行う。その概要は次の通りである。 Next, the operation of the human head detection device 6 will be described.
FIG. 4 is a diagram illustrating an overall operation flow of the human head detection device 6. The operation flow is roughly divided into four stages, and is performed in the order of pre-setting (S1), generation of pre-processing parameters (S2), pre-processing (S3), and measurement of the number of people (S4). The outline is as follows.

事前設定（Ｓ１）では、人物検出で使用する検出モデルやカメラパラメータなどの条件を設定する。
前処理パラメータ生成（Ｓ２）では、カメラパラメータから人物頭部の画面上のサイズを算出し、それに基づき射影変換を含む前処理パラメータを生成する。
前処理（Ｓ３）では、前処理パラメータを用いて、入力画像に対し射影変換を含む前処理を実施する。
人数計測（Ｓ４）では、検出モデルを用いて前処理を施した画像から人物頭部を検出し、検出した人数と位置の情報を出力する。
以下、各段階の処理について詳細に説明する。 In the preliminary setting (S1), conditions such as a detection model and camera parameters used for human detection are set.
In the pre-processing parameter generation (S2), the size of the human head on the screen is calculated from the camera parameters, and the pre-processing parameters including the projective transformation are generated based on the size.
In the preprocessing (S3), preprocessing including projective transformation is performed on the input image using the preprocessing parameters.
In the number of people measurement (S4), a human head is detected from an image that has been subjected to preprocessing using a detection model, and information on the detected number of people and the position is output.
Hereinafter, the processing of each stage will be described in detail.

（Ｓ１）事前設定（ステップＳ１０１〜Ｓ１０４）
図５は、事前設定の流れを示す図である。事前設定では、人物検出を行うための様々な条件を入力・設定する。 (S1) Prior setting (steps S101 to S104)
FIG. 5 is a diagram showing the flow of the advance setting. In the preliminary setting, various conditions for performing person detection are input and set.

ステップＳ１０１では、カメラパラメータ４４を設定する。カメラパラメータ４４は、監視カメラ４の仕様書や設置情報をもとに手動入力する。あるいは、カメラパラメータの推定手段による自動入力でも可能である。 In step S101, camera parameters 44 are set. The camera parameters 44 are manually input based on the specifications and installation information of the monitoring camera 4. Alternatively, automatic input by the camera parameter estimating means is also possible.

図６は、カメラパラメータ４４の設定例を示す図である。設定項目は、カメラの設置高さ、俯角、ロール角、焦点距離、イメージセンサーの横サイズＣｘ、イメージセンサーの縦サイズＣｙ、カメラ画像の横幅Ｗcam、カメラ画像の縦幅Ｈcamなどである。 FIG. 6 is a diagram showing a setting example of the camera parameters 44. The setting items include the installation height of the camera, the depression angle, the roll angle, the focal length, the horizontal size Cx of the image sensor, the vertical size Cy of the image sensor, the horizontal width Wcam of the camera image, the vertical width Hcam of the camera image, and the like.

ステップＳ１０２では、検出モデル４２の指定として、人物頭部検出で利用すべき学習済みモデルを指定する。学習済みモデルとは、例えばＣＮＮなどのニューラルネットワークを採用し、大量の同じサイズの人物頭部が含まれる画像を使ってネットワークの構造パラメータを算出したものである。 In step S102, a learned model to be used in human head detection is specified as the detection model 42. The trained model is a model in which a neural network such as a CNN is adopted, and the structural parameters of the network are calculated using an image including a large amount of a human head of the same size.

ステップＳ１０３では、検出モデル属性４３を指定する。
図７は、検出モデル属性４３のデータ構造を示す図である。検出モデル属性４３の項目は、検出モデル４２における推奨画像横幅Ｗopt、推奨画像縦幅Ｈopt、及び推奨頭部サイズＶopt（幅×高さ）からなる。 In step S103, the detection model attribute 43 is specified.
FIG. 7 is a diagram illustrating a data structure of the detection model attribute 43. Items of the detection model attribute 43 include a recommended image width Wopt, a recommended image vertical width Hopt, and a recommended head size Vopt (width × height) in the detection model 42.

推奨画像サイズＷopt，Ｈoptは、検出モデル４２が受付可能となる入力画像のサイズを指定する。もしも、入力画像のサイズが推奨画像サイズと異なる場合は、入力画像のサイズを変更（縮小／拡大）する。 The recommended image sizes Wopt and Hopt specify the size of the input image that the detection model 42 can accept. If the size of the input image is different from the recommended image size, the size of the input image is changed (reduced / enlarged).

推奨頭部サイズＶoptは、検出モデル４２にて頭部として検出可能となる頭部画像のサイズである。カメラ画像においてカメラからの距離が遠い人物は、頭部の画面上サイズが小さくなるため検出ができない場合がある。そこで、検出モデル４２の学習時の頭部サイズに基づいて推奨頭部サイズＶoptを指定する。例えば、学習データの中で頭部画像の平均サイズを推奨頭部サイズとして指定する。 The recommended head size Vopt is the size of a head image that can be detected as a head by the detection model 42. In a camera image, a person far from the camera may not be detected because the size of the head on the screen is reduced. Therefore, the recommended head size Vopt is specified based on the head size at the time of learning of the detection model 42. For example, the average size of the head image in the learning data is designated as the recommended head size.

ステップＳ１０４では、検出領域設定部３５により検出領域を設定する。カメラ画像の中で、人物頭部を検出すべき領域を設定する。 In step S104, the detection area is set by the detection area setting unit 35. In the camera image, an area where a human head is to be detected is set.

図８は、検出領域設定の画面を示す図である。映像表示装置７の画面７０には、監視カメラ４からの画像が表示されている。ここには、道路７１や人物７２が表示されている。検出領域の設定では、ユーザがマウス等を操作して操作入力部３９が受付し、４つの画面上位置を連結して１つのポリゴンを生成する。このポリゴンの中の領域を検出領域とし、その境界線を画面上に表示する。図８の例ではポリゴンＡＢＣＤの領域（破線で示す）が検出領域７３であり、これに含まれる人物７２が検出対象となる。なお、上記の操作を複数回行うことで複数の検出領域を設定することができる。ユーザはリセットボタン７８を押すことで、設定した領域をリセットし再設定することができる。また、保存ボタン７９を押すことで、設定した検出領域７３は検出領域情報４５として保存される。 FIG. 8 is a diagram showing a screen for setting a detection area. An image from the monitoring camera 4 is displayed on a screen 70 of the video display device 7. Here, a road 71 and a person 72 are displayed. In setting the detection area, the user operates the mouse or the like to receive the operation input unit 39, and connects four positions on the screen to generate one polygon. An area in the polygon is set as a detection area, and its boundary is displayed on the screen. In the example of FIG. 8, the area of the polygon ABCD (shown by a broken line) is the detection area 73, and the person 72 included therein is the detection target. Note that a plurality of detection regions can be set by performing the above operation a plurality of times. By pressing the reset button 78, the user can reset the set area and reset it. By pressing the save button 79, the set detection area 73 is stored as the detection area information 45.

図９は、検出領域情報４５のデータ構造を示す図である。検出領域情報４５において、領域ＩＤは、複数のポリゴンがある場合に各ポリゴンを区別するための番号である。ポリゴン座標は、ポリゴンを構成する複数（４点）の画面上座標が記述されている。 FIG. 9 is a diagram showing the data structure of the detection area information 45. In the detection area information 45, the area ID is a number for distinguishing each polygon when there are a plurality of polygons. The polygon coordinates describe a plurality of (four points) on-screen coordinates constituting the polygon.

（Ｓ２）前処理パラメータの生成（ステップＳ２０１−Ｓ２１３）
図１０は、前処理パラメータ生成の流れを示す図である。前処理パラメータの生成は、画面上頭部サイズ算出部３６と前処理パラメータ算出部３７が中心となって行う。ここでは、検出領域の中で、画面上に表示される頭部画像のサイズを算出し、それに基づき、前処理パラメータを生成する。 (S2) Generation of pre-processing parameters (steps S201 to S213)
FIG. 10 is a diagram showing a flow of the preprocessing parameter generation. The generation of the preprocessing parameters is performed mainly by the on-screen head size calculation unit 36 and the preprocessing parameter calculation unit 37. Here, the size of the head image displayed on the screen in the detection area is calculated, and the preprocessing parameters are generated based on the calculated size.

ステップＳ２０１では、画面上頭部サイズ算出部３６は、カメラパラメータＤＢ１６からカメラパラメータ４４を読み出す。 In step S201, the on-screen head size calculation unit 36 reads the camera parameters 44 from the camera parameter DB 16.

ステップＳ２０２では、全ての画面上位置において、人物が平面の地面に立っていると仮定して、人物の座標と頭部のサイズの関係を算出する。人物頭部のサイズ算出では、例えば以下の前提条件を用いるが、この前提条件は利用環境によって値を変更する。
・人物身長は１６０ｃｍである。
・人物頭部は直径４０ｃｍの球である。
・人物の活動範囲は平坦な地面上である。 In step S202, the relationship between the coordinates of the person and the size of the head is calculated on the assumption that the person stands on the flat ground at all positions on the screen. In calculating the size of a person's head, for example, the following preconditions are used, and the values of these preconditions change depending on the usage environment.
-The person's height is 160 cm.
-The human head is a sphere with a diameter of 40 cm.
・ The person's activity range is on flat ground.

具体的な頭部サイズの算出方法は、監視カメラ４をピンホール・カメラとみなし、カメラパラメータ４４を用いて、画面上の任意の画素位置に上記前提条件の人物頭部の中心が存在すると仮定し、人物頭部の世界座標を算出する。さらに、人物頭部の世界座標から人物頭部の画面上のサイズ（幅と高さ）を算出する。算出の結果、画面上頭部サイズ一覧を生成する。 The specific method of calculating the head size assumes that the surveillance camera 4 is regarded as a pinhole camera, and that the center of the prerequisite human head is present at an arbitrary pixel position on the screen using the camera parameters 44. Then, the world coordinates of the person's head are calculated. Further, the size (width and height) of the human head on the screen is calculated from the world coordinates of the human head. As a result of the calculation, an on-screen head size list is generated.

図１１は、画面上頭部サイズ一覧のデータ構造を示す図である。画面上頭部サイズ一覧８０は、画面上座標及び頭部サイズからなる。画面上座標として、カメラからの入力画像のサイズ（図６のカメラ画像横幅Ｗcam、画像縦幅Ｈcam）内の全ての画像点を仮定する。各画像点に人物が立っている場合、表示される頭部のサイズ（横と縦）を画素数で記述する。図１１から分かるように、カメラから遠い画面上座標では、頭部サイズが小さく算出され、カメラから近い画面上座標では、頭部サイズが大きく算出されている。 FIG. 11 is a diagram showing the data structure of the head size list on the screen. The on-screen head size list 80 includes on-screen coordinates and head size. It is assumed that all image points within the size of the input image from the camera (camera image width Wcam, image height Hcam in FIG. 6) are used as the coordinates on the screen. When a person stands at each image point, the size (horizontal and vertical) of the displayed head is described by the number of pixels. As can be seen from FIG. 11, the head size is calculated to be small at coordinates on the screen far from the camera, and the head size is calculated large at coordinates on the screen near the camera.

ステップＳ２０３では、ユーザが設定した検出領域情報４５を用いて、検出領域の中における頭部サイズを抽出する。つまり、ステップＳ２０２ではカメラ画像の全領域における頭部サイズを算出したが、これから検出領域内の頭部サイズに絞り込む訳である。 In step S203, the head size in the detection area is extracted using the detection area information 45 set by the user. That is, in step S202, the head size in the entire region of the camera image has been calculated, but the head size in the detection region is now narrowed down.

次に前処理パラメータ算出部３７は、検出領域内の頭部サイズの情報を用いて、前処理部３２で利用する前処理パラメータ４６の算出を行う。
ステップＳ２０４では、検出モデル属性ＤＢ１５から該当する検出モデル属性４３を読み出す。 Next, the preprocessing parameter calculation unit 37 calculates a preprocessing parameter 46 used by the preprocessing unit 32 using the information on the head size in the detection area.
In step S204, the corresponding detection model attribute 43 is read from the detection model attribute DB 15.

ステップＳ２０５では、検出モデル属性４３の推奨画像横幅Ｗoptと推奨画像縦幅Ｈopt、カメラパラメータ４４のカメラ画像横幅Ｗcam、カメラ画像縦幅Ｈcamを用いて、画像の縮小率を算出する。
横方向の縮小率＝推奨画像横幅Ｗopt／カメラ画像横幅Ｗcam
縦方向の縮小率＝推奨画像縦幅Ｈopt／カメラ画像縦幅Ｈcam
なお、カメラ画像サイズが推奨画像サイズよりも小さい場合は、縮小率が１より大きくなり、画像を拡大することになるが、ここでは画像を縮小する（縮小率＜１）の場合について説明する。 In step S205, the image reduction ratio is calculated using the recommended image width Wopt and the recommended image height Hopt of the detection model attribute 43, and the camera image width Wcam and the camera image height Hcam of the camera parameters 44.
Horizontal reduction ratio = Recommended image width Wopt / camera image width Wcam
Vertical reduction ratio = Recommended image vertical width Hopt / camera image vertical width Hcam
If the camera image size is smaller than the recommended image size, the reduction ratio becomes larger than 1 and the image is enlarged. Here, the case where the image is reduced (reduction ratio <1) will be described.

ステップ２０６では、前記画像縮小率を用いて、検出領域の頭部サイズから縮小後画像における頭部サイズを計算し、検出領域内で最小の頭部サイズＶminを算出する。一般に、カメラ画像内では奥方向になるほど頭部サイズが小さい。例えば図８の場合、検出領域７３内ではＣＤ直線が最も奥に位置するため、点Ｃと点Ｄにおける縮小後頭部サイズを最小頭部サイズＶminとする。 In step 206, the head size in the reduced image is calculated from the head size of the detection area using the image reduction rate, and the minimum head size Vmin in the detection area is calculated. Generally, in a camera image, the head size is smaller in the depth direction. For example, in the case of FIG. 8, since the CD straight line is located at the deepest position in the detection area 73, the reduced head size at the points C and D is set to the minimum head size Vmin.

ステップＳ２０７では、ステップＳ２０６で求めた最小頭部サイズＶminを検出モデル属性４３に指定された推奨頭部サイズＶoptに拡大するための、理想拡大率Ｇoptを算出する。つまり、Ｇopt＝Ｖopt／Ｖminで求める。なお、この拡大率Ｇoptは検出領域７３のＣＤ位置での値であって、他の位置での拡大率はこれよりも小さく直線的に変化する。 In step S207, an ideal enlargement ratio Gopt for enlarging the minimum head size Vmin obtained in step S206 to the recommended head size Vopt specified in the detection model attribute 43 is calculated. That is, Gopt = Vopt / Vmin. Note that the enlargement ratio Gopt is a value at the CD position of the detection area 73, and the enlargement ratios at other positions are smaller than this and change linearly.

ステップＳ２０８では、ステップＳ２０７で求めた理想拡大率Ｇoptについて、画像の歪みが一定閾値を超えないように拡大率を修正する。これは、画像の歪みが大きくなると検出精度が低下するからである。具体的には、人物頭部の場合、拡大後の頭部の形状における上部幅と下部幅の比率が一定閾値、例えば１．２以下になるように拡大率を修正する。修正後の拡大率を有効拡大率Ｇeffとする。画像の歪みが一定閾値以下であれば、理想拡大率Ｇoptがそのまま有効拡大率Ｇeffとなる。 In step S208, with respect to the ideal enlargement ratio Gopt obtained in step S207, the enlargement ratio is corrected so that the image distortion does not exceed a certain threshold. This is because the detection accuracy decreases as the image distortion increases. Specifically, in the case of a person's head, the enlargement ratio is corrected so that the ratio of the upper width to the lower width in the shape of the enlarged head becomes a certain threshold, for example, 1.2 or less. The corrected enlargement ratio is defined as an effective enlargement ratio Geff. If the image distortion is equal to or less than a certain threshold, the ideal enlargement ratio Gopt becomes the effective enlargement ratio Geff as it is.

ステップＳ２０９では、図８の検出領域７３の場合、ポリゴンＡＢＣＤが含まれる矩形を切り出し画像とし、切り出し画像内の点Ｃと点Ｄの座標を算出する。ここで点Ｃと点Ｄは画像拡大（射影変換）するときの頂点座標となるので、参考点と呼ぶことにする。 In step S209, in the case of the detection area 73 in FIG. 8, a rectangle including the polygon ABCD is set as a cutout image, and coordinates of points C and D in the cutout image are calculated. Here, the points C and D are vertex coordinates when the image is enlarged (projective transformation), and will be referred to as reference points.

ステップＳ２１０では、参考点ＣとＤについて、前記有効拡大率Ｇeffにて拡大した後の参考点Ｃ’とＤ’（図１３に示す）の座標を算出する。なお、点Ａと点Ｂは画像拡大では固定される点とし、座標は変わらない。 In step S210, the coordinates of the reference points C 'and D' (shown in FIG. 13) after the enlargement at the effective enlargement ratio Geff are calculated for the reference points C and D. The points A and B are fixed points in the image enlargement, and the coordinates do not change.

ステップＳ２１１では、ポリゴンＡＢＣＤと拡大後のポリゴンＡＢＣ’Ｄ’の各点の座標を用いて、射影変換のパラメータを算出する。射影変換のパラメータの算出方法は下記の通りである。 In step S211, projection transformation parameters are calculated using the coordinates of each point of the polygon ABCD and the polygon ABC'D 'after the enlargement. The method of calculating the parameters of the projective transformation is as follows.

数式１において、ｘ、ｙは拡大前の座標、ｕ，ｖは拡大後の座標、ａ，ｂ，ｃ，ｄ，ｅ，ｆ，ｇ，ｈは変換係数である。ＡＢＣＤの４点の座標（ｘ，ｙ）と拡大後のＡＢＣ’Ｄ’の４点の座標（ｕ，ｖ）の値を数式１に代入すると、８個の変換式が得られる。これより８個の変換係数ａ〜ｈを求め、これらを次の数式２の行列形式で記述したものが、射影変換パラメータである。 In Equation 1, x and y are coordinates before enlargement, u and v are coordinates after enlargement, and a, b, c, d, e, f, g, and h are conversion coefficients. When the values of the coordinates (x, y) of the four points of ABCD and the coordinates (u, v) of the four points of ABC'D 'after the enlargement are substituted into Equation 1, eight conversion equations are obtained. From these, eight transform coefficients a to h are obtained, and these are described in a matrix form of the following Expression 2 to be the projective transformation parameters.

ステップＳ２１２では、上記射影変換パラメータを用いて、前処理パラメータ４６を生成する。
図１２は、前処理パラメータ４６のデータ構造を示す図である。前処理パラメータ４６の項目は、切り出し領域、画像縮小率、射影変換パラメータ、出力画像サイズからなる。切り出し領域は、検出領域７３のポリゴンＡＢＣＤが含まれる矩形領域（図１３のＡＢＣ’Ｄ’）で、その対角点の座標値で示す。画像縮小率は、ステップＳ２０５にて算出した横方向と縦方向の縮小率である。射影変換パラメータは、ステップＳ２１１にて算出した値であり、変換係数ａ〜ｈを行列で示している。出力画像サイズは画像縮小後のサイズで、検出モデル属性４３の推奨画像横幅Ｗopt及び推奨画像縦幅Ｈoptと同じ値になる。 In step S212, a pre-processing parameter 46 is generated using the projective transformation parameter.
FIG. 12 is a diagram illustrating a data structure of the pre-processing parameter 46. The items of the pre-processing parameter 46 include a cut-out area, an image reduction ratio, a projection conversion parameter, and an output image size. The cutout area is a rectangular area (ABC'D 'in FIG. 13) of the detection area 73 that includes the polygon ABCD, and is indicated by the coordinate values of its diagonal points. The image reduction ratio is the reduction ratio in the horizontal direction and the vertical direction calculated in step S205. The projective transformation parameters are the values calculated in step S211 and represent the transformation coefficients a to h in a matrix. The output image size is the size after image reduction and has the same value as the recommended image horizontal width Wopt and the recommended image vertical width Hopt of the detection model attribute 43.

ステップＳ２１３では、ユーザによる前処理パラメータの確認動作を行う。
図１３は、前処理パラメータの確認画面を示す図である。映像表示装置７の画面７０には、カメラ画像とともに前処理による効果のプレビューを表示する。すなわち、現在設定されている検出領域７３（ＡＢＣＤ、破線で示す）を表示するとともに、前処理において切り出される画像領域８３（ＡＢＣ’Ｄ’、一点鎖線で示す）、及び射影変換により検出可能となる領域８５（ドットパターンで示す）を表示する。ここで、検出領域７３が全て検出可能領域８５とならないのは、射影変換により画像歪みが閾値を超える領域が生じるからである。 In step S213, the user performs a pre-processing parameter confirmation operation.
FIG. 13 is a diagram illustrating a confirmation screen of a preprocessing parameter. On the screen 70 of the video display device 7, a preview of the effect of the pre-processing is displayed together with the camera image. That is, the currently set detection area 73 (ABCD, indicated by a broken line) is displayed, and the image area 83 (ABC'D ', indicated by a dashed line) cut out in the pre-processing, and the detection can be detected by projective transformation. An area 85 (indicated by a dot pattern) is displayed. Here, the reason why all of the detection areas 73 do not become the detectable areas 85 is that an area where image distortion exceeds a threshold is generated by the projective transformation.

ユーザはこの結果を確認し、問題ないと判断した場合は保存ボタン８９を押すことで、前処理パラメータ４６が前処理パラメータＤＢ１８に格納される。もしユーザが検出領域７３を変更したい場合には、リセットボタン８８を押すことで図５における検出領域の設定（ステップＳ１０４）に戻る。そして検出領域を変更し、再度前処理パラメータを算出することが可能である。 The user confirms this result, and if there is no problem, presses the save button 89, so that the pre-processing parameter 46 is stored in the pre-processing parameter DB 18. If the user wants to change the detection area 73, the user returns to the detection area setting (step S104) in FIG. 5 by pressing the reset button 88. Then, it is possible to change the detection area and calculate the pre-processing parameters again.

（Ｓ３）前処理（ステップＳ３０１−Ｓ３０５）
図１４は、前処理の流れを示す図である。画像入力部３１がカメラ画像を前処理部３２に送ると、前処理部３２はカメラ画像に対して前処理を実施する。 (S3) Pre-processing (Steps S301-S305)
FIG. 14 is a diagram showing the flow of the pre-processing. When the image input unit 31 sends a camera image to the preprocessing unit 32, the preprocessing unit 32 performs preprocessing on the camera image.

ステップＳ３０１では、検出領域設定ＤＢ１７から検出領域情報４５（検出領域座標）を読み出す。図１３では、ポリゴンＡＢＣＤで示す検出領域７３が相当する。 In step S301, the detection area information 45 (detection area coordinates) is read from the detection area setting DB17. In FIG. 13, a detection area 73 indicated by polygon ABCD corresponds.

ステップＳ３０２では、前処理パラメータＤＢ１８から前処理パラメータ４６を読み出す。 In step S302, the pre-processing parameters 46 are read from the pre-processing parameter DB 18.

ステップＳ３０３では、前処理パラメータ４６の切り出し領域の情報を用いて、検出すべき領域の画像を切り出す。図１３では、ポリゴンＡＢＣ’Ｄ’で示す矩形領域８３を切り出す。 In step S303, the image of the region to be detected is cut out using the information of the cutout region of the preprocessing parameter 46. In FIG. 13, a rectangular area 83 indicated by polygon ABC'D 'is cut out.

ステップＳ３０４では、前処理パラメータ４６の射影変換パラメータを用いて、切り出し画像に対して射影変換を実施する。射影変換では、前記数式２を用いて画像内の各画素位置を変換する。図１３では、点Ｃ，Ｄの画像が点Ｃ’，Ｄ’の位置に拡大変換される。射影変換を実施することで、撮影空間の奥にいる人物の頭部画像を拡大することができる。 In step S304, projective transformation is performed on the cut-out image using the projective transformation parameter of the preprocessing parameter 46. In the projective transformation, each pixel position in the image is transformed by using the above equation (2). In FIG. 13, the images of points C and D are enlarged and converted to the positions of points C 'and D'. By performing the projective transformation, it is possible to enlarge the head image of the person in the back of the shooting space.

ステップＳ３０５では、前処理パラメータ４６の画像縮小率に従い、射影変換後の画像を縮小する。これで、前処理を終了する。 In step S305, the image after the projective transformation is reduced according to the image reduction ratio of the preprocessing parameter 46. Thus, the pre-processing ends.

（Ｓ４）人数計測（ステップＳ４０１−Ｓ４１１）
図１５は、人数計測処理の流れを示す図である。前処理部３２が前処理後の画像を特徴量抽出部３３に送ると、人物検出と人数計測の処理に進む。ここでは、特徴量抽出部３３による特徴量抽出、人物頭部検出部３４による人物頭部の検出、検出結果統合部３８による人物位置統合処理を行う。 (S4) Number of people measurement (steps S401-S411)
FIG. 15 is a diagram showing the flow of the number-of-people measurement process. When the pre-processing unit 32 sends the pre-processed image to the feature amount extraction unit 33, the process proceeds to person detection and number measurement processing. Here, feature amount extraction by the feature amount extraction unit 33, detection of a human head by the human head detection unit 34, and human position integration processing by the detection result integration unit 38 are performed.

ステップＳ４０１では、特徴量抽出部３３は画像から特徴量の抽出を行う。ＣＮＮネットワークの場合、畳み込み演算またはプーリング層などの深層学習の手法により画像特徴量の抽出を行う。 In step S401, the feature amount extraction unit 33 extracts a feature amount from an image. In the case of a CNN network, an image feature amount is extracted by a deep learning method such as a convolution operation or a pooling layer.

ステップＳ４０２では、抽出された特徴量を用いて、人物頭部検出部３４は人物頭部の有無、及び人物頭部の中心座標、頭部サイズの計算を行う。人物頭部検出の結果、人物頭部の座標情報を含むリストが生成される。 In step S402, the human head detection unit 34 calculates the presence / absence of a human head, the center coordinates of the human head, and the head size using the extracted feature amounts. As a result of the human head detection, a list including the coordinate information of the human head is generated.

ただし、ここで生成される座標情報は前処理後の座標情報であり、画像入力部３１で入力した画像の座標と異なる。そこで検出結果統合部３８は、ステップＳ４０３において、前処理前の入力画像の座標に変換する人物位置統合処理を行う。 However, the coordinate information generated here is coordinate information after preprocessing, and is different from the coordinates of the image input by the image input unit 31. Therefore, in step S403, the detection result integration unit 38 performs a person position integration process of converting the coordinates of the input image before the preprocessing.

ステップＳ４０３に示す人物位置統合処理は、ステップＳ４０４〜Ｓ４０９の工程からなる。
ステップＳ４０４では、前処理パラメータＤＢ１８から前処理パラメータ４６を読み出す。 The person position integration processing shown in step S403 includes steps S404 to S409.
In step S404, the pre-processing parameters 46 are read from the pre-processing parameter DB 18.

ステップＳ４０５では、検出領域設定ＤＢ１７から検出領域情報４５を読み出す。
ステップＳ４０６では、カメラパラメータＤＢ１６からカメラパラメータ４４を読み出す。 In step S405, the detection area information 45 is read from the detection area setting DB17.
In step S406, the camera parameters 44 are read from the camera parameter DB 16.

ステップＳ４０７では、前処理パラメータ４６の画像縮小率を用いて、Ｓ４０２で生成した人物頭部座標情報から縮小前の座標に変換する。
ステップＳ４０８では、人物頭部座標情報を射影変換前の座標に変換するために、逆射影変換を行う。逆射影変換時のパラメータには、前処理パラメータ４６の射影変換パラメータ（数式２）の転置行列を用いればよい。 In step S407, using the image reduction ratio of the pre-processing parameter 46, the human head coordinate information generated in S402 is converted into coordinates before reduction.
In step S408, inverse projective transformation is performed to transform the human head coordinate information into coordinates before projective transformation. What is necessary is just to use the transpose of the projective transformation parameter (Formula 2) of the pre-processing parameter 46 as the parameter at the time of inverse projective transformation.

ステップＳ４０９では、前処理パラメータ４６の切り出し領域の情報を用いて、切り出し前の画像における人物頭部の座標に変換する。
以上により、人物位置統合処理を完了し、前処理前の座標で記述された人物位置の情報が出力制御部４０へ送られる。 In step S409, using the information of the cut-out area of the preprocessing parameter 46, the coordinates of the person's head in the image before the cut-out are converted.
As described above, the person position integration processing is completed, and information on the person position described by the coordinates before the pre-processing is sent to the output control unit 40.

ステップＳ４１０では、出力制御部４０は人物検出情報を映像表示装置７へ出力し、人物検出結果が表示される。
図１６は、人物検出結果の表示例を示す図である。映像表示装置７の画面７０には、監視カメラ４からの画像とともに、設定した検出領域７３が表示されている。また現在の人物検出結果をもとに、検出された人物頭部位置に矩形の検出マーク９０を重ねて表示し、検出人数欄９１には最新の検出された人数を表示する。 In step S410, the output control unit 40 outputs the person detection information to the video display device 7, and the result of the person detection is displayed.
FIG. 16 is a diagram illustrating a display example of a person detection result. The screen 70 of the video display device 7 displays the set detection area 73 together with the image from the monitoring camera 4. Also, based on the current person detection result, a rectangular detection mark 90 is displayed over the detected person's head position, and the latest detected number of people is displayed in the detected number of people column 91.

ステップＳ４１１では、出力制御部４０は人物検出情報を人数履歴蓄積装置８へ出力し、人物検出結果が蓄積される。
図１７は、人物検出結果のデータ構造を示す図である。人物検出結果９２には、検出領域７３における検出時刻ごとの人物頭部の数、すなわち人数を記述する。また、人数の他に検出した人物の座標や頭部座標を記述してもよい。 In step S411, the output control unit 40 outputs the person detection information to the number history storage device 8, and the person detection results are stored.
FIG. 17 is a diagram illustrating a data structure of a person detection result. The person detection result 92 describes the number of person heads at each detection time in the detection area 73, that is, the number of people. Further, in addition to the number of people, the coordinates of the detected person and the coordinates of the head may be described.

以上、本実施例の人物頭部検出装置６の動作を説明したが、所定の領域の人物を検出する場合、カメラパラメータを用いて撮影画像の射影変換を含む適切な前処理を行うことで、検出処理のコスト増加を抑えつつ、人物の検出精度、ひいては人数計測の精度を向上させる効果が得られる。すなわち本実施例では、検出領域の中でカメラから見て奥にいる人物の画像を手前にある画像よりも拡大率が大きくなるように射影変換を行うことで、検出時の処理負荷を抑えつつ奥にいる人物を含めて検出精度を向上させることができる。また、前処理において入力画像のサイズを推奨画像サイズに縮小することで、畳み込みニューラルネットワークによる人物検出の処理負荷を低減させることができる。 As described above, the operation of the human head detection device 6 of the present embodiment has been described. When detecting a person in a predetermined area, by performing appropriate preprocessing including projection transformation of a captured image using camera parameters, The effect of improving the detection accuracy of the person, and hence the accuracy of the measurement of the number of persons, while suppressing an increase in the cost of the detection process can be obtained. That is, in the present embodiment, the processing load at the time of detection is suppressed by performing the projective transformation so that the image of the person in the back as viewed from the camera in the detection area is larger than the image in the foreground. It is possible to improve detection accuracy including a person in the back. Further, by reducing the size of the input image to the recommended image size in the pre-processing, the processing load of human detection by the convolutional neural network can be reduced.

上記実施例においては、検出する物体として人物の頭部を例に説明したが、これに限らず学習済みの検出モデルを用いることで任意の物体を検出できることは言うまでもない。また、監視カメラ４の台数は複数台であってもよく、さらに、人物頭部検出装置６と人数履歴蓄積装置８を複数台備え、装置間で連携しながら、人数計測を行うシステムであってもよい。 In the above-described embodiment, the head of a person has been described as an example of an object to be detected. However, it is needless to say that an arbitrary object can be detected by using a learned detection model. In addition, the number of monitoring cameras 4 may be plural, and furthermore, a plurality of human head detecting devices 6 and the number of people history accumulating devices 8 are provided, and the number of persons is measured while cooperating between the devices. Is also good.

本発明は上記実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、実施例の構成の一部について、他の構成の追加・削除・置換をすることも可能である。 The present invention is not limited to the above embodiment, and includes various modifications. For example, the above-described embodiment has been described in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to one having all the described configurations. Further, for a part of the configuration of the embodiment, it is also possible to add / delete / replace another configuration.

また、上記の各構成、機能、処理部等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記した実施例の技術的要素は、単独で適用されてもよいし、プログラム部品とハードウェア部品のような複数の部分に分けられて適用されるようにしてもよい。 In addition, each of the above-described configurations, functions, processing units, and the like may be partially or entirely realized by hardware, for example, by designing an integrated circuit. Further, the technical elements of the above-described embodiments may be applied independently, or may be applied by being divided into a plurality of parts such as a program component and a hardware component.

１：人数計測システム、３，７２：人物、４：監視カメラ、６：人物頭部検出装置、７：映像表示装置、８：人数履歴蓄積装置、１１：ＣＰＵ、１２：ストレージ部、３２：前処理部、３３：特徴量抽出部、３４：人物頭部検出部、３５：検出領域設定部、３６：画面上頭部サイズ算出部、３７：前処理パラメータ算出部、３８：検出結果統合部、４０：出力制御部、４２：検出モデル、４３：検出モデル属性、４４：カメラパラメータ、４５：検出領域情報、４６：前処理パラメータ、７３：検出領域、８０：画面上頭部サイズ一覧、８３：切り出し画像領域、８５：検出可能領域、９０：検出マーク、９１：検出人数欄、９２：人物検出結果。 1: person counting system, 3:72: person, 4: surveillance camera, 6: person head detecting device, 7: video display device, 8: person history storage device, 11: CPU, 12: storage unit, 32: before Processing unit, 33: feature amount extraction unit, 34: human head detection unit, 35: detection area setting unit, 36: on-screen head size calculation unit, 37: pre-processing parameter calculation unit, 38: detection result integration unit, 40: output control unit, 42: detection model, 43: detection model attribute, 44: camera parameter, 45: detection area information, 46: pre-processing parameter, 73: detection area, 80: screen head size list, 83: Cutout image area, 85: detectable area, 90: detection mark, 91: number of detected persons column, 92: result of person detection.

Claims

In an object detection device that detects a predetermined object from an image captured by a camera,
A detection region setting unit that sets a detection region for detecting the object with respect to the input image,
A preprocessing unit that performs preprocessing including projective transformation on the input image,
A preprocessing parameter calculation unit that generates a parameter used in the preprocessing,
An object detection unit that detects the object from the image that has been subjected to the preprocessing using a predetermined detection model,
The pre-processing parameter calculation unit, the size of the object on the image calculated from camera parameters at the time of shooting of the camera, assuming the object has a fixed size, and the object detectable using the detection model An object detection device that generates the parameters of the preprocessing including the projective transformation based on a recommended size on the image of (1).

The object detection device according to claim 1,
The pre-processing parameter calculation unit, the projective transformation so that the minimum value of the size of the object calculated from the camera parameters in the detection area is equal to or more than the recommended size of the object that can be detected using the detection model. An object detection device, which calculates the following parameters:

The object detection device according to claim 2,
The pre-processing parameter calculation unit may calculate the projective transformation parameters such that an enlargement ratio of an image located in the back as viewed from the camera is larger than an image located in the foreground in the detection area. Object detection device.

The object detection device according to claim 3,
The object detection method, wherein the preprocessing parameter calculation unit corrects the parameters of the projective transformation so that distortion of the shape of the object after the enlargement does not exceed a threshold when the image is enlarged by the projective transformation. apparatus.

The object detection device according to claim 1,
The pre-processing further includes an image cutout from the input image based on the detection area, and a process of reducing the input image to a recommended image size that allows the detection model to be accepted. Object detection device.

The object detection device according to claim 5,
The detection result of the object by the object detection unit, a detection result integration unit that converts the position of the object detected based on the parameters of the preprocessing into coordinates of the input image and integrates them,
An output control unit that outputs the integrated object detection information to an external device,
An object detection device comprising:

In an object detection method for detecting a predetermined object from an image captured by a camera,
Setting a detection area for detecting the object in the input image,
Performing preprocessing including projective transformation on the input image;
Calculating a pre-processing parameter to generate a parameter used in the pre-processing,
Detecting the object from the pre-processed image using a predetermined detection model,
In the pre-processing parameter calculation step, the size of the object on the image of the object is calculated from camera parameters at the time of shooting by the camera, assuming that the object has a fixed size, and the object detectable using the detection model. Generating a parameter for the preprocessing including the projective transformation in comparison with a recommended size on the image.

The object detection method according to claim 7, wherein
The pre-processing step further includes an image cutout from the input image based on the detection region, and a process of reducing the input image to a recommended image size that allows the detection model to be accepted,
In the step of detecting the object, an object is detected by using a convolutional neural network for the image on which the preprocessing has been performed.