JP5971712B2

JP5971712B2 - Monitoring device and method

Info

Publication number: JP5971712B2
Application number: JP2012206773A
Authority: JP
Inventors: 廣大齊藤; 助川　寛; 寛助川
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2012-09-20
Filing date: 2012-09-20
Publication date: 2016-08-17
Anticipated expiration: 2032-09-20
Also published as: JP2014064083A

Description

本発明の実施形態は、監視装置及び方法に関する。 Embodiments described herein relate generally to a monitoring apparatus and method.

従来から、犯罪者などの特定の人物を見つけ出すため、カメラで撮影された画像データを表示装置で目視確認する監視装置がある。この監視装置においては、人物の顔が目視確認しやすい画像データを取得したいという要望があり、カメラで撮像された画像データから顔画像の検出を行い、検出された顔画像を切り出して出力するものがある。 Conventionally, in order to find a specific person such as a criminal, there is a monitoring device that visually confirms image data captured by a camera on a display device. In this monitoring device, there is a demand to acquire image data that makes it easy to visually confirm a person's face, and a face image is detected from image data captured by a camera, and the detected face image is cut out and output. There is.

特開２０１０−２２６０１号公報JP 2010-22601 A

しかしながら、上述した従来技術では、撮影条件が広角な場合、カメラで撮影した画像中の顔サイズが小さくなってしまうことがあり、必ずしも人物の顔が目視確認しやすい画像を取得できるものではなかった。また、カメラの前を通行する複数の人物について、各々の人物の顔が目視確認しやすい画像データを取得するように、カメラの水平方向、上下方向、回転方向、ズーム（画角）等の撮影条件を調整することは容易なことではなかった。 However, in the above-described prior art, when the shooting conditions are wide-angle, the face size in the image shot by the camera may be reduced, and it is not always possible to acquire an image in which a human face can be easily visually confirmed. . In addition, for a plurality of persons passing in front of the camera, photographing of the camera in the horizontal direction, vertical direction, rotation direction, zoom (angle of view), etc. so as to obtain image data that makes it easy to visually confirm each person's face. It was not easy to adjust the conditions.

上述した課題を解決するために、実施形態の監視装置は、カメラが撮像した画像データを入力する画像入力手段と、前記入力された画像データから人物の顔が表された顔領域を検出する顔検出手段と、前記顔検出手段の検出結果を記憶する検出結果記憶手段と、前記記憶された検出結果をもとに、式（４）に従って、前記カメラの上下方向の角度φ、水平方向の角度θ、回転方向における撮影方向の角度ψ、及び画角ｒをパラメータとして、前記記憶された顔領域に基づいた前記人物の顔の確認しやすさを評価する評価関数による評価値を最大とする少なくとも一つの撮影条件を算出する算出手段と、前記算出された撮影条件を出力する出力手段と、を備える。

ｆ（θ，φ，ψ，ｒ）は、前記評価値である。θ１〜θＮは、Ｎ人分の前記人物の動線を直線で近似した場合の当該人物の動線と前記カメラの撮影方向とがなす角度である。ψ１〜ψＮは、Ｎ人分の前記人物の動線と前記画像データの水平方向の直線とがなす角度である。Ａ、Ｂ、Ｃ、Ｄ（＞＝０）は、θ、φ、ψ、ｒのいずれを重視するかを定めるパラメータである。 In order to solve the above-described problem, the monitoring apparatus according to the embodiment includes an image input unit that inputs image data captured by a camera, and a face that detects a face area representing a human face from the input image data. a detection means, a detection result storing means for storing the detection result of the face detection unit, on the basis of the stored detection result, according to equation (4), vertical angle phi, the horizontal angle of the camera Using at least θ , the angle ψ of the photographing direction in the rotation direction, and the angle of view r as parameters, at least the evaluation value by the evaluation function that evaluates the ease of confirming the face of the person based on the stored face area is maximized. A calculation unit that calculates one shooting condition; and an output unit that outputs the calculated shooting condition.

f (θ, φ, ψ, r) is the evaluation value. θ1 to θN are angles formed by the flow lines of the person and the shooting direction of the camera when the flow lines of the N persons are approximated by straight lines. ψ1 to ψN are angles formed by the flow lines of the N persons and the straight line in the horizontal direction of the image data. A, B, C, and D (> = 0) are parameters that determine which of θ, φ, ψ, and r is important.

また、実施形態の監視装置は、カメラが撮像した画像データを入力する画像入力手段と、前記入力された画像データから人物の顔が表された顔領域の検出を、予め設定された反復す、実施するまで継続する顔検出手段と、前記人物の顔を確認しやすくする前記カメラの撮影条件を記憶する撮影条件記憶手段と、前記顔検出手段の前記反復数分の検出結果に基づいた、前記人物の顔に対する前記カメラの上下方向、水平方向、回転方向における撮影方向、及び画角の少なくとも一つの撮影条件が、前記記憶された撮影条件に整合するか否かを判定する判定手段と、前記判定手段の判定結果を出力する出力手段と、を備える。 The monitoring apparatus according to the embodiment repeats preset detection of an image input unit that inputs image data captured by a camera and a face area in which a human face is represented from the input image data . Based on detection results for the number of iterations of the face detection means, face detection means for continuing until execution, shooting condition storage means for storing shooting conditions of the camera for facilitating confirmation of the face of the person, Determination means for determining whether at least one shooting condition of a vertical direction, a horizontal direction, a shooting direction in a rotation direction, and an angle of view of a person's face matches the stored shooting condition; Output means for outputting a determination result of the determination means.

また、実施形態の方法は、監視装置で実行される方法であって、画像入力手段が、カメラが撮像した画像データを入力するステップと、顔検出手段が、前記入力された画像データから人物の顔が表された顔領域を検出するステップと、検出結果記憶手段が、前記顔領域の検出結果を記憶するステップと、算出手段が、前記記憶された検出結果をもとに、式（４）に従って、前記カメラの上下方向の角度φ、水平方向の角度θ、回転方向における撮影方向の角度ψ、及び画角ｒをパラメータとして、前記記憶された顔領域に基づいた前記人物の顔の確認しやすさを評価する評価関数による評価値を最大とする少なくとも一つの撮影条件を算出するステップと、出力手段が、前記算出された撮影条件を出力するステップと、を含む。

ｆ（θ，φ，ψ，ｒ）は、前記評価値である。θ１〜θＮは、Ｎ人分の前記人物の動線を直線で近似した場合の当該人物の動線と前記カメラの撮影方向とがなす角度である。ψ１〜ψＮは、Ｎ人分の前記人物の動線と前記画像データの水平方向の直線とがなす角度である。Ａ、Ｂ、Ｃ、Ｄ（＞＝０）は、θ、φ、ψ、ｒのいずれを重視するかを定めるパラメータである。 The method of the embodiment is a method executed by the monitoring apparatus, in which the image input unit inputs the image data captured by the camera, and the face detection unit detects the person's image from the input image data. detecting a face is represented face area detection result storage means, and storing the detection result of the face area, calculation means, based on the detection result of said stored formula (4) accordingly vertical angle φ of the camera, the angle of the horizontal direction theta, the angle of the photographing direction in the direction of rotation [psi, and the angle r as a parameter, confirmation of the face of the person to which the basis of the stored face area Calculating at least one imaging condition that maximizes an evaluation value based on an evaluation function for evaluating ease; and an output unit outputting the calculated imaging condition.

また、実施形態の方法は、監視装置で実行される方法であって、前記監視装置は、人物の顔を確認しやすくするカメラの撮影条件を記憶する撮影条件記憶手段を備え、画像入力手段が、カメラが撮像した画像データを入力するステップと、顔検出手段が、前記入力された画像データから人物の顔が表された顔領域の検出を、予め設定された反復数、実施するまで継続するステップと、判定手段が、前記反復数分の前記顔領域の検出結果に基づいた、前記人物の顔に対する前記カメラの上下方向、水平方向、回転方向における撮影方向、及び画角の少なくとも一つの撮影条件が、前記記憶された撮影条件に整合するか否かを判定するステップと、出力手段が、判定された判定結果を出力するステップと、を含む。 The method according to the embodiment is a method executed by a monitoring device, and the monitoring device includes a shooting condition storage unit that stores a shooting condition of a camera that makes it easy to check a person's face, and the image input unit includes: The step of inputting image data captured by the camera and the face detection means continue until detection of a face area in which a human face is represented from the input image data is performed for a preset number of iterations. And at least one of the shooting direction in the vertical direction, the horizontal direction, the rotation direction, and the angle of view of the person's face based on the detection result of the face area for the number of repetitions. A step of determining whether or not a condition matches the stored photographing condition; and a step of outputting the determined determination result by the output means.

図１は、第１の実施形態にかかる監視装置の構成を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration of a monitoring device according to the first embodiment. 図２は、カメラの撮影方向を例示する概念図である。FIG. 2 is a conceptual diagram illustrating the shooting direction of the camera. 図３は、第１の実施形態にかかる監視装置の動作の一例を示すフローチャートである。FIG. 3 is a flowchart illustrating an example of the operation of the monitoring apparatus according to the first embodiment. 図４は、動線と撮影方向の関係を例示する概念図である。FIG. 4 is a conceptual diagram illustrating the relationship between the flow line and the shooting direction. 図５は、動線と撮影方向の関係を例示する概念図である。FIG. 5 is a conceptual diagram illustrating the relationship between the flow line and the shooting direction. 図６は、動線とカメラの撮影画像の回転を例示する概念図である。FIG. 6 is a conceptual diagram illustrating rotation of the flow line and the captured image of the camera. 図７は、撮影画像のズームを例示する概念図である。FIG. 7 is a conceptual diagram illustrating zooming of a captured image. 図８は、変形例にかかる監視装置の構成を示すブロック図である。FIG. 8 is a block diagram illustrating a configuration of a monitoring device according to a modification. 図９は、第２の実施形態にかかる監視装置の構成を示すブロック図である。FIG. 9 is a block diagram illustrating a configuration of a monitoring device according to the second embodiment. 図１０は、第２の実施形態にかかる監視装置の動作の一例を示すフローチャートである。FIG. 10 is a flowchart illustrating an example of the operation of the monitoring apparatus according to the second embodiment. 図１１は、第１、第２の実施形態にかかる監視装置のハードウエア構成を示したブロック図である。FIG. 11 is a block diagram illustrating a hardware configuration of the monitoring device according to the first and second embodiments.

以下、添付図面を参照して実施形態の監視装置及び方法を詳細に説明する。実施形態の監視装置及び方法は、街頭、建物、公共エリアなどに設置されている防犯カメラ（以下、カメラ）の映像から人物の顔を目視確認する用途を想定しており、人物の顔を目視確認しやすくするカメラの撮影条件（水平方向、上下方向、回転、ズーム）を容易に調整可能とするものである。また、本実施形態では人物領域として顔の領域を検出して顔の特徴情報を利用することで課題を実現する手順を説明するが、顔以外にも人物領域全身を検出する技術（Watanabeら,”Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection, In Proceedings of the 3rd Pacific-Rim Symposium on Image and Video Technology” (PSIVT2009), pp. 37-47.）を利用し、その大きさを使うことでも実現が可能であり、人物の領域を検出する技術、またその人物領域内の特徴情報を計測する技術であれば本実施形態に記載した技術のみに限定される内容ではないことは明らかである。 Hereinafter, a monitoring apparatus and method according to embodiments will be described in detail with reference to the accompanying drawings. The monitoring apparatus and method according to the embodiment are assumed to be used for visually confirming a human face from an image of a security camera (hereinafter referred to as a camera) installed in a street, a building, or a public area. This makes it possible to easily adjust the shooting conditions (horizontal direction, vertical direction, rotation, zoom) of the camera that are easy to check. In this embodiment, a procedure for realizing a problem by detecting a face area as a person area and using facial feature information will be described. However, a technique for detecting a person area whole body other than a face (Watanabe et al., "Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection, In Proceedings of the 3rd Pacific-Rim Symposium on Image and Video Technology" (PSIVT2009), pp. 37-47.) Obviously, the technology is not limited to the technology described in the present embodiment as long as it is a technology for detecting a person region and a technology for measuring characteristic information in the person region.

（第１の実施形態）
図１は、第１の実施形態にかかる監視装置１００の構成を例示するブロック図である。図１に示すように、監視装置１００は、入力部１０１と、検出部１０２と、検出結果管理部１０３と、撮影条件算出部１０４と、出力部１０５とを備える。カメラ１５０は、所定の領域に対して撮影を行う。例えば、カメラ１５０は、通行路の入退場対象エリアに対して撮影を行う監視カメラ等であり、撮影結果である動画像データを生成する。そして、入力部１０１は、カメラ１５０からの動画像データを入力処理する。入力部１０１は入力された動画像データを検出部１０２へ出力する。 (First embodiment)
FIG. 1 is a block diagram illustrating the configuration of a monitoring apparatus 100 according to the first embodiment. As illustrated in FIG. 1, the monitoring apparatus 100 includes an input unit 101, a detection unit 102, a detection result management unit 103, an imaging condition calculation unit 104, and an output unit 105. The camera 150 takes an image of a predetermined area. For example, the camera 150 is a monitoring camera or the like that captures an entrance / exit target area on a traffic path, and generates moving image data that is a capturing result. Then, the input unit 101 performs input processing on moving image data from the camera 150. The input unit 101 outputs the input moving image data to the detection unit 102.

カメラ１５０は、少なくとも１箇所、又は複数の地点に設置可能とする。また、カメラ１５０は、所定の領域に存在する人物の顔画像を入力するものであり、例えばＩＴＶ（Industrial Television）カメラとする。カメラ１５０は、カメラのレンズを通して得られた光学的な情報をＡ／Ｄ変換器によりデジタル化して所定のフレームレートのフレーム画像データを生成し、監視装置１００に対して出力する。また、カメラ１５０は、ＰＴＺ（パン・チルト・ズーム）カメラであり、水平方向、上下方向、回転、ズーム等の撮影条件が調整可能である。 The camera 150 can be installed in at least one place or a plurality of points. The camera 150 is for inputting a face image of a person existing in a predetermined area, and is an ITV (Industrial Television) camera, for example. The camera 150 digitizes optical information obtained through the lens of the camera by an A / D converter, generates frame image data of a predetermined frame rate, and outputs the frame image data to the monitoring apparatus 100. The camera 150 is a PTZ (pan / tilt / zoom) camera and can adjust photographing conditions such as a horizontal direction, a vertical direction, rotation, and zoom.

図２は、カメラ１５０の撮影方向２４を例示する概念図である。図２に示すように、カメラ１５０の撮影方向２４は、カメラ１５０を設置するカメラ台座（図示しない）を調整することで、水平方向２１（パン、ヨー方向）、上下方向２２（チルト、ピッチ方向）、撮影方向２４（ロール方向）において調整可能となっている。また、カメラ１５０は、レンズ位置を調整することで焦点調整が可能となっている。水平方向、上下方向、回転、ズーム等の撮影条件の調整については、ユーザが手動で行うものであってもよいし、アクチュエータなどで自動で行うものであってもよいものとする。 FIG. 2 is a conceptual diagram illustrating the shooting direction 24 of the camera 150. As shown in FIG. 2, the shooting direction 24 of the camera 150 is adjusted by adjusting a camera pedestal (not shown) on which the camera 150 is installed, so that the horizontal direction 21 (pan and yaw direction) and the vertical direction 22 (tilt and pitch direction). ), And can be adjusted in the shooting direction 24 (roll direction). The camera 150 can adjust the focus by adjusting the lens position. Adjustment of shooting conditions such as horizontal direction, vertical direction, rotation, and zoom may be performed manually by a user or automatically by an actuator or the like.

図１に戻り、検出部１０２は、入力された画像データ（入力画像）から人物の顔が表された顔領域を検出する。具体的には、検出部１０２は、入力画像内において、画像上の輝度情報を利用して顔の領域を示す座標を求める。ここでは文献（三田雄志ほか：「顔検出に適した共起に基づくＪｏｉｎｔＨａａｒ−ｌｉｋｅ特徴」電子情報通信学会論文誌（Ｄ），ｖｏｌ．Ｊ８９−Ｄ，Ｎｏ８，ｐｐ１７９１−１８０１（２００６））の方法を利用することによって実現となるため、本手法を利用することを前提とする。顔の向きや大きさにより検出された結果を示す情報は任意の形状でかまわないが、本実施例では簡単にするために、顔領域を矩形情報で示すこととし、その角の座標を検出結果として利用することとする。その他に予め用意されたテンプレートを画像内で移動させながら相関値を求めることにより、最も高い相関値を与える位置を顔領域とする方法、固有空間法や部分空間法を利用した顔抽出法などでも実現は可能である。 Returning to FIG. 1, the detection unit 102 detects a face area in which a human face is represented from input image data (input image). Specifically, the detection unit 102 obtains coordinates indicating the face region using luminance information on the image in the input image. Here, the literature (Yoshi Mita et al .: “Joint Haar-like feature based on co-occurrence suitable for face detection”, IEICE Transactions (D), vol. J89-D, No8, pp 1791-1801 (2006)). Since this is realized by using the method, it is assumed that this method is used. The information indicating the detection result depending on the orientation and size of the face may be an arbitrary shape, but in this embodiment, for the sake of simplicity, the face area is indicated by rectangular information, and the coordinates of the corner are detected. It will be used as. In addition, by obtaining a correlation value while moving a template prepared in advance in the image, a method that uses the position that gives the highest correlation value as a face region, a face extraction method that uses an eigenspace method or a subspace method, etc. Realization is possible.

また、カメラ１５０で撮影された映像においては、カメラ１５０の前の人物の移動（移動の軌跡）によって、検出された同一人物の顔が複数のフレームにわたって連続して映っていることが想定される。したがって、検出部１０２は、検出された同一の顔を同一人物として対応付けできるように人物の顔の追跡処理を行う必要がある。この実現手段としてはオプティカルフローを使って検出した顔が次のフレームでどの位置にいるか対応付けする手法や、特許公報（特開２０１１−１７０７１１号公報）を利用することで実現可能であり、顔部位を特徴点として検出する場合（後述する）は同一人物として対応付けられた複数フレームの顔領域の画像の中から検索をするのに適切な少なくとも１枚の画像を選択する方法や、最大で検出されたフレーム数までの任意の枚数の画像を利用することが可能となる。 In addition, in the video shot by the camera 150, it is assumed that the detected face of the same person is continuously shown over a plurality of frames due to the movement of the person in front of the camera 150 (movement locus). . Therefore, the detection unit 102 needs to perform tracking processing of a person's face so that the detected same face can be associated as the same person. This means can be realized by using a method of associating the position detected by the optical flow with the position of the face in the next frame, or by using a patent publication (Japanese Patent Laid-Open No. 2011-170711). When detecting a part as a feature point (to be described later), a method of selecting at least one image suitable for searching from images of face regions of a plurality of frames associated with the same person, Any number of images up to the number of detected frames can be used.

さらに、検出された顔領域の部分の中から、目、鼻などの顔部位の位置を顔の特徴点として検出する。具体的には文献（福井和広、山口修：「形状抽出とパタン照合の組合せによる顔特徴点抽出」，電子情報通信学会論文誌（Ｄ），ｖｏｌ．Ｊ８０−Ｄ−ＩＩ，Ｎｏ．８，ｐｐ２１７０−２１７７（１９９７））などの方法で実現可能である。また上記目・鼻の検出の他に口領域の検出については、文献（湯浅真由美、中島朗子：「高精度顔特徴点検出に基づくデジタルメイクシステム」第１０回画像センシングシンポジウム予稿集，ｐｐ２１９−２２４（２００４））の技術を利用することで容易に実現が可能である。いずれの場合でも二次元配列状の画像として取り扱える情報を獲得し、その中から顔特徴の領域を検出することが可能である。また、これらの処理は１枚の画像の中から１つの顔特徴だけを抽出するには全画像に対してテンプレートとの相関値を求め最大となる位置とサイズを出力すればよいし、複数の顔特徴を抽出するには画像全体に対する相関値の局所最大値を求め、一枚の画像内での重なりを考慮して顔の候補位置を絞り込み、最後は連続して入力された過去の画像との関係性（時間的な推移）も考慮して最終的に複数の顔特徴を同時に見つけることも可能となる。 Further, the position of a face part such as an eye or nose is detected as a facial feature point from the detected face area. Specifically, literature (Kazuhiro Fukui, Osamu Yamaguchi: “Face feature point extraction by combination of shape extraction and pattern matching”, IEICE Transactions (D), vol. J80-D-II, No. 8, pp2170 -2177 (1997)). In addition to the above-mentioned eye / nose detection, the literature (Mayumi Yuasa, Akiko Nakajima: “Digital Make System Based on High-Precision Facial Feature Point Detection” 10th Image Sensing Symposium Proceedings, pp 219-224 This can be easily realized by using the technique of (2004)). In any case, it is possible to acquire information that can be handled as a two-dimensional array of images and to detect a facial feature region from the information. Also, in these processes, in order to extract only one facial feature from one image, a correlation value with the template is obtained for all images, and the maximum position and size may be output. To extract facial features, obtain the local maximum correlation value for the entire image, narrow down the candidate face positions in consideration of the overlap in one image, and finally the past images that were input continuously It is also possible to finally find a plurality of facial features at the same time in consideration of the relationship (time transition).

顔の向きの推定については特開２００３−１４１５５１号公報（「顔向き計算方法及びその装置」東芝・山田貢己;福井和広;牧淳人;中島朗子) に示されているように、顔の回転行列、顔の向き別に学習させた複数のクラス（部分空間）を利用して顔向きを推定することが可能である。 Regarding the estimation of the face orientation, as disclosed in Japanese Patent Application Laid-Open No. 2003-141551 (“Face Orientation Calculation Method and Device” Toshiba, K. Yamada; Kazuhiro Fukui; Hayato Maki; Ayako Nakajima) It is possible to estimate the face orientation using a plurality of classes (subspaces) learned according to the rotation matrix and face orientation.

検出結果管理部１０３は、検出部１０２が検出した顔領域にかかる情報（検出結果）を記憶する。具体的には、検出結果管理部１０３は、検出部１０２が検出した人物の顔ごとに、入力画像中の顔領域の座標、複数のフレーム画像にわたって検出された顔領域の軌跡（動線）、顔領域から検出した顔部位（目、鼻など）の位置情報、推定した顔の向きを記憶するデータベースである。 The detection result management unit 103 stores information (detection result) related to the face area detected by the detection unit 102. Specifically, the detection result management unit 103 includes, for each human face detected by the detection unit 102, the coordinates of the face region in the input image, the trajectory (flow line) of the face region detected over a plurality of frame images, It is a database that stores position information of face parts (eyes, nose, etc.) detected from a face area and estimated face orientation.

撮影条件算出部１０４は、検出結果管理部１０３に記憶された検出結果をもとに、人物の顔を確認しやすくするカメラ１５０の上下方向、水平方向、回転方向における撮影方向、及び画角（ズーム）の少なくとも一つの撮影条件を算出する。例えば、検出結果管理部１０３において検出部１０２が検出した人物について、１〜Ｎ人分の検出結果が得られているものとすると、この１〜Ｎ人分について、人物の顔を確認しやすくする撮影条件を算出する。具体的には、撮影条件算出部１０４は、１〜Ｎ人分について検出結果管理部１０３に記憶された検出結果に基づいた、人物の顔の確認しやすさを評価する評価関数による評価値を最大とする撮影条件を算出する。 The imaging condition calculation unit 104, based on the detection result stored in the detection result management unit 103, makes it easy to confirm the human face, the imaging direction in the vertical direction, horizontal direction, and rotational direction of the camera 150, and the angle of view ( At least one shooting condition of (zoom) is calculated. For example, assuming that detection results for 1 to N persons are obtained for the persons detected by the detection unit 102 in the detection result management unit 103, it is easy to confirm the faces of the persons for the 1 to N persons. The shooting conditions are calculated. Specifically, the imaging condition calculation unit 104 calculates an evaluation value based on an evaluation function for evaluating ease of confirming a person's face based on the detection results stored in the detection result management unit 103 for 1 to N persons. The maximum shooting condition is calculated.

図３は、第１の実施形態にかかる監視装置１００の動作の一例を示すフローチャートであり、より具体的には撮影条件算出部１０４による撮影条件の算出を例示するフローチャートである。図３に示すように、撮影条件算出部１０４は、処理が開始されると、検出結果管理部１０３に記憶された検出結果（顔の検出・追跡結果）を取得する（Ｓ１）。次いで、撮影条件算出部１０４は、検出結果の取得個数（取得した人物の人数）が予め設定された所定数（Ｎ）以上であるか否かを判定する（Ｓ２）。所定数以上でない場合（Ｓ２：ＮＯ）はＳ１へ処理を戻して待機する。 FIG. 3 is a flowchart illustrating an example of the operation of the monitoring apparatus 100 according to the first embodiment. More specifically, FIG. 3 is a flowchart illustrating calculation of shooting conditions by the shooting condition calculation unit 104. As shown in FIG. 3, when the processing is started, the imaging condition calculation unit 104 acquires the detection result (face detection / tracking result) stored in the detection result management unit 103 (S1). Next, the imaging condition calculation unit 104 determines whether or not the number of detection results acquired (number of acquired persons) is equal to or greater than a predetermined number (N) set in advance (S2). If it is not greater than the predetermined number (S2: NO), the process returns to S1 and waits.

所定数以上の検出結果がある場合（Ｓ２：ＹＥＳ）、撮影条件算出部１０４は、カメラ１５０の水平方向（パン）の角度θのとりうる範囲、すなわち撮影条件における水平方向のパラメータの範囲を、検出結果管理部１０３に記憶されたＮ人分の動線から設定する（Ｓ３）。 When there are more than a predetermined number of detection results (S2: YES), the shooting condition calculation unit 104 determines the range that the horizontal angle (pan) of the camera 150 can take, that is, the range of horizontal parameters in the shooting conditions. It sets from the flow line for N persons memorize | stored in the detection result management part 103 (S3).

図４は、Ｎ人分の動線ｄ１〜ｄＮと撮影方向２４の関係を例示する概念図であり、より具体的には、撮影方向２４を上から俯瞰した概念図である。図４に示すように、Ｎ人分の移動の軌跡が動線ｄ１〜ｄＮとして与えられているものとする。Ｓ３において、撮影条件算出部１０４は、カメラ１５０の撮影方向２４と、動線ｄ１〜ｄＮが可能なかぎり平行となるように、各動線を直線で近似した時の撮影方向（水平方向）のなす角度を昇順にθ１、θ２、…θＮとした場合、次の式（１）のように水平方向の角度θを設定する。 FIG. 4 is a conceptual diagram illustrating the relationship between the flow lines d1 to dN for N persons and the shooting direction 24. More specifically, FIG. 4 is a conceptual view of the shooting direction 24 seen from above. As shown in FIG. 4, it is assumed that the trajectory of movement for N persons is given as flow lines d1 to dN. In S3, the shooting condition calculation unit 104 sets the shooting direction (horizontal direction) when each flow line is approximated by a straight line so that the shooting direction 24 of the camera 150 and the flow lines d1 to dN are as parallel as possible. When the formed angles are θ1, θ2,... ΘN in ascending order, the horizontal angle θ is set as in the following equation (1).

次いで、撮影条件算出部１０４は、カメラ１５０の上下方向（チルト）の角度φのとりうる範囲、すなわち撮影条件における上下方向のパラメータの範囲を、検出結果管理部１０３に記憶されたＮ人分の動線から設定する（Ｓ４）。 Next, the imaging condition calculation unit 104 calculates the range that can be taken by the vertical angle (tilt) angle φ of the camera 150, that is, the range of the vertical parameter in the imaging condition, for N people stored in the detection result management unit 103. Set from the flow line (S4).

図５は、Ｎ人分の動線ｄ１〜ｄＮと撮影方向２４の関係を例示する概念図であり、より具体的には、撮影方向２４を横から俯瞰した概念図である。図５に示すように、Ｎ人分の移動の軌跡が動線ｄ１〜ｄＮとして与えられているものとする。Ｓ４において、撮影条件算出部１０４は、各動線について顔向きの角度をフレーム画像に関して平均した値を昇順にφ１、φ２、…φＮとした場合、次の式（２）のように上下方向の角度φを設定する。 FIG. 5 is a conceptual diagram illustrating the relationship between the flow lines d1 to dN for N persons and the shooting direction 24. More specifically, FIG. 5 is a conceptual view of the shooting direction 24 seen from the side. As shown in FIG. 5, it is assumed that the trajectory of movement for N people is given as flow lines d1 to dN. In S4, the imaging condition calculation unit 104 sets the average angle of the face direction for each flow line with respect to the frame image as φ1, φ2,... ΦN in ascending order, as shown in the following equation (2). Set the angle φ.

次いで、撮影条件算出部１０４は、カメラ１５０の回転方向（ロール）の角度ψのとりうる範囲、すなわち撮影条件における回転方向のパラメータの範囲を、検出結果管理部１０３に記憶されたＮ人分の動線から設定する（Ｓ５）。 Next, the imaging condition calculation unit 104 calculates the range that the angle ψ of the rotation direction (roll) of the camera 150 can take, that is, the range of the rotation direction parameter in the imaging condition, for N people stored in the detection result management unit 103. The flow line is set (S5).

図６は、Ｎ人分の動線ｄ１〜ｄＮとカメラ１５０の撮影画像Ｇの回転を例示する概念図である。図６に示すように、Ｎ人分の移動の軌跡が動線ｄ１〜ｄＮとして与えられているものとする。Ｓ５において、撮影条件算出部１０４は、撮影画像Ｇ内の各動線と、撮影画像Ｇの水平方向の直線とがなす角度をそれぞれ昇順にψ１、ψ２、…ψＮとした場合、次の式（３）のように回転方向の角度ψを設定する。 FIG. 6 is a conceptual diagram illustrating the flow lines d1 to dN for N persons and the rotation of the captured image G of the camera 150. As shown in FIG. 6, it is assumed that the trajectory of movement for N persons is given as flow lines d1 to dN. In S <b> 5, the imaging condition calculation unit 104 assumes that the angle formed by each flow line in the captured image G and the horizontal straight line of the captured image G is ψ1, ψ2,. As in 3), the angle ψ in the rotational direction is set.

次いで、撮影条件算出部１０４は、カメラ１５０のズーム（画角）の取りうる範囲、すなわち撮影条件における画角のパラメータの範囲を設定する（Ｓ６）。具体的には、メモリなどに予め設定された、カメラ１５０のレンズ位置を調整することで可能な範囲内で、ズームのパラメータをｒとしてｒ１、ｒ２…ｒｋのようなｋ個の値を設定する。なお、上述したθ、φ、ψについても、メモリなどに予め設定された、カメラ台座の稼働範囲内を上限値、下限値とする。 Next, the shooting condition calculation unit 104 sets a range that can be taken by the zoom (field angle) of the camera 150, that is, a range of parameter of the field angle in the shooting conditions (S6). Specifically, k values such as r1, r2,... Rk are set with r being a zoom parameter within a range that can be set in advance by adjusting the lens position of the camera 150, which is preset in a memory or the like. . As for the above-described θ, φ, and ψ, the upper limit value and the lower limit value are set within the operating range of the camera pedestal preset in the memory or the like.

次いで、パン、チルト、ロール、ズームのパラメータの範囲を設定した撮影条件について、上述した評価関数による評価値を最大とするパン、チルト、ロール、ズームの値を計算する。この評価関数による評価値の算出は、顔のサイズが大きい、顔の向きがカメラ１５０に対して正面、撮影範囲が広いなどの、顔を目視した監視には理想的であるが、両立が難しい基準を考慮して、次の式（４）により行う。 Next, the pan, tilt, roll, and zoom values that maximize the evaluation value based on the above-described evaluation function are calculated for the shooting conditions in which the ranges of the pan, tilt, roll, and zoom parameters are set. Calculation of the evaluation value using this evaluation function is ideal for monitoring the face visually, such as when the face size is large, the face direction is front of the camera 150, and the shooting range is wide, but it is difficult to achieve both. Considering the standard, the following equation (4) is used.

式（４）において、Ａ、Ｂ、Ｃ、Ｄ（＞＝０）は、調整するパラメータであるθ、φ、ψ、ｒのいずれを重視するかを定めるパラメータである。このＡ、Ｂ、Ｃ、Ｄについては、一つのパラメータを有効（＞＝０）にして、一つ以上の撮影条件を評価できるようにする。ｇ（ｒ）は、例えば、各動線の各フレーム画像に含まれる顔のうち、ズームすることによって画角（撮像領域）に含まれなくなる個数Ｘから、各動線の各フレーム画像に含まれる顔のサイズがあらかじめ定めた範囲Ｓ画素以上、Ｓ画素以下となるものの数Ｙを引いたものとする。なお、顔サイズと画角の広さのトレードオフを表現できるものであれば他の関数でもよい。この評価値の最大化はたとえばｒの値を固定すると線形計画法で計算できるので、各ｒの値について、線形計画問題を解くことで実現できる。また、顔サイズや画角などはカメラと被写体の３次元情報が得られれば計算できる。評価関数（目的関数）は上記のような顔向きや顔サイズを所望の条件で撮影するように調整するものであればよく、この関数を最大化あるいは最小化（本実施形態では評価値を高くすることが人物の顔を確認しやすいことを示すので最大化）する手段が与えられていればよい。 In Expression (4), A, B, C, and D (> = 0) are parameters that determine which of θ, φ, ψ, and r, which are parameters to be adjusted, are important. For A, B, C, and D, one parameter is enabled (> = 0) so that one or more photographing conditions can be evaluated. g (r) is included in each frame image of each flow line, for example, from the number X of faces included in each frame image of each flow line that are not included in the angle of view (imaging area) by zooming. It is assumed that the number Y of faces whose face size is a predetermined range of S pixels or more and S pixels or less is subtracted. Other functions may be used as long as they can express the trade-off between the face size and the angle of view. Since the evaluation value can be maximized by, for example, linear programming when the value of r is fixed, it can be realized by solving a linear programming problem for each value of r. Also, the face size and angle of view can be calculated if three-dimensional information about the camera and subject is obtained. The evaluation function (objective function) may be any function as long as the face orientation and the face size are adjusted so as to be photographed under a desired condition. This function is maximized or minimized (in this embodiment, the evaluation value is increased). This means that it is easy to confirm the person's face.

図７は、撮影画像Ｇのズームを例示する概念図である。図７に示すように、検出結果による顔の頻度を調べて得られた、顔が頻出する領域Ｒ１、Ｒ２を撮影領域Ｒとして限定し、拡大して撮影するようにしてよい。また、上述した処理を行うことで、図６のような動線の傾きを補正するように回転する角度を計算することもできる。また、検出結果が一定期間にわたって出力されない場合は、適切な撮影条件が算出できないものとして、出力部１０５より警告を出力してもよい。 FIG. 7 is a conceptual diagram illustrating zooming of the captured image G. As shown in FIG. 7, the regions R1 and R2 where the face appears frequently obtained by examining the frequency of the face based on the detection result may be limited to the imaging region R and may be enlarged and photographed. Further, by performing the above-described processing, the rotation angle can be calculated so as to correct the inclination of the flow line as shown in FIG. Further, when the detection result is not output over a certain period, a warning may be output from the output unit 105 on the assumption that an appropriate photographing condition cannot be calculated.

出力部１０５は、表示装置などであり、撮影条件算出部１０４により算出された撮影条件を表示出力する。また、出力部１０５は、入力部１０１の入力画像の表示や、その入力画像に検出部１０２で検出された顔領域を重畳した表示なども行う。ユーザは、出力部１０５より出力された撮影条件を確認することで、カメラ１５０で撮影した人物の顔を確認しやすくするように、カメラ１５０の水平方向、上下方向、回転方向、ズーム（画角）等を容易に調整できる。 The output unit 105 is a display device or the like, and displays and outputs the shooting conditions calculated by the shooting condition calculation unit 104. The output unit 105 also displays the input image of the input unit 101 and displays the input image superimposed with the face area detected by the detection unit 102. The user can check the shooting conditions output from the output unit 105 to make it easier to check the face of the person shot by the camera 150, such as the horizontal direction, vertical direction, rotation direction, zoom (view angle). ) Etc. can be adjusted easily.

（変形例）
次に、監視装置１００の変形例について説明する。図８は、変形例にかかる監視装置１００ａの構成を示すブロック図である。なお、監視装置１００ａにおいて、監視装置１００と同じ構成については同一の符号を付してその説明を省略する。 (Modification)
Next, a modified example of the monitoring device 100 will be described. FIG. 8 is a block diagram illustrating a configuration of a monitoring device 100a according to a modification. In the monitoring device 100a, the same components as those of the monitoring device 100 are denoted by the same reference numerals, and the description thereof is omitted.

図８に示すように、監視装置１００ａは、入力部１０１と、検出部１０２と、検出結果管理部１０３と、撮影条件算出部１０４と、出力部１０５と、撮影条件制御部１０６と、特徴抽出部１０７と、人物情報管理部１０８と、認識部１０９とを備える。 As shown in FIG. 8, the monitoring apparatus 100a includes an input unit 101, a detection unit 102, a detection result management unit 103, an imaging condition calculation unit 104, an output unit 105, an imaging condition control unit 106, and feature extraction. Unit 107, person information management unit 108, and recognition unit 109.

撮影条件制御部１０６は、撮影条件算出部１０４によって算出したパラメータにしたがって、カメラ１５０のアクチュエータ（図示しない）に駆動信号を出力することで、カメラ１５０の水平方向２１の角度、上下方向２２の角度、ロール方向２３の角度、画面の解像度、画面のズームを制御する。撮影条件制御部１０６により、カメラ１５０は、カメラ１５０で撮影した人物の顔を確認しやすくする撮影条件に、ユーザが手動で変更することなく、自動的に変更することができる。 The shooting condition control unit 106 outputs a drive signal to an actuator (not shown) of the camera 150 according to the parameters calculated by the shooting condition calculation unit 104, whereby the angle of the camera 150 in the horizontal direction 21 and the angle of the vertical direction 22 Control the angle of roll direction 23, screen resolution, and screen zoom. The shooting condition control unit 106 allows the camera 150 to automatically change to shooting conditions that make it easy to check the face of a person shot by the camera 150 without the user manually changing the shooting conditions.

特徴抽出部１０７は、検出部１０２により検出された顔の領域の情報から個人を識別するための特徴情報（以降「顔特徴」とはこの個人を識別するための特徴情報を示すこととする）を数値として出力する。特徴抽出に利用する画像は入力部１０１によって得られた入力画像を利用する。特徴抽出部１０７は、入力部１０１による入力画像と、検出部１０２によって検出された人物領域を対応付けて、特徴抽出に必要な画像領域、本実施例では顔領域の検出をしているので顔の領域を切り出し、その濃淡情報を特徴量として用いる。複数フレームの画像を利用する場合にそなえて画像補正手段で複数フレームの出力をするようにしてもよい。 The feature extraction unit 107 identifies feature information from the face area information detected by the detection unit 102 (hereinafter, “face feature” indicates feature information for identifying the individual). Is output as a numerical value. As an image used for feature extraction, an input image obtained by the input unit 101 is used. The feature extraction unit 107 associates the input image from the input unit 101 with the person region detected by the detection unit 102 and detects an image region necessary for feature extraction, which is a face region in this embodiment. Are extracted, and the shading information is used as a feature amount. In case of using a plurality of frames of images, the image correction means may output a plurality of frames.

ここでは、ｍピクセル×ｎピクセルの領域の濃淡値をそのまま情報として用い、ｍ×ｎ次元の情報を特徴ベクトルとして用いる。これらは単純類似度法という手法によりベクトルとベクトルの長さをそれぞれ１とするように正規化を行い、内積を計算することで特徴ベクトル間の類似性を示す類似度が求められる。詳しくは文献（エルッキ・オヤ著、小川英光、佐藤誠訳、「パタン認識と部分空間法」、産業図書、１９８６年）にあるように部分空間法を利用することで実現できる。文献（東芝（小坂谷達夫）：「画像認識装置、方法およびプログラム」特開２００７−４７６７号公報）にあるように１枚の人物画像情報に対してモデルを利用して顔の向きや状態を意図的に変動させた画像を作成することによってより精度の高まる手法を適用してもよい。１枚の画像から顔の特徴を求める場合にはここまでの処理で顔特徴抽出は完了する。一方で同一人物に対して連続した複数の画像を利用した動画像による計算をすることでより精度の高い認識処理が行うこともできる。具体的には文献（福井和広、山口修、前田賢一：「動画像を用いた顔認識システム」電子情報通信学会研究報告ＰＲＭＵ，ｖｏｌ９７，Ｎｏ．１１３，ｐｐ１７−２４（１９９７）、前田賢一、渡辺貞一：「局所的構造を導入したパタン・マッチング法」，電子情報通信学会論文誌（Ｄ），ｖｏｌ．Ｊ６８−Ｄ，Ｎｏ．３，ｐｐ３４５−３５２（１９８５））にある相互部分空間法を用いる方法で説明する。入力手段から連続して得られた画像から特徴抽出手段と同様にｍ×ｎピクセルの画像を切り出しこれらのデータを特徴ベクトルの相関行列を求め、Ｋ−Ｌ展開による正規直交ベクトルを求めることにより、連続した画像から得られる顔の特徴を示す部分空間を計算する。部分空間の計算法は、特徴ベクトルの相関行列（または共分散行列）を求め、そのＫ−Ｌ展開による正規直交ベクトル（固有ベクトル）を求めることにより、部分空間を計算する。部分空間は、固有値に対応する固有ベクトルを、固有値の大きな順にｋ個選び、その固有ベクトル集合を用いて表現する。本実施例では、相関行列Ｃｄを特徴ベクトルから求め、相関行列Ｃｄ＝Φｄ Λｄ ΦｄＴと対角化して、固有ベクトルの行列Φを求める。この情報が現在認識対象としている人物の顔の特徴を示す部分空間となる。このような方法で出力された部分空間のような特徴情報を入力された画像で検出された顔に対する個人の特徴情報とする。 Here, the gray value of an area of m pixels × n pixels is used as information as it is, and m × n-dimensional information is used as a feature vector. These are normalized so that the vector and the length of each vector are set to 1 by a method called a simple similarity method, and a similarity indicating the similarity between feature vectors is obtained by calculating an inner product. For details, this can be realized by using the subspace method as described in the literature (written by Elkki Oya, Hidemitsu Ogawa, Makoto Sato, “Pattern Recognition and Subspace Method”, Sangyo Tosho, 1986). Document (Toshiba (Tatsuo Kosakaya): “Image recognition apparatus, method and program”, Japanese Patent Laid-Open No. 2007-4767) discloses the orientation and state of a face using a model for one piece of human image information. You may apply the method of improving a precision by producing the image changed intentionally. When the facial features are obtained from one image, the facial feature extraction is completed by the processing so far. On the other hand, recognition processing with higher accuracy can be performed by calculating with a moving image using a plurality of continuous images for the same person. Specifically, the literature (Kazuhiro Fukui, Osamu Yamaguchi, Kenichi Maeda: “Face Recognition System Using Moving Images” IEICE Research Report PRMU, vol 97, No. 113, pp 17-24 (1997), Kenichi Maeda, Watanabe Sadaichi: Use the mutual subspace method described in "Pattern matching method with local structure", IEICE Transactions (D), vol. J68-D, No. 3, pp 345-352 (1985)). How to explain. Similar to the feature extraction unit, an image of m × n pixels is cut out from images continuously obtained from the input unit, a correlation matrix of the feature vectors is obtained from these data, and an orthonormal vector by KL expansion is obtained. A subspace indicating facial features obtained from successive images is calculated. The subspace calculation method calculates a subspace by obtaining a correlation matrix (or covariance matrix) of feature vectors and obtaining an orthonormal vector (eigenvector) by KL expansion. In the subspace, k eigenvectors corresponding to eigenvalues are selected in descending order of eigenvalues, and expressed using the eigenvector set. In the present embodiment, the correlation matrix Cd is obtained from the feature vector, and diagonalized with the correlation matrix Cd = ΦdΛdΦdT to obtain the eigenvector matrix Φ. This information becomes a partial space indicating the characteristics of the face of the person currently recognized. The feature information such as the partial space output by such a method is used as the individual feature information for the face detected in the input image.

人物情報管理部１０８は、後述の認識部１０９で検索するときに利用する対象となるデータベースであり、検索対象となる個人ごとに特徴抽出部１０７で抽出された顔特徴情報、および性別や年齢、身長など属性判別手段で判別可能な属性情報のように人物に関する付随した情報を同一の人物ごとに対応付けて管理する。顔特徴情報および属性情報として実際に管理する内容は特徴抽出部１０７で出力されたデータそのものでよく、ｍ×ｎの特徴ベクトルや、部分空間やＫＬ展開を行う直前の相関行列でも構わない。さらに、特徴抽出部１０７で出力される特徴情報を入力手段より登録時に入力された人物画像とともに管理することで個人の検索や検索の表示に利用することができる。 The person information management unit 108 is a database to be used when searching by the recognition unit 109 to be described later. The facial feature information extracted by the feature extraction unit 107 for each person to be searched, and the gender, age, Information associated with a person such as height and other attribute information that can be discriminated by attribute discriminating means is managed in association with each person. The content actually managed as face feature information and attribute information may be the data itself output by the feature extraction unit 107, or may be an m × n feature vector, a partial space, or a correlation matrix immediately before performing KL expansion. Furthermore, by managing the feature information output by the feature extraction unit 107 together with the person image input at the time of registration from the input unit, it can be used for personal search and search display.

認識部１０９は、特徴抽出部１０７で得られた入力画像の顔特徴情報と、対応する人物情報管理部１０８に記憶された顔特徴情報との類似性を示す計算を行ってより類似性の高いものから順番に結果を返す処理を行う。また上述したように所定の属性情報に絞り込んで人物情報管理部１０８を一部分だけ検索するといったことも可能である。 The recognizing unit 109 performs a calculation indicating the similarity between the facial feature information of the input image obtained by the feature extracting unit 107 and the facial feature information stored in the corresponding person information management unit 108, so that the similarity is higher. Process to return the result in order from the one. Further, as described above, it is possible to narrow down to predetermined attribute information and search only part of the person information management unit 108.

この際に検索処理の結果としては類似性の高いものから順番に人物情報管理部１０８内で個人を識別するために管理されている人物ＩＤ,計算結果である類似性を示す指標を返す。それに加えて個人ごとに管理されている情報を一緒に返すようにしてもかまわないが、基本的に識別ＩＤにより対応付けが可能であるので検索処理自体では付属情報をやりとりすることはなくても実現が可能となる。類似性を示す指標としては顔特徴情報として管理されている部分空間同士の類似度とする。計算方法は、部分空間法や複合類似度法などの方法を用いてよい。この方法では、予め蓄えられた登録情報の中の認識データも、入力されるデータも複数の画像から計算される部分空間として表現され、２つの部分空間のなす「角度」を類似度として定義する。ここで入力される部分空間を入力手段分空間という。入力データ列に対して同様に相関行列Ｃｉｎを求め、Ｃｉｎ＝ΦｉｎΛｉｎΦｉｎＴと対角化し、固有ベクトルΦｉｎを求める。二つのΦｉｎ，Φｄで表される部分空間の部分空間間類似度（０．０〜１．０）を求め、これを認識するための類似度とする。具体的な計算方法については特徴抽出部１０７の説明で紹介した文献（エルッキ・オヤ）で実現が可能である。また、あらかじめ同一人物と分かる複数の人物画像をまとめて部分空間への射影によって本人であるかどうかを識別することで精度を向上させることも可能であり、文献（福井・小坂谷）でも同様の処理を行うこともできる。高速に検索するにはＴＲＥＥ構造を利用した検索方法なども利用可能である。 At this time, as a result of the search process, a person ID managed in order to identify an individual in the person information management unit 108 and an index indicating the similarity as a calculation result are returned in descending order of the similarity. In addition, information managed for each individual may be returned together, but it is basically possible to associate by identification ID, so the search process itself does not need to exchange attached information. Realization is possible. The index indicating similarity is the similarity between partial spaces managed as face feature information. As a calculation method, a method such as a subspace method or a composite similarity method may be used. In this method, both the recognition data in the registration information stored in advance and the input data are expressed as subspaces calculated from a plurality of images, and the “angle” formed by the two subspaces is defined as similarity. . The partial space input here is referred to as an input means space. Similarly, a correlation matrix Cin is obtained for the input data string, and diagonalized with Cin = ΦinΛinΦinT to obtain an eigenvector Φin. The similarity between subspaces (0.0 to 1.0) of the subspaces represented by two Φin and Φd is obtained and used as the similarity for recognizing this. A specific calculation method can be realized by the literature (Erki Oya) introduced in the description of the feature extraction unit 107. It is also possible to improve accuracy by collecting multiple human images that are known to be the same person in advance and identifying whether the person is the person by projecting to the partial space. The same applies to the literature (Fukui and Kosakaya). Processing can also be performed. For high-speed search, a search method using a TREE structure can be used.

出力部１０５では、認識部１０９で得られた結果、および入力された画像を画面に表示する。認識部１０９によって検索された結果のうち指定した条件にあうものをリアルタイムに表示するリアルタイム顔検索結果表示と、認識部１０９によって検索された結果を検索履歴として保存しておき、後から条件を指定することで該当する検索履歴だけ表示するオフラインの顔検索結果表示のいずれか一方、または両方を組み込むことが可能である。 The output unit 105 displays the result obtained by the recognition unit 109 and the input image on the screen. Real-time face search result display that displays in real time the results that are searched by the recognition unit 109 that meet the specified conditions, and the search results that have been searched by the recognition unit 109 are saved as a search history, and the conditions are specified later. By doing so, it is possible to incorporate either or both of offline face search result displays that display only the corresponding search history.

（第２の実施形態）
図９は、第２の実施形態にかかる監視装置１００ｂの構成を示すブロック図である。なお、監視装置１００ｂにおいて、監視装置１００と同じ構成については同一の符号を付してその説明を省略する。 (Second Embodiment)
FIG. 9 is a block diagram illustrating a configuration of a monitoring device 100b according to the second embodiment. In the monitoring device 100b, the same components as those of the monitoring device 100 are denoted by the same reference numerals, and the description thereof is omitted.

図９に示すように、監視装置１００ｂは、入力部１０１と、検出部１０２と、出力部１０５と、撮影条件記憶部１１０と、撮影条件診断部１１１とを備える。 As illustrated in FIG. 9, the monitoring device 100 b includes an input unit 101, a detection unit 102, an output unit 105, an imaging condition storage unit 110, and an imaging condition diagnosis unit 111.

撮影条件記憶部１１０は、カメラ１５０で撮影した人物の顔を確認しやすくするカメラの撮影条件として、あらかじめ定めておいた顔サイズ、顔の向き、視野角などの推奨値、背景の画像、又はカメラ１５０のパン、ヨー方向（水平方向）、チルト、ピッチ方向（上下方向）、ロール方向（回転方向）、及びズームの拡大率などの情報を記憶する。なお、第１の実施形態のようにカメラ１５０の前を移動する人物の追跡履歴から定めた値を記録してもよいし、ユーザが事前に画面を目視で確認しながら、設定した値を記録してもよい。 The photographing condition storage unit 110 is a camera photographing condition that makes it easy to confirm the face of a person photographed by the camera 150. Recommended values such as a predetermined face size, face orientation, viewing angle, background image, or Information such as the pan, yaw direction (horizontal direction), tilt, pitch direction (vertical direction), roll direction (rotation direction), and zoom magnification of the camera 150 is stored. As in the first embodiment, the value determined from the tracking history of the person moving in front of the camera 150 may be recorded, or the set value is recorded while the user visually confirms the screen in advance. May be.

撮影条件診断部１１１は、検出部１０２の検出結果に基づいた、人物の顔に対するカメラ１５０の上下方向、水平方向、回転方向における撮影方向、及び画角の少なくとも一つの撮影条件が、撮影条件記憶部１１０に記憶された撮影条件に整合するか否かを判定する。 Based on the detection result of the detection unit 102, the imaging condition diagnosis unit 111 stores at least one imaging condition of an up / down direction, a horizontal direction, a rotation direction and an angle of view of the camera 150 with respect to a human face. It is determined whether or not the shooting conditions stored in the unit 110 match.

図１０は、第２の実施形態にかかる監視装置１００ｂの動作の一例を示すフローチャートであり、より具体的には撮影条件診断部１１１による撮影条件の判定を例示するフローチャートである。図１０に示すように、撮影条件診断部１１１は、Ｓ１１〜Ｓ１７の処理が予め設定された反復数の上限回実施されたか否かを判定する（Ｓ１０）。ここで、反復数の上限回実施されていない場合（Ｓ１０：ＮＯ）、撮影条件診断部１１１は、Ｓ１１〜Ｓ１７の処理を継続する。反復数の上限回実施された場合（Ｓ１０：ＹＥＳ）、撮影条件診断部１１１は、Ｓ１８へ処理を進める。 FIG. 10 is a flowchart illustrating an example of the operation of the monitoring apparatus 100b according to the second embodiment. More specifically, FIG. 10 is a flowchart illustrating determination of imaging conditions by the imaging condition diagnosis unit 111. As illustrated in FIG. 10, the imaging condition diagnosis unit 111 determines whether or not the processing of S11 to S17 has been performed an upper limit number of times set in advance (S10). Here, when the upper limit number of repetitions has not been performed (S10: NO), the imaging condition diagnosis unit 111 continues the processes of S11 to S17. When the upper limit of the number of repetitions is performed (S10: YES), the imaging condition diagnosis unit 111 advances the process to S18.

Ｓ１１において、撮影条件診断部１１１は、検出部１０２の検出結果（顔の検出・追跡結果）を取得する。次いで、撮影条件診断部１１１は、検出結果の人数が０人であるか否かを判定する（Ｓ１２）。０人である場合（Ｓ１２：ＹＥＳ）、撮影条件診断部１１１は、入力部１０１の入力画像を背景画像として記憶し（Ｓ１３）、処理の反復数をインクリメントする（Ｓ１７）。０人でない場合（Ｓ１２：ＮＯ）、撮影条件診断部１１１は、検出部１０２の検出結果である顔サイズ、顔の向き、動線を記憶し（Ｓ１４〜１６）、処理の反復数をインクリメントする（Ｓ１７）。 In S <b> 11, the imaging condition diagnosis unit 111 acquires the detection result (face detection / tracking result) of the detection unit 102. Next, the imaging condition diagnosis unit 111 determines whether or not the number of detection results is zero (S12). If the number is zero (S12: YES), the imaging condition diagnosis unit 111 stores the input image of the input unit 101 as a background image (S13), and increments the number of processing iterations (S17). When the number is not zero (S12: NO), the imaging condition diagnosis unit 111 stores the face size, the face direction, and the flow line, which are detection results of the detection unit 102 (S14 to 16), and increments the number of processing iterations. (S17).

Ｓ１８において、撮影条件診断部１１１は、Ｓ１２〜Ｓ１７の処理で記憶されたデータと、撮影条件記憶部１１０に記憶された撮影条件とをもとに、撮影条件の整合性の評価値を計算し（Ｓ１８）、その計算結果（整合するか否かの判定結果）を出力部１０５へ出力する（Ｓ１９）。出力部１０５では、カメラ１５０の撮影条件が撮影条件記憶部１１０に予め記憶された撮影条件と整合するか否かを表示出力する。 In S18, the imaging condition diagnosis unit 111 calculates an evaluation value of the consistency of the imaging conditions based on the data stored in the processing of S12 to S17 and the imaging conditions stored in the imaging condition storage unit 110. (S18) The calculation result (determination result as to whether or not to match) is output to the output unit 105 (S19). The output unit 105 displays and outputs whether or not the shooting conditions of the camera 150 are consistent with the shooting conditions stored in the shooting condition storage unit 110 in advance.

具体的には、撮影条件診断部１１１は、入力画像から人物が検出されていない場合は背景画像を記憶し、Ｓ１８において、記憶しておいた背景画像と差分を計算し、画素または隣接する画素をまとめたブロックごとに背景である事後確率を得ることができる（中井宏章、特開平０９−８１７１４号公報）。また、顔検出結果の顔のサイズを記録しておき、顔のサイズが撮影条件記憶部１１０で指定した画素以上、Ｓ画素以下の範囲に含まれない顔の数Ｘにもとづいて整合性を判断することもできる。また、顔向きの水平方向の角度、垂直方向の角度、画面内回転の角度を記録しておき、撮影条件記憶部１１０で指定した顔向きの範囲に含まれない顔の数Ｙにもとづいて整合性を判断することもできる。顔向きは前述した手法で計算することができる。また、検出部１０２で得られた動線の情報から動線と画像の水平方向の直線がなす角度が撮影条件記憶部１１０で指定した角度の範囲に含まれない人物の数Ｚにもとづいて整合性を判断することもできる。また、検出部１０２で得られる人物の追跡で計数した人数をＮとして、Ｎが撮影条件記憶部１１０で指定した人数よりも少ない場合は不適切な画角であると判断することもできる。 Specifically, the imaging condition diagnosis unit 111 stores a background image when no person is detected from the input image, calculates a difference from the stored background image in S18, and the pixel or an adjacent pixel The posterior probability that is the background can be obtained for each block that summarizes (Hiroaki Nakai, JP 09-81714 A). In addition, the face size of the face detection result is recorded, and the consistency is determined based on the number X of faces not included in the range of the face size greater than or equal to the pixel specified in the imaging condition storage unit 110 and less than or equal to S pixel. You can also In addition, the horizontal angle, the vertical angle, and the in-screen rotation angle of the face direction are recorded, and matching is performed based on the number Y of faces not included in the face direction range specified by the photographing condition storage unit 110. Sex can also be judged. The face orientation can be calculated by the method described above. The angle formed by the flow line and the horizontal straight line of the image from the flow line information obtained by the detection unit 102 is matched based on the number Z of persons not included in the angle range specified by the imaging condition storage unit 110. Sex can also be judged. Further, if the number of persons counted by tracking the person obtained by the detection unit 102 is N, it can be determined that the angle of view is inappropriate when N is smaller than the number of persons specified by the imaging condition storage unit 110.

また、背景画像との差分や色ヒストグラムの比較により、登録された条件と異なることを判定することもできる。エッジ、コーナーなどの局所特徴量や局所領域の輝度分布にもとづいた特徴量を用いることもできる。この判定には画面全体で特徴量をもとめてそれぞれの座標値を個別に覚えてもよいし、画面全体を小領域に分割してそれぞれのブロック単位で特徴情報を計算して比較してよい。 Further, it is possible to determine that the registered condition is different from the difference between the background image and the comparison of the color histogram. It is also possible to use feature quantities based on local feature quantities such as edges and corners and the luminance distribution of local areas. For this determination, the feature values may be obtained for the entire screen and the respective coordinate values may be memorized individually, or the entire screen may be divided into small areas and feature information may be calculated and compared for each block.

図１１は、第１、第２の実施形態にかかる監視装置１００、１００ａ、１００ｂのハードウエア構成を示したブロック図である。図１１に示すように、監視装置１００、１００ａ、１００ｂは、ＣＰＵ１１０１と、ＲＯＭ（Read Only Memory）１１０２と、ＲＡＭ（Random Access Memory）１１０３と、通信Ｉ／Ｆ１１０４と、ＨＤＤ１１０５と、表示装置１１０６と、キーボードやマウスなどの入力デバイス１１０７と、これらを接続するバス１１０８と、を備えており、通常のコンピュータを利用したハードウェア構成となっている。 FIG. 11 is a block diagram illustrating a hardware configuration of the monitoring devices 100, 100a, and 100b according to the first and second embodiments. As illustrated in FIG. 11, the monitoring devices 100, 100 a, and 100 b include a CPU 1101, a ROM (Read Only Memory) 1102, a RAM (Random Access Memory) 1103, a communication I / F 1104, an HDD 1105, and a display device 1106. , An input device 1107 such as a keyboard and a mouse, and a bus 1108 for connecting them, and has a hardware configuration using a normal computer.

本実施形態の監視装置１００、１００ａ、１００ｂで実行されるプログラムは、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等のコンピュータで読み取り可能な記録媒体に記録されて提供される。 The programs executed by the monitoring apparatuses 100, 100a, and 100b of the present embodiment are files in an installable format or an executable format, and are CD-ROM, flexible disk (FD), CD-R, DVD (Digital Versatile Disk). Or the like recorded on a computer-readable recording medium.

また、本実施形態の監視装置１００、１００ａ、１００ｂで実行されるプログラムを、インターネット等のネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成しても良い。また、本実施形態の監視装置１００、１００ａ、１００ｂで実行されるプログラムをインターネット等のネットワーク経由で提供または配布するように構成しても良い。また、本実施形態のプログラムを、ＲＯＭ等に予め組み込んで提供するように構成してもよい。 The program executed by the monitoring devices 100, 100a, and 100b of the present embodiment may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. . Further, the program executed by the monitoring devices 100, 100a, and 100b of the present embodiment may be configured to be provided or distributed via a network such as the Internet. Further, the program of this embodiment may be configured to be provided by being incorporated in advance in a ROM or the like.

本実施形態の監視装置１００、１００ａ、１００ｂで実行されるプログラムは、上述した各構成を含むモジュール構成となっており、実際のハードウェアとしてはＣＰＵ１１０１が上記記憶媒体からプログラムを読み出して実行することにより上記各構成がＲＡＭ１１０３上にロードされ、上記各構成がＲＡＭ１１０３上に生成される。 The program executed by the monitoring devices 100, 100a, and 100b according to the present embodiment has a module configuration including each of the above-described configurations. As actual hardware, the CPU 1101 reads the program from the storage medium and executes it. As a result, the respective components are loaded onto the RAM 1103 and the respective components are generated on the RAM 1103.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although several embodiments of the present invention have been described, these embodiments are presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１００、１００ａ、１００ｂ…監視装置、２１…水平方向、２２…上下方向、２３…ロール方向、２４…撮影方向、１０１…入力部、１０２…検出部、１０３…検出結果管理部、１０４…撮影条件算出部、１０５…出力部、１０６…撮影条件制御部、１０７…特徴抽出部、１０９…認識部、１０８…人物情報管理部、１１０…撮影条件記憶部、１１１…撮影条件診断部、１５０…カメラ、１１０１…ＣＰＵ、１１０２…ＲＯＭ、１１０３…ＲＡＭ、１１０４…通信Ｉ／Ｆ、１１０５…ＨＤＤ、１１０６…表示装置、１１０７…入力デバイス、１１０８…バス、ｄ１、ｄ２、ｄＮ…動線、Ｇ…撮影画像、Ｒ…撮影領域、Ｒ１、Ｒ２…領域 DESCRIPTION OF SYMBOLS 100, 100a, 100b ... Monitoring apparatus, 21 ... Horizontal direction, 22 ... Vertical direction, 23 ... Roll direction, 24 ... Shooting direction, 101 ... Input part, 102 ... Detection part, 103 ... Detection result management part, 104 ... Shooting condition Calculation unit 105 ... Output unit 106 ... Shooting condition control unit 107 ... Feature extraction unit 109 ... Recognition unit 108 ... Personal information management unit 110 ... Shooting condition storage unit 111 ... Shooting condition diagnosis unit 150 ... Camera 1101 ... CPU 1102 ... ROM 1103 ... RAM 1104 ... Communication I / F 1105 ... HDD 1106 ... Display device 1107 ... Input device 1108 ... Bus, d1, d2, dN ... Flow line, G ... Photographing Image, R ... shooting area, R1, R2 ... area

Claims

Image input means for inputting image data captured by the camera;
Face detection means for detecting a face area representing a human face from the input image data;
Detection result storage means for storing the detection result of the face detection means;
Based on the stored detection result, according to the equation (1), the vertical angle φ 1 of the camera, the horizontal angle θ 2 , the angle ψ of the shooting direction in the rotation direction, and the angle of view r are used as parameters. Calculating means for calculating at least one photographing condition that maximizes an evaluation value by an evaluation function for evaluating ease of confirmation of the face of the person based on a stored face area ;
Output means for outputting the calculated photographing condition;
A monitoring device comprising:

The face detecting means detects a size of the face region;
The calculating means calculates a shooting condition based on the size of the stored face area, with the face area having a predetermined size;
The monitoring apparatus according to claim 1 .

The face detecting means detects a face direction based on a face part included in the face region;
The calculation means calculates shooting conditions based on the stored face orientation, with the face orientation being a predetermined direction.
The monitoring apparatus according to claim 1 .

The detection result storage means stores the face area detected by the face detection means over a plurality of frame images,
The calculation means calculates shooting conditions for shooting from a predetermined direction with respect to a trajectory of movement of the person based on the face area stored over the plurality of frame images.
The monitoring apparatus according to claim 1 .

Control means for controlling the shooting direction and angle of view of the camera based on the calculated shooting conditions,
The monitoring apparatus as described in any one of Claims 1 thru | or 4 .

Image input means for inputting image data captured by the camera;
Detects a face of a person from the input image data is represented the face area, a preset number of iterations, a face detection unit which continues until the implementation,
Shooting condition storage means for storing shooting conditions of the camera for facilitating confirmation of the person's face;
Based on the detection results for the number of repetitions of the face detection means, at least one shooting condition of the camera's face in the vertical direction, horizontal direction, shooting direction in the rotation direction, and angle of view is stored. Determining means for determining whether or not it matches the shooting conditions;
Output means for outputting a determination result of the determination means;
A monitoring device comprising:

The photographing condition storage means stores the face size of the person when the person is photographed with the camera as the photographing condition,
The determination unit determines whether or not a face size based on the detection result matches the stored face size;
The monitoring device according to claim 6 .

The photographing condition storage means stores the face direction of the person when the person is photographed by the camera as the photographing condition,
The determination means determines whether or not a face orientation based on the detection result matches the stored face orientation;
The monitoring device according to claim 6 .

The shooting condition storage means stores, as the shooting condition, the shooting direction of the camera with respect to a trajectory of movement of the person when the person moves in front of the camera,
The determination unit determines whether or not the movement trajectory of the person matches the stored movement trajectory based on the face area detected by the face detection unit over a plurality of frame images;
The monitoring device according to claim 6 .

The photographing condition storage means stores, as the photographing condition, a pixel value in the background of the person when the person is photographed by the camera.
The determination means determines whether or not a pixel value of an area other than the face area based on the detection result matches the stored pixel value.
The monitoring device according to claim 6 .

The determination unit determines whether the shooting condition based on the detection result of the face detection unit matches the stored shooting condition for all or a part of the image in the image data captured by the camera. To determine,
The monitoring device according to any one of claims 6 to 10 .

A method performed by a monitoring device, comprising:
An image input means for inputting image data captured by the camera;
A step of detecting a face area in which a human face is represented from the input image data;
A detection result storing means for storing the detection result of the face area;
Based on the stored detection result, the calculating means calculates the vertical angle φ 1 of the camera, the horizontal angle θ 2 , the shooting direction angle φ 2 in the rotation direction, and the angle of view r according to the equation (1). Calculating at least one imaging condition that maximizes an evaluation value by an evaluation function for evaluating ease of confirming the face of the person based on the stored face area as a parameter ;
An output means for outputting the calculated photographing condition;
Including methods.

A method performed by a monitoring device, comprising:
The monitoring device includes a shooting condition storage unit that stores shooting conditions of a camera that makes it easy to check a person's face,
An image input means for inputting image data captured by the camera;
A step of the face detecting means continues to detect the face of a person from the input image data is represented the face area, a preset number of iterations, until the implementation,
Based on the detection result of the face area for the number of repetitions , the determination unit has at least one shooting condition of the shooting direction in the vertical direction, horizontal direction, rotation direction, and angle of view of the person's face, Determining whether or not it matches the stored shooting conditions;
An output means for outputting the determined determination result;
Including methods.