JP7237520B2

JP7237520B2 - Information processing device, information processing method and program

Info

Publication number: JP7237520B2
Application number: JP2018201368A
Authority: JP
Inventors: 慧子中前; 光洋小野
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2023-03-13
Anticipated expiration: 2038-10-26
Also published as: JP2020068495A

Description

本発明は、情報処理装置、情報処理方法およびプログラムに関する。 The present invention relates to an information processing device, an information processing method, and a program.

近年、ネットワークカメラを利用した監視システムが広く普及している。このような監視システムとして、広域を撮像可能な全方位カメラ等の広域カメラと、パン機構、チルト機構およびズーム機構を備えたＰＴＺ（ＰａｎＴｉｌｔＺｏｏｍ）カメラとを連携させるカメラ連携機能を備えた監視システムがある。カメラ連携機能は、ユーザが全方位カメラの撮像画像（映像）から選択した注目点を注視点としてＰＴＺカメラが撮像するように、ＰＴＺカメラの撮像方向を制御し、ＰＴＺカメラにて撮像された撮像画像を拡大表示する機能である。 In recent years, surveillance systems using network cameras have become widespread. As such a monitoring system, a monitoring system equipped with a camera linkage function that links a wide-area camera such as an omnidirectional camera capable of imaging a wide area with a PTZ (Pan Tilt Zoom) camera equipped with a pan mechanism, a tilt mechanism and a zoom mechanism. We have a system. The camera linkage function controls the imaging direction of the PTZ camera so that the PTZ camera captures an image with the point of interest selected by the user from the image (video) captured by the omnidirectional camera as the gaze point, and the image captured by the PTZ camera is captured. This is a function that enlarges and displays an image.

このようなカメラ連携機能を実現する監視システムでは、運用開始前にキャリブレーションを実施して、複数のカメラの相対的な位置関係を把握し、それぞれのカメラの撮像画像間の位置合わせを行うのが一般的である。このカメラの位置関係をもとに、全方位カメラの座標系における注目点が、ＰＴＺカメラの座標系においてどの位置に相当するかを計算し、ＰＴＺカメラに対して撮像方向の変更指示を出す。
しかしながら、２次元平面である撮像画像からは奥行情報を取得できないため、撮像画像から注目点を指定した場合、注目点の三次元的な位置を算出することができない。そのため、例えば全方位カメラがドアの手前に立つ人物をドアの正面から撮像し、撮像画像においてその人物の顔を注目点として選択した場合、ドアの横方向から撮像するＰＴＺカメラは、人物の顔とドアのどちらを注視点とすればよいのか判断が付かない。また、人のように動き回る動体の場合、運用開始前に位置合わせをしておくことは難しい。
特許文献１には、カメラから撮像対象までの距離を取得し、奥行情報を得る方法として、距離センサを使用する方法や、カメラのフォーカス位置を取得する方法が開示されている。 In a monitoring system that realizes such a camera linkage function, it is necessary to perform calibration before starting operation, grasp the relative positional relationship of multiple cameras, and align the images captured by each camera. is common. Based on this positional relationship of the cameras, it calculates which position in the coordinate system of the PTZ camera corresponds to the point of interest in the coordinate system of the omnidirectional camera, and issues an instruction to change the imaging direction to the PTZ camera.
However, since depth information cannot be obtained from a captured image that is a two-dimensional plane, when a point of interest is specified from the captured image, the three-dimensional position of the point of interest cannot be calculated. Therefore, for example, when an omnidirectional camera captures an image of a person standing in front of a door from the front of the door, and the face of the person is selected as an attention point in the captured image, the PTZ camera that captures the image from the side of the door captures the face of the person. and the door should be the point of gaze. Also, in the case of a moving object that moves around, such as a person, it is difficult to align the positions before starting operation.
Patent Document 1 discloses a method of using a distance sensor and a method of obtaining the focus position of the camera as a method of obtaining depth information by obtaining the distance from the camera to the object to be imaged.

特開２００６－１１５４７０号公報JP-A-2006-115470

しかしながら、上記特許文献１に記載の技術のように距離センサを使用して奥行情報を得る方法はコストが嵩む。また、カメラのフォーカス位置を取得して奥行情報を得る方法は、全方位レンズなどには適用が困難であり、精度も良くない。
そこで、本発明は、撮像画像における注目点の三次元的な位置を低コストで適切に導出することを課題としている。 However, the method of obtaining depth information using a distance sensor like the technique described in Patent Literature 1 is costly. In addition, the method of obtaining depth information by acquiring the focus position of a camera is difficult to apply to an omnidirectional lens and the like, and the accuracy is not good.
Accordingly, an object of the present invention is to appropriately derive the three-dimensional position of a point of interest in a captured image at low cost.

上記課題を解決するために、本発明に係る情報処理装置の一態様は、第一の撮像装置により撮像された撮像画像における床面領域の位置および壁面領域の位置を特定するための領域情報を取得する第一の取得手段と、前記撮像画像における注目点の位置を示す位置情報を取得する第二の取得手段と、前記第一の取得手段により取得された領域情報と、前記第二の取得手段により取得された位置情報と、を用いて、前記注目点に対応する注目物体が床面に存在する物体であるか壁面に存在する物体であるかを判定し、当該判定の結果に基づいて、前記注目点の三次元的な位置を導出する導出手段と、を備える。 In order to solve the above problems, one aspect of an information processing apparatus according to the present invention provides area information for specifying the positions of a floor area and a wall area in an image captured by a first imaging device . a first acquiring means for acquiring; a second acquiring means for acquiring position information indicating a position of a point of interest in the captured image; area information acquired by the first acquiring means; positional information obtained by means for determining whether the target object corresponding to the target point is an object existing on the floor surface or an object existing on the wall surface, and based on the result of the determination; and derivation means for deriving the three-dimensional position of the attention point.

本発明によれば、撮像画像における注目点の三次元的な位置を低コストで適切に導出することができる。 According to the present invention, it is possible to appropriately derive the three-dimensional position of a point of interest in a captured image at low cost.

第一の実施形態における撮像システムのシステム構成例を示す図である。It is a figure which shows the system configuration example of the imaging system in 1st embodiment. 第一の実施形態における撮像装置のハードウェア構成例である。2 is a hardware configuration example of an imaging device according to the first embodiment; 第一の実施形態におけるクライアント装置のハードウェア構成例である。3 is a hardware configuration example of a client device according to the first embodiment; 第一の実施形態における床面設定処理を行う環境の概略図である。FIG. 4 is a schematic diagram of an environment in which floor surface setting processing is performed in the first embodiment; 第一の実施形態における床面設定処理手順のフローチャートである。4 is a flowchart of floor setting processing procedures in the first embodiment. 第一の実施形態におけるマーカー検出の説明図である。FIG. 4 is an explanatory diagram of marker detection in the first embodiment; 第一の実施形態におけるマーカーの位置取得に関する説明図である。FIG. 10 is an explanatory diagram of marker position acquisition in the first embodiment; 第一の実施形態におけるマップテーブルの一例である。It is an example of a map table in the first embodiment. 第一の実施形態におけるカメラ連携操作のフローチャートである。4 is a flowchart of camera cooperation operation in the first embodiment. 第一の実施形態における注視点の属性判定を説明する図である。FIG. 4 is a diagram for explaining attribute determination of a gaze point in the first embodiment; 第一の実施形態における仰角Θ_MAXの算出方法を説明する図である。FIG. 4 is a diagram illustrating a method of calculating an elevation angle Θ _MAX in the first embodiment; FIG. 第一の実施形態における注視点の算出方法を説明する図である。It is a figure explaining the calculation method of the gaze point in 1st embodiment. 第二の実施形態におけるカメラ連携操作のフローチャートである。9 is a flowchart of camera cooperation operation in the second embodiment. 第二の実施形態における注視点の算出方法を説明する図である。It is a figure explaining the calculation method of the gaze point in 2nd embodiment. 第二の実施形態におけるＵＩ画面の一例である。It is an example of a UI screen in the second embodiment. 第二の実施形態におけるＵＩ画面の一例である。It is an example of a UI screen in the second embodiment. 第二の実施形態におけるＵＩ画面の一例である。It is an example of a UI screen in the second embodiment. 第二の実施形態におけるＵＩ画面の一例である。It is an example of a UI screen in the second embodiment. マップテーブルの別の例である。It is another example of a map table. 注視点の算出方法の別の例を説明する図である。FIG. 11 is a diagram illustrating another example of a method of calculating a point of gaze; 床面指定方法の別の例を説明する図である。It is a figure explaining another example of the floor specification method.

以下、添付図面を参照して、本発明を実施するための形態について詳細に説明する。
なお、以下に説明する実施の形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されるべきものであり、本発明は以下の実施の形態に限定されるものではない。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the accompanying drawings.
The embodiments described below are examples of means for realizing the present invention, and should be appropriately modified or changed according to the configuration of the apparatus to which the present invention is applied and various conditions. It is not limited to the embodiment of

本実施形態では、複数の撮像装置を連携させて物体を撮像する撮像システムとして、ネットワークカメラを利用した監視システムについて説明する。
ネットワークカメラは、大規模な公共機関や量販店における監視カメラとして幅広い分野で利用されており、様々な運用方法がある。また、ネットワークカメラは、運用形態に合わせるために様々な機能的特徴をもったものが存在する。例えばパン・チルトといった撮像方向を自在に変更できるネットワークカメラや、撮像方向は変更できないが高倍率のズーム撮像が可能なネットワークカメラ、魚眼レンズを搭載し広い範囲を一度に監視可能なネットワークカメラ（全方位カメラ）が存在する。
このようなネットワークカメラのうち、全方位カメラは、非常に広い視野角を持つ一方で、カメラから離れた場所では割り当てられた画素数が少なく、解像度が落ちてしまうという特徴がある。そのため、全方位カメラは、人の顔や車のナンバープレートといった詳細な情報を取得することには向いていない。そのような欠点を補うため、全方位カメラとパン・チルト・ズーム機能を持つカメラ（ＰＴＺカメラ）とを連携させるようなカメラ連携機能がある。つまり、全方位カメラで広い範囲を一度に監視し、その映像中で特に注目すべき点（注目点）をＰＴＺカメラが撮像し、当該注目点を拡大表示して詳細情報を取得する。 In the present embodiment, a monitoring system using a network camera will be described as an imaging system that images an object by linking a plurality of imaging devices.
Network cameras are used in a wide range of fields as surveillance cameras in large-scale public institutions and mass retailers, and there are various operating methods. In addition, there are network cameras with various functional features in order to match the operation mode. For example, a network camera that can freely change the imaging direction such as pan and tilt, a network camera that cannot change the imaging direction but can perform high-magnification zoom imaging, a network camera that can monitor a wide range at once with a fisheye lens (omnidirectional) camera) exists.
Among such network cameras, an omnidirectional camera has a very wide viewing angle, but has the characteristic that the number of pixels assigned to the camera is small and the resolution drops at a location away from the camera. Therefore, omnidirectional cameras are not suitable for capturing detailed information such as people's faces and vehicle license plates. In order to compensate for such a drawback, there is a camera linkage function that links an omnidirectional camera and a camera (PTZ camera) having pan/tilt/zoom functions. That is, an omnidirectional camera monitors a wide range at once, a PTZ camera captures an image of a point of particular interest (a point of interest) in the image, and the point of interest is enlarged and displayed to obtain detailed information.

本実施形態では、複数の撮像装置を備える撮像システムにおいて、全方位カメラにより撮像された撮像画像から注目点が指定された場合、ＰＴＺカメラに対して当該注目点を注視点とした撮像を指示する。本実施形態では、全方位カメラの撮像画像における床面等の平面領域と、当該平面領域に対応する実空間上の平面からの注目点の高さとを定めておき、これらの情報をもとに、注目点の三次元的な位置を導出する。これにより、カメラ連携機能において、全方位カメラと連携するＰＴＺカメラに対して適切な撮像位置を指示することができる。 In this embodiment, in an imaging system having a plurality of imaging devices, when a point of interest is specified from an image captured by an omnidirectional camera, the PTZ camera is instructed to take an image with the point of interest as the point of gaze. . In this embodiment, a planar area such as a floor surface in the captured image of the omnidirectional camera and the height of the point of interest from the plane in the real space corresponding to the planar area are determined, and based on this information, , to derive the three-dimensional position of the point of interest. Accordingly, in the camera cooperation function, it is possible to instruct an appropriate imaging position to the PTZ camera that cooperates with the omnidirectional camera.

（第一の実施形態）
図１は、本実施形態における撮像システム１０００のシステム構成例を示す図である。本実施形態における撮像システム１０００は、２台の撮像装置１００Ａ、１００Ｂを連携させて監視対象となる注目物体を撮像するカメラ連携システムである。
撮像システム１０００においては、撮像装置１００Ａ、１００Ｂの位置関係を把握するために、運用開始前にキャリブレーションが実施される。本実施形態では、撮像システム１０００は、このキャリブレーション時にマーカーを使用した床面設定処理を実施して、撮像装置１００Ａの撮像画像における床面（地面）や壁面といった平面領域を設定する。運用開始後は、撮像システム１０００は、撮像装置１００Ａの撮像画像から指定された注目点の三次元的な位置を導出し、導出した注目点を注視点として、撮像装置１００Ｂに撮像させる処理を行う。具体的には、注目点が人物の顔である場合、撮像システム１０００は、床面からの高さが人物の推定身長となる点を注目点（注視点）として算出し、撮像装置１００Ｂの撮像方向を制御する撮像方向指示処理を実施する。
なお、注目点は人物の顔に限定されない。注目点は、人物の顔以外の一部であってもよいし、人物の全身であってもよい。また、注目点は、人物以外の物体（例えば車両や窓）であってもよい。 (First embodiment)
FIG. 1 is a diagram showing a system configuration example of an imaging system 1000 according to this embodiment. The imaging system 1000 according to the present embodiment is a camera cooperation system that images an object of interest to be monitored by linking two imaging devices 100A and 100B.
In the imaging system 1000, calibration is performed before starting operation in order to grasp the positional relationship between the imaging devices 100A and 100B. In this embodiment, the imaging system 1000 performs floor surface setting processing using a marker during calibration, and sets a plane area such as a floor surface (ground) or a wall surface in the captured image of the imaging device 100A. After the start of operation, the imaging system 1000 derives the three-dimensional position of the designated point of interest from the image captured by the imaging device 100A, and performs processing for causing the imaging device 100B to capture an image using the derived point of interest as the point of gaze. . Specifically, when the point of interest is the face of a person, the imaging system 1000 calculates a point whose height from the floor is the estimated height of the person as the point of interest (point of gaze), An imaging direction instruction process for controlling the direction is performed.
Note that the attention point is not limited to the person's face. The point of interest may be a part of the person other than the face, or may be the whole body of the person. Also, the point of interest may be an object other than a person (for example, a vehicle or a window).

撮像システム１０００は、撮像装置１００Ａと、撮像装置１００Ｂと、クライアント装置２００と、を備える。撮像装置１００Ａ、１００Ｂとクライアント装置２００とは、それぞれネットワーク３００によって相互に通信可能に接続されている。ネットワーク３００は、カメラ１００Ａ、１００Ｂとクライアント装置２００との間で通信可能な構成であれば、その通信規格、規模および構成は問わない。また、ネットワーク３００への物理的な接続形態は、有線であってもよいし、無線であってもよい。 The imaging system 1000 includes an imaging device 100A, an imaging device 100B, and a client device 200. The imaging devices 100A and 100B and the client device 200 are connected by a network 300 so as to be able to communicate with each other. The network 300 may be of any communication standard, size, and configuration as long as the cameras 100A and 100B and the client device 200 can communicate with each other. Also, the physical connection form to the network 300 may be wired or wireless.

撮像装置１００Ａおよび１００Ｂは、ネットワークカメラ（以下、単に「カメラ」という。）である。本実施形態において、カメラ１００Ａは全方位カメラ、カメラ１００ＢはＰＴＺカメラである。カメラ１００Ａおよび１００Ｂは、壁面や天井等に設置することができ、１枚以上の画像を含む映像を撮像するカメラとすることができる。なお、カメラ１００Ａおよび１００Ｂは、ＰｏＥ（ＰｏｗｅｒＯｖｅｒＥｔｈｅｒｎｅｔ（登録商標））に対応していてもよいし、ＬＡＮケーブル等を介して電力が供給される構成でもよい。また、図１では、２台のカメラ１００Ａ、１００Ｂがネットワーク３００に接続されているが、ネットワーク３００には２台以上のカメラが接続されていればよく、接続台数は図１に示す数に限定されない。
本実施形態では、カメラ１００Ａが、上記の床面設定処理および撮像方向指示処理を実行する情報処理装置として動作する場合について説明する。なお、本実施形態では、カメラ１００Ａが情報処理装置として動作する場合について説明するが、クライアント装置２００や撮像装置１００Ｂが情報処理装置として動作してもよい。 The imaging devices 100A and 100B are network cameras (hereinafter simply referred to as "cameras"). In this embodiment, camera 100A is an omnidirectional camera, and camera 100B is a PTZ camera. The cameras 100A and 100B can be installed on walls, ceilings, or the like, and can be cameras that capture video including one or more images. The cameras 100A and 100B may be compatible with PoE (Power Over Ethernet (registered trademark)), or may be configured to supply power via a LAN cable or the like. Also, in FIG. 1, two cameras 100A and 100B are connected to the network 300, but two or more cameras may be connected to the network 300, and the number of connected cameras is limited to the number shown in FIG. not.
In this embodiment, a case will be described in which the camera 100A operates as an information processing device that executes the floor surface setting process and the imaging direction instruction process described above. In this embodiment, a case where the camera 100A operates as an information processing device will be described, but the client device 200 and the imaging device 100B may operate as the information processing device.

図２は、ＰＴＺカメラであるカメラ１００Ｂのハードウェア構成例である。
カメラ１００Ｂは、撮像光学部１０１と、撮像素子部１０２と、ＣＰＵ１０３と、ＲＯＭ１０４と、ＲＡＭ１０５と、撮像系制御部１０６と、ＰＴ制御部１０７と、画像処理部１０８と、エンコーダ部１０９と、通信部１１０と、を備える。
撮像光学部１０１は、対物レンズ、ズームレンズ、フォーカスレンズ、光学絞り等を備え、被写体の光情報を撮像素子部１０２へ集光する。
撮像素子部１０２は、撮像光学部１０１にて集光された光情報を電気信号へと変換するＣＣＤまたはＣＭＯＳセンサといった撮像素子を備え、カラーフィルタなどと組み合わせることで色情報を取得する。また、撮像素子部１０２は、すべての画素に対して、任意の露光時間とゲインの調整が設定可能な撮像センサを用いた構成とすることができる。 FIG. 2 is a hardware configuration example of the camera 100B, which is a PTZ camera.
The camera 100B includes an imaging optical unit 101, an imaging device unit 102, a CPU 103, a ROM 104, a RAM 105, an imaging system control unit 106, a PT control unit 107, an image processing unit 108, an encoder unit 109, and communication a unit 110;
The imaging optical unit 101 includes an objective lens, a zoom lens, a focus lens, an optical diaphragm, and the like, and collects optical information of a subject onto the imaging element unit 102 .
The imaging element unit 102 includes an imaging element such as a CCD or CMOS sensor that converts light information collected by the imaging optical unit 101 into an electric signal, and acquires color information by combining with a color filter or the like. Further, the image sensor unit 102 can be configured using an image sensor that can set arbitrary exposure time and gain adjustment for all pixels.

ＣＰＵ１０３は、カメラ１００Ｂにおける動作を統括的に制御する。ＲＯＭ１０４は、ＣＰＵ１０３が処理を実行するために必要な制御プログラム等を記憶する不揮発性メモリである。なお、当該プログラムは、不図示の外部メモリや着脱可能な記憶媒体に記憶されていてもよい。ＲＡＭ１０５は、ＣＰＵ１０３の主メモリ、ワークエリア等として機能する。ＲＯＭ１０４やＲＡＭ１０５は、画質調整のパラメータやネットワークの設定といった設定値を格納することができ、再起動した場合でも以前設定した値を用いて起動することが可能である。
ＣＰＵ１０３は、処理の実行に際してＲＯＭ１０４から必要なプログラム等をＲＡＭ１０５にロードし、当該プログラム等を実行することで各種の機能動作を実現する。例えばＣＰＵ１０３は、露出を制御するＡＥ（Automatic Exposure）制御や、フォーカスを制御するＡＦ（Autofocus）制御等を撮像系制御部１０６に指示することができる。 The CPU 103 comprehensively controls operations in the camera 100B. The ROM 104 is a non-volatile memory that stores control programs and the like necessary for the CPU 103 to execute processing. Note that the program may be stored in an external memory (not shown) or a removable storage medium. A RAM 105 functions as a main memory, a work area, and the like for the CPU 103 . The ROM 104 and RAM 105 can store set values such as image quality adjustment parameters and network settings, and can be started using previously set values even when restarted.
The CPU 103 loads a necessary program or the like from the ROM 104 to the RAM 105 when executing processing, and executes the program or the like to realize various functional operations. For example, the CPU 103 can instruct the imaging system control unit 106 to perform AE (Automatic Exposure) control for controlling exposure, AF (Autofocus) control for controlling focus, and the like.

撮像系制御部１０６は、ＣＰＵ１０３からの指示に従って、撮像光学部１０１に対して、フォーカスレンズを駆動しフォーカスを合わせる制御、絞りを調整する露出制御、ズームレンズを駆動してズーム倍率を変更する制御を行う。
ＰＴ制御部１０７は、不図示のカメラ姿勢駆動部を制御することでカメラ１００Ｂのパン、チルトを制御する。ＰＴ制御部１０７は、クライアント装置２００やカメラ１００Ａ等の他のカメラから送信されるカメラ操作コマンドに従って、カメラ１００Ｂのパン、チルトを制御することもできる。 In accordance with instructions from the CPU 103, the imaging system control unit 106 controls the imaging optical unit 101 to focus by driving the focus lens, control exposure to adjust the aperture, and control to change the zoom magnification by driving the zoom lens. I do.
The PT control unit 107 controls panning and tilting of the camera 100B by controlling a camera attitude driving unit (not shown). The PT control unit 107 can also control panning and tilting of the camera 100B according to camera operation commands transmitted from the client device 200 and other cameras such as the camera 100A.

画像処理部１０８は、撮像素子部１０２から出力される画像信号を入力し、入力した画像信号に対して画像処理を行い、輝度信号と色差信号を生成する。
エンコーダ部１０９は、画像処理部１０８において画像処理された画像データをＪｐｅｇやＨ．２６４などの所定のフォーマットに変換する処理を行う。
通信部１１０は、カメラ１００Ａ等の他のカメラやクライアント装置２００との通信の処理を行う。本実施形態では、通信部１１０は、エンコーダ部１０９により変換された画像データを、ネットワーク３００を経由してクライアント装置２００へ配信する。また、通信部１１０は、クライアント装置２００やカメラ１００Ａからの各種コマンドを受信し、受信したコマンドに対するレスポンスや画像データ以外の必要なデータの送信を行う。 The image processing unit 108 receives an image signal output from the image sensor unit 102, performs image processing on the input image signal, and generates a luminance signal and a color difference signal.
The encoder unit 109 converts the image data image-processed by the image processing unit 108 into JPEG or H.264 format. 264 or other predetermined format.
The communication unit 110 processes communication with other cameras such as the camera 100</b>A and the client device 200 . In this embodiment, the communication unit 110 distributes image data converted by the encoder unit 109 to the client device 200 via the network 300 . The communication unit 110 also receives various commands from the client device 200 and the camera 100A, and transmits responses to the received commands and necessary data other than image data.

なお、図１に示すカメラ１００Ｂの各要素の一部の機能は、ＣＰＵ１０３がプログラムを実行することで実現することもできる。
また、全方位カメラであるカメラ１００Ａは、フォーカスレンズを省略したパンフォーカスカメラである。そのため、カメラ１００Ａのハードウェア構成は、上述したズームレンズやＰＴ制御部１０７を省略した構成となっていることを除いては、図２に示すカメラ１００Ｂのハードウェア構成を同様である。 A part of the functions of each element of the camera 100B shown in FIG. 1 can also be realized by the CPU 103 executing a program.
Also, the camera 100A, which is an omnidirectional camera, is a pan-focus camera without a focus lens. Therefore, the hardware configuration of the camera 100A is the same as that of the camera 100B shown in FIG. 2, except that the zoom lens and the PT control unit 107 are omitted.

クライアント装置２００は、例えば、パーソナルコンピュータ（ＰＣ）やモバイル端末などの汎用コンピュータと、画像を表示可能なモニタと、を備える表示装置とすることができる。
図３は、クライアント装置２００のハードウェア構成例である。
クライアント装置２００は、ＣＰＵ２０１と、ＲＯＭ２０２と、ＲＡＭ２０３と、ＨＤＤ２０４と、操作入力部２０５と、通信部２０６と、表示部２０７と、を備える。
ＣＰＵ２０１は、クライアント装置２００における動作を統括的に制御する。ＲＯＭ２０２は、ＣＰＵ２０１が処理を実行するために必要な制御プログラム等を記憶する不揮発性メモリである。なお、当該プログラムは、不図示の外部メモリや着脱可能な記憶媒体に記憶されていてもよい。ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際してＲＯＭ２０２から必要なプログラム等をＲＡＭ２０３にロードし、当該プログラム等を実行することで各種の機能動作を実現する。 The client device 200 can be, for example, a display device that includes a general-purpose computer such as a personal computer (PC) or mobile terminal, and a monitor capable of displaying images.
FIG. 3 is a hardware configuration example of the client device 200. As shown in FIG.
The client device 200 includes a CPU 201 , a ROM 202 , a RAM 203 , an HDD 204 , an operation input section 205 , a communication section 206 and a display section 207 .
The CPU 201 comprehensively controls operations in the client device 200 . A ROM 202 is a non-volatile memory that stores control programs and the like necessary for the CPU 201 to execute processing. Note that the program may be stored in an external memory (not shown) or a removable storage medium. A RAM 203 functions as a main memory, a work area, and the like for the CPU 201 . The CPU 201 loads necessary programs and the like from the ROM 202 to the RAM 203 when executing processing, and executes various functional operations by executing the programs and the like.

ＨＤＤ２０４は、ＣＰＵ２０１がプログラムを用いた処理を行う際に必要な各種データや各種情報等を記憶することができる。また、ＨＤＤ２０４は、ＣＰＵ２０１がプログラム等を用いた処理を行うことにより得られた各種データや各種情報等を記憶することができる。
操作入力部２０５は、電源ボタンや、キーボードやマウス等の操作デバイスを備え、ユーザからの指示入力を受け付けることができる。通信部２０６は、カメラ１００Ａおよび１００Ｂから送信される画像をネットワーク３００経由で受信する。また、通信部２０６は、各種コマンドをカメラ１００Ａおよび１００Ｂへ送信し、そのレスポンスや画像データ以外の必要なデータの受信を行う。表示部２０７は、液晶ディスプレイ（ＬＣＤ）等のモニタを備え、カメラ１００Ａおよび１００Ｂの各種制御パラメータ入力のためのグラフィカルユーザインタフェース（ＧＵＩ）を表示することができる。
なお、図１に示すクライアント装置２００の各要素の一部の機能は、ＣＰＵ２０１がプログラムを実行することで実現することもできる。 The HDD 204 can store various data and various information necessary when the CPU 201 performs processing using a program. In addition, the HDD 204 can store various data, various information, and the like obtained by the CPU 201 performing processing using programs and the like.
The operation input unit 205 includes operation devices such as a power button, a keyboard, and a mouse, and can receive instruction input from the user. Communication unit 206 receives images transmitted from cameras 100A and 100B via network 300 . Further, the communication unit 206 transmits various commands to the cameras 100A and 100B, and receives necessary data other than responses and image data. A display unit 207 has a monitor such as a liquid crystal display (LCD), and can display a graphical user interface (GUI) for inputting various control parameters for the cameras 100A and 100B.
Some functions of the elements of the client device 200 shown in FIG. 1 can also be implemented by the CPU 201 executing a program.

以下、情報処理装置において実行される床面設定処理について詳細に説明する。なお、本実施形態では、カメラ１００Ａが床面設定処理を実行する場合について説明するが、クライアント装置２００やカメラ１００Ｂが、カメラ１００Ａから各種情報を取得して床面設定処理を実行してもよい。
本実施形態における床面設定処理は、マーカーを用いて撮像画像における床面領域を設定する処理である。なお、カメラ１００Ａとカメラ１００Ｂとの相対的な位置関係は、カメラ設置時のキャリブレーションにより計測されており、床面からのカメラの高さも設置環境の設計図面などから得られているものとする。 The floor setting process executed in the information processing apparatus will be described in detail below. In this embodiment, a case where the camera 100A executes the floor setting process will be described, but the client device 200 and the camera 100B may acquire various information from the camera 100A and execute the floor setting process. .
The floor setting process in this embodiment is a process of setting a floor area in a captured image using a marker. It is assumed that the relative positional relationship between the cameras 100A and 100B is measured by calibration when the cameras are installed, and the height of the cameras from the floor is obtained from design drawings of the installation environment. .

図４は、床面設定処理を行うカメラ設置環境の概略図である。
この図４に示すように、床用マーカー４０１と壁用マーカー４０２の２種類の異なるマーカーを用意する。そして、床用マーカー４０１は床面上に水平に、壁用マーカー４０２は壁面上に水平に設置する。それぞれのマーカーは、検出が容易な特徴点として既知のパターンが描かれており、形状、パターンデザイン、色などの違いによって、マーカーの種類（床用／壁用）を検出できるようになっている。
なお、各マーカー４０１および４０２には、２次元バーコード（例えば、ＱＲコード（登録商標））を用いて情報を埋め込むこともできる。この場合、２次元バーコードにマーカーの種類を示す種類情報を埋め込めば、２次元バーコードを解析することでマーカーの種類を判別することができる。さらに、各マーカー４０１および４０２には、既知の長さのパターンを用いることもできる。この場合、検出されたパターンの長さをもとに、カメラ１００Ａとマーカー４０１、４０２との距離を算出することができる。このように、各マーカー４０１および４０２には、マーカー情報として、マーカーの種類を示す情報や、マーカーの設置位置を示す情報を埋め込むことができる。 FIG. 4 is a schematic diagram of a camera installation environment for floor setting processing.
As shown in FIG. 4, two different types of markers, a floor marker 401 and a wall marker 402, are prepared. The floor marker 401 is placed horizontally on the floor surface, and the wall marker 402 is placed horizontally on the wall surface. Each marker has a known pattern drawn as feature points that are easy to detect, and the type of marker (for floors/for walls) can be detected by differences in shape, pattern design, color, etc. .
Information can also be embedded in each of the markers 401 and 402 using a two-dimensional barcode (eg, QR code (registered trademark)). In this case, if type information indicating the type of marker is embedded in the two-dimensional barcode, the type of marker can be determined by analyzing the two-dimensional barcode. Additionally, each marker 401 and 402 may use a pattern of known length. In this case, the distance between the camera 100A and the markers 401 and 402 can be calculated based on the detected pattern length. In this way, in each marker 401 and 402, information indicating the type of marker and information indicating the installation position of the marker can be embedded as marker information.

カメラ１００Ａは、床用マーカー４０１および壁用マーカー４０２が設置された環境を撮像し、撮像された撮像画像からマーカーを検出する。そして、カメラ１００Ａは、検出されたマーカーに埋め込まれたマーカー情報を抽出し、抽出されたマーカー情報に基づいて、撮像画像における床面領域を示す領域情報と壁面領域を示す領域情報とを取得する。ここで、上記領域情報は、床面領域や壁面領域といった平面領域の撮像画像における位置を示す情報や、当該平面領域の種類を示す情報、マーカーの設置位置（カメラ１００Ａからの距離等）を示す情報を含む。
つまり、床用マーカー４０１および壁用マーカー４０２は、撮像画像における床面領域および壁面領域を識別するための識別情報であるといえる。なお、本実施形態では、識別情報としてマーカーを用いる場合について説明するが、上記に限定されるものではない。識別情報は、撮像画像における平面領域を設定し、当該平面領域の領域情報を取得可能な情報であればよい。
そして、カメラ１００Ａは、撮像画像から平面領域の領域情報を取得すると、取得した領域情報をもとに、カメラ１００Ａの座標系における床面の位置、および壁面の位置を設定する。 Camera 100A captures an image of the environment in which floor marker 401 and wall marker 402 are installed, and detects the markers from the captured image. Then, the camera 100A extracts marker information embedded in the detected marker, and acquires area information indicating the floor surface area and area information indicating the wall surface area in the captured image based on the extracted marker information. . Here, the area information is information indicating the position of a planar area such as a floor area or a wall area in the captured image, information indicating the type of the planar area, and indicating the installation position of the marker (distance from the camera 100A, etc.). Contains information.
That is, it can be said that the floor marker 401 and the wall marker 402 are identification information for identifying the floor surface region and the wall surface region in the captured image. In this embodiment, a case where a marker is used as identification information will be described, but the present invention is not limited to the above. The identification information may be any information that can set a planar area in the captured image and acquire area information of the planar area.
After obtaining the area information of the plane area from the captured image, the camera 100A sets the position of the floor surface and the position of the wall surface in the coordinate system of the camera 100A based on the obtained area information.

図５は、カメラ１００Ａが実行する床面設定処理のフローチャートである。この図５の処理は、キャリブレーション後、撮像システム１０００の運用開始前に実行される。ただし、図５の処理の開始タイミングは、上記のタイミングに限らない。カメラ１００Ａは、カメラ１００Ａが備えるＣＰＵが必要なプログラムを読み出して実行することにより、図５に示す各処理を実現することができる。以降、アルファベットＳはフローチャートにおけるステップを意味するものとする。 FIG. 5 is a flowchart of floor setting processing executed by the camera 100A. The processing in FIG. 5 is executed after calibration and before the imaging system 1000 starts operating. However, the start timing of the processing in FIG. 5 is not limited to the above timing. The camera 100A can realize each process shown in FIG. 5 by reading and executing a necessary program by the CPU provided in the camera 100A. Hereafter, the letter S shall mean a step in the flow chart.

まずＳ１において、カメラ１００Ａは、図４に示すようなカメラ設置環境においてマーカーを撮像する。カメラ１００Ａは、画角の広い全方位カメラであるため、カメラ設置環境においてすべてのマーカーを画角に収めて一度に撮像することができる。このＳ１においてカメラ１００Ａが撮像した撮像画像の一例を図６（ａ）に示す。このように、カメラ１００Ａは、設置されたすべての床用マーカー４０１と壁用マーカー４０２とを撮像することができる。
なお、ＰＴＺカメラなどの画角が狭いカメラが床面設定処理を実行する場合は、必要に応じてすべてのマーカーを撮像できるように画角を変更してもよい。ここで、画角の変更は、ユーザが手動で行ってもよい。 First, in S1, the camera 100A captures an image of a marker in the camera installation environment as shown in FIG. Since the camera 100A is an omnidirectional camera with a wide angle of view, it is possible to capture all the markers at once within the angle of view in the camera installation environment. FIG. 6A shows an example of an image captured by the camera 100A in S1. In this way, the camera 100A can capture images of all installed floor markers 401 and wall markers 402 .
When a camera with a narrow angle of view such as a PTZ camera executes the floor surface setting process, the angle of view may be changed as necessary so that all markers can be captured. Here, the angle of view may be changed manually by the user.

Ｓ２では、カメラ１００Ａは、Ｓ１において撮像された撮像画像からマーカーを抽出する。このＳ２では、カメラ１００Ａは、図６（ａ）に示すような撮像画像から既知の床用マーカー４０１および壁用マーカー４０２を抽出する。例えばカメラ１００Ａは、予め学習したマーカーのとりうる複数の姿勢情報からパターンマッチングによりマーカーを検出することができる。ただし、マーカーの検出方法は上記に限定されるものではなく、任意の検出方法を用いることができる。
カメラ１００Ａは、撮像画像からマーカーを抽出すると、当該マーカーの撮像画像における位置情報を検出する。説明を簡潔にするため、マーカーを一つだけ記載した撮像画像を図６（ｂ）に示す。この図６（ｂ）に示すように、床用マーカー４０１の位置情報は、撮像画像の中心となるＯｃを原点とした座標（ｒ、Φ）として表すことができる。ここで、撮像画像の中心Ｏｃは、カメラ１００Ａのレンズの中心点と一致する。カメラ１００Ａは、マーカーの撮像画像における位置情報（ｒ、Φ）を検出し、これを保持する。 In S2, the camera 100A extracts markers from the captured image captured in S1. In this S2, the camera 100A extracts known floor markers 401 and wall markers 402 from the captured image as shown in FIG. 6(a). For example, the camera 100A can detect a marker by pattern matching from a plurality of orientation information that the marker can take, which has been learned in advance. However, the marker detection method is not limited to the above, and any detection method can be used.
After extracting the marker from the captured image, the camera 100A detects the position information of the marker in the captured image. To simplify the explanation, FIG. 6B shows a captured image with only one marker. As shown in FIG. 6B, the position information of the floor marker 401 can be expressed as coordinates (r, Φ) with Oc, which is the center of the captured image, as the origin. Here, the center Oc of the captured image coincides with the center point of the lens of the camera 100A. The camera 100A detects and holds position information (r, Φ) in the captured image of the marker.

次にＳ３において、カメラ１００Ａは、Ｓ２において抽出されたマーカーに埋め込まれたマーカー情報を取得する。そして、カメラ１００Ａは、マーカー情報に含まれるマーカーの種類情報をもとに、抽出されたマーカーが床用マーカー４０１であるか壁用マーカー４０２であるかを判定する。
次にＳ４では、カメラ１００Ａは、マップテーブルを更新する。マップテーブルは、カメラ１００Ａの座標系における床面の位置を示す情報を格納するテーブルである。 Next, in S3, the camera 100A acquires marker information embedded in the marker extracted in S2. Then, the camera 100A determines whether the extracted marker is the floor marker 401 or the wall marker 402 based on the marker type information included in the marker information.
Next, in S4, the camera 100A updates the map table. The map table is a table that stores information indicating the position of the floor surface in the coordinate system of the camera 100A.

まず、Ｓ３において床用マーカー４０１が検出された場合のマップテーブルの更新方法について説明する。
この場合、Ｓ４において、カメラ１００Ａは、検出された床用マーカー４０１の撮像画像における位置情報をもとに床面領域を設定し、マップテーブルを更新する。
カメラ１００Ａの座標系における床用マーカー４０１の位置は、図７（ａ）に示すように、上述した中心Ｏｃに対応する位置を原点とした仰角Θ₁と、床面上での角度Φ₁とにより定義することができる。仰角Θ₁は、Ｓ２において検出された像高ｒとレンズの投影方式とに基づいて計算することができる。本実施形態では、撮像光学部１０１は、レンズの投影方式として立体射影方式を用いている。したがって、仰角Θ₁は、像高ｒにより下記（１）式により求められる。
Θ₁＝２×ａｒｃｔａｎ（ｒ／２ｆ） ………（１）
ここで、ｆは撮像光学部１０１における焦点距離である。
このように、床用マーカー４０１の撮像画像における位置情報に基づいて、カメラ１００Ａの座標系における床用マーカー４０１の位置情報（Θ₁，Φ₁）を一意に求めることができる。 First, a method of updating the map table when the floor marker 401 is detected in S3 will be described.
In this case, in S4, the camera 100A sets the floor area based on the detected positional information of the floor marker 401 in the captured image, and updates the map table.
The position of the floor marker 401 in the coordinate system of the camera 100A _is , as shown in _FIG. can be defined by The elevation angle Θ ₁ can be calculated based on the image height r detected in S2 and the projection system of the lens. In this embodiment, the imaging optical unit 101 uses a stereoscopic projection method as a lens projection method. Therefore, the elevation angle Θ ₁ is obtained by the following formula (1) from the image height r.
Θ ₁ =2×arctan(r/2f) (1)
Here, f is the focal length in the imaging optical unit 101 .
In this way, the position information (Θ ₁ , Φ ₁ ) of floor marker 401 in the coordinate system of camera 100A can be uniquely obtained based on the position information of floor marker 401 in the captured image.

図８は、マップテーブルの一例である。この図８に示すように、マップテーブルは、角度パラメータΦと仰角パラメータΘとを関連付けたテーブルであり、仰角パラメータΘには、床面領域として設定された仰角範囲が格納される。つまり、マップテーブルは、カメラ１００Ａの周囲３６０°に対して、床面のある仰角範囲を記述したテーブルである。
マーカーとして床用マーカー４０１が検出され、検出された床用マーカー４０１のカメラ１００Ａの座標系における位置が（Θ₁，Φ₁）であった場合、角度Φ₁において、少なくとも０°からΘ₁°の仰角範囲は床面領域であると判断することができる。したがって、この場合、マップテーブルの角度パラメータΦ＝Φ₁に対応する仰角パラメータΘには、仰角範囲０～Θ₁が格納される。
なお、すでに対応する仰角パラメータΘが格納されている場合には、格納されている仰角パラメータΘの最大値と仰角Θ₁とを比較し、仰角Θ₁の方が大きい場合、マップテーブルの仰角パラメータΘの最大値を仰角Θ₁に更新する。 FIG. 8 is an example of a map table. As shown in FIG. 8, the map table is a table that associates the angle parameter Φ with the elevation parameter Θ, and the elevation parameter Θ stores the elevation angle range set as the floor area. That is, the map table is a table that describes the elevation angle range of the floor with respect to 360° around the camera 100A.
When the floor marker 401 is detected as a marker and the position of the detected floor marker 401 in the coordinate system of the camera 100A is (Θ ₁ , Φ ₁ ), the angle Φ ₁ is at least 0° to Θ ₁ ° can be determined to be the floor area. Therefore, in this case, the elevation angle range 0 to Θ ₁ is stored in the elevation angle parameter Θ corresponding to the angle parameter Φ=Φ ₁ in the map table.
If the corresponding elevation parameter Θ is already stored, the maximum value of the stored elevation parameter Θ is compared with the elevation angle _Θ1 , and if the elevation angle _Θ1 is larger, the map table elevation parameter Update the maximum value of Θ to the elevation angle Θ ₁ .

次に、Ｓ３において壁用マーカー４０２が検出された場合のマップテーブルの更新方法について説明する。
この場合、Ｓ４において、カメラ１００Ａは、検出された壁用マーカー４０２の撮像画像における位置情報をもとに床面領域を設定し、マップテーブルを更新する。まず、カメラ１００Ａは、カメラ１００Ａと壁用マーカー４０２との実際の距離Ｗを求める。前述のように、マーカーには、マーカー情報として、既知の長さのパターンを含めることができる。そのため、撮像光学部１０１の焦点距離ｆと上記パターンの撮像画像上の長さｄとを用いて、下記（２）式をもとに距離Ｗを算出することができる。ここで、ｐは既知のパターンの長さである。
Ｗ＝ｐ×ｆ／ｄ ………（２） Next, a method for updating the map table when the wall marker 402 is detected in S3 will be described.
In this case, in S4, the camera 100A sets the floor area based on the position information of the detected wall marker 402 in the captured image, and updates the map table. First, camera 100A determines the actual distance W between camera 100A and wall marker 402 . As noted above, markers can include patterns of known length as marker information. Therefore, using the focal length f of the imaging optical unit 101 and the length d of the pattern on the captured image, the distance W can be calculated based on the following equation (2). where p is the known pattern length.
W=p×f/d (2)

次に、カメラ１００Ａは、カメラ１００Ａと壁用マーカー４０２との水平方向における距離Ｄを求める。距離Ｄは、図７（ｂ）に示すように、カメラ１００Ａから下ろした垂線と床面との交点と、壁用マーカー４０２から下ろした垂線と床面との交点との距離である。
Ｄ＝Ｗ×ｓｉｎΘ₂ ………（３）
上記（３）式において、Θ₂は壁用マーカー４０２までの仰角であり、図７（ａ）に示す床用マーカー４０１までの仰角Θ₁と同様に、Ｓ２において検出された像高ｒとレンズの投影方式とに基づいて計算することができる。
最後に、カメラ１００Ａは、カメラ１００Ａと壁用マーカー４０２との水平方向における距離Ｄと、カメラ１００Ａの床面からの高さｈとに基づいて、壁と床との境界部までの仰角Θ₃を算出する。
Θ₃＝ａｒｃｔａｎ（Ｄ／ｈ） ………（４） Next, the camera 100A obtains the distance D in the horizontal direction between the camera 100A and the wall marker 402. FIG. The distance D, as shown in FIG. 7(b), is the distance between the intersection point of the perpendicular drawn from the camera 100A and the floor and the intersection of the perpendicular drawn from the wall marker 402 and the floor.
D=W×sin Θ ₂ (3)
In the above equation (3), Θ ₂ is the elevation angle up to the wall marker 402. Similar to the elevation angle Θ ₁ up to the floor marker 401 shown in FIG. can be calculated based on the projection method of
Finally, the camera 100A determines the elevation angle Θ ₃ Calculate
Θ ₃ = arctan (D/h) (4)

上記（４）式により算出された仰角Θ₃は、床面と壁面との境界位置を示す値であり、図８に示すマップテーブルにおける床面の領域を示す仰角パラメータΘの最大値に対応する。したがって、算出された仰角Θ₃を用いてマップテーブルの仰角パラメータΘを更新する。つまり、マーカーとして壁用マーカー４０２が検出され、検出された壁用マーカー４０２のカメラ１００Ａの座標系における位置が（Θ₂，Φ₂）であった場合、角度Φ₂において、仰角Θ₂°の位置は床面領域であると判断することができる。したがって、この場合、マップテーブルの角度パラメータΦ＝Φ₂に対応する仰角パラメータΘの最大値がΘ₂に更新される。
なお、本実施形態では、マップテーブルは、カメラ１００Ａの周囲３６０°に対して、床面のある仰角範囲を記述したテーブルである場合について説明するが、上記に限定されるものではない。例えばマップテーブルは、カメラ１００Ａの周囲３６０°に対して、壁面のある仰角範囲を記述したテーブルであってもよい。この場合、仰角パラメータΘの最小値が、床面と壁面との境界位置を示す値となる。また、マップテーブルは、複数であってもよい。例えばマップテーブルは、床用、壁用など、種類ごとに作成してもよいし、壁の右端部など、領域ごとに作成してもよい。 The elevation angle _Θ3 calculated by the above equation (4) is a value indicating the boundary position between the floor surface and the wall surface, and corresponds to the maximum value of the elevation angle parameter Θ indicating the area of the floor surface in the map table shown in FIG. . Therefore, the calculated elevation angle _Θ3 is used to update the elevation parameter Θ in the map table. That is, when wall marker 402 is detected as a marker, and the position of detected wall marker 402 in the coordinate system of camera 100A is (Θ ₂ , Φ ₂ ), elevation angle Θ ₂ at angle Φ ₂ is A location can be determined to be a floor area. Therefore, in this case, the maximum value of the elevation angle parameter Θ corresponding to the angle parameter Φ= _Φ2 in the map table is updated to _Θ2 .
In this embodiment, the map table is a table that describes an elevation angle range with a floor surface with respect to 360 degrees around the camera 100A, but it is not limited to the above. For example, the map table may be a table describing elevation angle ranges of wall surfaces with respect to 360° around the camera 100A. In this case, the minimum value of the elevation angle parameter Θ is the value indicating the boundary position between the floor surface and the wall surface. Also, there may be a plurality of map tables. For example, map tables may be created for each type, such as for floors and walls, or for each area, such as the right end of a wall.

図５に戻って、Ｓ５では、カメラ１００Ａは、床面設定処理を終了するか否かを判定する。カメラ１００Ａは、ユーザからの終了指示を受け取ることで床面設定処理を終了すると判定してもよいし、カメラ１００Ａの撮像可能範囲内に存在するマーカーすべてについて処理が終了している場合に、床面設定処理を終了すると判定してもよい。そして、カメラ１００Ａは、床面設定処理を継続すると判定した場合にはＳ１に戻り、床面設定処理を終了すると判定した場合には図５の床面設定処理を終了する。
なお、本実施形態では、カメラ１００Ａとカメラ１００Ｂとの相対的な位置関係は、設置時のキャリブレーションにより計測するとしたが、カメラ１００Ａとカメラ１００Ｂとの相対的な位置関係の計測方法は上記に限定されない。例えば、複数のマーカーをそれぞれカメラ１００Ａ、１００Ｂから予め定められた距離だけ離して設置し、当該マーカーをカメラ１００Ａ、１００Ｂによって撮像することによっても位置関係を計測可能である。 Returning to FIG. 5, in S5, the camera 100A determines whether or not to end the floor setting process. The camera 100A may determine to end the floor surface setting process by receiving an end instruction from the user. It may be determined to end the face setting process. If the camera 100A determines to continue the floor setting process, it returns to S1, and if it determines to end the floor setting process, it ends the floor setting process of FIG.
In this embodiment, the relative positional relationship between camera 100A and camera 100B is measured by calibration at the time of installation. Not limited. For example, the positional relationship can also be measured by placing a plurality of markers at predetermined distances from the cameras 100A and 100B, respectively, and taking images of the markers with the cameras 100A and 100B.

次に、情報処理装置において実行される撮像方向指示処理について説明する。なお、本実施形態では、カメラ１００Ａが撮像方向指示処理を実行する場合について説明するが、クライアント装置２００やカメラ１００Ｂが、カメラ１００Ａから各種情報を取得して撮像方向指示処理を実行してもよい。
本実施形態における撮像方向指示処理は、カメラ１００Ａにより撮像された撮像画像において指定された注目点の三次元位置を算出し、当該注目点をカメラ１００Ｂが注視点として撮像するように、カメラ１００Ｂの撮像方向を制御する指示を出す処理である。 Next, the imaging direction instruction processing executed in the information processing apparatus will be described. In this embodiment, a case where the camera 100A executes the imaging direction instruction process will be described, but the client device 200 and the camera 100B may acquire various information from the camera 100A and execute the imaging direction instruction process. .
The imaging direction instruction processing in the present embodiment calculates the three-dimensional position of the point of interest specified in the captured image captured by the camera 100A, and directs the camera 100B so that the point of interest is captured by the camera 100B as the point of interest. This is processing for issuing an instruction to control the imaging direction.

図９は、カメラ１００Ａが実行する撮像方向指示処理のフローチャートである。この図９の処理は、ユーザによりカメラ１００Ａの撮像画像から注目点が指定されたタイミングで実行される。ただし、図９の処理の開始タイミングは、上記のタイミングに限らない。カメラ１００Ａは、カメラ１００Ａが備えるＣＰＵが必要なプログラムを読み出して実行することにより、図９に示す各処理を実現することができる。
まずＳ１１において、カメラ１００Ａは、ユーザが指定したカメラ１００Ａの撮像画像における注目点の位置情報（注目位置情報）を取得する。カメラ１００Ａは、撮像画像をクライアント装置２００へ送信することで、当該撮像画像をクライアント装置２００の表示部２０７に表示させる表示制御を行い、表示部２０７に表示された撮像画像内における注目点の指定を受け付ける。カメラ１００Ａは、ユーザが、クライアント装置２００の表示部２０７に表示された撮像画像上において注目点を指定したとき、指定された注目点の撮像画像における位置情報を取得する。 FIG. 9 is a flowchart of imaging direction instruction processing executed by the camera 100A. The processing of FIG. 9 is executed at the timing when the user designates a point of interest from the captured image of the camera 100A. However, the start timing of the processing in FIG. 9 is not limited to the above timing. The camera 100A can realize each process shown in FIG. 9 by reading and executing a necessary program by the CPU provided in the camera 100A.
First, in S11, the camera 100A acquires position information (position-of-interest information) of the point of interest in the captured image of the camera 100A specified by the user. By transmitting the captured image to the client device 200, the camera 100A performs display control to display the captured image on the display unit 207 of the client device 200, and designates an attention point in the captured image displayed on the display unit 207. accept. When the user designates an attention point on the captured image displayed on the display unit 207 of the client device 200, the camera 100A acquires the position information of the designated attention point in the captured image.

図１０（ａ）および図１０（ｂ）は、クライアント装置２００の表示部２０７に表示されるカメラ１００Ａの撮像画像の一例を示す図である。
図１０（ａ）に示す撮像画像には、床の上に立つ人物５０１が存在する。ユーザが人物５０１の顔を注目点として指定した場合、カメラ１００Ａは、注目点の位置情報として座標（ｒ’，Φ’）を取得する。また、図１０（ｂ）に示すように、ユーザが壁に設置された窓５０２を注目点として指定した場合、カメラ１００Ａは、注目点の位置情報として座標（ｒ’’，Φ’’）を取得する。 10(a) and 10(b) are diagrams showing an example of an image captured by the camera 100A displayed on the display unit 207 of the client device 200. FIG.
In the captured image shown in FIG. 10A, there is a person 501 standing on the floor. When the user designates the face of the person 501 as the point of interest, the camera 100A acquires the coordinates (r', Φ') as the position information of the point of interest. Also, as shown in FIG. 10B, when the user designates a window 502 installed on a wall as a point of interest, the camera 100A sets the coordinates (r'', Φ'') as the positional information of the point of interest. get.

図９に戻って、Ｓ１２では、カメラ１００Ａは、注視点の属性を判定する。注視点の属性は、注目点に存在する注目物体が、床面上に存在する物体であるか、それ以外の物体であるかの２種類で定義する。本実施形態では、注視点の属性は、注目点に存在する注目物体が、床面上に存在する人物であるか、壁面上に存在する物体であるかの２種類で定義される。カメラ１００Ａは、注目位置情報と、床面設定処理において作成されたマップテーブルとを用いて、注視点の属性を判定する。以下、属性の判定方法について説明する。 Returning to FIG. 9, in S12, the camera 100A determines the attribute of the gaze point. The attribute of the point of interest is defined by two types, that is, whether the object of interest existing at the point of interest is an object existing on the floor or any other object. In this embodiment, the attribute of the point of interest is defined by two types, that is, whether the object of interest present at the point of interest is a person present on the floor surface or an object present on the wall surface. The camera 100A determines the attribute of the gaze point using the attention position information and the map table created in the floor surface setting process. The attribute determination method will be described below.

まずカメラ１００Ａは、注目位置情報に基づいて、カメラ１００Ａの座標系における注目点の座標（Φ’，Θ’）を算出する。次にカメラ１００Ａは、マップテーブルを参照し、算出された角度Φ’に最も近い角度パラメータΦを検索し、対応する仰角パラメータΘの最大値を取得する。
次に、カメラ１００Ａは、マップテーブルから得られた仰角パラメータΘの最大値に、予め定められた高さ分を加えた仰角Θ_MAXを算出する。
図１１は、仰角Θ_MAXの算出方法を説明する図である。注目点が人物の顔である場合、注視点の仰角の最大値は、壁際に立つ人物の顔を注目点として指定した場合の仰角と等しい。壁際に立つ身長ｈ₀の人物の顔の位置６００は、カメラ１００Ａから下ろした垂線と床面との交点から距離Ｄだけ離れた位置から、高さｈ₀だけ上方の位置となる。 First, the camera 100A calculates the coordinates (Φ', Θ') of the target point in the coordinate system of the camera 100A based on the target position information. Next, the camera 100A refers to the map table, searches for the angle parameter Φ that is closest to the calculated angle Φ', and obtains the maximum value of the corresponding elevation parameter Θ.
Next, the camera 100A calculates an elevation angle Θ _MAX by adding a predetermined height to the maximum value of the elevation angle parameter Θ obtained from the map table.
FIG. 11 is a diagram illustrating a method of calculating the elevation angle Θ _MAX . When the point of interest is a person's face, the maximum value of the elevation angle of the point of interest is equal to the elevation angle when the face of a person standing by a wall is designated as the point of interest. A face position 600 of a person of height h ₀ standing by a wall is a position above the height h ₀ from a position a distance D away from the intersection of the perpendicular drawn from the camera 100A and the floor surface.

この図１１に示す注視点の位置６００に対応する仰角Θ_MAXは、次式により算出することができる。
Θ_MAX＝ａｒｃｔａｎ（（ｈ×ｔａｎΘ₃）／（ｈ－ｈ₀）） ………（５）
ここで、Θ₃は、マップテーブルに格納されている仰角パラメータΘの最大値である。
したがって、注目点の座標値Θ’が、仰角Θ_MAXよりも小さい値であれば、注視点の属性は、床面上に存在する人物であると判定することができる。一方、注目点の座標値Θ’が、仰角Θ_MAXよりも大きい場合は、注視点の属性が、床面上に存在する人物以外の物体、つまり壁面上に存在する物体であると判定することができる。 The elevation angle Θ _MAX corresponding to the gaze point position 600 shown in FIG. 11 can be calculated by the following equation.
Θ _MAX = arctan ((h×tan Θ ₃ )/(h−h ₀ )) (5)
where _Θ3 is the maximum value of the elevation parameter Θ stored in the map table.
Therefore, if the coordinate value Θ' of the point of interest is smaller than the elevation angle Θ _MAX , it can be determined that the attribute of the point of interest is a person existing on the floor. On the other hand, if the coordinate value Θ' of the point of interest is greater than the elevation angle Θ _MAX , it is determined that the attribute of the point of interest is an object other than a person existing on the floor, that is, an object existing on the wall. can be done.

図９に戻って、Ｓ１３では、カメラ１００Ａは、注視点の三次元的な位置（注視点位置）を算出する。ユーザは、上述したように、カメラ１００Ａにより撮像された二次元の撮像画像上において、１点を注目点として指定する。この注目点の指定は、実空間上では注目点が存在しうる直線の指定にすぎない。Ｓ１３では、カメラ１００Ａは、Ｓ１２において判定された注視点の属性に応じて、当該注視点の三次元位置を算出する。
図１２（ａ）は、注視点の属性が床面上に存在する人物であると判定された場合の注視点位置の算出方法を説明する図である。カメラ１００Ａの撮像画像から注目点として指定された点は、カメラ１００Ａの視点以外から見ると、図１２（ａ）における直線６０１で示される。この直線６０１は、カメラ１００Ａの撮像条件と、カメラ１００Ａの座標系における注目点の座標（Φ’，Θ’）から得られる。ここで、上記撮像条件は、カメラ１００Ａの撮像方向および撮像画角を含む。 Returning to FIG. 9, in S13, the camera 100A calculates the three-dimensional position of the point of interest (position of the point of interest). As described above, the user designates one point as the target point on the two-dimensional captured image captured by the camera 100A. Designation of this point of interest is nothing more than designation of a straight line on which the point of interest can exist in the real space. In S13, the camera 100A calculates the three-dimensional position of the point of gaze according to the attribute of the point of gaze determined in S12.
FIG. 12(a) is a diagram for explaining a method of calculating the point-of-regard position when it is determined that the attribute of the point-of-regard is a person existing on the floor. A point designated as a target point from the captured image of the camera 100A is indicated by a straight line 601 in FIG. 12A when viewed from a viewpoint other than the camera 100A. This straight line 601 is obtained from the imaging conditions of the camera 100A and the coordinates (Φ', Θ') of the point of interest in the coordinate system of the camera 100A. Here, the imaging conditions include the imaging direction and the imaging angle of view of the camera 100A.

そして、注目する人物の推定身長が予めｈ₀と設定されている場合、直線６０１上の点のうち、床面６０２からｈ₀の高さにある点６０３が、注視点としてふさわしい点であると推定される。したがって、カメラ１００Ａは、この点６０３を注視点として三次元位置を算出する。
なお、人物の推定身長ｈ₀は、予め設定された固定値であってもよいし、マーカーに埋め込まれたマーカー情報から取得してもよいし、ユーザからの指定を受け付けて取得してもよい。また、注目点が人物の顔以外である場合、上記の高さ情報ｈ₀は、実空間上における人物の顔以外の注目点の床面からの高さを示す情報となる。 Then, when the estimated height of the person of interest is set in advance to h ₀ , among the points on the straight line 601, a point 603 at a height of h ₀ from the floor surface 602 is considered to be a suitable point of gaze. Presumed. Therefore, the camera 100A calculates the three-dimensional position with this point 603 as the gaze point.
Note that the estimated height h ₀ of the person may be a preset fixed value, may be obtained from marker information embedded in the marker, or may be obtained by receiving a designation from the user. . Also, when the point of interest is other than the person's face, the height information h ₀ is information indicating the height from the floor of the point of interest other than the person's face in the real space.

図１２（ｂ）は、注視点の属性が壁面上に存在する物体であると判定された場合の注視点位置の算出方法を説明する図である。カメラ１００Ａの撮像画像から注目点として指定された点は、カメラ１００Ａの視点以外から見ると、図１２（ｂ）における直線６１１で示される。この直線６１１は、図１２（ａ）の直線６０１と同様に、カメラ１００Ａの撮像条件と、カメラ１００Ａの座標系における注目点の座標（Φ’，Θ’）から得られる。
ここで、注目する物体は、床面上には存在せず、壁面上に存在する。そのため、直線６１１上の点のうち、カメラ１００Ａから床面６０２に下ろした垂線と床面６０２との交点から距離Ｄだけ離れた位置の上方にある点６１３が、注視点としてふさわしい点であると推定される。したがって、カメラ１００Ａは、この点６１３を注視点として三次元位置を算出する。この点６１３は、直線６１１と壁面との交点であり、直線６１１上の点のうち、壁面からの高さが０となる点である。 FIG. 12(b) is a diagram illustrating a method of calculating the position of the point of gaze when it is determined that the attribute of the point of gaze is an object existing on a wall surface. A point designated as a point of interest from the captured image of the camera 100A is indicated by a straight line 611 in FIG. This straight line 611 is obtained from the imaging conditions of the camera 100A and the coordinates (Φ', Θ') of the point of interest in the coordinate system of the camera 100A, like the straight line 601 in FIG. 12(a).
Here, the object of interest does not exist on the floor surface, but exists on the wall surface. Therefore, among the points on the straight line 611, a point 613 located above a position separated by a distance D from the intersection of the floor surface 602 and the perpendicular drawn from the camera 100A to the floor surface 602 is considered to be a suitable point of gaze. Presumed. Therefore, the camera 100A calculates the three-dimensional position with this point 613 as the gaze point. This point 613 is the intersection of the straight line 611 and the wall surface, and is the point on the straight line 611 whose height from the wall surface is zero.

ここで、カメラ１００Ａから床面６０２に下ろした垂線と床面６０２との交点から壁までの距離Ｄは、上記（３）式により算出することができる。また、床面６０２から注視点６１３までの高さｈ₁は、注目点の座標値Θ’を用いて次式により算出することができる。
ｈ₁＝ｈ－ｓｉｎΘ’ ………（６）
このように、カメラ１００Ａは、カメラ１００Ａの撮像条件と、撮像画像における注目点の位置情報に基づいて算出される注目点の座標（Φ’，Θ’）とに基づいて、実空間上で注目点が存在しうる直線を導出する。そして、カメラ１００Ａは、導出された直線上の点のうち、注目物体が存在する平面（床面または壁面）からの高さが、予め設定された高さとなる位置を注目点の三次元的な位置として導出する。 Here, the distance D from the intersection of the perpendicular drawn from the camera 100A to the floor surface 602 and the floor surface 602 to the wall can be calculated by the above equation (3). Also, the height h ₁ from the floor surface 602 to the gaze point 613 can be calculated by the following equation using the coordinate value Θ' of the gaze point.
h ₁ =h−sin Θ' (6)
In this way, the camera 100A captures an image of interest in the real space based on the imaging conditions of the camera 100A and the coordinates (Φ', Θ') of the attention point calculated based on the position information of the attention point in the captured image. Derive a line on which points can exist. Then, the camera 100A selects, among the derived points on the straight line, the position where the height from the plane (floor surface or wall surface) on which the object of interest exists is a preset height. Derived as a position.

図９に戻って、Ｓ１４では、カメラ１００Ａは、Ｓ１３において算出されたカメラ１００Ａの座標系における注視点位置を、カメラ１００Ｂの座標系へと座標変換する。
次に、Ｓ１５において、カメラ１００Ａは、Ｓ１４において座標変換された注視点位置をカメラ１００Ｂによって撮像するためのカメラ１００Ｂの撮像方向を決定する。そして、カメラ１００Ａは、カメラ１００Ｂの撮像方向を制御するコマンド（パラメータ）を、ネットワーク３００を介してカメラ１００Ｂへ送信し、カメラ１００Ｂに撮像を開始させる。
なお、Ｓ１５においては、カメラ１００Ａは、カメラ１００Ｂの撮像方向に加えて、カメラ１００Ｂの撮像画角を決定し、カメラ１００Ｂに指示してもよい。この場合、カメラ１００Ａは、注目物体がカメラ１００Ｂの撮像画角内に収まるようにズーム倍率を算出する。 Returning to FIG. 9, in S14, the camera 100A coordinate-transforms the point-of-regard position in the coordinate system of the camera 100A calculated in S13 into the coordinate system of the camera 100B.
Next, in S15, the camera 100A determines the imaging direction of the camera 100B for imaging the point-of-regard position coordinate-transformed in S14. Then, camera 100A transmits a command (parameter) for controlling the imaging direction of camera 100B to camera 100B via network 300, and causes camera 100B to start imaging.
In addition, in S15, camera 100A may determine the imaging angle of view of camera 100B in addition to the imaging direction of camera 100B and instruct camera 100B. In this case, the camera 100A calculates the zoom magnification so that the object of interest is within the imaging angle of view of the camera 100B.

以上説明したように、本実施形態における情報処理装置は、カメラ１００Ａにより撮像された撮像画像における平面領域を示す領域情報と、撮像画像における注目点の位置を示す位置情報とを取得する。本実施形態において、上記平面領域は、床面領域と壁面領域とを含む。そして、情報処理装置は、撮像画像における平面領域の領域情報と、撮像画像における注目点の位置情報と、予め設定された高さ情報とに基づいて、注目点の三次元的な位置を導出する。また、情報処理装置は、上記の注目点を注視点として撮像するための、カメラ１００Ａとは異なるカメラ１００Ｂの撮像方向を決定し、カメラ１００Ｂに当該撮像方向を指示する。なお、情報処理装置は、さらにカメラ１００Ｂの撮像画角を決定し、カメラ１００Ｂに撮像画角を指示してもよい。 As described above, the information processing apparatus according to the present embodiment acquires area information indicating a plane area in the captured image captured by the camera 100A and position information indicating the position of the point of interest in the captured image. In this embodiment, the plane area includes a floor area and a wall area. Then, the information processing device derives the three-dimensional position of the point of interest based on the area information of the plane area in the captured image, the position information of the point of interest in the captured image, and the preset height information. . Further, the information processing apparatus determines an imaging direction of the camera 100B different from the camera 100A, and instructs the camera 100B of the imaging direction for imaging the attention point as the gaze point. The information processing device may further determine the imaging angle of view of the camera 100B and instruct the imaging angle of view to the camera 100B.

ここで、注目点は、例えば人物の顔とすることができる。この場合、ユーザは、カメラ１００Ａの撮像画像中の人物の顔を注視点として指定し、情報処理装置は、指定された人物の顔を注視点として設定し、カメラ１００Ｂへ注視点の三次元的な位置情報を送信することが可能となる。これにより、カメラ１００Ｂは、撮像方向を制御して上記注視点を適切に撮像することが可能となる。したがって、カメラのキャリブレーション時に存在しない人物であっても、カメラ１００Ａとカメラ１００Ｂとで連携して適切に撮像することができる。 Here, the attention point can be, for example, a person's face. In this case, the user designates a person's face in the captured image of the camera 100A as the gaze point, and the information processing device sets the designated person's face as the gaze point, and the camera 100B moves the gaze point three-dimensionally. location information can be sent. As a result, the camera 100B can control the imaging direction and appropriately capture an image of the gaze point. Therefore, even a person who does not exist at the time of camera calibration can be appropriately imaged by cooperation between the cameras 100A and 100B.

情報処理装置は、注視点（注目点）の三次元位置の導出に際し、カメラ１００Ａの撮像条件（撮像方向や撮像画角）と、撮像画像における注目点の位置情報とに基づいて、実空間上で注目点が存在しうる直線を導出することができる。そして、情報処理装置は、導出された直線上の点のうち、上記平面領域に対応する実空間上の平面からの高さが、予め設定された高さとなる位置を注目点の三次元的な位置として導出することができる。
具体的には、情報処理装置は、注目点が床面上に存在する人物の顔である場合、導出された直線上の点のうち、床面からの高さが、人物の推定身長に相当する高さとなる位置を注目点の三次元的な位置として導出することができる。また、情報処理装置は、注目点が壁面上に存在する物体である場合、導出された直線上の点のうち、壁面からの高さが０となる位置を注目点の三次元的な位置として導出することができる。 When deriving the three-dimensional position of the point of interest (point of interest), the information processing device uses the imaging conditions (imaging direction and angle of view) of the camera 100A and the position information of the point of interest in the captured image to determine the position of the point of interest in the real space. can derive a straight line on which the point of interest can exist. Then, the information processing device selects a position where the height from the plane in the real space corresponding to the plane region is a preset height among the derived points on the straight line, and the point of interest is three-dimensionally calculated. can be derived as a position.
Specifically, when the point of interest is the face of a person existing on the floor, the information processing apparatus determines that, among the points on the derived straight line, the height from the floor corresponds to the estimated height of the person. It is possible to derive the position that is the height of the point of interest as the three-dimensional position. Further, when the point of interest is an object existing on the wall surface, the information processing apparatus regards the point at which the height from the wall surface is 0 among the points on the derived straight line as the three-dimensional position of the point of interest. can be derived.

つまり、情報処理装置は、注目点に存在する注目物体が、床面上に存在する物体（例えば人物）であるか壁面上に存在する物体であるかを判定し、当該判定の結果に基づいて注目点の三次元的な位置を導出する。したがって、注目点の三次元位置を適切に導出することができる。
このように、情報処理装置は、１台のカメラ１００Ａの撮像画像において二次元的に設定された注目点の三次元的な位置を、床面と、床面からの注視点の高さ（例えば人物の推定身長）とを拘束条件とすることで一意に算出することが可能である。したがって、注目点の三次元位置を取得するために距離センサを設ける必要がない。そのため、その分のコストを削減することができる。さらに、全方位カメラのように、フォーカス位置を取得してカメラから注視点までの距離を取得する方法を適用できない場合であっても、適切に注目点の三次元位置を算出することができる。 That is, the information processing device determines whether the target object existing at the target point is an object (for example, a person) existing on the floor surface or an object existing on the wall surface, and based on the result of the determination, A three-dimensional position of the point of interest is derived. Therefore, the three-dimensional position of the point of interest can be appropriately derived.
In this way, the information processing apparatus can set the three-dimensional position of the point of interest set two-dimensionally in the captured image of one camera 100A to the floor and the height of the point of interest from the floor (for example, It is possible to uniquely calculate by using the estimated height of the person) as a constraint condition. Therefore, there is no need to provide a distance sensor to obtain the three-dimensional position of the point of interest. Therefore, the cost can be reduced accordingly. Furthermore, even if a method of acquiring a focus position and acquiring a distance from the camera to the gaze point cannot be applied like an omnidirectional camera, it is possible to appropriately calculate the three-dimensional position of the gaze point.

また、情報処理装置は、撮像画像から平面領域を識別するための識別情報としてマーカーを検出し、マーカーに埋め込まれたマーカー情報に基づいて、平面領域の領域情報を取得する。例えばマーカーに平面領域の種類を示す種類情報が埋め込まれている場合、情報処理装置は、マーカーに埋め込まれたマーカー情報を解析することで、検出されたマーカーが存在する位置が床面であるか壁面であるかを判定することができる。また、マーカーに当該マーカーの設置位置を示す情報が埋め込まれている場合、情報処理装置は、マーカーに埋め込まれたマーカー情報を解析することで、検出されたマーカーとカメラ１００Ａとの距離を把握することができる。これにより、情報処理装置は、撮像画像中の床面、壁面といった平面領域の位置、ひいてはカメラ１００Ａの座標系における床面、壁面といった平面の位置を適切に把握することができる。 Further, the information processing device detects a marker as identification information for identifying a plane area from the captured image, and acquires area information of the plane area based on the marker information embedded in the marker. For example, when type information indicating the type of planar area is embedded in the marker, the information processing device analyzes the marker information embedded in the marker to determine whether the position where the detected marker exists is the floor surface. Whether it is a wall surface can be determined. Further, when information indicating the installation position of the marker is embedded in the marker, the information processing device analyzes the marker information embedded in the marker to grasp the distance between the detected marker and camera 100A. be able to. Accordingly, the information processing apparatus can appropriately grasp the positions of planar regions such as the floor surface and the wall surface in the captured image, and the positions of the planar regions such as the floor surface and the wall surface in the coordinate system of the camera 100A.

また、情報処理装置は、カメラ１００Ａにより撮像された撮像画像を表示装置であるクライアント装置２００へ表示させる表示制御を行い、ユーザによる撮像画像における注目点の設定を受け付け、撮像画像における注目点の位置情報を取得する。したがって、情報処理装置は、ユーザが所望する注視点の三次元位置を適切に算出し、カメラ１００Ｂに撮像を指示することができる。
本実施形態の撮像システムにおいて、カメラ１００Ａは全方位カメラ、カメラ１００ＢはＰＴＺカメラとすることができる。本撮像システムでは、全方位カメラにより広範囲を監視し、注視したい物体が存在する場合、ユーザが全方位画像から注視点を指定することで、その注視点をＰＴＺカメラに撮像させることができる。これにより、注視点の詳細画像（例えば拡大画像）を適切にユーザに提示することが可能となる。したがって、ユーザは、人物の顔や車のナンバープレートなどの詳細な情報を容易に確認することができる。 In addition, the information processing device performs display control to display the captured image captured by the camera 100A on the client device 200, which is a display device, receives the setting of the attention point in the captured image by the user, and determines the position of the attention point in the captured image. Get information. Therefore, the information processing apparatus can appropriately calculate the three-dimensional position of the gaze point desired by the user and instruct the camera 100B to take an image.
In the imaging system of this embodiment, the camera 100A can be an omnidirectional camera, and the camera 100B can be a PTZ camera. In this imaging system, a wide range is monitored by an omnidirectional camera, and if there is an object that the user wishes to gaze at, the user can specify a gaze point from the omnidirectional image, and the PTZ camera can take an image of the gaze point. This makes it possible to appropriately present a user with a detailed image (for example, an enlarged image) of the gaze point. Therefore, the user can easily confirm detailed information such as a person's face and a vehicle license plate.

次に、本発明における第二の実施形態について説明する。
上述した第一の実施形態では、注目点が人物の顔である場合について説明した。第二の実施形態では、注目点を人物の顔とするか、注目点を人物の中心として全身を注視するか、注目点を人物の足元とするか、といった視点モードを選択可能とし、ユーザが求める視点で注視点を設定する場合について説明する。
この第二の実施形態におけるシステム構成は、上述した第一の実施形態と同様である。また、本実施形態における情報処理装置が実行する床面設定処理についても、上述した第一の実施形態と同様である。
本実施形態では、情報処理装置が実行する撮像方向指示処理が、上述した第一の実施形態とは異なる。したがって、以下、第一の実施形態とは処理の異なる部分を中心に説明する。 Next, a second embodiment of the invention will be described.
In the first embodiment described above, the case where the point of interest is the face of a person has been described. In the second embodiment, it is possible to select a viewpoint mode such as whether the point of interest is the person's face, whether the point of interest is the center of the person and the whole body is being gazed at, or the point of interest is the feet of the person. A case will be described where the gaze point is set at the desired viewpoint.
The system configuration in this second embodiment is the same as in the first embodiment described above. Also, the floor surface setting process executed by the information processing apparatus in this embodiment is the same as in the first embodiment described above.
In this embodiment, the imaging direction instruction process executed by the information processing apparatus is different from that in the above-described first embodiment. Therefore, the following description will focus on the portions of processing that are different from those of the first embodiment.

図１３は、本実施形態における情報処理装置が実行する撮像方向指示処理を示すフローチャートである。なお、本実施形態では、カメラ１００Ａが撮像方向指示処理を実行する場合について説明するが、クライアント装置２００やカメラ１００Ｂが、カメラ１００Ａから各種情報を取得して撮像方向指示処理を実行してもよい。
この図１３の処理は、ユーザによりカメラ１００Ａの撮像画像から注目点が指定されたタイミングで実行される。ただし、図１３の処理の開始タイミングは、上記のタイミングに限らない。カメラ１００Ａは、カメラ１００Ａが備えるＣＰＵが必要なプログラムを読み出して実行することにより、図１３に示す各処理を実現することができる。 FIG. 13 is a flowchart showing imaging direction instruction processing executed by the information processing apparatus according to the present embodiment. In this embodiment, a case where the camera 100A executes the imaging direction instruction process will be described, but the client device 200 and the camera 100B may acquire various information from the camera 100A and execute the imaging direction instruction process. .
The process of FIG. 13 is executed at the timing when the user designates a point of interest from the captured image of the camera 100A. However, the start timing of the processing in FIG. 13 is not limited to the above timing. The camera 100A can realize each process shown in FIG. 13 by reading and executing a necessary program by the CPU provided in the camera 100A.

まずＳ２１において、カメラ１００Ａは、視点モードと推定身長とを示す情報を取得する。カメラ１００Ａは、ＧＵＩをクライアント装置２００の表示部２０７に表示させる表示制御を行い、表示部２０７に表示されたＧＵＩを介して視点モードおよび推定身長の指定を受け付ける。カメラ１００Ａは、ユーザが、クライアント装置２００の表示部２０７に表示されたＧＵＩ上において視点モードおよび推定身長を指定したとき、指定された視点モードを示す情報および推定身長を示す情報を取得する。
本実施形態において、ユーザは、「顔アップ」モード、「全身」モード、「足元」モードの３つの視点モードを選択できるものとする。「顔アップ」モードは、上述した第一の実施形態と同様に、床面上に存在する人を注目物体とする場合、人の顔を注視するモードである。「全身」モードは、人の顔だけでなく、足元まで含んだ全身を注視するモードである。「足元」モードは、人の足元を注視するモードであり、床面から高さ０の点を注目点とするモードである。
視点モードおよび推定身長を入力するためのＧＵＩについては後述する。 First, in S21, the camera 100A acquires information indicating the viewpoint mode and the estimated height. Camera 100</b>A performs display control to display a GUI on display unit 207 of client device 200 , and receives designation of a viewpoint mode and estimated height via the GUI displayed on display unit 207 . When the user specifies the viewpoint mode and the estimated height on the GUI displayed on the display unit 207 of the client device 200, the camera 100A acquires the information indicating the specified viewpoint mode and the information indicating the estimated height.
In this embodiment, it is assumed that the user can select from three viewpoint modes, ie, a "face close-up" mode, a "whole body" mode, and a "feet" mode. The "face close-up" mode is a mode for gazing at a person's face when the target object is a person existing on the floor, as in the first embodiment described above. The "whole body" mode is a mode in which not only the person's face but also the whole body including the feet is watched. The "feet" mode is a mode in which a person's feet are watched, and a point at a height of 0 from the floor is the point of interest.
A GUI for inputting the viewpoint mode and estimated height will be described later.

Ｓ２２およびＳ２３の処理は、上述した第一の実施形態（図９のＳ１１およびＳ１２）と同様であるので説明を省略する。
Ｓ２４では、Ｓ２１において決定された視点モードを考慮して、注視点の三次元位置（注視点位置）を算出する。
図１４は、Ｓ２３の注視属性判定において注視点の属性が床面上に存在する人物であると判定された場合の、それぞれの視点モードにおける注視点位置の算出方法を説明する図である。カメラ１００Ａの撮像画像から注目点として指定された点は、カメラ１００Ａの視点以外から見ると、図１４における直線６２１で示される。この直線６２１は、カメラ１００Ａの撮像条件と、カメラ１００Ａの座標系における注目点の座標（Φ’，Θ’）から得られる。
視点モードが「顔アップ」モードである場合、Ｓ２１において取得された推定身長ｈ₀を注目点の床面からの高さと定義し、直線６２１上の点のうち、床面６２２からｈ₀の高さにある点６２３が、注視点としてふさわしい点であると推定される。したがって、カメラ１００Ａは、この点６２３を注視点として三次元位置を算出する。 The processes of S22 and S23 are the same as those of the above-described first embodiment (S11 and S12 in FIG. 9), so description thereof will be omitted.
In S24, the three-dimensional position of the gaze point (gazing point position) is calculated in consideration of the viewpoint mode determined in S21.
FIG. 14 is a diagram for explaining how to calculate the position of the gaze point in each of the viewpoint modes when the attribute of the gaze point is determined to be a person existing on the floor in the gaze attribute determination in S23. A point designated as a point of interest from the captured image of the camera 100A is indicated by a straight line 621 in FIG. 14 when viewed from a viewpoint other than the camera 100A. This straight line 621 is obtained from the imaging conditions of the camera 100A and the coordinates (Φ', Θ') of the point of interest in the coordinate system of the camera 100A.
When the viewpoint mode is the “face up” mode, the estimated height h ₀ obtained in S21 is defined as the height from the floor of the point of interest, and the height of h ₀ from the floor 622 among the points on the straight line 621 is It is estimated that a point 623 at the point of interest is suitable as a gaze point. Therefore, the camera 100A calculates the three-dimensional position with this point 623 as the gaze point.

一方、視点モードが「全身」モードである場合には、注目点は人物の中心であるため、直線６２１上の点のうち、床面６２２から推定身長ｈ₀の半分の高さ（ｈ₀／２）にある点６２４が、注視点としてふさわしい点であると推定される。したがって、カメラ１００Ａは、この点６２４を注視点として三次元位置を算出する。
また、視点モードが「足元」モードである場合には、直線６２１上の点のうち、床面６２２から高さ０の点６２５が、注視点としてふさわしい点であると推定される。したがって、カメラ１００Ａは、この点６２５を注視点として三次元位置を算出する。
なお、Ｓ２３の注視属性判定において注視点の属性が床面上に存在する人物以外の物体であると判定された場合、Ｓ２４における注視点位置の算出方法は、上述した第一の実施形態と同様であるので説明を省略する。 On the other hand, when the viewpoint mode is the "whole body" mode, the point of interest is the center of _the person _. A point 624 in 2) is presumed to be a point suitable as a gaze point. Therefore, the camera 100A calculates the three-dimensional position with this point 624 as the gaze point.
Also, when the viewpoint mode is the "feet" mode, it is estimated that among the points on the straight line 621, the point 625 at a height of 0 from the floor surface 622 is suitable as the gaze point. Therefore, the camera 100A calculates the three-dimensional position with this point 625 as the gaze point.
Note that if it is determined in the gaze attribute determination in S23 that the attribute of the gaze point is an object other than a person existing on the floor surface, the method of calculating the gaze point position in S24 is the same as in the above-described first embodiment. Therefore, the explanation is omitted.

Ｓ２５の処理は、上述した第一の実施形態（図９のＳ１４）と同様であるので説明を省略する。
Ｓ２６では、カメラ１００Ａは、Ｓ２５において座標変換された注視点位置をカメラ１００Ｂによって撮像するためのカメラ１００Ｂの撮像方向を決定する。また、カメラ１００Ａは、Ｓ２５において座標変換された注視点位置をカメラ１００Ｂの撮像画角の中心に置き、撮像対象の領域が画角内に収まるようにズーム倍率を算出する。つまり、視点モードが「顔アップ」モードである場合には顔が大きく映るように、視点モードが「全身」モードである場合には全身が映るように、視点モードが「足元」モードである場合には足元が大きく映るように、撮像画角を決定する。そして、カメラ１００Ａは、カメラ１００Ｂの撮像方向および撮像画角を制御するコマンド（パラメータ）を、ネットワーク３００を介してカメラ１００Ｂへ送信し、カメラ１００Ｂに撮像を開始させる。 The processing of S25 is the same as that of the above-described first embodiment (S14 in FIG. 9), so the description is omitted.
In S26, camera 100A determines the imaging direction of camera 100B for imaging the point-of-regard position coordinate-transformed in S25. In addition, the camera 100A places the point-of-regard position coordinate-transformed in S25 at the center of the imaging angle of view of the camera 100B, and calculates the zoom magnification so that the area to be imaged falls within the angle of view. In other words, when the viewpoint mode is "face up" mode, the face is shown large, when the viewpoint mode is "whole body" mode, the whole body is shown, and when the viewpoint mode is "feet" mode The imaging angle of view is determined so that the feet appear large in the image. Then, camera 100A transmits commands (parameters) for controlling the imaging direction and imaging angle of view of camera 100B to camera 100B via network 300, and causes camera 100B to start imaging.

以下、図１５～図１８を用いて、視点モードと推定身長とを入力するためのＧＵＩの一例について説明する。
図１５～図１７に示すＧＵＩ７００は、クライアント装置２００の表示部２０７に表示され、操作入力部２０５によって操作される。ＧＵＩ７００は、カメラ１００Ａから配信される映像を表示する映像表示部７０１と、カメラ１００Ｂから配信される映像を表示する映像表示部７０２と、を備えることができる。また、ＧＵＩ７００は、視点モードの設定を変更可能なモード設定入力フォーム７０３と、推定身長を設定可能な推定身長入力フォーム７０４と、を備えることができる。
ここで、モード設定入力フォーム７０３は、例えばプルダウンにより構成され、ユーザは操作入力部２０５が備えるマウスなどの操作デバイスを用いて視点モードを選択することができる。また、推定身長入力フォーム７０４は、例えばプルダウンにより構成され、ユーザは操作デバイスを用いて推定身長を選択することができる。なお、視点モードおよび推定身長の指定方法は、上記に限定されない。例えば推定身長については、ユーザが推定身長の数値を直接入力する構成であってもよいし、図１８に示すようなポップアップ７１０から、ユーザが操作デバイスを用いて予め設定された範囲内の数値を選択する構成であってもよい。 An example of a GUI for inputting a viewpoint mode and an estimated height will be described below with reference to FIGS. 15 to 18. FIG.
A GUI 700 shown in FIGS. 15 to 17 is displayed on the display unit 207 of the client device 200 and operated by the operation input unit 205. FIG. The GUI 700 can include an image display unit 701 that displays images distributed from the camera 100A, and an image display unit 702 that displays images distributed from the camera 100B. The GUI 700 can also include a mode setting input form 703 for changing the setting of the viewpoint mode, and an estimated height input form 704 for setting the estimated height.
Here, the mode setting input form 703 is composed of, for example, a pull-down, and the user can select a viewpoint mode using an operation device such as a mouse provided in the operation input unit 205 . Also, the estimated height input form 704 is composed of, for example, a pull-down, and the user can select an estimated height using an operation device. Note that the method of specifying the viewpoint mode and the estimated height is not limited to the above. For example, the estimated height may be configured so that the user directly inputs the numerical value of the estimated height. Alternatively, the user can enter a numerical value within a preset range using an operation device from a pop-up 710 as shown in FIG. It may be a configuration of choice.

ユーザは、操作デバイスを用いて、表示画面上のポインタ７０５を、映像表示部７０１に表示された映像上の注視したい点に合わせ、マウスクリックなどによって注目点の指定を行う。ユーザによる注目点の指定をトリガーとして、カメラ１００Ａは、図１３に示す撮像方向指示処理を実行し、カメラ１００Ｂに対して撮像方向および撮像画角を指示する。すると、カメラ１００Ｂは、カメラ１００Ａからの指示に従って撮像方向および撮像画角を変更する。その結果、映像表示部７０２には、ユーザが指定した注目点を注視点としてカメラ１００Ｂが撮像した映像が表示される。 Using the operation device, the user aligns the pointer 705 on the display screen with a point of interest on the image displayed on the image display unit 701 and designates the point of interest by clicking the mouse or the like. Triggered by the user's specification of the point of interest, the camera 100A executes the imaging direction instruction process shown in FIG. 13 to instruct the imaging direction and imaging angle of view to the camera 100B. Then, the camera 100B changes the imaging direction and the imaging angle of view according to the instruction from the camera 100A. As a result, an image captured by the camera 100B is displayed on the image display unit 702 with the point of interest specified by the user as the point of interest.

ユーザが視点モードとして「顔アップ」モードを選択した場合、図１５に示すように、映像表示部７０２にはユーザが注視したい人物の顔付近の映像が表示される。また、ユーザが視点モードとして「全身」モードを選択した場合には、図１６に示すように、映像表示部７０２にはユーザが注視したい人物の全体が映った映像が表示される。さらに、ユーザが視点モードとして「足元」モードを選択した場合には、図１７に示すように、映像表示部７０２にはユーザが注視したい人物の足元の映像が表示される。
なお、「足元」モードでは、床面からの注視点の高さが０で固定であるため、推定身長入力フォーム７０４はユーザからの数値入力を受け付けない構成としてもよい。また、その場合、視点モードの切り替え前にユーザにより指定された推定身長を記憶しておき、視点モードの復帰時に記憶していた推定身長を自動的に選択した状態としてもよい。 When the user selects the "face close-up" mode as the viewpoint mode, as shown in FIG. Also, when the user selects the "whole body" mode as the viewpoint mode, as shown in FIG. Furthermore, when the user selects the "feet" mode as the viewpoint mode, as shown in FIG.
In the "feet" mode, since the height of the gaze point from the floor is fixed at 0, the estimated height input form 704 may be configured not to accept numerical input from the user. In this case, the estimated height specified by the user may be stored before switching the viewpoint mode, and the stored estimated height may be automatically selected when the viewpoint mode is restored.

以上説明したように、本実施形態における情報処理装置は、ユーザによる視点モードの設定を受け付け、視点モードに応じて注目点の三次元位置を算出する。具体的には、情報処理装置は、視点モードに応じて、注目点を人物の顔、人物の中心、人物の足元のいずれかに設定する。そして、情報処理装置は、注目点が人物の顔の場合、床面から推定身長ｈ₀の高さにある点を注視点として設定し、注目点が人物の中心である場合、床面から推定身長（ｈ₀／２）の高さにある点を注視点として設定する。また、情報処理装置は、注目点が人物の足元である場合、床面から高さ０となる点を注視点として設定する。
つまり、情報処理装置は、ユーザによる視点モードの設定を受け付けることで、床面から注視点までの高さを示す高さ情報の設定を受け付け、注目点の三次元位置を算出する。これにより、情報処理装置は、ユーザが所望する注視点の三次元位置を適切に算出し、連携カメラに対して撮像方向を指示することができる。また、情報処理装置は、視点モードに応じてカメラ１００Ｂの撮像画角を決定し指示することができるので、ユーザが注視したい物体が適切に撮像された映像をユーザに提示することができる。 As described above, the information processing apparatus according to the present embodiment receives viewpoint mode setting by the user, and calculates the three-dimensional position of the point of interest according to the viewpoint mode. Specifically, the information processing device sets the point of interest to one of the person's face, the person's center, and the person's feet according to the viewpoint mode. Then, when the point of interest is the face of a person, the information processing device sets a point at a height of the estimated height h ₀ from the floor as the point of interest, and when the point of interest is the center of the person, the point of interest is estimated from the floor. A point at the height of height (h ₀ /2) is set as the gaze point. Further, when the point of interest is the feet of a person, the information processing apparatus sets the point at which the height is 0 from the floor surface as the point of interest.
In other words, the information processing apparatus receives the setting of the viewpoint mode by the user, receives the setting of height information indicating the height from the floor surface to the point of interest, and calculates the three-dimensional position of the point of interest. As a result, the information processing apparatus can appropriately calculate the three-dimensional position of the gaze point desired by the user and instruct the cooperative camera of the imaging direction. In addition, the information processing apparatus can determine and instruct the imaging angle of view of the camera 100B according to the viewpoint mode, so it is possible to present to the user an image in which an object that the user wishes to gaze at is appropriately captured.

（変形例）
上記各実施形態では、床面の高さが一定である場合について説明したが、床面に階段などの段差がある場合にも、情報処理装置は床面設定処理および撮像方向指示処理を実施することができる。この場合、カメラ１００Ａの直下の床面とは高さの異なる床面に対して床面設定処理を実施することになる。したがって、カメラ１００Ａの直下の床面を基準平面とし、基準平面に対する高さを示す情報を、高さの補正情報として床用マーカー４０１のマーカー情報として埋め込み、床面の高さの違いをカメラ１００Ａに通知すればよい。
つまり、図５に示す床面設定処理では、カメラ１００Ａは、Ｓ３においてマーカー情報を抽出した際に、上記の高さの補正情報も取得する。そして、カメラ１００Ａは、Ｓ４においてマップテーブルを更新する際に、Ｓ３において取得した高さの補正情報をマップテーブルへ格納する。マップテーブルの一例を図１９に示す。 (Modification)
In each of the above-described embodiments, the case where the floor surface is at a constant height has been described. be able to. In this case, the floor surface setting process is performed on the floor surface that is different in height from the floor surface directly below the camera 100A. Therefore, the floor surface immediately below the camera 100A is set as a reference plane, and information indicating the height with respect to the reference plane is embedded as marker information of the floor marker 401 as height correction information, and the difference in floor surface height is calculated by the camera 100A. should be notified.
That is, in the floor surface setting process shown in FIG. 5, the camera 100A also acquires the height correction information when extracting the marker information in S3. Then, when updating the map table in S4, the camera 100A stores the height correction information acquired in S3 in the map table. An example of the map table is shown in FIG.

また、図９に示す撮像方向指示処理では、カメラ１００Ａは、Ｓ１２において、図１９に示すように高さの補正情報が格納されたマップテーブルを用いて注視点の属性を判定する。以下、注視点の属性の判定方法について説明する。
上述した第一の実施形態においては、予め設定された高さ情報（推定身長）ｈ₀を用いて、上記（５）式をもとに仰角の最大値Θ_MAXを算出したが、床面の高さの違いを考慮する場合、高さの補正値ｈ_cを加えて仰角の最大値Θ_MAX’を算出する。
Θ_MAX’＝ａｒｃｔａｎ（（ｈ×ｔａｎΘ₃）／（ｈ－（ｈ₀＋ｈ_c））） ………（７）
そして、注目点の座標値Θ’が、仰角Θ_MAX’よりも小さい値であれば、注視点の属性は、床面上に存在する人物であると判定することができる。一方、注目点の座標値Θ’が、仰角Θ_MAX’よりも大きい場合は、注視点の属性が、床面上に存在する人物以外の物体、つまり壁面上に存在する物体であると判定することができる。 Also, in the imaging direction instruction process shown in FIG. 9, the camera 100A determines the attribute of the gaze point using a map table storing height correction information as shown in FIG. 19 in S12. A method of determining the attribute of the gaze point will be described below.
In the first embodiment described above, the maximum elevation angle Θ _MAX is calculated based on the above equation (5) using preset height information (estimated height) h ₀ , but the floor surface When the height difference is considered, the maximum elevation angle Θ _MAX ' is calculated by adding the height correction value h _c .
Θ _MAX ′=arctan((h×tan Θ ₃ )/(h−(h ₀ +h _c ))) …………(7)
If the coordinate value Θ' of the point of interest is smaller than the elevation angle Θ _MAX ', it can be determined that the attribute of the point of interest is a person existing on the floor. On the other hand, if the coordinate value Θ' of the point of interest is greater than the elevation angle Θ _MAX ', it is determined that the attribute of the point of interest is an object other than a person existing on the floor, that is, an object existing on the wall. be able to.

さらに、カメラ１００Ａは、図９のＳ１３においては、高さの補正値ｈ_cを考慮して注視点位置を算出する。
図２０は、注視点の属性が床面上に存在する人物であると判定された場合の注視点位置の算出方法を説明する図である。カメラ１００Ａの撮像画像から注目点として指定された点は、カメラ１００Ａの視点以外から見ると、図２０における直線６３１で示される。ここで、注目する人物の推定身長が予めｈ₀と設定されている場合、直線６３１上の点のうち、床面６３２から（ｈ₀＋ｈ_c）の高さにある点６３３が、注視点としてふさわしい点であると推定される。したがって、カメラ１００Ａは、この点６３３を注視点として三次元位置を算出する。なお、注視点の属性が床面上に存在する人物以外の物体であると判定された場合の注視点位置の算出方法は、上述した第一の実施形態と同様である。
以上のように構成することで、床面の高さが異なる位置に存在する人物についても、適切に複数台のカメラで連携して撮像することが可能である。 Further, in S13 of FIG. 9, the camera 100A calculates the point-of-regard position in consideration of the height correction value h _c .
FIG. 20 is a diagram illustrating a method of calculating the position of the point of gaze when it is determined that the attribute of the point of gaze is that of a person existing on the floor. A point designated as a point of interest from the captured image of the camera 100A is indicated by a straight line 631 in FIG. 20 when viewed from a viewpoint other than the camera 100A. Here, when the estimated height of the person of interest is set to h ₀ in advance, a point 633 at a height of (h ₀ +h _c ) from the floor surface 632 among the points on the straight line 631 is the gaze point. It is presumed to be a suitable point. Therefore, the camera 100A calculates the three-dimensional position with this point 633 as the gaze point. Note that the method of calculating the point-of-regard position when it is determined that the attribute of the point-of-regard is an object other than a person existing on the floor is the same as in the above-described first embodiment.
With the configuration as described above, it is possible to appropriately capture images of persons existing at different positions on the floor surface in cooperation with a plurality of cameras.

なお、上記各実施形態においては、床用マーカー４０１と壁用マーカー４０２とを用いて床面領域の設定を行う場合について説明したが、ユーザがＧＵＩを介して床面領域を指定するようにしてもよい。その場合、図２１（ａ）および図２１（ｂ）に示すように、ユーザがマウスなどの操作デバイスを用いてポインタ７０５を操作し、表示部２０７に表示されたカメラ１００Ａの撮像画像上において床面領域を指定する。このとき、ユーザは、図２１（ａ）に示すように床面の範囲をフリーハンドで指定してもよいし、図２１（ｂ）に示すように矩形選択で指定してもよい。カメラ１００Ａは、撮像画像における床面領域の設定を受け付け、床面領域の領域情報を取得する。
このように、情報処理装置は、カメラ１００Ａの撮像画像を表示装置であるクライアント装置２００へ表示させる表示制御を行い、撮像画像における平面領域の設定を受け付け、当該平面領域の領域情報を取得することもできる。これにより、床面領域や壁面領域をより適切に設定することが可能となる。
また、撮像画像における平面領域の設定方法は、上述したマーカーを使用する方法や撮像画像を表示させてユーザがマウス等により指定する方法に限定されない。例えば、撮像画像から動体を検知し、その動線から床面領域を推定する方法を用いることもできる。 In each of the above embodiments, the floor area is set using the floor markers 401 and the wall markers 402. However, the floor area can be specified by the user via the GUI. good too. In this case, as shown in FIGS. 21(a) and 21(b), the user operates the pointer 705 using an operation device such as a mouse, and the floor is displayed on the captured image of the camera 100A displayed on the display unit 207. Specify face area. At this time, the user may specify the range of the floor surface by freehand as shown in FIG. 21(a), or may specify it by selecting a rectangle as shown in FIG. 21(b). The camera 100A receives the setting of the floor area in the captured image and acquires the area information of the floor area.
In this way, the information processing apparatus performs display control to display the image captured by the camera 100A on the client device 200, which is a display device, receives the setting of the planar area in the captured image, and acquires the area information of the planar area. can also This makes it possible to set the floor surface area and the wall surface area more appropriately.
Further, the method of setting the plane area in the captured image is not limited to the above-described method of using the marker or the method of displaying the captured image and specifying it with a mouse or the like by the user. For example, a method of detecting a moving object from a captured image and estimating the floor surface area from its flow line can also be used.

また、上記各実施形態においては、カメラ１００Ａは、クライアント装置２００に表示された撮像画像における注目点の設定をユーザから受け付け、撮像画像における注目点の位置を示す位置情報を取得する場合について説明した。しかしながら、カメラ１００Ａは、自動認識技術により撮像画像から注目点を設定し、注目点の位置情報を取得してもよい。例えば、カメラ１００Ａは、撮像画像から人物を検出する人物検出処理を行い、人物検出処理による検出結果に基づいて、撮像画像における注目点の位置を例えば人物の顔に設定し、注目点の位置情報を取得してもよい。
このように、情報処理装置は、人物検出処理の検出結果に基づいて撮像画像における注目点の位置を設定するので、ユーザがカメラ１００Ａの撮像画像を監視して注目点を指示する手間を省くことができる。 In each of the above-described embodiments, the camera 100A accepts from the user the setting of the point of interest in the captured image displayed on the client device 200, and acquires the position information indicating the position of the point of interest in the captured image. . However, the camera 100A may set a point of interest from the captured image by automatic recognition technology and acquire position information of the point of interest. For example, the camera 100A performs person detection processing for detecting a person from the captured image, sets the position of the attention point in the captured image to, for example, the face of the person based on the detection result of the person detection processing, and sets the position information of the attention point. may be obtained.
In this way, the information processing apparatus sets the position of the point of interest in the captured image based on the detection result of the person detection process, thereby saving the user the trouble of monitoring the image captured by the camera 100A and indicating the point of interest. can be done.

さらに、上記各実施形態においては、カメラ１００Ａは、撮像方向指示処理によりカメラ１００Ｂに撮像方向を指示するコマンドを送信する場合について説明した。しかしながら、カメラ１００Ａは、カメラ１００Ｂに対して、カメラ１００Ｂの座標系における注視点の三次元位置を送信するようにしてもよい。また、カメラ１００Ａは、カメラ１００Ｂに対して、カメラ１００Ａの座標系における三次元位置を送信し、カメラ１００Ｂがカメラ１００Ａから受信した三次元位置をカメラ１００Ｂの座標系に座標変換するようにしてもよい。このように、カメラ１００Ｂに対して注視点の三次元位置を送信した場合にも、カメラ１００Ｂは、撮像方向を適切に制御して、カメラ１００Ａの撮像画像から指定された注目点を注視点とした撮像を行うことができる。
また、上記各実施形態においては、複数の撮像装置を連携させて物体を撮像するカメラ連携システムについて説明したが、本発明はカメラ連携システム以外にも適用可能である。つまり、情報処理装置により導出された、カメラ１００Ａの撮像画像における注目点の三次元的な位置を、カメラ連携処理以外の処理に用いることもできる。 Furthermore, in each of the above-described embodiments, the case where the camera 100A transmits a command for instructing the imaging direction to the camera 100B by the imaging direction instruction processing has been described. However, the camera 100A may transmit the three-dimensional position of the gaze point in the coordinate system of the camera 100B to the camera 100B. Alternatively, camera 100A may transmit the three-dimensional position in the coordinate system of camera 100A to camera 100B, and camera 100B may coordinate-transform the three-dimensional position received from camera 100A into the coordinate system of camera 100B. good. In this way, even when the three-dimensional position of the point of interest is transmitted to camera 100B, camera 100B appropriately controls the imaging direction so that the point of interest specified from the captured image of camera 100A is the point of interest. It is possible to perform imaging with
Further, in each of the above-described embodiments, a camera link system that captures an object by linking a plurality of imaging devices has been described, but the present invention can be applied to systems other than camera link systems. That is, the three-dimensional position of the point of interest in the captured image of the camera 100A derived by the information processing device can also be used for processing other than the camera cooperation processing.

（その他の実施形態）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments)
The present invention supplies a program that implements one or more functions of the above-described embodiments to a system or device via a network or a storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by processing to It can also be implemented by a circuit (for example, ASIC) that implements one or more functions.

１００Ａ，１００Ｂ…撮像装置、２００…クライアント装置、３００…ネットワーク、４０１…床用マーカー、４０２…壁用マーカー、１０００…撮像システム DESCRIPTION OF SYMBOLS 100A, 100B... Imaging device, 200... Client apparatus, 300... Network, 401... Floor marker, 402... Wall marker, 1000... Imaging system

Claims

a first acquisition means for acquiring area information for specifying the position of the floor area and the position of the wall surface area in the captured image captured by the first imaging device;
a second acquisition means for acquiring position information indicating the position of the point of interest in the captured image;
Using the area information acquired by the first acquisition means and the position information acquired by the second acquisition means, whether or not the target object corresponding to the target point is an object existing on the floor surface and deriving means for deriving the three-dimensional position of the point of interest based on the result of determination whether the object exists on a wall surface .

determining an imaging direction of a second imaging device different from the first imaging device for imaging the point of interest derived by the deriving means as a gaze point; 2. The information processing apparatus according to claim 1, further comprising an instruction means for instructing.

3. The information processing apparatus according to claim 2, wherein said instruction means further determines an imaging angle of view of said second imaging device and instructs said second imaging device of said imaging angle of view.

The derivation means is
deriving a straight line on which the point of interest may exist in real space based on the imaging conditions of the first imaging device and the position information acquired by the second acquisition means;
When it is determined that the object of interest corresponding to the point of interest is an object existing on the floor surface, among the derived points on the straight line , the height from the floor surface in real space is set in advance. 4. The information processing apparatus according to any one of claims 1 to 3, wherein the height position based on the height information obtained is derived as the three-dimensional position of the attention point.

The first acquisition means is
5. The information processing apparatus according to any one of claims 1 to 4 , wherein identification information is extracted from the captured image to set the floor surface area and acquire the area information.

The first acquisition means is
2. Markers are detected from the captured image, the floor area is set based on the detection results, and the area information is obtained based on marker information embedded in the markers. 5. The information processing device according to any one of 4 to 4 .

7. The information processing according to claim 6 , wherein the marker information includes at least one of information indicating whether the marker is the floor surface area or the wall surface area, and information about an installation position of the marker. Device.

The first acquisition means is
5. The method according to any one of claims 1 to 4 , wherein the area information is acquired by displaying the captured image on a display device and receiving a setting of the floor surface area in the displayed captured image. The information processing device described.

Further comprising person detection means for detecting a person from the captured image,
The second acquisition means is
9. The information processing apparatus according to any one of claims 1 to 8 , wherein the position of the point of interest in the captured image is set based on a detection result by the person detection means, and the position information is acquired. .

The second acquisition means is
9. The position information according to any one of claims 1 to 8 , wherein the position information is obtained by displaying the captured image on a display device and receiving the setting of the attention point in the displayed captured image. information processing equipment.

5. The information processing apparatus according to claim 4 , wherein the height information is information indicating a height from the floor area in real space to the point of interest.

5. The information processing apparatus according to claim 4 , further comprising third obtaining means for receiving setting of said height information and obtaining said height information.

The information processing apparatus according to any one of claims 1 to 12 , wherein the point of interest is a person or a part of a person.

obtaining area information for specifying the position of the floor area and the position of the wall surface area in the captured image captured by the first imaging device;
obtaining position information indicating the position of the point of interest in the captured image;
Using the area information and the position information, it is determined whether the target object corresponding to the target point is an object existing in the floor area or an object existing in the wall surface area, and the determination is performed. and deriving the three-dimensional position of the point of interest based on the result of (1).

a first imaging device;
a second imaging device different from the first imaging device;
a first acquisition means for acquiring area information for specifying the position of the floor area and the position of the wall area in the captured image captured by the first imaging device;
a second acquisition means for acquiring position information indicating the position of the point of interest in the captured image;
Using the area information acquired by the first acquisition means and the position information acquired by the second acquisition means, it is determined whether the target object corresponding to the target point is an object existing in the floor area. derivation means for deriving the three-dimensional position of the point of interest based on the determination result ;
an instruction means for determining an imaging direction of the second imaging device for imaging the point of interest derived by the deriving means as a gaze point, and instructing the second imaging device of the imaging direction;
An imaging system comprising:

A program for causing a computer to function as each means of the information processing apparatus according to any one of claims 1 to 13 .