JP2018139058A

JP2018139058A - Information processor and information processing method and program

Info

Publication number: JP2018139058A
Application number: JP2017033639A
Authority: JP
Inventors: 岩本　和成; Kazunari Iwamoto; 和成岩本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-02-24
Filing date: 2017-02-24
Publication date: 2018-09-06

Abstract

PROBLEM TO BE SOLVED: To provide an information processor, an information processing method and a program making determination of mis-detection of an object more accurately.SOLUTION: An object is detected from a photographic image taken by photographic means. A feature quantity of the detected object and a direction of the object against the photographic means are obtained. A prediction value of a feature of the object is calculated based on those. In the next detection timing, the prediction value are compared with the feature quantity of a result of detection of the object by detection means, then whether the detection result is mis-detection is evaluated.SELECTED DRAWING: Figure 4

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

画像から人物や車等の特定のオブジェクト又は、その一部の領域を検出する技術が知られている。また、画像から特定のオブジェクトが映る領域を検出する際に、誤検出を低減するための技術が知られている。例えば、特許文献１には、オブジェクトが検出された領域が重複していた際、それらの重複状態から、それぞれの検出結果が誤検出か否かを判断する方法が開示されている。また特許文献２には、頭部が検出された領域から所定距離以内に、他の頭部が検出された領域が存在した場合、いずれかの検出結果を無効とする方法が開示されている。 A technique for detecting a specific object such as a person or a car or a partial area thereof from an image is known. In addition, a technique for reducing erroneous detection when detecting a region where a specific object appears from an image is known. For example, Patent Document 1 discloses a method for determining whether or not each detection result is a false detection from the overlapping state when regions where objects are detected overlap. Patent Document 2 discloses a method of invalidating one of the detection results when an area where another head is detected is present within a predetermined distance from the area where the head is detected.

特開２０１３−６１８０２号公報JP2013-61802A 特開２０１２−２１２９６８号公報JP 2012-221968 A

しかしながら、特許文献１に開示された技術では、オブジェクトが検出された二つ以上の領域が重複していなければ、オブジェクトが検出された領域を誤検出か否か判断できないという課題があった。したがって、領域が重複していない場合は、その領域にオブジェクトが含まれていない場合であっても誤検出と判断することができない。また、特許文献２に開示された技術では、注目している検出頭部領域から所定の範囲内の検出頭部領域は誤検出とみなすため、前述の所定の範囲内に存在する正しい検出頭部領域が誤検出と判断される可能性があるという課題があった。以上のように従来の技術では、オブジェクトの誤検出を正確に判断できない場合があった。
本発明は、オブジェクトの誤検出をより正確に判断できるようにすることを目的とする。 However, the technique disclosed in Patent Document 1 has a problem in that it is impossible to determine whether or not an area where an object is detected is erroneously detected unless two or more areas where the object is detected overlap. Therefore, if the areas do not overlap, it cannot be determined that the detection is false even if no object is included in the area. Further, in the technique disclosed in Patent Document 2, since a detected head region within a predetermined range from the detection head region of interest is regarded as a false detection, a correct detected head existing within the predetermined range described above There was a problem that the area may be determined to be erroneously detected. As described above, in the conventional technique, there is a case where it is not possible to accurately determine erroneous detection of an object.
An object of the present invention is to make it possible to more accurately determine erroneous detection of an object.

本発明の情報処理装置は、撮像手段により撮影された撮影画像からオブジェクトを検出する検出手段と、前記検出手段により検出されたオブジェクトの特徴量と、前記撮像手段に対する前記オブジェクトの方向と、に基づいて、前記検出手段による前記オブジェクトの検出の結果を評価する評価手段と、を有する。 The information processing apparatus according to the present invention is based on a detection unit that detects an object from a captured image captured by the imaging unit, a feature amount of the object detected by the detection unit, and a direction of the object with respect to the imaging unit. Evaluation means for evaluating the detection result of the object by the detection means.

本発明によれば、オブジェクトの誤検出をより正確に判断することができる。 According to the present invention, it is possible to more accurately determine erroneous detection of an object.

検出システムのシステム構成の一例を示す図である。It is a figure showing an example of a system configuration of a detection system. 撮像装置等のハードウェア構成の一例を示す図である。It is a figure which shows an example of hardware constitutions, such as an imaging device. 撮像装置等の機能構成の一例を示す図である。It is a figure which shows an example of functional structures, such as an imaging device. 検出処理の一例を示すフローチャートである。It is a flowchart which shows an example of a detection process. 撮影環境の一例を示す図である。It is a figure which shows an example of imaging | photography environment. 撮影された画像の一例を示す図である。It is a figure which shows an example of the image | photographed image. 座標系同士の関係の一例を説明する図である。It is a figure explaining an example of the relationship between coordinate systems. 検出の結果の一例を示す図である。It is a figure which shows an example of the result of a detection. 撮影環境の一例を示す図である。It is a figure which shows an example of imaging | photography environment. 撮影された画像の一例を示す図である。It is a figure which shows an example of the image | photographed image. オブジェクトと撮像装置との位置関係の一例を示す図である。It is a figure which shows an example of the positional relationship of an object and an imaging device.

以下に、本発明の好ましい実施の形態を、図面に基づいて詳細に説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.

＜実施形態１＞
本実施形態では、検出システムは、以下のような処理を行う。即ち、検出システムは、撮像装置から見た被写体であるオブジェクトの方向と、その方向に映るオブジェクトを検出した結果、検出された領域から得られる特徴量がどの程度の値であるかを予測する。その後、実際に検出した検出された領域の特徴量の値が、前述の予測値から逸脱しているか否かを判断し、判断した結果に基づいて検出された各領域が誤検出か否かを決定する。本実施形態では、特徴量として検出された領域のサイズを使用する。しかしながらこの限りでは無く、サイズ以外の特徴量を用いても良い。
図１は、本実施形態の検出システムのシステム構成の一例を示す図である。
検出システムは、撮像装置１１０、クライアント装置１２０を含む。撮像装置１１０、クライアント装置１２０は、ネットワーク１５０を介して、相互に通信可能に接続されている。クライアント装置１２０は、入力装置１３０と表示装置１４０とに接続されている。
撮像装置１１０は、撮像を行うネットワークカメラ等の撮像装置である。クライアント装置１２０は、撮像装置１１０の駆動、撮像画像の取得、取得した画像に対してのオブジェクトの検出、解析等を行うパーソナルコンピュータ、サーバ装置、タブレット装置等の情報処理装置である。入力装置１３０は、マウスやキーボード等から構成される入力装置である。表示装置１４０は、クライアント装置１２０が出力した画像を表示するモニタ等の表示装置である。本実施形態では、クライアント装置１２０と入力装置１３０と表示装置１４０とは、各々独立した装置とする。しかし、例えば、クライアント装置１２０と表示装置１４０とが、一体化されていてもよいし、入力装置１３０と表示装置１４０とが一体化されていてもよい。また、クライアント装置１２０と入力装置１３０と表示装置１４０とが、一体化されていてもよい。
ネットワーク１５０は、撮像装置１１０とクライアント装置１２０とを接続するネットワークである。ネットワーク１５０は、例えばＥｔｈｅｒｎｅｔ（登録商標）等の通信規格を満足する複数のルータ、スイッチ、ケーブル等から構成される。本実施形態では、ネットワーク１５０は、撮像装置１１０とクライアント装置１２０との間の通信を行うことができるものであればよく、その通信規格、規模、構成を問わない。例えば、ネットワーク１５０は、インターネットや有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、無線ＬＡＮ（ＷｉｒｅｌｅｓｓＬＡＮ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）等により構成されてもよい。 <Embodiment 1>
In the present embodiment, the detection system performs the following processing. That is, the detection system predicts the value of the feature amount obtained from the detected area as a result of detecting the direction of the object as the subject viewed from the imaging apparatus and the object reflected in the direction. Thereafter, it is determined whether or not the feature value of the detected area actually detected deviates from the predicted value, and whether or not each detected area is erroneously detected based on the determined result. decide. In the present embodiment, the size of the detected area is used as the feature amount. However, the present invention is not limited to this, and feature quantities other than the size may be used.
FIG. 1 is a diagram illustrating an example of a system configuration of a detection system according to the present embodiment.
The detection system includes an imaging device 110 and a client device 120. The imaging device 110 and the client device 120 are connected via a network 150 so that they can communicate with each other. The client device 120 is connected to the input device 130 and the display device 140.
The imaging device 110 is an imaging device such as a network camera that performs imaging. The client device 120 is an information processing device such as a personal computer, a server device, or a tablet device that performs driving of the imaging device 110, acquisition of a captured image, detection of an object with respect to the acquired image, analysis, and the like. The input device 130 is an input device including a mouse, a keyboard, and the like. The display device 140 is a display device such as a monitor that displays an image output from the client device 120. In the present embodiment, the client device 120, the input device 130, and the display device 140 are independent devices. However, for example, the client device 120 and the display device 140 may be integrated, or the input device 130 and the display device 140 may be integrated. In addition, the client device 120, the input device 130, and the display device 140 may be integrated.
The network 150 is a network that connects the imaging device 110 and the client device 120. The network 150 includes a plurality of routers, switches, cables, and the like that satisfy a communication standard such as Ethernet (registered trademark). In the present embodiment, the network 150 only needs to be able to perform communication between the imaging device 110 and the client device 120, and the communication standard, scale, and configuration are not limited. For example, the network 150 may be configured by the Internet, a wired LAN (Local Area Network), a wireless LAN (Wireless LAN), a WAN (Wide Area Network), or the like.

図２（ａ）は、撮像装置１１０のハードウェア構成の一例を示す図である。
撮像装置１１０は、ＣＰＵ２１１、主記憶装置２１２、補助記憶装置２１３、駆動部２１４、撮像部２１５、ネットワークＩ／Ｆ２１６を含む。各要素は、システムバス２１７を介して、相互に通信可能に接続されている。
ＣＰＵ２１１は、撮像装置１１０の動作を制御する中央演算装置である。主記憶装置２１２は、ＣＰＵ２１１のワークエリア、データの一時的な記憶場所として機能するＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ（ＲＡＭ）等の記憶装置である。補助記憶装置２１３は、各種プログラム、各種設定データ等を記憶するハードディスクドライブ（ＨＤＤ）、ＲｅａｄＯｎｌｙＭｅｍｏｒｙ（ＲＯＭ）、ソリッドステートドライブ（ＳＳＤ）等の記憶装置である。
駆動部２１４は、撮像装置１１０を駆動し、撮像装置１１０の姿勢等を変更させ、撮像部２１５の撮影方向及び画角を変更する駆動部である。撮像部２１５は、撮像素子と光学系とを有し、光学系の光軸と撮像素子との交点を撮像中心として被写体の像を撮像素子上に結像する撮像部である。撮像素子には、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌ−ＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）、ＣＣＤ（ＣｈａｒｇｅｄＣｏｕｐｌｅｄＤｅｖｉｃｅ）等がある。ネットワークＩ／Ｆ２１６は、クライアント装置１２０等の外部の装置とのネットワーク１５０を介した通信に利用されるインターフェースである。
ＣＰＵ２１１が、補助記憶装置２１３に記憶されたプログラムに基づき処理を実行することによって、図３で後述する撮像装置１１０の機能及び撮像装置１１０の処理が実現される。 FIG. 2A is a diagram illustrating an example of a hardware configuration of the imaging apparatus 110.
The imaging device 110 includes a CPU 211, a main storage device 212, an auxiliary storage device 213, a drive unit 214, an imaging unit 215, and a network I / F 216. Each element is connected to be communicable with each other via a system bus 217.
The CPU 211 is a central processing unit that controls the operation of the imaging device 110. The main storage device 212 is a storage device such as a random access memory (RAM) that functions as a work area for the CPU 211 and a temporary storage location for data. The auxiliary storage device 213 is a storage device such as a hard disk drive (HDD), a read only memory (ROM), or a solid state drive (SSD) that stores various programs, various setting data, and the like.
The drive unit 214 is a drive unit that drives the imaging device 110 to change the orientation and the like of the imaging device 110 and change the shooting direction and the angle of view of the imaging unit 215. The imaging unit 215 is an imaging unit that includes an imaging element and an optical system, and forms an image of a subject on the imaging element with an intersection of the optical axis of the optical system and the imaging element as an imaging center. Examples of the image pickup device include a complementary metal-oxide semiconductor (CMOS) and a charged coupled device (CCD). The network I / F 216 is an interface used for communication with an external device such as the client device 120 via the network 150.
The CPU 211 executes processing based on the program stored in the auxiliary storage device 213, thereby realizing the functions of the imaging device 110 and the processing of the imaging device 110, which will be described later with reference to FIG.

図２（ｂ）は、クライアント装置１２０のハードウェア構成の一例を示す図である。
クライアント装置１２０は、ＣＰＵ２２１、主記憶装置２２２、補助記憶装置２２３、入力Ｉ／Ｆ２２４、出力Ｉ／Ｆ２２５、ネットワークＩ／Ｆ２２６を含む。各要素は、システムバス２２７を介して、相互に通信可能に接続されている。
ＣＰＵ２２１は、クライアント装置１２０の動作を制御する中央演算装置である。主記憶装置２２２は、ＣＰＵ２２１のワークエリア、データの一時的な記憶場所として機能するＲＡＭ等の記憶装置である。補助記憶装置２２３は、各種プログラム、各種設定データ等を記憶するＨＤＤ、ＲＯＭ、ＳＳＤ等の記憶装置である。
入力Ｉ／Ｆ２２４は、入力装置１３０等からの入力を受付ける際に利用されるインターフェースである。出力Ｉ／Ｆ２２５は、表示装置１４０等への情報の出力に利用されるインターフェースである。ネットワークＩ／Ｆ２１６は、撮像装置１１０等の外部の装置とのネットワーク１５０を介した通信に利用されるインターフェースである。
ＣＰＵ２２１が、補助記憶装置２２３に記憶されたプログラムに基づき処理を実行することによって、図３で後述するクライアント装置１２０の機能及び図４で後述するフローチャートの処理等のクライアント装置１２０の処理が実現される。 FIG. 2B is a diagram illustrating an example of a hardware configuration of the client device 120.
The client device 120 includes a CPU 221, a main storage device 222, an auxiliary storage device 223, an input I / F 224, an output I / F 225, and a network I / F 226. Each element is connected to be communicable with each other via a system bus 227.
The CPU 221 is a central processing unit that controls the operation of the client device 120. The main storage device 222 is a storage device such as a RAM that functions as a work area for the CPU 221 and a temporary storage location for data. The auxiliary storage device 223 is a storage device such as an HDD, ROM, or SSD that stores various programs, various setting data, and the like.
The input I / F 224 is an interface used when receiving an input from the input device 130 or the like. The output I / F 225 is an interface used for outputting information to the display device 140 or the like. The network I / F 216 is an interface used for communication with an external device such as the imaging device 110 via the network 150.
The CPU 221 executes processing based on the program stored in the auxiliary storage device 223, thereby realizing the processing of the client device 120 such as the function of the client device 120 described later in FIG. 3 and the processing of the flowchart described later in FIG. The

図３（ａ）は、撮像装置１１０の機能構成の一例を示す図である。
撮像装置１１０は、撮像制御部３１１、信号処理部３１２、駆動制御部３１３、通信制御部３１４を含む。
撮像制御部３１１は、撮像部２１５を介して、周囲の環境を撮影する。信号処理部３１２は、撮像制御部３１１によって撮影された画像の処理を行う。信号処理部３１２は、例えば、撮像制御部３１１によって撮影された画像の符号化を行う。静止画の場合は、信号処理部３１２は、例えば、ＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）等の符号化方式を用いて、画像の符号化を行う。また、動画の場合は、信号処理部３１２は、Ｈ．２６４／ＭＰＥＧ−４ＡＶＣ（以下では、Ｈ．２６４）、ＨＥＶＣ（ＨｉｇｈＥｆｆｉｃｉｅｎｃｙＶｉｄｅｏＣｏｄｉｎｇ符号化方式）等の符号化方式を用いて、画像の符号化を行う。また、信号処理部３１２は、予め設定された複数の符号化方式の中から、例えば撮像装置１１０の操作部を介してユーザにより選択された符号化方式を用いて、画像の符号化を行うようにしてもよい。
駆動制御部３１３は、駆動部２１４を介して、撮像制御部３１１の撮影方向及び画角を変更させる制御を行う。しかし、駆動制御部３１３は、撮像制御部３１１による撮影方向と画角とのうちの何れか１つを変更することとしてもよい。また、撮像制御部３１１の撮影方向及び画角は、固定であってもよい。通信制御部３１４は、信号処理部３１２により処理された撮像制御部３１１により撮影された画像を、ネットワークＩ／Ｆ２１６を介して、クライアント装置１２０に送信する。また、通信制御部３１４は、ネットワークＩ／Ｆ２１６を介して、クライアント装置１２０から撮像装置１１０に対する制御命令を受信する。 FIG. 3A is a diagram illustrating an example of a functional configuration of the imaging apparatus 110.
The imaging device 110 includes an imaging control unit 311, a signal processing unit 312, a drive control unit 313, and a communication control unit 314.
The imaging control unit 311 images the surrounding environment via the imaging unit 215. The signal processing unit 312 performs processing of an image captured by the imaging control unit 311. For example, the signal processing unit 312 encodes an image captured by the imaging control unit 311. In the case of a still image, the signal processing unit 312 performs image encoding using an encoding method such as JPEG (Joint Photographic Experts Group), for example. In the case of a moving image, the signal processing unit 312 performs the H.264 operation. An image is encoded using an encoding method such as H.264 / MPEG-4 AVC (hereinafter referred to as H.264) or HEVC (High Efficiency Video Coding method). In addition, the signal processing unit 312 encodes an image using, for example, a coding method selected by the user via the operation unit of the imaging device 110 from a plurality of preset coding methods. It may be.
The drive control unit 313 performs control to change the shooting direction and the angle of view of the imaging control unit 311 via the drive unit 214. However, the drive control unit 313 may change any one of the shooting direction and the angle of view by the imaging control unit 311. Further, the shooting direction and the angle of view of the imaging control unit 311 may be fixed. The communication control unit 314 transmits the image captured by the imaging control unit 311 processed by the signal processing unit 312 to the client device 120 via the network I / F 216. Further, the communication control unit 314 receives a control command for the imaging apparatus 110 from the client apparatus 120 via the network I / F 216.

図３（ｂ）は、クライアント装置１２０の機能構成の一例を示す図である。
クライアント装置１２０は、入力情報取得部３２１、通信制御部３２２、画像取得部３２３、検出部３２４、予測部３２５、評価部３２６、表示制御部３２７を含む。
入力情報取得部３２１は、入力装置１３０を介したユーザによる入力を受け付ける。通信制御部３２２は、撮像装置１１０から送信された画像を、ネットワーク１５０を介して受信する。また、通信制御部３２２は、撮像装置１１０への制御命令を、ネットワーク１５０を介して送信する。画像取得部３２３は、通信制御部３２２を介して、撮像装置１１０から、撮像装置１１０により撮影された画像を、オブジェクトの検出処理の対象である画像として取得する。また、画像取得部３２３は、補助記憶装置２２３に記憶されている画像を、オブジェクトの検出処理の対象である画像として取得してもよい。
検出部３２４は、画像取得部３２３により取得された画像から、検出対象のオブジェクトの検出処理を行う。予測部３２５は、画像中で検出され得るオブジェクトの特徴量の予測を行う。評価部３２６は、予測部３２５により予測された画像中で検出され得るオブジェクトの特徴量と、検出部３２４により検出されたオブジェクトの特徴量と、に基づいて、検出部３２４の検出結果が誤検出か否かを判定する。表示制御部３２７は、ＣＰＵ２２１からの指示にしたがって画像を表示装置１４０へ出力する。表示制御部３２７は、例えば、検出部３２４により検出されたオブジェクトのうち、評価部３２６により誤検出と判定されたものを除いたものを示す画像を、表示装置１４０へ出力する。 FIG. 3B is a diagram illustrating an example of a functional configuration of the client device 120.
The client device 120 includes an input information acquisition unit 321, a communication control unit 322, an image acquisition unit 323, a detection unit 324, a prediction unit 325, an evaluation unit 326, and a display control unit 327.
The input information acquisition unit 321 receives input from the user via the input device 130. The communication control unit 322 receives the image transmitted from the imaging device 110 via the network 150. Further, the communication control unit 322 transmits a control command to the imaging device 110 via the network 150. The image acquisition unit 323 acquires an image photographed by the imaging device 110 from the imaging device 110 as an image that is a target of object detection processing via the communication control unit 322. The image acquisition unit 323 may acquire the image stored in the auxiliary storage device 223 as an image that is a target of the object detection process.
The detection unit 324 performs detection processing of the detection target object from the image acquired by the image acquisition unit 323. The prediction unit 325 predicts the feature amount of an object that can be detected in the image. The evaluation unit 326 erroneously detects the detection result of the detection unit 324 based on the feature amount of the object that can be detected in the image predicted by the prediction unit 325 and the feature amount of the object detected by the detection unit 324. It is determined whether or not. The display control unit 327 outputs an image to the display device 140 in accordance with an instruction from the CPU 221. For example, the display control unit 327 outputs, to the display device 140, an image indicating an object detected by the detection unit 324 excluding those determined to be erroneously detected by the evaluation unit 326.

図４は、オブジェクトの検出処理の一例を示すフローチャートである。図４を用いて、クライアント装置１２０が撮像装置１１０から画像を取得し、取得した画像中のオブジェクトを検出し、検出された各領域それぞれに対して、検出結果が誤検出か否かを判定する処理を説明する。
Ｓ４００において、画像取得部３２３は、通信制御部３２２を介して、撮像装置１１０から、撮像装置１１０により撮影された画像を、オブジェクトの処理の対象の画像として取得する。以下では、Ｓ４００で取得された撮像装置１１０により撮影された画像を、撮影画像とする。本実施形態では、検出対象のオブジェクトは、人体であるとする。
図５は、本実施形態における、撮像装置１１０により撮影される撮影環境の一例を示す図である。図５で、撮像装置１１０から延びる実線で囲まれる範囲は、撮像装置１１０の可視範囲を示す。また、オブジェクト５０１〜５０３は、本実施形態における検出対象のオブジェクトである人体を示す。本実施形態では、検出システムは、人体を検出するが、自動車、荷物、ドローン、動物等の人体以外のオブジェクトを検出することとしてもよい。 FIG. 4 is a flowchart illustrating an example of an object detection process. Using FIG. 4, the client device 120 acquires an image from the imaging device 110, detects an object in the acquired image, and determines whether the detection result is a false detection for each detected region. Processing will be described.
In S <b> 400, the image acquisition unit 323 acquires an image captured by the imaging device 110 from the imaging device 110 as an object processing target image via the communication control unit 322. Hereinafter, the image captured by the imaging device 110 acquired in S400 is referred to as a captured image. In the present embodiment, it is assumed that the detection target object is a human body.
FIG. 5 is a diagram illustrating an example of an imaging environment in which the imaging apparatus 110 captures images according to the present embodiment. In FIG. 5, a range surrounded by a solid line extending from the imaging device 110 indicates a visible range of the imaging device 110. Objects 501 to 503 represent human bodies that are detection target objects in the present embodiment. In the present embodiment, the detection system detects a human body, but may detect an object other than the human body such as an automobile, a luggage, a drone, or an animal.

図６は、図５に示す撮影環境において、撮像装置１１０により撮影された撮影画像の一例を示す図である。画像６００は、撮像装置１１０により図５の撮影環境が撮影された撮影画像を示す。図６では、オブジェクト５０１〜５０３は、図５中のオブジェクト５０１〜５０３と同じオブジェクトである。
また、本実施形態では、クライアント装置１２０は、撮像装置１１０からリアルタイムで撮影されたライブ映像を受信して、当該受信したライブ映像（動画）の各フレームに対し図４の処理を行う。しかしながら、クライアント装置１２０は、例えば、撮像装置１１０内の補助記憶装置２１３に記憶されている静止画、又は動画の各フレームに対して図４の処理を行ってもよい。また、クライアント装置１２０は、クライアント装置１２０内の補助記憶装置２２３に記憶されている静止画、又は動画の各フレームに対して図４の処理を行ってもよい。また、クライアント装置１２０は、外部の録画サーバにアクセスし、録画サーバ内に記憶されている静止画、又は動画の各フレームに対して図４に示す処理を行ってもよい。
また、本実施形態では、クライアント装置１２０は、図４の処理を行う前に予め撮像装置１１０により撮影された撮影画像の任意の座標（ｘ、ｙ）が、撮像装置１１０からどの方向に存在するかを求める。 FIG. 6 is a diagram illustrating an example of a photographed image photographed by the imaging device 110 in the photographing environment illustrated in FIG. An image 600 shows a captured image obtained by capturing the imaging environment of FIG. In FIG. 6, objects 501 to 503 are the same objects as the objects 501 to 503 in FIG.
In the present embodiment, the client device 120 receives live video captured in real time from the imaging device 110, and performs the process of FIG. 4 on each frame of the received live video (moving image). However, the client device 120 may perform the process of FIG. 4 on each frame of a still image or a moving image stored in the auxiliary storage device 213 in the imaging device 110, for example. Further, the client device 120 may perform the process of FIG. 4 on each frame of a still image or a moving image stored in the auxiliary storage device 223 in the client device 120. Further, the client device 120 may access an external recording server and perform the processing shown in FIG. 4 on each frame of a still image or a moving image stored in the recording server.
In the present embodiment, the client device 120 is located in any direction from the imaging device 110 where arbitrary coordinates (x, y) of the captured image previously captured by the imaging device 110 before performing the processing of FIG. Ask for.

撮像装置１１０の視点（撮像部２１５が撮影対象の環境を見るときの立脚点）を原点として、原点を通る水平な平面上に原点を通り直行するＷｘ軸及びＷｙ軸と、原点を通り垂直な方向に伸びるＷｚ軸と、を想定する。このＷｘ軸、Ｗｙ軸、Ｗｚ軸の３つの軸で特定される３次元座標を、ワールド座標系とする。
ワールド座標系上の座標は、例えば、（ｘ、ｙ、ｚ）のように、３軸上のそれぞれの位置を並べる形式で示すことができる。また、ワールド座標系上の座標は、例えば、（φ、θ、ｒ）のように、球面座標で表すこともできる。ここで、φは、Ｗｘ軸と、Ｗｘ軸とＷｙ軸とを含む平面にその座標が投影された点と原点とを通る直線と、がなす角度を示す。また、θは、Ｗｚ軸と、原点とその座標とを通る直線と、がなす角度である。また、ｒは、原点からその座標までの長さである。
ここで、球面座標で表示されたワールド座標系上の座標におけるφ及びθは、撮像装置１１０に対してその座標がどの方向に存在するかを示す情報である。本実施形態では、クライアント装置１２０は、撮影画像中のオブジェクトの位置に対応するスクリーン座標系上の座標から、ワールド座標系上の座標におけるθとφとを求めることで、撮像装置１１０に対するオブジェクトの方向を特定する。 The Wx axis and Wy axis that go straight through the origin on a horizontal plane passing through the origin, and the direction perpendicular to the origin, with the viewpoint of the imaging device 110 (the pedestal point when the imaging unit 215 sees the environment to be photographed) as the origin And the Wz axis extending to A three-dimensional coordinate specified by the three axes of the Wx axis, the Wy axis, and the Wz axis is defined as a world coordinate system.
The coordinates on the world coordinate system can be shown in a form in which the respective positions on the three axes are arranged, for example, (x, y, z). The coordinates on the world coordinate system can also be expressed by spherical coordinates, for example (φ, θ, r). Here, φ indicates an angle formed by the Wx axis, a straight line passing through the origin and a point where the coordinates are projected on a plane including the Wx axis and the Wy axis. Θ is an angle formed by the Wz axis and a straight line passing through the origin and its coordinates. R is the length from the origin to the coordinates.
Here, φ and θ in coordinates on the world coordinate system displayed in spherical coordinates are information indicating in which direction the coordinates exist with respect to the imaging apparatus 110. In the present embodiment, the client device 120 obtains θ and φ in the coordinates on the world coordinate system from the coordinates on the screen coordinate system corresponding to the position of the object in the captured image. Identify the direction.

また、撮像装置１１０の視点を原点として、原点を通る撮像装置１１０の撮影方向（光軸方向）に伸びるＶｚ軸と、原点を通り撮像装置１１０により撮影される画像における水平方向に伸びるＶｘ軸と、を想定する。また、原点を通り撮像装置１１０により撮影される画像における垂直方向に伸びるＶｙ軸を想定する。このＶｘ軸、Ｖｙ軸、Ｖｚ軸の３つの軸で特定される３次元座標を、ビュー座標系とする。
また、撮像装置１１０により撮影される画像の中央を原点として、原点を通り、その画像における水平方向に伸びるＳｘ軸と、原点を通り、その画像における垂直方向に伸びるＳｙ軸と、を想定する。このＳｘ軸、Ｓｙ軸の２つの軸で特定される２次元座標を、スクリーン座標系とする。 Also, with the viewpoint of the imaging device 110 as the origin, a Vz axis extending in the shooting direction (optical axis direction) of the imaging device 110 passing through the origin, and a Vx axis extending in the horizontal direction in an image taken by the imaging device 110 through the origin. Assuming Also, a Vy axis extending in the vertical direction in an image taken by the imaging device 110 through the origin is assumed. The three-dimensional coordinates specified by the three axes of the Vx axis, the Vy axis, and the Vz axis are set as a view coordinate system.
Further, an Sx axis that passes through the origin and extends in the horizontal direction in the image and an Sy axis that passes through the origin and extends in the vertical direction are assumed with the center of the image captured by the imaging device 110 as the origin. A two-dimensional coordinate specified by the two axes of the Sx axis and the Sy axis is defined as a screen coordinate system.

図７は、ワールド座標系と、ビュー座標系と、スクリーン座標系と、の関係の一例を説明する図である。また、Ｓｘ、Ｓｙに接する矩形が撮像装置１１０により撮影される画像（スクリーン）を示す。図７中のＳｗは、スクリーンの横幅の半分の長さ（撮影画像の半分の長さ）を示す。また、図７中のｄは、ビュー座標系、及びワールド座標系の原点からスクリーンまでの距離を示す。また、図７中の（ｘ、ｙ）は、スクリーン座標上の１点を示す。（φ、θ、ｒ）は、点（ｘ、ｙ）の座標を球面座標で表現した座標である。
つまり、クライアント装置１２０は、撮像装置１１０により撮影された画像のスクリーン座標系における任意の座標（ｘ、ｙ）から、方向（φ、θ）を求めるためには、以下のような処理を行う。即ち、クライアント装置１２０は、スクリーン座標系からビュー座標系へ変換を行い、ビュー座標系からワールド座標系へ変換を行った後、ワールド座標系上の点を球面座標の表現に変換すればよい。
スクリーン座標系からワールド座標系へ変換する方法について説明する。まず、クライアント装置１２０は、ビュー座標系原点から、スクリーンまでの距離ｄを以下の式（１）に基づいて、求める。 FIG. 7 is a diagram illustrating an example of the relationship among the world coordinate system, the view coordinate system, and the screen coordinate system. A rectangle in contact with Sx and Sy indicates an image (screen) photographed by the imaging device 110. Sw in FIG. 7 indicates the half length of the screen (half the length of the captured image). Further, d in FIG. 7 indicates the distance from the origin of the view coordinate system and the world coordinate system to the screen. Further, (x, y) in FIG. 7 indicates one point on the screen coordinates. (Φ, θ, r) are coordinates representing the coordinates of the point (x, y) in spherical coordinates.
That is, the client device 120 performs the following processing to obtain the direction (φ, θ) from arbitrary coordinates (x, y) in the screen coordinate system of the image captured by the imaging device 110. That is, the client device 120 may convert a point on the world coordinate system into a spherical coordinate representation after converting from the screen coordinate system to the view coordinate system and from the view coordinate system to the world coordinate system.
A method of converting from the screen coordinate system to the world coordinate system will be described. First, the client device 120 obtains the distance d from the view coordinate system origin to the screen based on the following formula (1).

式（１）中のφｍａｘは、撮像装置１１０の水平画角を示す。クライアント装置１２０は、撮像装置１１０の画角等の撮影の際の条件の情報を、撮像装置１１０から取得することができる。撮像装置１１０は、クライアント装置１２０からの要求に応じて、例えば、補助記憶装置２１３に記憶された撮像装置１１０の画角等の撮影の際の条件の情報をクライアント装置１２０に送信する。
スクリーン座標系の任意の点（ｘ、ｙ）は、ビュー座標系上では、（ｘ、ｙ、ｄ）で表すことができる。クライアント装置１２０は、ビュー座標系からワールド座標系へ変換する変換行列を、上記ビュー座標系の点（ｘ、ｙ、ｄ）に適用することで、ビュー座標系からワールド座標系へ変換することができる。本実施形態では、クライアント装置１２０は、四元数による変換行列を用いてスクリーン座標系からワールド座標系へ変換を行う。しかし、クライアント装置１２０は、例えば、ｘ、ｙ、ｚの各軸周り回転行列を用いてもよい。
ビュー座標系からワールド座標系へ変換するためには、撮像装置１１０の光軸方向の情報が必要である。本実施形態では、クライアント装置１２０は、撮像装置１１０内のジャイロセンサから出力される情報に基づいて、撮像装置１１０の姿勢を求める。また、撮像装置１１０が、撮像装置１１０内のジャイロセンサから出力される情報に基づいて、撮像装置１１０の姿勢を求め、求めた姿勢の情報をクライアント装置１２０に送信してもよい。また、クライアント装置１２０は、例えば、入力装置１３０を介したユーザの操作に基づいて、撮像装置１１０の姿勢の情報の入力を受付けることで、撮像装置１１０の姿勢の情報を取得してもよい。また、クライアント装置１２０は、撮像装置１１０が行ったパン、チルト駆動の情報を取得して、取得したパン、チルト駆動の情報から撮像装置１１０の姿勢の情報を取得してもよい。また、クライアント装置１２０は、床の上に置かれたテストチャートに基づいて、撮像装置１１０のカメラ校正を行ってもよい。 Φmax in Expression (1) indicates a horizontal angle of view of the imaging apparatus 110. The client device 120 can acquire information on conditions at the time of shooting such as the angle of view of the imaging device 110 from the imaging device 110. In response to a request from the client device 120, the imaging device 110 transmits, to the client device 120, information on conditions at the time of shooting such as the angle of view of the imaging device 110 stored in the auxiliary storage device 213.
An arbitrary point (x, y) in the screen coordinate system can be represented by (x, y, d) on the view coordinate system. The client device 120 can convert the view coordinate system to the world coordinate system by applying a conversion matrix for converting the view coordinate system to the world coordinate system to the point (x, y, d) of the view coordinate system. it can. In the present embodiment, the client device 120 performs conversion from the screen coordinate system to the world coordinate system using a conversion matrix based on a quaternion. However, the client device 120 may use a rotation matrix around each of the x, y, and z axes, for example.
In order to convert from the view coordinate system to the world coordinate system, information on the optical axis direction of the imaging device 110 is necessary. In the present embodiment, the client device 120 obtains the attitude of the imaging device 110 based on information output from a gyro sensor in the imaging device 110. Further, the imaging apparatus 110 may obtain the attitude of the imaging apparatus 110 based on information output from a gyro sensor in the imaging apparatus 110 and transmit the obtained attitude information to the client apparatus 120. In addition, the client device 120 may acquire the posture information of the imaging device 110 by receiving input of posture information of the imaging device 110 based on a user operation via the input device 130, for example. In addition, the client device 120 may acquire information on pan and tilt driving performed by the imaging device 110 and may acquire information on the posture of the imaging device 110 from the acquired information on pan and tilt driving. Further, the client device 120 may perform camera calibration of the imaging device 110 based on a test chart placed on the floor.

Ｓ４０１において、検出部３２４は、Ｓ４００で取得されたた撮影画像から、検出対象として設定されたオブジェクトである人体を検出する検出処理を行う。
本実施形態では、検出部３２４は、まず、Ｓ４００で取得された撮影画像に対し、様々なサイズでスケーリングを行うことで、複数の撮影画像の拡大・縮小画像（以下では、スケール画像）を取得する。検出部３２４は、取得した複数の撮影画像の拡大・縮小画像に対して、オブジェクトの検出処理を行うことで、様々なサイズのオブジェクトを検出可能である。
次に、検出部３２４は、撮像画像の各スケール画像に対して、特定のサイズの検出窓を用いてラスタースキャンする。補助記憶装置２２３は、予め学習データを用いて計算された検出対象のオブジェクトの特徴量を記憶しておく。そして、検出部３２４は、スキャンの際に検出窓内から計算される特徴量と、補助記憶装置２２３に記憶された学習データに基づく特徴量と、の誤差が設定された閾値よりも小さい場合に、撮影画像中の検出窓の領域をオブジェクトであると判定する。この閾値の情報は、予め補助記憶装置２２３に記憶されている。また、検出部３２４は、検出したオブジェクトの領域の中心座標から、撮像装置１１０に対する検出したオブジェクトの方向を求めておく。より具体的には、検出部３２４は、検出したオブジェクトの領域の中心座標から、ワールド座標系における検出したオブジェクトの方向（φ、θ）を求める。 In S401, the detection unit 324 performs a detection process of detecting a human body that is an object set as a detection target from the captured image acquired in S400.
In the present embodiment, the detection unit 324 first obtains enlarged / reduced images (hereinafter, scale images) of a plurality of photographed images by scaling the photographed images obtained in S400 with various sizes. To do. The detection unit 324 can detect objects of various sizes by performing object detection processing on the acquired enlarged / reduced images of the plurality of captured images.
Next, the detection unit 324 performs a raster scan on each scale image of the captured image using a detection window of a specific size. The auxiliary storage device 223 stores the feature amount of the detection target object calculated using the learning data in advance. When the error between the feature amount calculated from the detection window at the time of scanning and the feature amount based on the learning data stored in the auxiliary storage device 223 is smaller than the set threshold value, the detection unit 324 The area of the detection window in the captured image is determined to be an object. Information on this threshold is stored in advance in the auxiliary storage device 223. The detection unit 324 obtains the direction of the detected object with respect to the imaging device 110 from the center coordinates of the detected object region. More specifically, the detection unit 324 obtains the direction (φ, θ) of the detected object in the world coordinate system from the center coordinates of the detected object region.

図８は、撮影画像に対する検出部３２４による検出処理の結果の一例を示す図である。図８において、領域８０１〜８０３は、それぞれ、オブジェクト５０１〜５０３が検出された領域を示す。また、領域８０４、８０５は、オブジェクトではなく、誤検出された領域を示す。領域８０４は、内部に、検出対象のオブジェクトに近い特徴をもつ部分があり、誤検出となって現れた結果を示す。また、領域８０５は、密集した複数のオブジェクトを含む領域が検出対象のオブジェクトに近い特徴を持ち、誤検出となって現れた結果を示す。
以下では、Ｓ４０１で検出されたＮ（１≦Ｎ）個の領域を、検出領域ｎ（ｎ＝１〜Ｎ）とする。 FIG. 8 is a diagram illustrating an example of a result of detection processing performed by the detection unit 324 on a captured image. In FIG. 8, areas 801 to 803 indicate areas where the objects 501 to 503 are detected, respectively. In addition, areas 804 and 805 indicate misdetected areas, not objects. A region 804 shows a result of an erroneous detection that has a portion having characteristics close to the object to be detected inside. An area 805 indicates a result of an area that includes a plurality of dense objects having characteristics close to that of the object to be detected and appears as a false detection.
Hereinafter, the N (1 ≦ N) areas detected in S401 are referred to as detection areas n (n = 1 to N).

Ｓ４０２において、予測部３２５は、撮影画像の各部に写るオブジェクトの検出領域のサイズの予測値を更新するか否かを判定する。本実施形態では、予測部３２５は、図４の処理が初めて実行される場合、又は、前回の図４の処理から一定期間経過した場合、撮影画像の各部に写るオブジェクトの検出領域のサイズの予測値を更新すると判定し、Ｓ４０３の処理に進む。また、予測部３２５は、図４の処理が初めて実行されるわけではない場合、かつ、前回の図４の処理から一定期間経過していない場合、撮影画像の各部に写るオブジェクトの検出領域のサイズの予測値を更新しないと判定し、Ｓ４０４の処理に進む。
Ｓ４０３において、予測部３２５は、各検出領域に基づき、各検出領域の方向におけるオブジェクトのサイズの予測値を更新する。本実施形態では、予測部３２５は、Ｓ４０１で検出した検出領域の方向、及びサイズに基づき、球面調和関数を用いた補間を行い、各方向に映るオブジェクトの検出領域のサイズの予測を行う。そのため、予測部３２５は、以下の式（２）に基づき球面調査関数の基底の係数ｃを計算する。 In S402, the prediction unit 325 determines whether or not to update the predicted value of the size of the detection area of the object that appears in each part of the captured image. In the present embodiment, the prediction unit 325 predicts the size of the detection area of the object that appears in each part of the captured image when the process of FIG. 4 is executed for the first time or when a certain period of time has elapsed since the previous process of FIG. It is determined that the value is updated, and the process proceeds to S403. In addition, when the process of FIG. 4 is not executed for the first time and when a certain period has not elapsed since the previous process of FIG. 4, the prediction unit 325 determines the size of the detection area of the object shown in each part of the captured image. Is determined not to be updated, and the process proceeds to S404.
In S403, the prediction unit 325 updates the predicted value of the object size in the direction of each detection area based on each detection area. In the present embodiment, the prediction unit 325 performs interpolation using a spherical harmonic function based on the direction and size of the detection area detected in S401, and predicts the size of the detection area of the object reflected in each direction. Therefore, the prediction unit 325 calculates the base coefficient c of the spherical search function based on the following equation (2).

式（２）中のφｎ、θｎは、それぞれＳ４０１で検出された検出領域ｎ（ｎ＝１〜Ｎ）の方向を表す。Ｙは、球面調和関数の基底関数を表し、第一引数が次数、第二引数が位数、第三、第四引数が検出領域の方向である。また、ｓｎは、検出領域ｎのサイズを表す。予測部３２５は、式（２）を用いて、球面調和関数Ｙ（ｌ、ｍ、φ、θ）の係数ｃ（ｌ、ｍ）を、得ることができる。そして、予測部３２５は、任意の方向（φ、θ）における検出領域のサイズの予測値ｓ（φ、θ）を、以下の式（３）に基づき計算する。 In the equation (2), φn and θn each represent the direction of the detection region n (n = 1 to N) detected in S401. Y represents the basis function of the spherical harmonic function, the first argument is the order, the second argument is the order, and the third and fourth arguments are the directions of the detection region. Sn represents the size of the detection region n. The prediction unit 325 can obtain the coefficient c (l, m) of the spherical harmonic function Y (l, m, φ, θ) using the equation (2). Then, the prediction unit 325 calculates a predicted value s (φ, θ) of the size of the detection region in an arbitrary direction (φ, θ) based on the following equation (3).

本実施形態では、次数ｌは、できる限り小さな値を用いる。これにより、領域８０４、８０５のような、周りとサイズが顕著に異なる検出領域を用いて球面調和関数の係数を計算しても、高周波をカットした補間が可能になり、それらを外れ値とみなすことができる。
また、球面調和関数の各基底は、位数が０の場合、θの値にのみ依存し、φの値に依存しない。そのため、本実施形態では、オブジェクトの領域のサイズは、撮像装置１１０からオブジェクトまでのワールド座標系における俯角のみに依存し、方位角には依存しないと仮定し、位数ｍを０とする。
このように、予測部３２５は、球面調和関数を用いて、オブジェクトの検出領域のサイズの予測値を求める。予測部３２５は、球面調和関数を用いることで、ワールド座標系における撮像装置１１０からの方向に応じた予測値を求めることができる。
例えば、撮像装置１１０のパンチルト駆動が発生し、撮像装置１１０の視野が変化したとする。しかし、撮像装置１１０とオブジェクトとのワールド座標系における位置関係は、変化しない。そのため、クライアント装置１２０は、ワールド座標系において、球面調和関数を用いて求められたオブジェクトの検出領域のサイズの予測値を、撮像装置１１０の視野の変更前でも変更後でも、同様に利用できる。そのため、予測部３２５は、球面調和関数を用いることで、撮像装置１１０の視野が変化する度に、各方向についてオブジェクトの検出領域のサイズの予測値を計算しなくてもよくなり、計算の処理の負担を軽減できる。また、予測部３２５は、球面調和関数以外の放射基底関数等のワールド座標系における撮像装置１１０からの方向に基づき検出領域のサイズを近似できる関数を用いて、オブジェクトの検出領域のサイズの予測値を求めても、同様の効果を得ることができる。 In the present embodiment, the order l is as small as possible. As a result, even if the spherical harmonic function coefficients are calculated using detection regions that are remarkably different in size from the surroundings, such as regions 804 and 805, high-frequency interpolation is possible, and these are regarded as outliers. be able to.
In addition, each base of the spherical harmonic function depends only on the value of θ when the order is 0, and does not depend on the value of φ. Therefore, in this embodiment, it is assumed that the size of the object region depends only on the depression angle in the world coordinate system from the imaging device 110 to the object, and does not depend on the azimuth angle, and the order m is set to zero.
As described above, the prediction unit 325 obtains a predicted value of the size of the object detection area using the spherical harmonic function. The prediction unit 325 can obtain a predicted value corresponding to the direction from the imaging device 110 in the world coordinate system by using a spherical harmonic function.
For example, it is assumed that the pan / tilt drive of the imaging device 110 occurs and the field of view of the imaging device 110 changes. However, the positional relationship between the imaging device 110 and the object in the world coordinate system does not change. Therefore, the client device 120 can use the predicted value of the detection area size of the object obtained using the spherical harmonic function in the world coordinate system before and after changing the field of view of the imaging device 110. Therefore, by using the spherical harmonic function, the prediction unit 325 does not have to calculate the predicted value of the size of the detection area of the object for each direction every time the field of view of the imaging device 110 changes, and the calculation process Can be reduced. Further, the prediction unit 325 uses a function that can approximate the size of the detection area based on the direction from the imaging device 110 in the world coordinate system, such as a radial basis function other than the spherical harmonic function, to predict the size of the detection area of the object. Even if it asks for, the same effect can be acquired.

Ｓ４０４において、評価部３２６は、検出領域ｎのリストを作成し、補助記憶装置２２３等に記憶する。評価部３２６は、作成するリスト内の各ノードに、各検出領域のサイズ、中心座標、及び方向を格納する。図７の例では、評価部３２６は、リストに領域８０１〜８０５それぞれの矩形のサイズ、中心座標、及び中心座標から計算された方向の情報を格納する。
次に、評価部３２６は、Ｓ４０４で作成したリストを走査しながら、リスト内の各検出領域が誤検出か否かを判定する。誤検出か否かを判定する処理の流れは、以下の通りである。
Ｓ４０５において、評価部３２６は、Ｓ４０４で作成したリストから、先頭の検出領域を選択する。以下、現在選択されている検出領域を、検出領域ｉとする。 In S404, the evaluation unit 326 creates a list of detection areas n and stores it in the auxiliary storage device 223 or the like. The evaluation unit 326 stores the size, center coordinates, and direction of each detection area in each node in the list to be created. In the example of FIG. 7, the evaluation unit 326 stores information on the rectangle size, center coordinates, and direction calculated from the center coordinates of the regions 801 to 805 in the list.
Next, the evaluation unit 326 determines whether each detection region in the list is erroneously detected while scanning the list created in S404. The flow of processing for determining whether or not it is erroneous detection is as follows.
In S405, the evaluation unit 326 selects the first detection area from the list created in S404. Hereinafter, the currently selected detection area is referred to as a detection area i.

Ｓ４０６において、評価部３２６は、Ｓ４０５又は後述するＳ４１０で選択した検出領域の特徴量であるサイズと、Ｓ４０５又はＳ４１０で選択した検出領域の方向と、に基づいて、以下の処理を行う。即ち、評価部３２６は、Ｓ４０５又はＳ４１０で選択した検出領域が誤検出されたものであるか否かを判定する処理を行う。評価部３２６は、例えば、検出領域ｉのサイズｓ'（φｉ、θｉ）と、検出領域ｉの方向（φｉ、θｉ）に対応するオブジェクトの検出領域のサイズの予測値ｓ（φｉ、θｉ）と、を比較することにより、検出領域ｉについて、検出の結果を評価する。
本実施形態では、評価部３２６は、検出領域ｉについてのオブジェクトの検出が誤検出であるか否か決定することで、検出領域ｉに係る検出の結果を評価する。しかし、評価部３２６は、例えば、検出領域ｉについて、検出領域ｉがオブジェクトある確からしさを示す設定された尤度を求めることで、検出領域ｉに係る検出の結果を評価することとしてもよい。
評価部３２６は、検出領域ｉのサイズｓ'（φｉ、θｉ）が、検出領域ｉの方向（φｉ、θｉ）から予測部３２５により予測されたサイズの予測値ｓ（φｉ、θｉ）からどの程度逸脱しているかの度合いに基づいて、検出領域ｉに係る検出の結果を評価する。本実施形態では、評価部３２６は、検出領域ｉのサイズｓ'（φｉ、θｉ）が、以下の式（４）に示す条件を満たす場合には、検出領域ｉに係るオブジェクトの検出の結果が誤検出ではないと決定する。一方、評価部３２６は、検出領域ｉのサイズｓ'（φｉ、θｉ）が式（４）の条件を満たさない場合、検出領域ｉに係るオブジェクトの検出の結果が誤検出であると決定する。 In S406, the evaluation unit 326 performs the following processing based on the size, which is the feature amount of the detection area selected in S405 or S410 described later, and the direction of the detection area selected in S405 or S410. That is, the evaluation unit 326 performs a process of determining whether or not the detection region selected in S405 or S410 is erroneously detected. The evaluation unit 326, for example, the size s ′ (φi, θi) of the detection area i and the predicted value s (φi, θi) of the size of the detection area of the object corresponding to the direction (φi, θi) of the detection area i, , The detection result is evaluated for the detection region i.
In the present embodiment, the evaluation unit 326 evaluates the detection result related to the detection region i by determining whether or not the detection of the object for the detection region i is a false detection. However, for example, the evaluation unit 326 may evaluate the detection result related to the detection region i by obtaining a set likelihood indicating the probability that the detection region i is an object for the detection region i.
The evaluation unit 326 determines how much the size s ′ (φi, θi) of the detection region i is from the predicted value s (φi, θi) of the size predicted by the prediction unit 325 from the direction (φi, θi) of the detection region i. Based on the degree of deviation, the detection result relating to the detection region i is evaluated. In the present embodiment, when the size s ′ (φi, θi) of the detection area i satisfies the condition shown in the following expression (4), the evaluation unit 326 displays the detection result of the object related to the detection area i. It is determined that it is not a false detection. On the other hand, when the size s ′ (φi, θi) of the detection area i does not satisfy the condition of Expression (4), the evaluation unit 326 determines that the detection result of the object related to the detection area i is a false detection.

本実施形態では、式（４）中のα（φｉ、θｉ）及びβ（φｉ、θｉ）の値は、どちらもｓ（φｉ、θｉ）の１０％の値とする。しかしながら、評価部３２６は、例えば、α、βの値を、球面調和関数を用いて求めてもよい。その際、評価部３２６は、α、βの係数の計算として、α、βの係数をランダムに変化させ、予め誤検出か否かが判別されている検出領域のデータの幾つかを式（４）で判断させ、正しく誤検出と判断できた検出領域の数が最も多い係数の値を採用してもよい。
このように、評価部３２６は、検出領域ｉのサイズの予測値に対する逸脱の度合いに基づいて誤検出か否か決定する。
本実施形態では、予測部３２５は、撮影画像に対するオブジェクトの検出の結果、得られた検出領域を利用して、予測値の生成を行う。しかしながら、予測部３２５は、例えば、撮影画像から検出されたオブジェクトの数が設定された閾値よりも少なければ、過去の撮影画像に対して行った検出結果を利用してもよい。予測部３２５は、過去の撮影画像について検出された検出領域について、それぞれのサイズ、中心座標、及び撮像装置１１０からの方向を補助記憶装置２２３等に記憶しておくことにより、過去の撮影画像を利用できる。 In the present embodiment, the values of α (φi, θi) and β (φi, θi) in equation (4) are both 10% of s (φi, θi). However, the evaluation unit 326 may obtain the values of α and β using a spherical harmonic function, for example. At this time, the evaluation unit 326 calculates α and β coefficients by randomly changing the coefficients of α and β, and calculates some of the data in the detection area in which whether or not erroneous detection has been performed in advance is represented by Equation (4). The coefficient value having the largest number of detection areas that have been correctly determined as erroneous detection may be employed.
In this way, the evaluation unit 326 determines whether or not a false detection is made based on the degree of deviation from the predicted value of the size of the detection region i.
In the present embodiment, the prediction unit 325 generates a prediction value by using a detection area obtained as a result of object detection for a captured image. However, for example, if the number of objects detected from the captured image is less than the set threshold value, the prediction unit 325 may use the detection result performed on the past captured image. The prediction unit 325 stores the past captured image in the auxiliary storage device 223 and the like by storing the size, the center coordinates, and the direction from the imaging device 110 in the detection area detected for the past captured image. Available.

Ｓ４０７において、評価部３２６は、検出領域ｉが誤検出であることを示す情報を、Ｓ４０４で作成されたリストに記憶する。
Ｓ４０８において、評価部３２６は、検出領域ｉが誤検出で無いことを示す情報を、Ｓ４０４で作成されたリストに記憶する。
Ｓ４０９において、評価部３２６は、検出領域ｉがＳ４０４で作成されたリストの末尾の検出領域であるか否かを判定する。評価部３２６は、検出領域ｉがＳ４０４で作成されたリストの末尾の検出領域であると判定した場合、Ｓ４１１の処理に進む。また、評価部３２６は、検出領域ｉがＳ４０４で作成されたリストの末尾の検出領域でないと判定した場合、Ｓ４１０の処理に進む。 In S407, the evaluation unit 326 stores information indicating that the detection area i is a false detection in the list created in S404.
In S408, the evaluation unit 326 stores information indicating that the detection area i is not erroneously detected in the list created in S404.
In S409, the evaluation unit 326 determines whether the detection area i is the detection area at the end of the list created in S404. If the evaluation unit 326 determines that the detection area i is the detection area at the end of the list created in S404, the evaluation unit 326 proceeds to the process of S411. If the evaluation unit 326 determines that the detection area i is not the detection area at the end of the list created in S404, the process proceeds to S410.

Ｓ４１０において、評価部３２６は、Ｓ４０４で作成されたリストにおける現在の検出領域ｉの次の検出領域を、新たな検出領域ｉとして選択し、Ｓ４０６の処理に進む。こうして、評価部３２６は、リストに登録されている検出領域について、誤検出か否かを順次に決定する。
Ｓ４１１において、表示制御部３２７は、撮影画像である画像４００を、検出対象の被写体の検出結果とともに表示装置１４０に表示することで、検出結果を出力する。表示制御部３２７は、例えば、評価部３２６により誤検出でないと決定された各検出領域を示す枠を撮影画像に重畳して表示する。例えば、表示制御部３２７は、図８の画面から、領域８０４〜８０５の矩形枠を除いた画面を表示する。なお、表示制御部３２７は、検出領域を示す枠として、矩形以外の楕円等の形状の枠を表示してもよい。 In S410, the evaluation unit 326 selects a detection area next to the current detection area i in the list created in S404 as a new detection area i, and the process proceeds to S406. In this way, the evaluation unit 326 sequentially determines whether or not the detection areas registered in the list are erroneously detected.
In S411, the display control unit 327 outputs the detection result by displaying the image 400, which is a captured image, on the display device 140 together with the detection result of the subject to be detected. For example, the display control unit 327 superimposes and displays a frame indicating each detection region that is determined not to be erroneously detected by the evaluation unit 326 on the captured image. For example, the display control unit 327 displays a screen obtained by removing the rectangular frames in the areas 804 to 805 from the screen in FIG. Note that the display control unit 327 may display a frame having a shape such as an ellipse other than a rectangle as a frame indicating the detection region.

以上、本実施形態では、クライアント装置１２０は、撮像装置１１０により撮影された画像からオブジェクトを検出し、検出された各検出領域について、以下の処理を行うこととした。即ち、クライアント装置１２０は、画像中における検出領域のサイズと、撮像装置１１０に対する検出領域の方向に応じて設定された検出領域のサイズの予測値と、に基づいて、その検出領域が誤検出か否かを決定した。
これにより、検出システムは、誤検出の可能性を低減することができる。また、検出システムは、複数のオブジェクトが密集している場合でも、誤検出でない検出領域の近辺の検出領域を誤検出とするわけではない。このように、検出システムは、オブジェクトの誤検出をより正確に判断できる。 As described above, in the present embodiment, the client device 120 detects an object from the image captured by the imaging device 110, and performs the following processing for each detected detection area. In other words, the client device 120 determines whether the detection region is erroneously detected based on the size of the detection region in the image and the predicted value of the size of the detection region set according to the direction of the detection region with respect to the imaging device 110. Decided whether or not.
Thereby, the detection system can reduce the possibility of erroneous detection. Further, even when a plurality of objects are densely populated, the detection system does not mean that a detection area near a detection area that is not erroneously detected is erroneously detected. As described above, the detection system can more accurately determine the erroneous detection of the object.

本実施形態では、検出システムは、オブジェクトの特徴量として、検出領域のサイズを利用した。しかし、検出システムは、例えば、オブジェクトの特徴量として、検出領域の平均の明度を利用してもよい。例えば、撮像装置１１０が撮影する撮影環境が日陰の領域と、照明の当たっている領域と、が含まれるとする。その場合、検出対象のオブジェクトが日陰の領域に存在する場合、画像中における検出領域の平均明度は、オブジェクトが証明の当たっている領域に存在する場合に比べて、低くなると想定できる。
そこで、評価部３２６は、検出領域の方向が、日陰領域を示す方向を向いている場合、検出領域の平均明度が設定された閾値以上である場合、誤検出と決定し、設定された閾値未満である場合、誤検出でないと決定してもよい。また、評価部３２６は、検出領域の方向が、照明の当たっている領域を示す方向を向いている場合、検出領域の平均明度が設定された閾値未満である場合、誤検出と決定し、設定された閾値以上である場合、誤検出でないと決定してもよい。 In the present embodiment, the detection system uses the size of the detection area as the feature amount of the object. However, the detection system may use, for example, the average brightness of the detection area as the feature amount of the object. For example, it is assumed that the shooting environment in which the imaging apparatus 110 captures includes a shaded area and a illuminated area. In that case, when the object to be detected is present in the shaded area, it can be assumed that the average brightness of the detected area in the image is lower than that in the case where the object is present in the certified area.
Therefore, the evaluation unit 326 determines that the detection area is false detection when the direction of the detection area is in the direction indicating the shaded area, and the average brightness of the detection area is equal to or greater than the set threshold, and is less than the set threshold. If it is, it may be determined that there is no false detection. In addition, the evaluation unit 326 determines that the detection area is erroneously detected when the direction of the detection area is in the direction indicating the illuminated area, and the average brightness of the detection area is less than the set threshold value. If it is equal to or greater than the threshold value, it may be determined that there is no false detection.

＜実施形態２＞
実施形態１では、予測部３２５は、任意の方向に映るオブジェクトの検出領域のサイズの予測値を、図４の処理を初めて実行した場合、又は、前回の更新から一定期間経過していた場合に更新していた。また、予測部３２５は、オブジェクトの検出領域のサイズがワールド座標系における撮像装置１１０の俯角のみに依存し、方位角には依存しないと仮定し、オブジェクトの検出領域の予測に用いる球面調和関数の位数ｍを０に限定していた。本実施形態では、オブジェクトの検出領域のサイズが、ワールド座標系における撮像装置１１０の方位角にも依存する例について説明する。また、本実施形態では、オブジェクトの検出領域のサイズの方位角への依存が確認でき次第、検出領域のサイズの予測値を更新する処理について説明する。
本実施形態の検出システムのシステム構成は、実施形態１と同様である。また、撮像装置１１０、及びクライアント装置１２０のハードウェア構成及び機能構成は、実施形態１と同様である。
本実施形態におけるクライアント装置１２０の処理は、実施形態１と同様に、図４に示す処理である。 <Embodiment 2>
In the first embodiment, the prediction unit 325 determines the predicted value of the size of the detection area of an object appearing in an arbitrary direction when the process in FIG. 4 is executed for the first time, or when a certain period has elapsed since the previous update. It was updated. Further, the prediction unit 325 assumes that the size of the object detection area depends only on the depression angle of the imaging device 110 in the world coordinate system, and does not depend on the azimuth angle, and the spherical harmonic function used for prediction of the object detection area. The order m was limited to 0. In the present embodiment, an example will be described in which the size of the detection area of the object also depends on the azimuth angle of the imaging device 110 in the world coordinate system. In the present embodiment, a process for updating the predicted value of the size of the detection area as soon as the dependence of the size of the detection area of the object on the azimuth angle can be confirmed will be described.
The system configuration of the detection system of the present embodiment is the same as that of the first embodiment. The hardware configurations and functional configurations of the imaging device 110 and the client device 120 are the same as those in the first embodiment.
The processing of the client device 120 in this embodiment is the processing shown in FIG. 4 as in the first embodiment.

オブジェクトの検出領域のサイズがワールド座標系における撮像装置１１０の方位角に依存する撮影環境として、例えば、撮影された画像に階段が映り、階段を利用している人体が存在する場合や、床にしゃがみこんだ人体が存在する場合がある。図９、図１０を用いてこのような場合の例を説明する。
図９は、撮像装置１１０から同じ俯角の方向に複数のオブジェクトが存在する撮影環境の一例を示す図である。図９の例では、三つのオブジェクト９０１〜９０３が存在している。オブジェクト９０１は、床にしゃがみ込んでいる。オブジェクト９０２は、階段下に直立している。また、オブジェクト９０３は、階段上に直立している。この際、撮像装置１１０により図１０に示す画像１０００が撮影される。 As a shooting environment in which the size of the detection area of the object depends on the azimuth angle of the imaging device 110 in the world coordinate system, for example, when a staircase appears in the captured image and there is a human body using the staircase, There may be a crouched human body. An example of such a case will be described with reference to FIGS.
FIG. 9 is a diagram illustrating an example of a shooting environment in which a plurality of objects exist in the same depression direction from the imaging device 110. In the example of FIG. 9, there are three objects 901 to 903. The object 901 is crouching on the floor. The object 902 stands upright under the stairs. The object 903 stands upright on the stairs. At this time, the image 1000 shown in FIG.

図１０は、図９の撮影環境が撮像装置１１０により撮影された画像の一例を示す図である。図１０内のオブジェクト９０１〜９０３は、それぞれ図９の撮影環境のオブジェクト９０１〜９０３を示す。各オブジェクトは、撮像装置１１０から同じ俯角の方向に存在するにも関わらず、サイズにばらつきが見られることが図１０から分かる。この場合、実施形態１の処理では、オブジェクト９０１〜９０３の何れかが誤検出とみなされる可能性がある。
そこで、本実施形態では、クライアント装置１２０は、以下のような処理を行う。
クライアント装置１２０は、図４の処理を繰り返し、撮影画像の設定された部分で誤検出が頻発し、その誤検出された検出領域のサイズのばらつきが設定された閾値よりも低い場合、次のような処理を行う。即ち、クライアント装置１２０は、図９のように、オブジェクトの検出領域のサイズが方位角にも依存する撮影環境であると判断し、任意の方向に映るオブジェクトの検出領域のサイズの予測値を生成し直す。
例えば、評価部３２６は、図２の処理の繰り返しの際に、撮影画像を任意の数の領域に分割し、それぞれの領域内で誤検出が連続して発生したフレームをカウントし、カウント数を補助記憶装置２２３に記憶しておく。また、評価部３２６は、その際発生している誤検出された検出領域のサイズの情報を、補助記憶装置２２３に記憶しておく。そして、予測部３２５は、Ｓ２０２で、評価部３２６により補助記憶装置２２３に記憶されたカウント数と、誤検出された検出領域のサイズの情報と、に基づいて、オブジェクトの検出領域のサイズの予測値を更新するか否か判定する。予測部３２５は、例えば、そのカウント数が設定された閾値以上、かつ、連続して発生していた複数の誤検出に係る検出流域のサイズの分散の値が設定された閾値以下の場合、オブジェクトの検出領域のサイズの予測値を更新すると判定する。予測部３２５は、それ以外の場合、オブジェクトの検出領域のサイズの予測値を更新しないと判定する。この際、予測部３２５は、球面調和関数の位数ｍが０で無い基底も使用し、実施形態１のように位数ｍの限定を解除する。これにより、本実施形態では、予測部３２５は、局所的にサイズが異なる、任意の方向に映るオブジェクトの検出領域のサイズの予測値を生成することができる。 FIG. 10 is a diagram illustrating an example of an image captured by the imaging device 110 in the imaging environment of FIG. Objects 901 to 903 in FIG. 10 indicate objects 901 to 903 in the shooting environment in FIG. 9, respectively. It can be seen from FIG. 10 that even though each object is present in the same depression angle direction from the imaging device 110, there is a variation in size. In this case, in the process of the first embodiment, any of the objects 901 to 903 may be regarded as a false detection.
Therefore, in the present embodiment, the client device 120 performs the following processing.
When the client apparatus 120 repeats the process of FIG. 4 and erroneous detection frequently occurs in the set part of the photographed image, and the variation in the size of the erroneously detected detection area is lower than the set threshold, the following is performed. Perform proper processing. That is, as shown in FIG. 9, the client apparatus 120 determines that the size of the object detection area is a shooting environment that also depends on the azimuth angle, and generates a predicted value of the size of the object detection area reflected in an arbitrary direction. Try again.
For example, the evaluation unit 326 divides the captured image into an arbitrary number of areas when the process of FIG. 2 is repeated, counts frames in which false detections continuously occur in each area, and calculates the count number. This is stored in the auxiliary storage device 223. Further, the evaluation unit 326 stores information on the size of the erroneously detected detection area generated at that time in the auxiliary storage device 223. In S202, the prediction unit 325 predicts the size of the detection area of the object based on the count number stored in the auxiliary storage device 223 by the evaluation unit 326 and the size information of the detection area that has been erroneously detected. Determine whether to update the value. For example, when the number of counts is equal to or greater than a set threshold value, and the variance value of the size of the detected basin associated with a plurality of false detections that have occurred continuously is equal to or less than the set threshold value, the prediction unit 325 It is determined that the predicted value of the size of the detection area is updated. In other cases, the prediction unit 325 determines not to update the predicted value of the size of the detection area of the object. At this time, the prediction unit 325 uses a base whose order m of the spherical harmonic function is not 0, and cancels the restriction of the order m as in the first embodiment. Thereby, in this embodiment, the estimation part 325 can produce | generate the predicted value of the size of the detection area | region of the object reflected in arbitrary directions from which a size differs locally.

以上、本実施形態では、クライアント装置１２０は、過去に発生した誤検出の位置、及び誤検出された検出領域のサイズに基づいて、想定されている仮定の正当性を判断した。そして、クライアント装置１２０は、想定していた仮定が誤っていれば、他の仮定に基づいて、撮影画像中におけるオブジェクトの検出領域のサイズの予測値を生成し直し、生成し直した予測値に基づいて、検出されたオブジェクトの評価を行うこととした。即ち、クライアント装置１２０は、評価部３２６による過去の評価結果に基づいて、オブジェクトの検出領域のサイズの予測値を生成し直すこととした。これにより、クライアント装置１２０は、実施形態１と比べて、更に、オブジェクトの検出の精度を向上できる。 As described above, in the present embodiment, the client device 120 determines the validity of the assumed assumption based on the position of erroneous detection that has occurred in the past and the size of the erroneously detected detection area. If the assumed assumption is incorrect, the client device 120 regenerates the predicted value of the size of the detection area of the object in the captured image based on the other assumptions, and generates the regenerated predicted value. Based on this, the detected object was evaluated. In other words, the client device 120 regenerates the predicted value of the size of the object detection area based on the past evaluation result by the evaluation unit 326. Thereby, the client apparatus 120 can further improve the accuracy of object detection as compared with the first embodiment.

＜実施形態３＞
実施形態１、２では、クライアント装置１２０は、撮像装置１１０により撮影された撮影画像に対しオブジェクトの検出を実行した。そして、クライアント装置１２０は、その検出結果に基づいて、撮影画像中で任意の方向に映るオブジェクトの検出領域のサイズの予測値を生成していた。本実施形態では、クライアント装置１２０は、撮像装置１１０の設置位置に基づいて、任意の方向に映るオブジェクトの検出領域のサイズの予測値を生成する処理について説明する。
本実施形態の検出システムのシステム構成は、実施形態１と同様である。また、撮像装置１１０、及びクライアント装置１２０のハードウェア構成及び機能構成は、実施形態１と同様である。
本実施形態におけるクライアント装置１２０の処理は、実施形態１と同様に、図４に示す処理である。 <Embodiment 3>
In the first and second embodiments, the client device 120 performs object detection on the captured image captured by the imaging device 110. Based on the detection result, the client device 120 generates a predicted value of the size of the detection area of the object appearing in an arbitrary direction in the captured image. In the present embodiment, the client device 120 will be described with reference to processing for generating a predicted value of the size of the detection area of an object appearing in an arbitrary direction based on the installation position of the imaging device 110.
The system configuration of the detection system of the present embodiment is the same as that of the first embodiment. The hardware configurations and functional configurations of the imaging device 110 and the client device 120 are the same as those in the first embodiment.
The processing of the client device 120 in this embodiment is the processing shown in FIG. 4 as in the first embodiment.

本実施形態では、予測部３２５は、Ｓ２０３で、オブジェクトの検出領域のサイズの予測値を生成する際、撮像装置１１０の設置位置の情報を用いる。
図１１は、オブジェクトと撮像装置との位置関係の一例を示す図である。図１１の撮影環境は、撮像装置１１０が設置されている空間に、オブジェクト１１０１が存在している撮影環境を示す。図１１中のｗｐ１（ｘ１、ｙ１、ｚ１）は、オブジェクト１１０１の最上部のワールド座標系における座標を示す。また、ｗｐ２（ｘ２、ｙ２、ｚ２）は、オブジェクト９０１の最下部のワールド座標系における座標を示す。この場合、ｚ２は、撮像装置１１０の設置高さから求まる。ｚ１は、オブジェクトの高さから求まる。例えば、二つの点ｗｐ１、ｗｐ２を、ワールド座標系からビュー座標系上の座標ｖｐ１、ｖｐ２へ変換する。その後、ビュー座標系からスクリーン座標系上の点ｓｐ１、ｓｐ２へ変換する。この際、クライアント装置１２０は、撮像画像上の点ｓｐ１とｓｐ２との距離を計算し、計算した距離に基づいて、ワールド座標系上の点ｗｐ２に存在する被写体を検出した際のサイズを計算することができる。 In the present embodiment, the prediction unit 325 uses information on the installation position of the imaging device 110 when generating a predicted value of the size of the object detection area in S203.
FIG. 11 is a diagram illustrating an example of a positional relationship between an object and an imaging apparatus. The shooting environment in FIG. 11 shows a shooting environment in which the object 1101 exists in the space where the imaging device 110 is installed. In FIG. 11, wp1 (x1, y1, z1) indicates coordinates in the world coordinate system at the top of the object 1101. Further, wp2 (x2, y2, z2) indicates coordinates in the world coordinate system at the bottom of the object 901. In this case, z2 is obtained from the installation height of the imaging device 110. z1 is obtained from the height of the object. For example, two points wp1 and wp2 are converted from the world coordinate system to coordinates vp1 and vp2 on the view coordinate system. Thereafter, the view coordinate system is converted into points sp1 and sp2 on the screen coordinate system. At this time, the client device 120 calculates the distance between the points sp1 and sp2 on the captured image, and calculates the size when the subject existing at the point wp2 on the world coordinate system is detected based on the calculated distance. be able to.

本実施形態では、予測部３２５は、初めて図４の処理を行う際、以下の処理を行う。撮像装置１１０は、オブジェクト１１０１が存在する場所を変えながら図１１の撮影環境を撮影し、図１１の撮影環境の撮影画像を複数パターン用意する。予測部３２５は、撮像装置１１０により撮影された複数パターンの撮影画像それぞれについて、オブジェクト１１０１のサイズを求める。そして、予測部３２５は、求めたサイズに基づいて、撮影画像中で任意の方向に映るオブジェクトの検出領域のサイズの予測値を生成する。これにより、予測部３２５は、オブジェクトの検出処理の実行前に、オブジェクトの検出領域のサイズの予測値を生成することができる。
本実施形態では、予測部３２５は、初めて図２の処理が行われる際にのみ、予測値を生成するとしたが、動的に予測値の更新を行ってもよい。
また、本実施形態では、実施形態１同様、予測部３２５は、使用する球面調和関数の基底を位数ｍが０の物に限定する。しかしながら予測部３２５は、実施形態２と同様に、動的に制限を解除してもよい。 In the present embodiment, the prediction unit 325 performs the following processing when performing the processing of FIG. 4 for the first time. The imaging device 110 captures the shooting environment of FIG. 11 while changing the location where the object 1101 exists, and prepares a plurality of patterns of captured images of the shooting environment of FIG. The prediction unit 325 obtains the size of the object 1101 for each of a plurality of patterns of captured images captured by the imaging device 110. Then, the predicting unit 325 generates a predicted value of the size of the detection area of the object appearing in an arbitrary direction in the captured image based on the obtained size. Thereby, the prediction unit 325 can generate a predicted value of the size of the detection area of the object before executing the object detection process.
In the present embodiment, the prediction unit 325 generates the prediction value only when the process of FIG. 2 is performed for the first time, but the prediction value may be dynamically updated.
In the present embodiment, as in the first embodiment, the prediction unit 325 limits the basis of the spherical harmonic function to be used to those having an order m of 0. However, the prediction unit 325 may release the restriction dynamically as in the second embodiment.

以上、本実施形態では、予測部３２５は、撮像装置１１０の設置位置に基づいて、撮像装置１１０により撮影された撮影画像中のオブジェクトの検出領域のサイズの予測値を生成することとした。これにより、クライアント装置１２０は、撮像装置１１０に対するオブジェクトの方向でなく、撮像装置１１０の位置に基づいて、オブジェクトの検出領域のサイズの予測値を生成できる。 As described above, in the present embodiment, the prediction unit 325 generates a predicted value of the size of the detection area of the object in the captured image captured by the imaging device 110 based on the installation position of the imaging device 110. Accordingly, the client device 120 can generate a predicted value of the size of the detection area of the object based on the position of the imaging device 110, not the direction of the object with respect to the imaging device 110.

＜その他の実施形態＞
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 <Other embodiments>
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

例えば、上述した検出システムの機能構成の一部又は全てをハードウェアとして撮像装置１１０又はクライアント装置１２０に実装してもよい。
以上、本発明の好ましい実施形態について詳述したが、本発明は係る特定の実施形態に限定されるものではない。上述した各実施形態を任意に組み合わせてもよい。 For example, a part or all of the functional configuration of the above-described detection system may be implemented in the imaging device 110 or the client device 120 as hardware.
As mentioned above, although preferable embodiment of this invention was explained in full detail, this invention is not limited to the specific embodiment which concerns. You may combine each embodiment mentioned above arbitrarily.

１１０撮像装置
１２０クライアント装置
２２１ＣＰＵ 110 Imaging device 120 Client device 221 CPU

Claims

Detecting means for detecting an object from a photographed image photographed by the imaging means;
Evaluation means for evaluating the detection result of the object by the detection means based on the feature amount of the object detected by the detection means and the direction of the object with respect to the imaging means;
An information processing apparatus.

The information processing apparatus according to claim 1, wherein the evaluation unit evaluates a detection result of the object and determines whether or not the detection of the object by the detection unit is a false detection.

The information processing apparatus according to claim 2, further comprising an output unit that outputs information indicating a result determined by the evaluation unit as not being erroneously detected among detection results by the detection unit.

The said evaluation means evaluates the detection result of the said object by the said detection means based on the said feature-value and the predicted value of the said feature-value according to the said direction. Information processing device.

Further comprising generating means for generating the predicted value;
The information according to claim 4, wherein the evaluation unit evaluates a detection result of the object by the detection unit based on the feature amount and the predicted value corresponding to the direction generated by the generation unit. Processing equipment.

The information processing apparatus according to claim 5, wherein the generation unit generates the predicted value based on the direction.

The information processing apparatus according to claim 5, wherein the generation unit regenerates the predicted value based on a result of evaluation by the evaluation unit.

The information processing apparatus according to claim 5, wherein the generation unit generates the predicted value based on an installation position of the imaging unit.

The information processing apparatus according to claim 1, wherein the feature amount is a size of an area that can be detected as the object.

An information processing method executed by an information processing apparatus,
A detection step of detecting an object from a captured image captured by the imaging means;
An evaluation step for evaluating a result of detection of the object in the detection step based on a feature amount of the object detected in the detection step and a direction of the object with respect to the imaging unit;
An information processing method including:

A program for causing a computer to function as each unit of the information processing apparatus according to any one of claims 1 to 9.