JP7182893B2

JP7182893B2 - Image processing device, imaging device, image processing method, and program

Info

Publication number: JP7182893B2
Application number: JP2018068068A
Authority: JP
Inventors: 孝志安達
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-03-30
Filing date: 2018-03-30
Publication date: 2022-12-05
Anticipated expiration: 2038-03-30
Also published as: JP2019180017A

Description

本発明は、画像処理技術に関するものである。 The present invention relates to image processing technology.

近年、撮影装置が撮影した画像に存在する人を解析して人の属性情報（性別や年齢など）を推定する属性推定の技術が注目されている。特許文献１には、画像から人の属性を推定するための技術について記載されている。 2. Description of the Related Art In recent years, attention has been focused on an attribute estimation technique for estimating a person's attribute information (gender, age, etc.) by analyzing a person present in an image captured by an imaging device. Patent Literature 1 describes a technique for estimating attributes of a person from an image.

特許第４８８８２１７号Patent No. 4888217

しかし、上述の特許文献１では、撮影装置が撮影した画像から人の検出はできている状況で、画像に映る人のサイズが小さいために属性推定を行えない場合について考慮していない。一方、撮影範囲に含まれる複数の人について属性推定する場合に、属性推定を行えるよう撮影装置の画角を変更すると、撮影装置の画角から外れてしまった人について、ユーザーが撮影装置の画角を制御し直して属性推定する必要があった。 However, in the above-mentioned Patent Literature 1, there is no consideration of a situation where a person can be detected from an image captured by an imaging device, but attribute estimation cannot be performed because the size of the person in the image is small. On the other hand, when estimating the attributes of multiple people included in the shooting range, if the angle of view of the imaging device is changed so that the attributes can be estimated, the user will not be able to see the images of the people who are outside the angle of view of the imaging device. It was necessary to re-control the angle and estimate the attribute.

本発明は、このような問題に鑑みてなされたものであり、属性推定が行えるように画角を変更しつつ、より広い範囲を対象として複数の人の属性推定を行うことを目的としている。 SUMMARY OF THE INVENTION The present invention has been made in view of such problems, and it is an object of the present invention to estimate the attributes of a plurality of people over a wider range while changing the angle of view so that the attributes can be estimated.

上記課題を解決するために、本発明の画像処理装置は以下の構成を備える。すなわち、撮影手段により撮影された画像に含まれる複数の人を検出する検出手段と、前記検出手段により検出した前記複数の人のうち少なくとも１人を含む領域を２つ以上決定する決定手段と、前記決定手段により決定された領域のそれぞれを前記撮影手段によって順次撮影するための制御コマンドを出力する制御手段と、前記制御手段により出力された前記制御コマンドに基づいて撮影された画像から前記領域に存在する人の属性情報を推定する推定手段とを有し、前記制御手段は、前記決定手段により決定された領域のうち１つの領域が前記撮影手段の撮影範囲に含まれ、かつ、該１つの領域に存在する人の前記属性情報の推定が可能となる撮影条件に基づいて前記制御コマンドを出力することを特徴とする画像処理装置。 In order to solve the above problems, the image processing apparatus of the present invention has the following configuration. That is, detection means for detecting a plurality of persons included in an image captured by a photographing means; determination means for determining two or more areas containing at least one of the plurality of persons detected by the detection means; Control means for outputting a control command for sequentially photographing each of the areas determined by the determination means by the photographing means; and an estimating means for estimating attribute information of a person existing in the area, wherein one of the areas determined by the determining means is included in the photographing range of the photographing means, and the one An image processing apparatus, wherein the control command is output based on photographing conditions that enable estimation of the attribute information of a person existing in one area.

本発明によれば、属性推定が行えるように画角を変更しつつ、より広い範囲を対象として複数の人の属性推定を行うことができる。 According to the present invention, it is possible to estimate the attributes of a plurality of persons over a wider range while changing the angle of view so that the attributes can be estimated.

システム構成を示す模式図である。1 is a schematic diagram showing a system configuration; FIG. 撮影装置の外観図である。1 is an external view of an imaging device; FIG. 撮影装置の機能ブロック図である。It is a functional block diagram of an imaging device. 画像処理装置の機能ブロック図である。1 is a functional block diagram of an image processing device; FIG. 画像処理のフローチャートである。4 is a flowchart of image processing; 画像処理を説明するための模式図である。It is a schematic diagram for demonstrating image processing. 画像処理において使用されるテーブルである。It is a table used in image processing. 画像処理から得られる表である。Fig. 3 is a table resulting from image processing; 画像処理のフローチャートである。4 is a flowchart of image processing; 画像処理を説明するための模式図である。It is a schematic diagram for demonstrating image processing. 画像処理装置の一部の機能を有する撮影装置とクライアント装置の機能ブロック図の一例である。1 is an example of a functional block diagram of an imaging device and a client device having some functions of an image processing device; FIG. 画像処理装置のハードウェア構成を示す概略図である。2 is a schematic diagram showing the hardware configuration of an image processing apparatus; FIG.

本実施形態に係る画像処理装置は、撮影装置によって撮影した連続する画像における複数の推定領域の各々に対して、人の属性情報（性別や年齢）を推定する属性推定を行う画像処理装置である。 The image processing apparatus according to the present embodiment is an image processing apparatus that performs attribute estimation for estimating human attribute information (sex and age) for each of a plurality of estimation regions in successive images captured by an imaging device. .

以下、添付図面を参照しながら本発明の実施形態について説明する。なお、以下の実施形態において示す構成は一例に過ぎず、図示された構成に限定されるものではない。 Embodiments of the present invention will be described below with reference to the accompanying drawings. Note that the configurations shown in the following embodiments are merely examples, and are not limited to the illustrated configurations.

（本実施形態）
図１は、本実施形態に係るシステム構成を示す図である。画像処理装置１００は、後述する画像処理を実行する装置である。なお、画像処理装置１００は、例えば、後述する画像処理の機能を実現するためのプログラムがインストールされたパーソナルコンピュータなどによって実現される。 (this embodiment)
FIG. 1 is a diagram showing a system configuration according to this embodiment. The image processing device 100 is a device that executes image processing, which will be described later. Note that the image processing apparatus 100 is realized by, for example, a personal computer or the like in which a program for realizing image processing functions described later is installed.

表示装置１０１は、画像処理装置１００に接続され、後述する画像処理により出力されるデータやＵＩ（ｕｓｅｒｉｎｔｅｒｆａｃｅ）などをユーザーが閲覧するための表示装置である。 The display device 101 is connected to the image processing device 100, and is a display device for a user to view data output by image processing described later, a UI (user interface), and the like.

撮影装置１０２は、画像を撮影する装置であり、パン・チルト・ズーム（Ｐａｎ－Ｔｉｌｔ－Ｚｏｏｍ、以下「ＰＴＺ」と称す）制御可能に構成されている。また、撮影装置１０２は、例えば、ネットワークを介して画像の画像データなどを送信できるネットワークカメラなどである。 The photographing device 102 is a device for photographing an image, and is configured to be capable of Pan-Tilt-Zoom (hereinafter referred to as “PTZ”) control. Also, the imaging device 102 is, for example, a network camera capable of transmitting image data of an image via a network.

記録装置１０３は、撮影装置１０２で撮影された画像の画像データなどを記録することができる装置である。また、画像処理装置１００、撮影装置１０２、および記録装置１０３は、ネットワーク１０４を介して通信を行う。ネットワーク１０４は、例えばＥｔｈｅｒｎｅｔ（商標）等の通信規格を満足する複数のルータ、スイッチ、ケーブル等から構成される。本実施形態においては画像処理装置１００、撮影装置１０２、記録装置１０３間の通信を行うことができるものであればその通信規格、規模、構成を問わない。例えば、ネットワーク１０４はインターネットや有線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、無線ＬＡＮ（ＷｉｒｅｌｅｓｓＬＡＮ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）等により構成されてもよい。 A recording device 103 is a device capable of recording image data of an image captured by the imaging device 102 . Also, the image processing apparatus 100 , the image capturing apparatus 102 , and the recording apparatus 103 communicate via the network 104 . The network 104 is composed of a plurality of routers, switches, cables, etc. that satisfy communication standards such as Ethernet (trademark). In this embodiment, any communication standard, scale, or configuration is acceptable as long as the image processing apparatus 100, image capturing apparatus 102, and recording apparatus 103 can communicate with each other. For example, the network 104 may be configured by the Internet, a wired LAN (Local Area Network), a wireless LAN (Wireless LAN), a WAN (Wide Area Network), or the like.

また、図１の構成では、撮影装置１０２により撮影された画像の画像データ（ライブ映像）や記録装置１０３にて記録された画像データ（過去に撮影した画像）などが画像処理装置１００に送信される。 In the configuration of FIG. 1, image data (live video) of an image captured by the image capturing apparatus 102, image data (images captured in the past) recorded by the recording apparatus 103, and the like are transmitted to the image processing apparatus 100. be.

次に、図１２を参照して、本実施形態の後述する各機能を実現するための画像処理装置１００のハードウェア構成を説明する。ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１２０１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１２００が実行するコンピュータプログラムを一時的に記憶する。また、ＲＡＭ１２０１は、通信インターフェース１２０３を介して外部から取得したデータ（コマンドや画像データ）などを一時的に記憶する。また、ＲＡＭ１２０１は、ＣＰＵ１２００が後述の画像処理を実行する際に用いるワークエリアを提供する。また、ＲＡＭ１２０１は、例えば、フレームメモリとして機能したり、バッファメモリとして機能したりする。 Next, with reference to FIG. 12, the hardware configuration of the image processing apparatus 100 for realizing each function of this embodiment, which will be described later, will be described. A RAM (Random Access Memory) 1201 temporarily stores computer programs executed by a CPU (Central Processing Unit) 1200 . The RAM 1201 also temporarily stores data (commands and image data) obtained from outside via the communication interface 1203 . The RAM 1201 also provides a work area used when the CPU 1200 executes image processing, which will be described later. Also, the RAM 1201 functions, for example, as a frame memory or as a buffer memory.

ＣＰＵ１２００は、ＲＡＭ１２０１に格納されるコンピュータプログラムを実行する。なおＣＰＵ以外にも、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）等のプロセッサやＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）を用いてもよい。 CPU 1200 executes computer programs stored in RAM 1201 . In addition to the CPU, a processor such as a DSP (Digital Signal Processor) or an ASIC (Application Specific Integrated Circuit) may be used.

ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）１２０２は、オペレーティングシステムのプログラムや画像データを記憶する。 A HDD (Hard Disk Drive) 1202 stores an operating system program and image data.

後述する図５や図９のフローチャートを実行するためのコンピュータプログラムやデータはＨＤＤ１２０２に格納されており、ＣＰＵ１２００による制御に従って、適宜、ＲＡＭ１２０１にロードされ、ＣＰＵ１２００によって実行される。ＨＤＤ以外にもフラッシュメモリ等の他の記憶媒体を用いてもよい。 Computer programs and data for executing the flowcharts of FIGS. 5 and 9, which will be described later, are stored in the HDD 1202, loaded into the RAM 1201 as appropriate under the control of the CPU 1200, and executed by the CPU 1200. FIG. In addition to the HDD, other storage media such as flash memory may be used.

次に、図２および図３を参照して、撮影装置１０２について説明する。図２は、本実施形態に係る撮影装置１０２の外観図である。また、図３は、本実施形態に係る撮影装置１０２の機能ブロック図である。 Next, the imaging device 102 will be described with reference to FIGS. 2 and 3. FIG. FIG. 2 is an external view of the imaging device 102 according to this embodiment. Also, FIG. 3 is a functional block diagram of the imaging device 102 according to this embodiment.

レンズ２０２の光軸の向く方向が撮影装置１０２の撮影方向であり、レンズ２０２を通過した光束は、撮影部２０５の撮像素子に結像する。なお、レンズ２０２は、フォーカスレンズ及びズームレンズ等を備える。また、レンズ駆動部２１０は、フォーカスレンズ及びズームレンズなどの駆動系を含み、レンズ２０２の焦点距離を変更する。レンズ駆動部２１０は、パンチルトズーム制御部２０８により制御される。 The direction in which the optical axis of the lens 202 faces is the imaging direction of the imaging device 102 , and the light flux passing through the lens 202 forms an image on the imaging element of the imaging unit 205 . Note that the lens 202 includes a focus lens, a zoom lens, and the like. Also, the lens drive unit 210 includes drive systems such as a focus lens and a zoom lens, and changes the focal length of the lens 202 . The lens drive unit 210 is controlled by the pan/tilt/zoom control unit 208 .

パン駆動部２００は、パン動作を行うメカ駆動系及び駆動源のモータを含み、撮影装置１０２の撮影方向をパン方向２０３に変更するように駆動する。また、パン駆動部２００は、パンチルトズーム制御部２０８により制御される。 The pan drive unit 200 includes a mechanical drive system that performs a pan operation and a motor as a drive source, and drives to change the photographing direction of the photographing apparatus 102 to a pan direction 203 . Also, the pan drive unit 200 is controlled by a pan tilt zoom control unit 208 .

チルト駆動部２０１は、チルト動作を行うメカ駆動及び駆動源のモータを含み、撮影装置１０２の撮影方向をチルト方向２０４に変更するように駆動する。チルト駆動部２０１は、パンチルトズーム制御部２０８により制御される。 The tilt drive unit 201 includes a mechanical drive that performs a tilt operation and a motor as a drive source, and drives the image pickup device 102 to change the shooting direction to a tilt direction 204 . The tilt drive unit 201 is controlled by a pan/tilt/zoom control unit 208 .

撮影部２０５は、ＣＣＤ（ｃｈａｒｇｅｃｏｕｐｌｅｄｄｅｖｉｃｅ）センサやＣＭＯＳ（ｃｏｍｐｌｅｍｅｎｔａｒｙｍｅｔａｌｏｘｉｄｅｓｅｍｉｃｏｎｄｕｃｔｏｒ）センサ等の撮像素子を有する。そして、撮影部２０５は、レンズ２０２を通って結像された被写体像を光電変換して電気信号を生成する。 The imaging unit 205 has an imaging element such as a CCD (charge coupled device) sensor or a CMOS (complementary metal oxide semiconductor) sensor. The imaging unit 205 photoelectrically converts the subject image formed through the lens 202 to generate an electric signal.

画像処理部２０６は、撮影部２０５において光電変換された電気信号をデジタル信号へ変換する処理や、圧縮符号化処理などを行い、画像データを生成する。 The image processing unit 206 performs processing for converting the electrical signal photoelectrically converted by the imaging unit 205 into a digital signal, compression encoding processing, and the like, and generates image data.

パンチルトズーム制御部２０８は、システム制御部２０７から伝達された指示に基づいて、パン駆動部２００、チルト駆動部２０１及びレンズ駆動部２１０の制御を行う。 The pan/tilt/zoom control unit 208 controls the pan drive unit 200 , the tilt drive unit 201 and the lens drive unit 210 based on the instructions transmitted from the system control unit 207 .

通信部２０９は、画像処理装置１００との通信を行うインターフェースである。例えば、通信部２０９は、生成された画像データを画像処理装置１００に送信する。また、通信部２０９は、画像処理装置１００から送信された撮影装置１０２のパン・チルト・ズーム動作などを制御する制御コマンドを受信し、システム制御部２０７へ伝達する。また、通信部２０９は、有線ＬＡＮインターフェースや無線ＬＡＮインターフェースを有する。 A communication unit 209 is an interface that communicates with the image processing apparatus 100 . For example, the communication unit 209 transmits generated image data to the image processing apparatus 100 . The communication unit 209 also receives control commands for controlling pan/tilt/zoom operations of the image capturing apparatus 102 transmitted from the image processing apparatus 100 and transmits the control commands to the system control unit 207 . Also, the communication unit 209 has a wired LAN interface and a wireless LAN interface.

システム制御部２０７は、撮影装置１０２の全体を制御し、例えば、次のような処理を行う。すなわち、システム制御部２０７は、通信部２０９から伝達された制御コマンドを解析し、解析した内容に基づく処理を行う。例えば、システム制御部２０７は、パンチルトズーム制御部２０８に対してパン・チルト・ズーム動作の指示を行う。 The system control unit 207 controls the entire imaging device 102 and performs, for example, the following processing. That is, the system control unit 207 analyzes the control command transmitted from the communication unit 209 and performs processing based on the analyzed content. For example, the system control unit 207 instructs the pan/tilt/zoom operation to the pan/tilt/zoom control unit 208 .

なお、パン角度は、パン駆動部２００の駆動端の一方を０°としたときの、撮影装置１０２のパン方向２０３における撮影方向（光軸）の角度である。また、チルト角度は、チルト駆動部２０１の駆動端の一方を０°としたときの、撮影装置１０２のチルト方向２０４における撮影方向（光軸）の角度である。 Note that the pan angle is the angle of the imaging direction (optical axis) in the pan direction 203 of the imaging apparatus 102 when one driving end of the pan driving unit 200 is 0°. The tilt angle is the angle of the photographing direction (optical axis) in the tilt direction 204 of the photographing apparatus 102 when one drive end of the tilt drive unit 201 is 0°.

次に、図４を参照して、本実施形態に係る画像処理装置１００について説明する。図４は、本実施形態に係る画像処理装置の機能ブロック図である。なお、図４に示す各機能ブロックは、ソフトウェアモジュールで実現してもよいし、ハードウェアモジュールで実現してもよい。また、ソフトウェアモジュールとハードウェアモジュールを協働させて実現してもよい。なお、以降の説明では画像処理装置１００が有する各機能は、画像処理装置１００のＣＰＵ１２００により実行されるものとする。 Next, the image processing apparatus 100 according to this embodiment will be described with reference to FIG. FIG. 4 is a functional block diagram of the image processing apparatus according to this embodiment. Note that each functional block shown in FIG. 4 may be realized by a software module or by a hardware module. Alternatively, a software module and a hardware module may be implemented in cooperation. Note that in the following description, each function of the image processing apparatus 100 is assumed to be executed by the CPU 1200 of the image processing apparatus 100. FIG.

本実施形態に係る画像処理装置１００は、通信部４００と、検出領域設定部４０１と、操作受付部４０２と、検出部４０３と、推定領域決定部４０４と、コマンド管理部４０５と、属性推定部４０６と、出力制御部４０７と、記憶部４０８と、を有する。 The image processing apparatus 100 according to this embodiment includes a communication unit 400, a detection area setting unit 401, an operation reception unit 402, a detection unit 403, an estimation area determination unit 404, a command management unit 405, and an attribute estimation unit. 406 , an output control unit 407 , and a storage unit 408 .

通信部４００は、撮影装置１０２により撮影された画像の画像データの受信や、撮影装置１０２のパン・チルト・ズームを制御する制御コマンドなどを送信するための通信を行う。また、通信部４００は、例えば、記録装置１０３に記憶された画像データ、または他の外部装置のストレージデバイスに記憶された画像データなどを各装置からネットワーク１０４を介して送信してもらい受信する。 The communication unit 400 performs communication for receiving image data of an image captured by the imaging device 102 and for transmitting control commands for controlling panning, tilting, and zooming of the imaging device 102 . The communication unit 400 also receives image data stored in the recording device 103 or image data stored in a storage device of another external device, for example, transmitted from each device via the network 104 .

検出領域設定部４０１は、通信部４００を介して得られた画像において、検出部４０３が処理を行う対象とする領域である検出領域を設定する。なお、操作受付部４０２を介して、ユーザーにより設定された画像内の領域を検出領域として設定してもよいし、事前に設定された所定の領域を検出領域としてもよい。なお、ユーザーによる検出領域の指示や、事前に設定された領域がなければ、通信部４００を介して得られた画像データが示す画像全体を検出領域としてもよい。 The detection area setting unit 401 sets a detection area, which is an area to be processed by the detection unit 403 in the image obtained via the communication unit 400 . Note that an area within the image set by the user via the operation reception unit 402 may be set as the detection area, or a predetermined area set in advance may be set as the detection area. Note that if there is no detection area specified by the user or a previously set area, the entire image indicated by the image data obtained via the communication unit 400 may be used as the detection area.

操作受付部４０２は、マウス、キーボードなどの入力装置（不図示）を介して、ユーザーにより行われた操作を受け付ける。本実施形態における操作受付部４０２は、例えば、出力制御部４０７により表示装置１０１に出力された画像に対して、ユーザーが入力装置を介して設定した検出領域の位置の情報を受け付ける。 An operation reception unit 402 receives an operation performed by a user via an input device (not shown) such as a mouse and keyboard. The operation reception unit 402 in this embodiment receives, for example, information on the position of the detection area set by the user via the input device for the image output to the display device 101 by the output control unit 407 .

検出部４０３は、通信部４００を介して得られた画像において検出領域に含まれる複数の人の検出を行う。なお、本実施形態に係る検出部４０３は、学習画像から人体の特徴量（Ｈａａｒ－Ｌｉｋｅ特徴量、ＨＯＧ特徴量など）が学習されたうえで作成された識別器を保持している。そして、検出部４０３は、学習の結果である識別器に画像が入力され、該画像から人体の検出を行う。なお、本実施形態における検出部４０３は、人の体を検出する人体検出を行うとして説明するが、これに限定されない。例えば、検出部４０３は、人の顔を検出する顔検出や人の頭部を検出する頭部検出であってもよい。なお、以降の説明において、人体は人の全身のことであり、人の顔や胴体を含むものとする。 The detection unit 403 detects a plurality of persons included in the detection area in the image obtained via the communication unit 400 . Note that the detection unit 403 according to the present embodiment holds a discriminator that is created after learning a human body feature amount (Haar-Like feature amount, HOG feature amount, etc.) from a learning image. Then, the detection unit 403 receives an image as input to the classifier that is the result of learning, and detects a human body from the image. Note that the detection unit 403 in this embodiment is described as performing human body detection for detecting a human body, but the present invention is not limited to this. For example, the detection unit 403 may perform face detection for detecting a person's face or head detection for detecting a person's head. In the following description, the human body means the whole body of a person, including the face and body of the person.

推定領域決定部４０４は、検出部４０３における検出の結果に基づいて、検出部４０３により検出した複数の人のうち少なくとも１人を含む推定領域を複数決定する。推定領域に含まれる人が、属性推定部４０６により属性情報を推定される対象となる。なお、推定領域の位置に関する情報は記憶部４０８にて記憶される。 Based on the result of detection by the detection unit 403 , the estimated region determination unit 404 determines a plurality of estimated regions including at least one of the plurality of persons detected by the detection unit 403 . A person included in the estimation area is subject to estimation of attribute information by the attribute estimation unit 406 . Note that information about the position of the estimated region is stored in the storage unit 408 .

コマンド管理部４０５は、推定領域決定部４０４において決定された複数の推定領域の各々について、順次撮影されるよう撮影装置１０２を制御するための制御コマンドを生成する。なおこのとき、１つの推定領域が撮影装置１０２の撮影範囲に含まれ、かつ、該推定領域に存在する人の属性情報を属性推定部４０４が推定できる撮影条件になるよう撮影装置１０２を制御する制御コマンドが生成される。またこのとき、コマンド管理部４０５により生成される制御コマンドは、例えば、撮影装置１０２のパン・チルト・ズームの少なくともいずれか１つを制御するための制御コマンドである。なお、本実施形態におけるコマンド管理部４０５にて生成された制御コマンドは、通信部４００および通信部２０９を介して、撮影装置１０２におけるシステム制御部２０７へ送られ、該制御コマンドに基づいて撮影装置１０２の制御が行われる。 A command management unit 405 generates a control command for controlling the image capturing apparatus 102 to sequentially capture each of the plurality of estimated areas determined by the estimated area determination unit 404 . At this time, the imaging device 102 is controlled so that one estimated region is included in the imaging range of the imaging device 102, and the imaging conditions are such that the attribute estimation unit 404 can estimate the attribute information of a person existing in the estimated region. A control command is generated. At this time, the control command generated by the command management unit 405 is, for example, a control command for controlling at least one of panning, tilting, and zooming of the imaging device 102 . Note that the control command generated by the command management unit 405 in this embodiment is sent to the system control unit 207 in the image capturing apparatus 102 via the communication unit 400 and the communication unit 209, and the image capturing apparatus is controlled based on the control command. 102 control is performed.

属性推定部４０６は、撮影装置１０２から得られる撮影画像の各々に対して推定領域に存在する人の属性情報を推定する。本実施形態に係る属性推定部４０６は、学習画像によって学習した識別器を利用して人の顔領域から属性情報（性別や年齢に関する情報）を推定する。なお、人の顔領域から属性情報を推定する際、画像に含まれる顔領域のサイズ（画素数）が小さいと属性情報を推定できない場合がある。つまり、人の顔領域から属性情報を推定する場合、最低限必要な顔領域の画素数が存在する。また、本実施形態における属性情報を人の性別に関する情報や年齢に関する情報として説明するが、これに限定されない。例えば、人種や服装、髪形などに関する情報であってもよい。 The attribute estimating unit 406 estimates the attribute information of a person present in the estimation area for each captured image obtained from the imaging device 102 . The attribute estimating unit 406 according to the present embodiment estimates attribute information (information regarding sex and age) from a person's face region using a classifier learned from learning images. When estimating attribute information from a person's face area, it may not be possible to estimate the attribute information if the size (number of pixels) of the face area included in the image is small. In other words, when estimating attribute information from a person's face area, there is a minimum required number of pixels in the face area. Also, although the attribute information in the present embodiment is explained as information about a person's sex and information about a person's age, it is not limited to this. For example, it may be information about race, clothing, hairstyle, and the like.

出力制御部４０７は、撮影装置１０２から得られた画像の画像データや、属性推定部４０６により得られた属性情報の推定結果を表示装置１０１に出力する。 The output control unit 407 outputs the image data of the image obtained from the imaging device 102 and the attribute information estimation result obtained by the attribute estimation unit 406 to the display device 101 .

記憶部４０８は、検出領域や推定領域の位置に関する情報などを記憶する。 A storage unit 408 stores information about the positions of the detection area and the estimation area.

以上のように、推定領域に存在する人の属性情報を属性推定部４０６が推定できる撮影条件を満たしつつ、複数の推定領域の各々について順次撮影されるよう撮像装置１０２を制御して、該複数の推定領域の各々に存在する人の属性情報を推定する。こうすることにより、より広い範囲を対象として複数の人の属性推定を行うことが可能となる。 As described above, the imaging device 102 is controlled so that each of the plurality of estimation regions is sequentially photographed while satisfying the photographing conditions under which the attribute estimating unit 406 can estimate the attribute information of the person present in the estimation region. The attribute information of a person existing in each of the estimation regions is estimated. By doing so, it is possible to estimate the attributes of a plurality of people over a wider range.

次に本実施形態における画像処理について図５に示すフローチャートを参照して説明する。図５は、本実施形態に係る画像処理の流れを示すフローチャートである。なお、図５に示すフローチャートの処理は、主に図４に示す各機能ブロックにより実行される。また、図５に示すフローチャートの処理は、ＨＤＤ１２０２に格納されたコンピュータプログラムに従って画像処理装置１００のＣＰＵ１２００により実行される。以下、画像処理装置１００のＣＰＵ１２００により実行される処理について説明する。 Next, image processing in this embodiment will be described with reference to the flowchart shown in FIG. FIG. 5 is a flowchart showing the flow of image processing according to this embodiment. The processing of the flowchart shown in FIG. 5 is mainly executed by each functional block shown in FIG. The processing of the flowchart shown in FIG. 5 is executed by the CPU 1200 of the image processing apparatus 100 according to the computer program stored in the HDD 1202 . Processing executed by the CPU 1200 of the image processing apparatus 100 will be described below.

Ｓ５０１にて、通信部４００は、撮影装置１０２により撮影された画像の画像データを受信する。 In S501 , the communication unit 400 receives image data of an image captured by the imaging device 102 .

次に、Ｓ５０２にて、検出領域設定部４０１は、通信部４００を介して得られた画像において、検出部４０３が画像に含まれる複数の人を検出する処理を行う対象とする領域である検出領域を設定する。本実施形態における操作受付部４０２は、出力制御部４０７により表示装置１０１に出力された画像に対して、ユーザーが設定した画像内の領域の位置に関する情報を受け付ける。そして、検出領域設定部４０１は、操作受付部４０２が受け付けた情報である画像内の領域の位置を検出領域の位置として設定する。なお本実施形態における検出領域の位置は、該検出領域の重心点におけるパン角度と、チルト角度と、該検出領域の画角となるズーム倍率により定められる。なお、検出領域の位置に関する情報は、記憶部４０８にて記憶される。 Next, in S502, the detection area setting unit 401 detects an area, which is an area to be subjected to processing for detecting a plurality of people included in the image by the detection unit 403, in the image obtained via the communication unit 400. Set a region. The operation reception unit 402 in this embodiment receives information about the position of the area in the image set by the user for the image output to the display device 101 by the output control unit 407 . Then, the detection area setting unit 401 sets the position of the area in the image, which is the information received by the operation receiving unit 402, as the position of the detection area. Note that the position of the detection area in this embodiment is determined by the pan angle and tilt angle at the center of gravity of the detection area, and the zoom magnification that is the angle of view of the detection area. Information about the position of the detection area is stored in the storage unit 408 .

次に、Ｓ５０３にて、検出部４０３は、検出領域設定部４０１により設定された検出領域に含まれる複数の人体の検出を行う。このとき、検出部４０３は、検出領域内で検出した複数の人体各々の顔領域の画素数、および、位置情報も検出する。なお、本実施形態における検出部４０３は人体検出を行うが、検出した人体の画素数から顔領域の画素数を推定することができる。 Next, in S503 , the detection unit 403 detects a plurality of human bodies included in the detection area set by the detection area setting unit 401 . At this time, the detection unit 403 also detects the number of pixels in the face area of each of the plurality of human bodies detected within the detection area and the position information. Note that the detection unit 403 in this embodiment performs human body detection, and the number of pixels in the face area can be estimated from the number of pixels of the detected human body.

次に、Ｓ５０４にて、検出領域に複数の人体が存在しない場合（Ｓ５０４で’Ｎｏ’）、Ｓ５０１の処理を行う。 Next, in S504, if a plurality of human bodies do not exist in the detection area ('No' in S504), the process of S501 is performed.

Ｓ５０４にて、検出領域に複数の人体が存在する場合（Ｓ５０４で’Ｙｅｓ’）、Ｓ５０５の処理を行う。Ｓ５０５にて、推定領域決定部４０４は、検出部４０３の結果に基づいて、検出部４０３により検出した複数の人体のうち少なくとも１人を含む推定領域を複数決定する。以下図６を参照して、推定領域決定部４０４の処理について更に詳細に説明する。 In S504, when a plurality of human bodies exist in the detection area ('Yes' in S504), the process of S505 is performed. In S505 , estimated region determining unit 404 determines a plurality of estimated regions including at least one of the plurality of human bodies detected by detecting unit 403 based on the result of detecting unit 403 . The processing of the estimated region determination unit 404 will be described in more detail below with reference to FIG.

図６は、推定領域決定部４０４の処理を説明するための図である。図６（ａ）において、画像６００は、撮影装置１０２により撮影された画像である。検出領域６０１は、ユーザーにより設定された画像内における領域であり、該領域内には検出部４０３により検出された複数の人体が存在している。 6A and 6B are diagrams for explaining the processing of the estimated region determination unit 404. FIG. In FIG. 6A, an image 600 is an image captured by the imaging device 102 . A detection area 601 is an area within the image set by the user, and a plurality of human bodies detected by the detection unit 403 are present in the area.

範囲６０２は、属性推定部４０６が該範囲に含まれる人の属性情報を推定するために最低限必要な顔領域の画素数を取得できるズーム倍率で撮影した場合の撮影範囲である。この場合、範囲６０２は、該範囲に含まれる６人各々の属性情報を推定するために最低限必要な顔領域の画素数を取得できるズーム倍率で撮影した撮影範囲であり、該ズーム倍率以下で撮影した撮影範囲になると該６人に対して属性情報を推定できなくなる。なお、範囲６０２は、検出部４０３が検出した人の顔領域の画素数に基づいて、推定領域決定部４０４により決定される。例えば、属性推定部４０６が人の属性情報を推定するうえで必要な顔領域の画素数が縦４０ピクセル以上かつ横３０ピクセル以上が条件であるとする。このとき、推定領域決定部４０４は、検出部４０３により検出された人の顔領域の画素数が該条件を満たすうえでの最小ズーム倍率で撮影した撮影範囲を範囲６０２として決定する。 A range 602 is a photographing range when photographing is performed at a zoom magnification that allows the attribute estimation unit 406 to acquire the minimum number of pixels in the face area necessary for estimating the attribute information of a person included in the range. In this case, a range 602 is a range photographed at a zoom magnification that can obtain the minimum number of pixels in the face region necessary for estimating the attribute information of each of the six persons included in the range. When the photographing range is reached, the attribute information cannot be estimated for the six persons. Note that the range 602 is determined by the estimated area determination unit 404 based on the number of pixels in the human face area detected by the detection unit 403 . For example, it is assumed that the number of pixels required for the attribute estimation unit 406 to estimate the attribute information of a person is 40 pixels or more in the vertical direction and 30 pixels or more in the horizontal direction. At this time, the estimated area determination unit 404 determines, as a range 602 , an imaging range photographed at the minimum zoom magnification for which the number of pixels of the human face area detected by the detection unit 403 satisfies the condition.

そして、推定領域決定部４０４は、範囲６０２内において範囲６０２の面積以下となり、かつ、検出部４０３により検出した複数の人のうち少なくとも１人を含む領域６０２ａを推定領域として決定する。なおこのとき、推定領域決定部４０４は、範囲６０２に対応する推定領域である領域６０２ａに対して、推定領域を特定するための推定領域ＩＤである“１”を付与する。以上のように、推定領域決定部４０４は、検出部４０３が検出した人の顔領域の画素数に基づいて範囲６０２を決定し、該範囲６０２に基づいて推定領域６０２aを決定する。 Then, the estimated area determining unit 404 determines an area 602a having an area equal to or less than the area of the range 602 and including at least one of the plurality of persons detected by the detecting unit 403 as an estimated area. At this time, the estimated area determination unit 404 assigns an estimated area ID of “1” for specifying the estimated area to the area 602 a that is the estimated area corresponding to the range 602 . As described above, the estimated area determination unit 404 determines the range 602 based on the number of pixels in the human face area detected by the detection unit 403, and determines the estimated area 602a based on the range 602. FIG.

次に、推定領域決定部４０４は、範囲６０３を決定する。範囲６０２と同様、範囲６０３は、属性推定部４０６が該範囲に含まれる人の属性情報を推定するために最低限必要な顔領域の画素数を取得できるズーム倍率で撮影した場合の撮影範囲である。そして、推定領域決定部４０４は、範囲６０３内において範囲６０３の面積以下となり、かつ、検出部４０３により検出した複数の人のうち少なくとも１人を含む領域である領域６０３ｂを決定する。なおこのとき、推定領域決定部４０４は、範囲６０３に対応する推定領域である領域６０３ｂに対して、推定領域ＩＤである“２”を付与する。 Next, the estimated area determination unit 404 determines the range 603. FIG. As with the range 602, the range 603 is a photographing range when photographed at a zoom magnification that allows the attribute estimation unit 406 to acquire the minimum number of pixels in the face area necessary for estimating the attribute information of the person included in the range. be. Then, the estimated area determination unit 404 determines an area 603 b that is an area within the area 603 that is equal to or less than the area of the area 603 and that includes at least one of the plurality of persons detected by the detection unit 403 . At this time, the estimated area determining unit 404 assigns an estimated area ID of “2” to the area 603 b that is the estimated area corresponding to the range 603 .

次に、推定領域決定部４０４は、範囲６０４を決定する。範囲６０２と同様、範囲６０４は、属性推定部４０６が該範囲に含まれる人の属性情報を推定するために最低限必要な顔領域の画素数を取得できるズーム倍率で撮影した場合の撮影範囲である。そして、推定領域決定部４０４は、範囲６０４内において範囲６０４の面積以下となり、かつ、検出部４０３により検出した複数の人のうち少なくとも１人を含む領域である領域６０４ｃを決定する。なおこのとき、推定領域決定部４０４は、範囲６０４に対応する推定領域である領域６０４ｃに対して、推定領域ＩＤである“３”を付与する。 Next, the estimated area determination unit 404 determines the range 604. FIG. As with the range 602, the range 604 is a shooting range when the image is taken at a zoom magnification that allows the attribute estimation unit 406 to acquire the minimum number of pixels in the face area necessary for estimating the attribute information of the person included in the range. be. Then, the estimated area determining unit 404 determines an area 604 c that is an area within the range 604 that is equal to or less than the area of the range 604 and that includes at least one of the plurality of persons detected by the detecting unit 403 . At this time, the estimated area determination unit 404 assigns an estimated area ID of “3” to the area 604 c that is the estimated area corresponding to the range 604 .

次に、推定領域決定部４０４は、範囲６０５を決定する。範囲６０２と同様、範囲６０５は、属性推定部４０６が該範囲に含まれる人の属性情報を推定するために最低限必要な顔領域の画素数を取得できるズーム倍率で撮影した場合の撮影範囲である。そして、推定領域決定部４０４は、範囲６０５内において範囲６０５の面積以下となり、かつ、検出部４０３により検出した複数の人のうち少なくとも１人を含む領域である領域６０５ｄを決定する。なおこのとき、推定領域決定部４０４は、範囲６０５に対応する推定領域である領域６０５ｄに対して、推定領域ＩＤである“４”を付与する。 Next, the estimated area determination unit 404 determines the range 605. FIG. As with the range 602, the range 605 is a photographing range when photographed at a zoom magnification that allows the attribute estimation unit 406 to acquire the minimum number of pixels in the face area necessary for estimating the attribute information of the person included in the range. be. Then, the estimated area determining unit 404 determines an area 605 d that is an area within the range 605 that is equal to or less than the area of the range 605 and that includes at least one of the plurality of persons detected by the detecting unit 403 . At this time, the estimated area determination unit 404 assigns an estimated area ID of “4” to the area 605 d that is the estimated area corresponding to the range 605 .

また、推定領域決定部４０４は、推定領域を特定するための推定領域ＩＤと、推定領域の位置と、推定領域内に含まれる人の属性情報を推定するために最低限必要なズーム倍率と、推定領域内に含まれる人数と、を対応付けてテーブル７００に格納する。なお、図７に示すテーブル７００は、推定領域決定部４０４によりデータが格納されるデータ構造図の一例である。 In addition, the estimation area determination unit 404 includes an estimation area ID for specifying the estimation area, the position of the estimation area, the minimum necessary zoom ratio for estimating the attribute information of the person included in the estimation area, and the number of people included in the estimated area are stored in the table 700 in association with each other. Note that a table 700 shown in FIG. 7 is an example of a data structure diagram in which data is stored by the estimated region determination unit 404. FIG.

推定領域ＩＤは、推定領域を特定すると共に、複数の推定領域の各々について順次撮影するために撮影装置を制御していく際、対象とする推定領域の順番を表す。また、推定領域の位置は、該推定領域の重心点におけるパン角度と、チルト角度と、該推定領域の画角となるズーム倍率と、により定められる。図７に示す必要ズーム倍率は、推定領域内に含まれる人の属性情報を推定するために最低限必要なズーム倍率である。例えば、推定領域ＩＤが”１”である領域６０２ａにおける人の属性情報を推定するために最低限必要なズーム倍率は、範囲６０２が撮影範囲となる場合のズーム倍率（ｇｇ）となる。このとき、ズーム倍率（ｇｇ）以下で撮影した撮影範囲になると、領域６０２ａにおける人に対して属性推定部４０６は属性情報を推定できなくなる。なお、図６（ｅ）は、推定領域決定部４０４により決定された複数の推定領域と、推定領域ＩＤが付与された図である。なお、図６（e）に示される図は、出力制御部４０７により表示装置１０１に表示されてもよい。 The estimated area ID identifies the estimated area and indicates the order of the estimated areas to be targeted when controlling the imaging device to sequentially capture images for each of the plurality of estimated areas. Also, the position of the estimation area is determined by the pan angle, the tilt angle, and the zoom factor, which is the angle of view of the estimation area, at the center of gravity of the estimation area. The required zoom magnification shown in FIG. 7 is the minimum required zoom magnification for estimating the attribute information of a person included in the estimation area. For example, the minimum necessary zoom magnification for estimating the attribute information of a person in the area 602a whose estimated area ID is "1" is the zoom magnification (gg) when the range 602 is the imaging range. At this time, the attribute estimating unit 406 cannot estimate the attribute information of the person in the area 602a when the photographing range is photographed at a zoom magnification (gg) or less. FIG. 6E is a diagram showing a plurality of estimated areas determined by the estimated area determination unit 404 and the estimated area IDs. 6E may be displayed on the display device 101 by the output control unit 407. FIG.

以上のように、本実施形態における推定領域決定部４０４は、検出部４０３により検出された人の顔領域の画素数に基づいて、検出部４０３により検出した複数の人のうち少なくとも１人を含む推定領域を複数決定する。なお、本実施形態における推定領域決定部４０４は、検出領域６０１の左下から右方向へ走査するように推定領域を決定したが、これに限定されない。例えば、検出領域６０１の左上から右方向へ走査するように推定領域を決定してもよい。 As described above, the estimated region determination unit 404 in this embodiment includes at least one of the plurality of persons detected by the detection unit 403 based on the number of pixels in the human face region detected by the detection unit 403. Multiple estimation regions are determined. Although the estimation area determination unit 404 in this embodiment determines the estimation area so as to scan the detection area 601 from the lower left to the right, the present invention is not limited to this. For example, the estimation area may be determined so as to scan from the upper left of the detection area 601 to the right.

なお、本実施形態における推定領域決定部４０４は、検出部４０３により検出した複数の人のうち少なくとも１人の人を含む推定領域を複数決定する際、人の顔領域が推定領域に含まれるように決定したが、これに限定されない。例えば、人の上半身が推定領域に含まれるように決定してもよい。 Note that, when determining a plurality of estimation areas including at least one person among the plurality of persons detected by the detection unit 403, the estimation area determination unit 404 in this embodiment sets the estimation area so that the person's face area is included in the estimation area. determined to be, but not limited to. For example, it may be determined that the upper body of a person is included in the estimation area.

また、本実施形態における推定領域決定部４０４は、推定領域の数が少なくなるよう、範囲６０２、範囲６０３、範囲６０４、範囲６０５のそれぞれに対して一つずつ推定領域を決定するようにしたが、これに限定されない。例えば、推定領域決定部４０４は、範囲６０２において、少なくとも１人の人を含む推定領域を２つ決定してもよい。なお、推定領域決定部４０４が決定する推定領域の数をより少なくすることで、検出部４０３が検出した複数の人に対して、属性推定部４０６が属性情報を推定する処理を行う時間をより短くすることができる。 In addition, the estimation area determination unit 404 in this embodiment determines one estimation area for each of the ranges 602, 603, 604, and 605 so as to reduce the number of estimation areas. , but not limited to. For example, the estimated region determiner 404 may determine two estimated regions in the range 602 that include at least one person. By reducing the number of estimation regions determined by the estimation region determination unit 404, the time for the attribute estimation unit 406 to estimate the attribute information of the plurality of people detected by the detection unit 403 can be shortened. can be shortened.

次に図５に示すフローチャートの説明に戻る。Ｓ５０６において、コマンド管理部４０５は、推定領域決定部４０４において決定された複数の推定領域の各々について、順次撮影されるよう撮影装置１０２を制御するための制御コマンドを生成する。なおこのとき、１つの推定領域が撮影装置１０２の撮影範囲に含まれ、かつ、該推定領域に存在する人の属性情報を属性推定部４０６が推定できる撮影条件になるよう撮影装置１０２を制御する制御コマンドが生成される。 Next, return to the description of the flowchart shown in FIG. In step S506 , the command management unit 405 generates a control command for controlling the imaging device 102 to sequentially capture each of the plurality of estimation areas determined by the estimation area determination unit 404 . At this time, the imaging device 102 is controlled so that one estimated region is included in the imaging range of the imaging device 102 and the imaging conditions are such that the attribute estimation unit 406 can estimate the attribute information of a person existing in the estimated region. A control command is generated.

本実施形態におけるコマンド管理部４０５は、まず、推定領域ＩＤが”１”である推定領域６０２ａが撮影されるよう撮影装置を制御する。このとき、推定領域６０２ａが撮影装置１０２の撮影範囲に含まれ、かつ、撮影条件として撮影装置１０２のズーム倍率が必要ズーム倍率以上になるように撮影装置１０２を制御する制御コマンドが生成される。この場合、撮影条件として撮影装置１０２のズーム倍率が必要ズーム倍率ｇｇ以上になるように撮影装置１０２を制御する制御コマンドが生成される。 The command management unit 405 in this embodiment first controls the imaging device so that the estimated area 602a with the estimated area ID of "1" is captured. At this time, a control command is generated to control the imaging device 102 so that the estimated area 602a is included in the imaging range of the imaging device 102 and the zoom magnification of the imaging device 102 is equal to or greater than the required zoom magnification as imaging conditions. In this case, a control command is generated for controlling the imaging device 102 so that the zoom magnification of the imaging device 102 is equal to or greater than the required zoom magnification gg as the imaging condition.

なお、本実施形態におけるコマンド管理部４０５は、撮影装置１０２が撮影する撮影画像の重心と推定領域の重心とが略同一となるよう制御コマンドを生成してもよい。こうすることにより、撮影された画像の周縁部に歪みがある場合であっても、推定領域においては歪みを減らすことができ、属性情報を推定する精度をさらに向上させることができる。 Note that the command management unit 405 in this embodiment may generate a control command so that the center of gravity of the captured image captured by the imaging device 102 and the center of gravity of the estimated region are substantially the same. By doing so, even if the periphery of the photographed image is distorted, the distortion can be reduced in the estimation area, and the accuracy of estimating the attribute information can be further improved.

次に、Ｓ５０７において、属性推定部４０６は、撮影装置１０２から得られる撮影画像に対して推定領域に存在する人の属性情報を推定する。この場合、推定領域６０２ａに存在する人の属性情報を推定する。 Next, in step S507 , the attribute estimation unit 406 estimates the attribute information of a person existing in the estimation area for the captured image obtained from the imaging device 102 . In this case, the attribute information of the person present in the estimation area 602a is estimated.

なお、Ｓ５０７にて、属性推定部４０６が一つの推定領域に存在する人の属性情報を推定する際、所定時間（例えば５秒など）が経過するとＳ５０８の処理に移行するようにしてもよい。 In S507, when the attribute estimation unit 406 estimates the attribute information of a person existing in one estimation area, the process may proceed to S508 after a predetermined time (for example, 5 seconds) has elapsed.

なお、属性推定部４０６が、撮影装置１０２から得られる撮影画像における推定領域に存在する人の属性情報を推定する際に、他の推定領域に存在する人が該撮影画像に映る場合がある。例えば、図６（ｃ）において、推定領域６０４ｃにおける人を対象に属性情報を推定するために範囲６０４と位置および大きさが略同一の撮影範囲で撮影したときの撮影画像が得られた場合を想定する。このとき、該撮影画像には、すでに属性情報の推定を行った推定領域６０２ａにおける人が３人含まれている。このとき、属性推定部４０６は、重複して属性情報を推定しないよう次のような処理を行う。例えば、属性推定部４０６は、撮影画像６０４から推定領域６０４ｃを切り出し、切り出された推定領域６０４ｃの画像に含まれる人を対象として属性情報の推定を行う。または、属性推定部４０６により属性情報の推定がなされた人の位置情報を記憶部４０８は記憶しておいてもよい。そして、例えば、該撮影画像における推定領域６０４ｃの属性情報を推定する際に、記憶部４０８に記憶された位置情報から、すでに属性情報の推定がなされた人は処理対象から除外するようにしてもよい。 When the attribute estimating unit 406 estimates the attribute information of a person existing in an estimated area in the captured image obtained from the imaging device 102, a person existing in another estimated area may appear in the captured image. For example, in FIG. 6C, assume that a photographed image is obtained in a photographing range having substantially the same position and size as the range 604 in order to estimate the attribute information of a person in the estimation region 604c. Suppose. At this time, the photographed image includes three persons in the estimation area 602a whose attribute information has already been estimated. At this time, the attribute estimating unit 406 performs the following processing so as not to redundantly estimate attribute information. For example, the attribute estimating unit 406 cuts out an estimated region 604c from the captured image 604, and estimates the attribute information of a person included in the image of the cut out estimated region 604c. Alternatively, the storage unit 408 may store the position information of the person whose attribute information has been estimated by the attribute estimation unit 406 . Then, for example, when estimating the attribute information of the estimation area 604c in the captured image, a person whose attribute information has already been estimated from the position information stored in the storage unit 408 may be excluded from the processing target. good.

次に、Ｓ５０８において、属性推定部４０６が属性情報を推定していない推定領域が存在する場合（Ｓ５０８にて’Ｎｏ’）、Ｓ５０６およびＳ５０７における処理を繰り返す。本実施形態の場合、推定領域ＩＤ”２”、推定領域ＩＤ”３”、推定領域ＩＤ”４”、の順番でＳ５０６およびＳ５０７における処理を繰り返す。 Next, in S508, if there is an estimated area for which attribute information has not been estimated by the attribute estimation unit 406 ('No' in S508), the processes in S506 and S507 are repeated. In the case of this embodiment, the processing in S506 and S507 is repeated in the order of estimated area ID "2", estimated area ID "3", and estimated area ID "4".

次に、Ｓ５０８において、属性推定部４０６が属性情報を推定していない推定領域が存在しない場合（Ｓ５０８にて’Ｙｅｓ’）、処理を終了する。 Next, in S508 , if there is no estimated area whose attribute information has not been estimated by the attribute estimation unit 406 (“Yes” in S508 ), the process ends.

以上のように、本実施形態では、複数の推定領域の各々について順次撮影されるよう撮影装置１０２を制御して、該複数の推定領域の各々に存在する人の属性情報を推定する。こうすることにより、属性推定が行えるように画角を変更しつつ、より広い範囲を対象として複数の人の属性推定を行うことが可能となる。 As described above, in the present embodiment, the imaging device 102 is controlled so that each of the plurality of estimation regions is sequentially photographed, and the attribute information of the person present in each of the plurality of estimation regions is estimated. By doing so, it is possible to estimate the attributes of a plurality of people over a wider range while changing the angle of view so that the attributes can be estimated.

次に図８を参照し、本実施形態に係る画像処理の結果の出力を示す。図８は、本実施形態に係る画像処理の結果が、出力制御部４０７により表示装置１０１に出力された表である。 Next, referring to FIG. 8, the output of the result of the image processing according to this embodiment is shown. FIG. 8 is a table in which the result of image processing according to this embodiment is output to the display device 101 by the output control unit 407. As shown in FIG.

例えば、１０：００から１１：５９の時間において、図５に示すＳ５０７にて、属性推定部４０６が推定領域６０２ａにおける人に対して属性情報を推定した際、１人の人が２５歳の男性と推定されると、男性の２０～２９にカウント数に１プラスされる。同様に、Ｓ５０７にて、属性推定部４０６が推定領域６０２ａにおける１人の人に対して３０歳の女性と推定した場合、女性の３０～３９のカウント数に１がプラスされる。このように、図５に示すＳ５０７における処理において、人の属性情報を推定するたびに、表示装置１０１に表示されるグラフにリアルタイムでカウントをプラスしてもよい。また、図５に示す本実施形態に係る画像処理が終了したのち、属性情報の推定結果を表示装置１０１に出力するようにしてもよい。 For example, from 10:00 to 11:59, when the attribute estimation unit 406 estimates attribute information for a person in the estimation area 602a in S507 shown in FIG. When it is estimated, 1 is added to the number of counts from 20 to 29 for men. Similarly, in S507, when the attribute estimating unit 406 estimates that one person in the estimation area 602a is a 30-year-old woman, 1 is added to the count number of 30 to 39 women. In this way, in the processing in S507 shown in FIG. 5, the count may be added in real time to the graph displayed on the display device 101 each time the attribute information of the person is estimated. Further, after the image processing according to the present embodiment shown in FIG. 5 is completed, the attribute information estimation result may be output to the display device 101 .

なお、本実施形態では、所定の時間間隔において、属性推定部４０６が推定した属性情報毎に人の総数をグラフとして出力するが、これに限定されない。例えば、推定領域ＩＤ毎に、属性推定部４０６が推定した結果を集計してもよい。 In this embodiment, the total number of people is output as a graph for each piece of attribute information estimated by the attribute estimation unit 406 at predetermined time intervals, but the present invention is not limited to this. For example, the results estimated by the attribute estimation unit 406 may be aggregated for each estimated region ID.

また、所定の時間間隔において、属性推定部４０６が属性情報を推定した人の総数とともに、例えば、検出部４０３により検出された人の数も出力するようにしてもよい。これにより、ユーザーは検出部４０３により検出された人の人数に対して、何人の属性情報の推定がなされたかを把握することができる。なお、推定領域ＩＤ毎に、属性情報が推定された人数と検出部４０３により検出された人数とを表示されるようにしてもよい。 In addition, for example, the number of people detected by the detection unit 403 may be output together with the total number of people whose attribute information is estimated by the attribute estimation unit 406 at predetermined time intervals. Thereby, the user can grasp how many people's attribute information has been estimated for the number of people detected by the detection unit 403 . Note that the number of people whose attribute information is estimated and the number of people detected by the detection unit 403 may be displayed for each estimated area ID.

なお、本実施形態におけるＳ５０２にて、検出領域設定部４０１が、通信部４００を介して得られた画像内に検出領域を設定したのち、該検出領域を対象として撮影装置１０２の画角を変更するように撮影装置１０２が制御されてもよい。例えば、図６（ａ）に示す画像６００に対してユーザーが検出領域６０１を設定したのち、コマンド管理部４０５は、検出領域６０１が撮影装置１０２の撮影範囲に含まれるようにしつつ、撮影装置のズーム倍率を上げて画角を変更するようにしてもよい。こうすることにより、Ｓ５０３における検出部４０３による検出の精度をさらに向上させることが可能になる。 Note that in S502 in the present embodiment, after the detection area setting unit 401 sets the detection area in the image obtained via the communication unit 400, the angle of view of the imaging device 102 is changed with respect to the detection area. The imaging device 102 may be controlled so as to do so. For example, after the user sets the detection area 601 for the image 600 shown in FIG. The angle of view may be changed by increasing the zoom magnification. This makes it possible to further improve the accuracy of detection by the detection unit 403 in S503.

また、本実施形態では、推定領域ＩＤ“１”、“２”、“３”、“４”の順番で複数の推定領域を順次撮影するように撮影装置１０２を制御したが、これに限定されない。例えば、次のような処理を行ってもよい。 Further, in the present embodiment, the imaging device 102 is controlled so as to sequentially capture a plurality of estimation regions in the order of estimation region IDs "1", "2", "3", and "4", but the present invention is not limited to this. . For example, the following processing may be performed.

Ｓ５０３にて、検出部４０３は検出領域６０１に含まれる人の検出を行う際、人の顔を検出する顔検出を行い、さらに、人の属性情報を推定しやすいかの指標である検出スコアを人の顔領域ごとに出力する。なお、検出スコアが高いほど人の顔領域に対して属性情報が推定されやすいことを表しており、例えば、顔検出により得られた顔の向きが正面に近いほど検出スコアが高くなるように出力される。 In S503, when detecting a person included in the detection area 601, the detection unit 403 performs face detection to detect a person's face, and further calculates a detection score, which is an index of whether it is easy to estimate a person's attribute information. Output for each human face area. It should be noted that the higher the detection score, the easier it is to estimate attribute information for a person's face area. be done.

Ｓ５０４にて、検出領域に複数の顔領域が存在する場合、検出領域に複数の人体が存在するとみなし、Ｓ５０５の処理へ移行する。Ｓ５０５にて、推定領域決定部４０４は、検出部４０３の結果に基づいて、検出部４０３により検出した複数の人のうち少なくとも１人を含む推定領域を複数決定する。さらに推定領域決定部４０４は、検出部４０３により出力された検出スコアを推定領域毎に合算する。 In S504, if a plurality of face areas exist in the detection area, it is assumed that a plurality of human bodies exist in the detection area, and the process proceeds to S505. In S505 , estimated region determining unit 404 determines a plurality of estimated regions including at least one of the plurality of persons detected by detecting unit 403 based on the result of detecting unit 403 . Furthermore, the estimation area determination unit 404 adds up the detection scores output by the detection unit 403 for each estimation area.

そして、推定領域毎に合算された検出スコアが高い順番に従って、複数の推定領域の各々についてＳ５０６およびＳ５０７の処理を行ってもよい。例えば、合算された検出スコアが推定領域ＩＤ“２”、“３”、“１”、“４”の順番で高い場合、該順番に従ってＳ５０６およびＳ５０７の処理が行われてもよい。このように、検出スコアの高い推定領域を優先して撮影するように撮影装置１０２を制御することで、正面を向いている人が多い推定領域に対して優先的に属性推定が実行され、効率的に属性情報を推定できるようになる。 Then, the processes of S506 and S507 may be performed for each of the plurality of estimation areas in accordance with the order of the highest detection score added up for each estimation area. For example, when the summed detection scores are highest in the order of estimated region IDs "2", "3", "1", and "4", the processing of S506 and S507 may be performed according to the order. In this way, by controlling the imaging device 102 so as to preferentially capture an estimated region with a high detection score, attribute estimation is preferentially executed for an estimated region in which many people are facing the front. attribute information can be estimated.

また、複数の推定領域の各々についてＳ５０６およびＳ５０７の処理が行われる際、次のような順番で行われてもよい。 Further, when the processes of S506 and S507 are performed for each of the plurality of estimation regions, they may be performed in the following order.

Ｓ５０３にて、検出部４０３は検出領域６０１に含まれる人の検出を行う際、人の顔を検出する顔検出を行い、さらに、検出スコアを人の顔領域ごとに出力する。 In S503, when detecting a person included in the detection area 601, the detection unit 403 performs face detection for detecting a person's face, and further outputs a detection score for each person's face area.

そして、合算された検出スコアが最も高い推定領域に対して、Ｓ５０６およびＳ５０７の処理を行う。このとき、推定領域ＩＤ“２”に対応する推定領域６０３ｂにて合算された検出スコアが最も高いと想定し、該推定領域に対してＳ５０６およびＳ５０７の処理が行われたものとする。 Then, the processing of S506 and S507 is performed on the estimated region with the highest combined detection score. At this time, it is assumed that the summed detection score is the highest in the estimation area 603b corresponding to the estimation area ID "2", and the processes of S506 and S507 are performed on this estimation area.

その後、コマンド管理部４０５は、検出領域６０１が撮影装置１０２の撮影範囲に含まれるようにして、画像を撮影するよう撮影装置１０２を制御する制御コマンドを生成する。生成された制御コマンドに基づいて撮影装置１０２が制御されたのち、検出部４０３は検出領域６０１に含まれる人の顔検出を再度行い、人の顔領域ごとに検出スコアを出力し直す。 After that, the command management unit 405 generates a control command for controlling the imaging device 102 to capture an image so that the detection area 601 is included in the imaging range of the imaging device 102 . After the image capturing apparatus 102 is controlled based on the generated control command, the detection unit 403 performs human face detection again in the detection area 601 and outputs a detection score for each human face area.

そして、推定領域決定部４０４は、属性推定部４０６により属性情報が推定されていない推定領域を対象として、検出部４０３により出力された検出スコアを推定領域毎に再度合算する。この場合、推定領域決定部４０４は、推定領域ＩＤ“１”、“３”、“４”に対応する推定領域を対象として、検出スコアを推定領域毎に合算する。そして、合算された検出スコアが最も高い推定領域に対して、Ｓ５０６およびＳ５０７の処理を行う。 Then, the estimated area determining unit 404 again sums up the detection scores output by the detecting unit 403 for each estimated area, targeting the estimated areas whose attribute information has not been estimated by the attribute estimating unit 406 . In this case, the estimated area determining unit 404 sums the detection scores for each estimated area for the estimated areas corresponding to the estimated area IDs "1", "3", and "4". Then, the processing of S506 and S507 is performed on the estimated region with the highest combined detection score.

以上のように、検出領域における人の検出スコアが繰り返し出力されていくなか、検出スコアが出力されるたびに推定領域毎に検出スコアが合算され、合算された検出スコアが最も高い推定領域に対してＳ５０６およびＳ５０７の処理を行うようにしてもよい。このように、検出スコアの最も高い推定領域を優先して撮影するように撮影装置１０２を制御することで、正面を向いている人がより多い推定領域に対して優先的に属性推定が実行され、より効率的に属性情報を推定できるようになる。また、本実施形態では、図５に示すＳ５０８にて、属性推定部４０６が属性情報を推定していない推定領域が存在しない場合（Ｓ５０８にて‘Ｙｅｓ’）、処理を終了するとしたが、これに限定されない。推定領域決定部４０４により決定された複数の推定領域各々についてＳ５０６およびＳ５０７における処理を行った後、属性推定部４０６により属性情報が推定されなかった人が存在する推定領域に対して再度Ｓ５０６およびＳ５０７における処理を行ってもよい。例えば、推定領域ＩＤ”１”、”２”、”３”、”４”の順番でＳ５０６およびＳ５０７における処理を行った場合において、６人の人が存在する推定領域ＩＤ”２”の推定領域６０３ｂにおいて５人に対してのみ属性情報が推定された場合を想定する。このとき、再び推定領域ＩＤ”２”に対してＳ５０６およびＳ５０７における処理を行うようにする。こうすることにより、属性推定部４０６により属性情報を推定できない人がいた場合であっても、該人に対して属性推定部４０６により属性情報が再度推定されるようになる。 As described above, while the human detection score in the detection area is repeatedly output, the detection score for each estimated area is summed each time the detection score is output, and the sum of the detection scores for the estimated area with the highest total detection score is Alternatively, the processing of S506 and S507 may be performed. In this way, by controlling the imaging device 102 to preferentially shoot the estimated area with the highest detection score, attribute estimation is preferentially executed for the estimated area in which more people are facing the front. , the attribute information can be estimated more efficiently. In addition, in the present embodiment, in S508 shown in FIG. 5, if there is no estimated area for which attribute information has not been estimated by the attribute estimation unit 406 ('Yes' in S508), the process is terminated. is not limited to After the processes in S506 and S507 are performed for each of the plurality of estimation areas determined by the estimation area determination unit 404, the estimation areas in which there are persons whose attribute information was not estimated by the attribute estimation unit 406 are subjected to S506 and S507 again. may be performed. For example, when the processes in S506 and S507 are performed in the order of the estimated area IDs "1", "2", "3", and "4", the estimated area of the estimated area ID "2" in which six people are present Assume that attribute information is estimated only for five persons in 603b. At this time, the processing in S506 and S507 is performed again for the estimated area ID "2". By doing so, even if there is a person whose attribute information cannot be estimated by the attribute estimation unit 406, the attribute information of the person can be estimated again by the attribute estimation unit 406. FIG.

また、複数の推定領域の各々についてＳ５０６およびＳ５０７における処理を行った後、属性推定部４０６により属性情報が推定されなかった人を対象として、推定領域決定部４０４は、少なくとも１つの推定領域を再度決定するようにしてもよい。そして、推定領域決定部４０４により再度決定された少なくとも１つの推定領域の各々について、Ｓ５０６およびＳ５０７の処理を行うようにしてもよい。以下、図９および図１０を参照して、推定領域を再度決定する処理について更に詳細に説明する。 After performing the processes in S506 and S507 for each of the plurality of estimated regions, the estimated region determination unit 404 re-determines at least one estimated region for the person whose attribute information was not estimated by the attribute estimation unit 406. You may decide. Then, the processes of S506 and S507 may be performed for each of at least one estimation area re-determined by the estimation area determination unit 404. FIG. The process of re-determining the estimated area will be described in more detail below with reference to FIGS. 9 and 10. FIG.

図９は、推定領域を再度決定する処理を示す一連のフローチャートである。なお、図９に示すフローチャートの処理は、主に図４に示す各機能ブロックにより実行される。また、図９に示すフローチャートの処理は、ＨＤＤ１２０２に格納されたコンピュータプログラムに従って画像処理装置１００のＣＰＵ１２００により実行される。以下、画像処理装置１００のＣＰＵ１２００により実行される処理について説明する。なお、図５と同一の機能である処理のステップには同一符号を付すとともに、機能的に変わらない処理のステップについては説明を省略する。Ｓ９０９において、検出部４０３により検出された人の全てに対して属性情報が推定されている場合（Ｓ９０９にて‘Ｙｅｓ’）、処理を終了する。Ｓ９０９において、検出部４０３により検出された複数の人すべてに対して属性情報が推定されてない場合（Ｓ９０９にて‘Ｎｏ’）、Ｓ９１０の処理を行う。 FIG. 9 is a series of flowcharts showing the process of re-determining the estimated region. The processing of the flowchart shown in FIG. 9 is mainly executed by each functional block shown in FIG. 9 is executed by the CPU 1200 of the image processing apparatus 100 according to the computer program stored in the HDD 1202. The processing of the flowchart shown in FIG. Processing executed by the CPU 1200 of the image processing apparatus 100 will be described below. The same reference numerals are assigned to the steps of processing that have the same functions as in FIG. 5, and the description of the steps of processing that are functionally the same is omitted. In S909, if attribute information has been estimated for all persons detected by the detection unit 403 ('Yes' in S909), the process ends. In S909, if attribute information has not been estimated for all of the plurality of persons detected by the detection unit 403 ('No' in S909), the process of S910 is performed.

Ｓ９１０にて、推定領域決定部４０４は、検出部４０３における検出の結果に基づいて、検出部４０３により検出した複数の人のうち属性情報を推定できなかった人を少なくとも１人含む推定領域を少なくとも１つ決定する。なお、推定領域に含まれる人が、属性推定部４０６により属性情報を推定される対象となる。また、推定領域の位置に関する情報は記憶部４０８にて記憶される。以下図１０を参照して、推定領域決定部４０４が再度推定領域を決定する処理について更に詳細に説明する。 In S910, estimated area determining unit 404 determines, based on the result of detection by detecting unit 403, at least an estimated area including at least one person whose attribute information could not be estimated among the plurality of persons detected by detecting unit 403. Decide on one. A person included in the estimation area is subject to estimation of attribute information by the attribute estimation unit 406 . Also, information about the position of the estimated region is stored in the storage unit 408 . The process of re-determining the estimation region by the estimation region determining unit 404 will be described in more detail below with reference to FIG.

図１０は、Ｓ９１０の処理を説明するための図である。図１０（ａ）において、人１０００および人１００１は、推定領域ＩＤ”１”に対応する推定領域である領域６０２ａにおいて属性推定部４０６により属性情報が推定されなかった人である。人１００２は、推定領域ＩＤ”２”に対応する推定領域である領域６０３ｂにおいて属性推定部４０６により属性情報が推定されなかった人である。人１００３は、推定領域ＩＤ”４”に対応する推定領域である領域６０５ｄにおいて属性推定部４０６により属性情報が推定されなかった人である。 FIG. 10 is a diagram for explaining the processing of S910. In FIG. 10A, persons 1000 and 1001 are persons whose attribute information was not estimated by the attribute estimation unit 406 in the area 602a, which is the estimation area corresponding to the estimation area ID "1". A person 1002 is a person whose attribute information has not been estimated by the attribute estimation unit 406 in the area 603b, which is the estimation area corresponding to the estimation area ID "2". A person 1003 is a person whose attribute information was not estimated by the attribute estimation unit 406 in the region 605d, which is the estimation region corresponding to the estimation region ID "4".

推定領域決定部４０４は、検出部４０３における検出の結果に基づいて、属性推定部４０６により属性情報が推定されなかった人物１０００～１００３のうち少なくとも１人を含む推定領域を少なくとも１つ決定する。 Based on the result of detection by the detection unit 403, the estimation area determination unit 404 determines at least one estimation area including at least one of the persons 1000 to 1003 whose attribute information has not been estimated by the attribute estimation unit 406. FIG.

図１０ｂにおける、範囲１００４は、属性推定部４０６が人１０００および人１００１に対して属性情報を推定するために最低限必要な顔領域の画素数を取得できるズーム倍率で撮影した場合の撮影範囲である。なお、範囲１００４は、検出部４０３が検出した人の顔領域の画素数に基づいて、推定領域決定部４０４により決定される。そして、推定領域決定部４０４は、範囲１００４内において範囲１００４の面積以下となり、かつ、人１０００および人１００１の少なくとも一人を含む領域を推定領域として決定する。この場合、範囲１００４に対応する推定領域として領域１００４ｂが決定される。なおこのとき、推定領域決定部４０４は、範囲１００４に対応する推定領域である領域１００４ｂに対して、推定領域ＩＤである“５”を付与する。 A range 1004 in FIG. 10B is a photographing range when the photographing is performed at a zoom magnification that allows the attribute estimation unit 406 to acquire the minimum number of pixels in the face region necessary for estimating the attribute information of the persons 1000 and 1001. be. Note that the range 1004 is determined by the estimated area determination unit 404 based on the number of pixels in the human face area detected by the detection unit 403 . Then, the estimated area determination unit 404 determines an area within the range 1004 that is equal to or less than the area of the range 1004 and that includes at least one of the person 1000 and the person 1001 as an estimated area. In this case, an area 1004b is determined as an estimated area corresponding to range 1004. FIG. At this time, the estimated area determining unit 404 assigns an estimated area ID of “5” to the area 1004 b that is the estimated area corresponding to the range 1004 .

次に、推定領域決定部４０４は、範囲１００５を決定する。範囲１００５は、属性推定部４０６が人１００２および人１００３の属性情報を推定するために最低限必要な顔領域の画素数を取得できるズーム倍率で撮影した場合の撮影範囲である。そして、推定領域決定部４０４は、範囲１００５内において範囲１００５の面積以下となり、かつ、人１００２および人１００３のうち少なくとも１人を含む領域を推定領域として決定する。この場合、推定領域決定部４０４は、範囲１００５に対応する推定領域として領域１００５ｂが決定される。また、推定領域決定部４０４は、範囲１００５に対応する推定領域である領域１００５ｂに対して、推定領域ＩＤである“６”を付与する。 Next, the estimated area determination unit 404 determines the range 1005. FIG. A range 1005 is a photographing range when photographing is performed at a zoom magnification that allows the attribute estimation unit 406 to acquire the minimum number of pixels in the face area necessary for estimating the attribute information of the person 1002 and the person 1003 . Then, the estimated area determination unit 404 determines an area within the range 1005 that is equal to or less than the area of the range 1005 and that includes at least one of the person 1002 and the person 1003 as an estimated area. In this case, the estimated area determination unit 404 determines an area 1005 b as the estimated area corresponding to the range 1005 . In addition, the estimated area determination unit 404 assigns an estimated area ID of “6” to the area 1005 b that is the estimated area corresponding to the range 1005 .

以上のように、推定領域決定部４０４は、検出部４０３における検出の結果に基づいて、属性推定部４０６により属性情報が推定されなかった人のうち少なくとも１人を含む推定領域を少なくとも１つ決定する。なお、推定領域決定部４０４は、検出領域６０１の左下から右方向へ走査するように推定領域を決定したが、これに限定されない。例えば、検出領域６０１の左上から右方向へ操作するように推定領域を決定してもよい。 As described above, the estimated region determining unit 404 determines at least one estimated region including at least one person whose attribute information was not estimated by the attribute estimating unit 406, based on the detection result of the detecting unit 403. do. Although the estimation area determining unit 404 determines the estimation area so as to scan from the lower left to the right of the detection area 601, the present invention is not limited to this. For example, the estimation area may be determined by operating from the upper left of the detection area 601 to the right.

そして、推定領域決定部４０４により再度決定された複数の推定領域の各々について、Ｓ５０６およびＳ５０７の処理を行う。こうすることにより、属性推定部４０６により属性情報を推定できない人がいた場合であっても、該人に対して属性推定部４０６により属性情報が再度推定されるようになる。なお、Ｓ９１０にて、推定領域４０４により再度決定される推定領域は１つであってもよい。 Then, the processes of S506 and S507 are performed for each of the plurality of estimation areas re-determined by the estimation area determination unit 404. FIG. By doing so, even if there is a person whose attribute information cannot be estimated by the attribute estimation unit 406, the attribute information of the person can be estimated again by the attribute estimation unit 406. FIG. Note that in S910, the number of estimation areas re-determined by the estimation area 404 may be one.

これまで、上述した本実施形態に係る画像処理は画像処理装置１００が行うとして説明したが、これに限定されない。本実施形態に係る画像処理装置１００の１以上の機能を、撮影装置１０２が有していてもよい。つまり、本実施形態では、画像処理装置１００のＣＰＵ１２００により本実施形態に係る画像処理が実行されたが、後述するクライアント装置および撮影装置１０２がそれぞれ有するＣＰＵにより実行されてもよい。 Although it has been described that the image processing according to the present embodiment is performed by the image processing apparatus 100, the present invention is not limited to this. The imaging device 102 may have one or more functions of the image processing device 100 according to this embodiment. That is, in the present embodiment, the CPU 1200 of the image processing apparatus 100 executes the image processing according to the present embodiment, but the image processing may be executed by CPUs of the client apparatus and the imaging apparatus 102, which will be described later.

例えば、図４に示す、検出領域設定部４０１と、検出部４０３と、推定領域決定部４０４と、コマンド管理部４０５と、属性推定部４０６と、記憶部４０８は撮影装置１０２が有していてもよい。なお、このとき、操作受付部４０２と、出力制御部４０７は、図１２に示すハードウェア構成により実現されるクライアント装置１１００が有する。この場合について、図１１を参照して更に詳細に説明する。なお、図２～４と同一の機能を有する構成には同一符号を付すとともに、機能的に変わらないものについては説明を省略する。 For example, the detection area setting unit 401, the detection unit 403, the estimation area determination unit 404, the command management unit 405, the attribute estimation unit 406, and the storage unit 408 shown in FIG. good too. At this time, the operation reception unit 402 and the output control unit 407 are included in the client device 1100 implemented by the hardware configuration shown in FIG. 12 . This case will be described in more detail with reference to FIG. Components having the same functions as those in FIGS. 2 to 4 are denoted by the same reference numerals, and descriptions of components having the same functions are omitted.

図１１は、画像処理装置１００の一部の機能を有する撮影装置１０２とクライアント装置１１００の機能ブロック図の一例である。撮影装置１０２とクライアント装置１１００は、ネットワーク１０４を介して通信が行われる。
画像処理部２０６で生成された画像データは、通信部２０９と通信部１１０１とを介してクライアント装置１１００における出力制御部４０７に送られる。出力制御部４０７は、例えば、撮影装置１０２から送られた画像データを表示装置１０１に出力する。なおこのとき、表示装置１０１はクライアント装置１１００に接続されているものとする。 FIG. 11 is an example of a functional block diagram of the imaging device 102 and the client device 1100 having some functions of the image processing device 100 . The imaging device 102 and the client device 1100 communicate via the network 104 .
Image data generated by the image processing unit 206 is sent to the output control unit 407 in the client device 1100 via the communication units 209 and 1101 . The output control unit 407 outputs the image data sent from the imaging device 102 to the display device 101, for example. At this time, it is assumed that the display device 101 is connected to the client device 1100 .

検出領域設定部４０１は、生成された画像データにおいて、検出部４０３が処理を行う対象とする領域である検出領域を設定する。なお、クライアント装置１１００における操作受付部４０２を介して、ユーザーにより設定された画像内の領域を検出領域として設定してもよいし、事前に設定された所定の領域を検出領域としてもよい。なお、ユーザーによる検出領域の指示や、事前に設定された領域がなければ、画像データが示す画像全体を検出領域としてもよい。 The detection area setting unit 401 sets a detection area, which is an area to be processed by the detection unit 403, in the generated image data. Note that an area within an image set by the user via the operation reception unit 402 of the client device 1100 may be set as the detection area, or a predetermined area set in advance may be set as the detection area. If there is no user's indication of a detection area or a previously set area, the entire image indicated by the image data may be used as the detection area.

クライアント装置１１００における操作受付部４０２は、マウス、キーボードなどの入力装置（不図示）を介して、ユーザーにより行われた操作を受け付ける。なお、操作受付部４０２が受け付けたユーザーによる操作の情報は、通信部１１０１と通信部２０９とを介して撮影装置１０２における検出領域設定部４０１へと送られる。 An operation reception unit 402 in the client device 1100 receives an operation performed by a user via an input device (not shown) such as a mouse or keyboard. Information on the user's operation received by the operation receiving unit 402 is sent to the detection area setting unit 401 in the imaging device 102 via the communication unit 1101 and the communication unit 209 .

検出部４０３は、生成された画像データにおいて検出領域に含まれる人体の検出を行う。推定領域決定部４０４は、検出部４０３における検出の結果に基づいて、検出部４０３により検出した複数の人のうち少なくとも１人を含む推定領域を複数決定する。 The detection unit 403 detects a human body included in the detection area in the generated image data. Based on the result of detection by the detection unit 403 , the estimated region determination unit 404 determines a plurality of estimated regions including at least one of the plurality of persons detected by the detection unit 403 .

コマンド管理部４０５は、推定領域決定部４０４において決定された複数の推定領域の各々について、順次撮影されるよう撮影装置１０２を制御するための制御コマンドを生成する。なお、コマンド管理部４０５にて生成された制御コマンドは、システム制御部２０７へ送られ、該制御コマンドに基づいて撮影装置１０２の制御が行われる。 A command management unit 405 generates a control command for controlling the image capturing apparatus 102 to sequentially capture each of the plurality of estimated areas determined by the estimated area determination unit 404 . Note that the control command generated by the command management unit 405 is sent to the system control unit 207, and the imaging device 102 is controlled based on the control command.

属性推定部４０６は、撮影装置１０２から得られる撮影画像の各々に対して推定領域に存在する人の属性情報を推定する。記憶部４０８は、検出領域や推定領域の位置に関する情報などを記憶する。 The attribute estimating unit 406 estimates the attribute information of a person present in the estimation area for each captured image obtained from the imaging device 102 . A storage unit 408 stores information about the positions of the detection area and the estimation area.

画像処理の出力の結果（例えば、図８に示す表など）は、通信部２０９および通信部１１０１を介して、撮影装置１０２からクライアント装置１１００へと送られる。なおこのとき、撮影装置１０２から送られた画像処理の出力の結果は、例えば、出力制御部４０７へと送られ、出力制御部４０７により表示装置１０１に出力されてもよい。 An image processing output result (for example, the table shown in FIG. 8) is sent from the imaging device 102 to the client device 1100 via the communication units 209 and 1101 . At this time, the output result of the image processing sent from the imaging device 102 may be sent to, for example, the output control unit 407 and output to the display device 101 by the output control unit 407 .

以上のように画像処理装置１００の１以上の機能は、撮影装置１０２が有していてもよい。 As described above, one or more functions of the image processing apparatus 100 may be included in the imaging apparatus 102 .

なお、本発明は、上述の実施形態の１以上の機能を実現するプログラムを１つ以上のプロセッサが読出して実行する処理でも実現可能である。プログラムは、ネットワーク又は記憶媒体を介して、プロセッサを有するシステム又は装置に供給するようにしてもよい。また、本発明は、上述の実施形態の１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。また、画像処理装置１００または撮影装置１０２の各部は、図１２に示すハードウェアにより実現してもよいし、ソフトウェアにより実現することもできる。 Note that the present invention can also be implemented by processing in which one or more processors read and execute a program that implements one or more functions of the above-described embodiments. The program may be supplied to a system or device having a processor via a network or storage medium. The invention can also be implemented in a circuit (eg, an ASIC) that implements one or more of the functions of the embodiments described above. Also, each unit of the image processing apparatus 100 or the imaging apparatus 102 may be implemented by hardware shown in FIG. 12, or may be implemented by software.

４００通信部
４０１検出領域設定部
４０２操作受付部
４０３検出部
４０４推定領域決定部
４０５コマンド管理部
４０６属性推定部 400 communication unit 401 detection area setting unit 402 operation reception unit 403 detection unit 404 estimation area determination unit 405 command management unit 406 attribute estimation unit

Claims

a detecting means for detecting a plurality of persons included in an image photographed by the photographing means;
determining means for determining two or more areas including at least one of the plurality of persons detected by the detecting means;
a control means for outputting a control command for sequentially photographing each of the areas determined by the determining means by the photographing means;
estimating means for estimating attribute information of a person existing in the area from an image captured based on the control command output by the control means;
The control means is configured such that one of the areas determined by the determination means is included in the imaging range of the imaging means, and the attribute information of a person present in the one area can be estimated. An image processing apparatus, wherein the control command is output based on a condition.

2. The image processing apparatus according to claim 1, wherein said photographing condition is such that a human face area included in the area determined by said determining means has a predetermined number of pixels or more.

the control command includes at least one of panning, tilting, and zooming of the photographing means;
The photographing condition is characterized in that the zoom magnification of the photographing means is equal to or greater than the minimum necessary zoom magnification for the estimation means to estimate the attribute information of a person existing in the area determined by the determination means. 3. The image processing apparatus according to claim 1 or 2, wherein

4. The control unit according to any one of claims 1 to 3, wherein said control unit outputs said control command for controlling said photographing unit such that the center of gravity of the photographing range of said photographing unit and the center of gravity of said area are substantially the same. 2. The image processing device according to item 1.

5. The image processing apparatus according to claim 1, wherein said attribute information includes at least one of information regarding a person's age and information regarding a person's sex.

6. The image processing according to any one of claims 1 to 5, further comprising setting means for setting a detection area, which is an area within the image from which the detection means detects the plurality of people. Device.

The control means outputs the control command to increase the zoom magnification of the photographing means while the detection area is included in the photographing range of the photographing means,
7. The image processing apparatus according to claim 6, wherein said detection means detects said plurality of persons included in said detection area in an image taken by said photographing means controlled by said control means.

7. The image processing apparatus according to claim 6, wherein the detection area is an area within the image set by a user.

The determining means determines the area including at least one of the plurality of persons detected by the detecting means, based on the number of pixels in the face area of the person detected by the detecting means. Item 9. The image processing apparatus according to any one of Items 1 to 8.

10. The method according to any one of claims 1 to 9, wherein said determining means determines at least one area including at least one of a plurality of persons whose attribute information could not be estimated by said estimating means. The described image processing device.

a detection step of detecting a plurality of people included in the image captured by the imaging means;
a determining step of determining two or more regions containing at least one of the plurality of persons detected by the detecting step;
a control step of outputting a control command for sequentially photographing each of the regions determined by the determining step;
an estimating step of estimating attribute information of a person existing in the area from an image captured based on the control command output by the controlling step;
In the control step, one of the regions determined in the determination step is included in the imaging range of the imaging means, and the attribute information of a person present in the one region can be estimated. An image processing method, wherein the control command is output based on a condition.

a detecting means for detecting a plurality of persons included in an image photographed by the photographing means;
determining means for determining two or more areas including at least one of the plurality of persons detected by the detecting means;
a control means for outputting a control command for sequentially photographing each of the areas determined by the determining means by the photographing means;
estimating means for estimating attribute information of a person existing in the area from an image captured based on the control command output by the control means;
The control means is configured such that one of the areas determined by the determination means is included in the imaging range of the imaging means, and the attribute information of a person present in the one area can be estimated. A program for causing a computer to function as control means for outputting the control command based on conditions.