JP2010244090A

JP2010244090A - Person information extraction device, person information extraction method, and person information extraction program

Info

Publication number: JP2010244090A
Application number: JP2009088563A
Authority: JP
Inventors: Satoshi Futami; 聡二見; Takehiro Mabuchi; 健宏馬渕; Hisatomo Ushijima; 央智牛島; Takuo Moriguchi; 拓雄森口
Original assignee: Sohgo Security Services Co Ltd
Current assignee: Sohgo Security Services Co Ltd
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2010-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently obtain various information related to a person included in a photographed image. <P>SOLUTION: A person information extraction device for extracting information related to a person included in a video photographed by an image pickup means includes: a face region detection means for detecting a face region with respect to one or more persons from a prescribed image in each of time-series images included in the video; a body region detection means for detecting a body region by using the face region obtained by the face region detection means; and a person information integration means for integrating the detection result obtained by the face region detection means and the detection result obtained by the body region detection means. The body region detection means includes a height estimation means for estimating the height of the person based on the installation position of the image pickup means, the angle of view, and a horizontal distance, based on the face region, between the image pickup means and the person. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、人物情報抽出装置、人物情報抽出方法、及び人物情報抽出プログラムに係り、特に撮影された画像に含まれる人物に関する多種の情報を効率的に取得するための人物情報抽出装置、人物情報抽出方法、及び人物情報抽出プログラムに関する。 The present invention relates to a person information extraction apparatus, a person information extraction method, and a person information extraction program, and more particularly to a person information extraction apparatus and person information for efficiently acquiring a variety of information related to a person included in a photographed image. The present invention relates to an extraction method and a person information extraction program.

従来、カメラ等の撮像手段により撮影された映像に対して画像処理等を行い、その映像を構成する画像中に含まれる人物を自動的に検出する手法が存在している。その中でも画像中から人物の顔を検出する技術が急速に発達しており、例えば顔の判別精度を向上させるために、入力画像から肌色領域を抽出し、その肌色領域に基づいて、頭頂部、目、口等の顔の特徴点の位置を検出して、その検出結果から肌色領域が顔か否かを判定する画像処理装置が存在している（例えば、特許文献１参照。）。 2. Description of the Related Art Conventionally, there has been a technique for performing image processing or the like on a video taken by an imaging means such as a camera and automatically detecting a person included in an image constituting the video. Among them, a technique for detecting a human face from an image has been rapidly developed.For example, in order to improve face discrimination accuracy, a skin color region is extracted from an input image, and based on the skin color region, the top of the head, There is an image processing apparatus that detects the positions of facial feature points such as eyes and mouth and determines whether or not the skin color region is a face from the detection result (see, for example, Patent Document 1).

また、このような顔検出技術は、例えば銀行や百貨店、コンビニエンスストア等の警備対象施設内に設置された監視カメラにより得られる映像に対して処理がなされ、犯罪時の迅速な人物特定や、不審者か否かを検出して犯罪を未然に防止するために用いられる。 In addition, such face detection technology is applied to images obtained by surveillance cameras installed in guarded facilities such as banks, department stores, and convenience stores. It is used to detect crimes and prevent crimes.

特開２００４−５３８４号公報JP 2004-5384 A

ところで、上述した従来の手法により検出された人物は、画像検索処理において効率的に高精度な検索が行えるように、カメラから読み取れる多種の情報を蓄積しておくことが好ましい。しかしながら、人物の顔情報のみからでは、抽出できる情報に限りがあり、例えばその人物が不審者等の特定の人物であるか否かを判定する場合や、映像中から不審者等の特定人物を検出する場合等には、より詳細な人物情報が必要となる。 By the way, it is preferable that the person detected by the above-described conventional method accumulates various kinds of information that can be read from the camera so that the highly accurate search can be efficiently performed in the image search process. However, there is a limit to the information that can be extracted from only the face information of a person. For example, when determining whether or not the person is a specific person such as a suspicious person, In the case of detection, more detailed person information is required.

本発明は、上記の問題点に鑑みてなされたものであって、撮影された画像に含まれる人物に関する多種の情報を効率的に取得するための人物情報抽出装置、人物情報抽出方法、及び人物情報抽出プログラムを提供することを目的とする。 The present invention has been made in view of the above-described problems, and a person information extraction device, a person information extraction method, and a person for efficiently acquiring various types of information about a person included in a captured image An object is to provide an information extraction program.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

本発明では、撮像手段により撮影された映像に含まれる人物に関する情報を抽出する人物情報抽出装置において、前記映像に含まれる時系列の各画像のうち、所定の画像から１又は複数の人物に対する顔領域を検出する顔領域検出手段と、前記顔領域検出手段により得られる顔領域を用いて人体領域を検出する人体領域検出手段と、前記顔領域検出手段により得られる検出結果と、前記人体領域検出手段により得られる検出結果とを統合する人物情報統合手段とを有し、前記人体領域検出手段は、前記撮像手段の設置位置、画角、及び前記顔領域を基準とした前記撮像手段と前記人物との水平距離とに基づいて前記人物の身長を推定する身長推定手段を有することを特徴とする。これにより、撮影された画像に含まれる人物に関する多種の情報を効率的に取得することができる。 In the present invention, in a person information extraction device that extracts information about a person included in a video captured by an imaging unit, a face for one or more persons from a predetermined image among time-series images included in the video A face area detecting means for detecting an area; a human body area detecting means for detecting a human body area using the face area obtained by the face area detecting means; a detection result obtained by the face area detecting means; and the human body area detection. Human information integration means for integrating the detection results obtained by the means, wherein the human body region detection means includes the imaging means and the person based on the installation position, the angle of view, and the face area of the imaging means. And a height estimating means for estimating the height of the person based on the horizontal distance between the person and the person. Thereby, it is possible to efficiently acquire various kinds of information related to the person included in the photographed image.

また本発明では、前記人体領域検出手段は、前記顔領域のサイズを基準として、前記人体領域に含まれる前記人物に関する色情報を抽出する色情報抽出手段とを有することを特徴とする。 In the present invention, the human body region detecting means includes color information extracting means for extracting color information relating to the person included in the human body region on the basis of the size of the face region.

また本発明では、前記色情報抽出手段は、前記顔領域のサイズを基準として、頭部領域、肌領域、上半身領域、及び下半身領域のうち少なくとも１つの色情報又は画像データを抽出することを特徴とする。 In the present invention, the color information extraction unit extracts at least one color information or image data from a head region, a skin region, an upper body region, and a lower body region on the basis of the size of the face region. And

また本発明では、前記人物情報統合手段において、統合された人物毎の人物情報を映像中の画像フレーム毎に生成するフレーム情報生成手段を有することを特徴とする。 Further, the present invention is characterized in that the person information integration means includes frame information generation means for generating person information for each integrated person for each image frame in the video.

更に本発明では、撮像手段により撮影された映像に含まれる人物に関する情報を抽出する人物情報抽出方法において、前記映像に含まれる時系列の各画像のうち、所定の画像から１又は複数の人物に対する顔領域を検出する顔領域検出手順と、前記顔領域検出手順により得られる顔領域を用いて人体領域を検出する人体領域検出手順と、前記顔領域検出手順により得られる検出結果と、前記人体領域検出手順により得られる検出結果とを統合する人物情報統合手順とを有し、前記人体領域検出手順は、前記撮像手段の設置位置、画角、及び前記顔領域を基準とした前記撮像手段と前記人物との水平距離とに基づいて前記人物の身長を推定する身長推定手順を有することを特徴とする。これにより、撮影された画像に含まれる人物に関する多種の情報を効率的に取得することができる。 Furthermore, in the present invention, in a person information extraction method for extracting information about a person included in a video taken by an imaging unit, one or more persons from a predetermined image among time-series images included in the video are extracted. A face area detection procedure for detecting a face area, a human body area detection procedure for detecting a human body area using the face area obtained by the face area detection procedure, a detection result obtained by the face area detection procedure, and the human body area A human information integration procedure that integrates the detection results obtained by the detection procedure, and the human body region detection procedure includes the imaging unit based on the installation position, the angle of view, and the face region of the imaging unit; It has a height estimation procedure for estimating the height of the person based on the horizontal distance to the person. Thereby, it is possible to efficiently acquire various kinds of information related to the person included in the photographed image.

また本発明では、前記人体領域検出手順は、前記顔領域のサイズを基準として、前記人体領域に含まれる前記人物に関する色情報を抽出する色情報抽出手順とを有することを特徴とする。 According to the present invention, the human body region detection procedure includes a color information extraction procedure for extracting color information related to the person included in the human body region on the basis of the size of the face region.

また本発明では、前記色情報抽出手順は、前記顔領域のサイズを基準として、頭部領域、肌領域、上半身領域、及び下半身領域のうち少なくとも１つの色情報又は画像データを抽出することを特徴とする。 In the present invention, the color information extraction procedure extracts at least one color information or image data from a head region, a skin region, an upper body region, and a lower body region based on the size of the face region. And

また本発明では、前記人物情報統合手順において、統合された人物毎の人物情報を映像中の画像フレーム毎に生成するフレーム情報生成手段を有することを特徴とする。 In the present invention, the person information integration procedure includes frame information generation means for generating individual information for each integrated person for each image frame in the video.

更に本発明では、撮像手段により撮影された映像に含まれる人物に関する情報を抽出する人物情報抽出プログラムにおいて、コンピュータを、前記映像に含まれる時系列の各画像のうち、所定の画像から１又は複数の人物に対する顔領域を検出する顔領域検出手段、前記顔領域検出手段により得られる顔領域を用いて人体領域を検出する人体領域検出手段、及び、前記顔領域検出手段により得られる検出結果と、前記人体領域検出手段により得られる検出結果とを統合する人物情報統合手段として機能させ、前記人体領域検出手段は、前記撮像手段の設置位置、画角、及び前記顔領域を基準とした前記撮像手段と前記人物との水平距離とに基づいて前記人物の身長を推定する身長推定手段を有することを特徴とする。これにより、撮影された画像に含まれる人物に関する多種の情報を効率的に取得することができる。また、プログラムをインストールすることにより、汎用のパーソナルコンピュータ等で本発明における人物情報抽出処理を容易に実現することができる。 Further, according to the present invention, in a person information extraction program for extracting information about a person included in a video captured by an imaging unit, the computer is configured to select one or more from a predetermined image among time-series images included in the video. A face area detection means for detecting a face area for a person, a human body area detection means for detecting a human body area using the face area obtained by the face area detection means, and a detection result obtained by the face area detection means; The human body region detection unit functions as a person information integration unit that integrates the detection result obtained by the human body region detection unit, and the human body region detection unit is configured to use the imaging unit based on the installation position, the angle of view, and the face region. And a height distance estimating means for estimating the height of the person based on the horizontal distance from the person. Thereby, it is possible to efficiently acquire various kinds of information related to the person included in the photographed image. Also, by installing the program, the personal information extraction process of the present invention can be easily realized by a general-purpose personal computer or the like.

本発明によれば、撮影された画像に含まれる人物に関する多種の情報を効率的に取得することができる。 ADVANTAGE OF THE INVENTION According to this invention, the various information regarding the person contained in the image | photographed image can be acquired efficiently.

本実施形態における人物情報抽出装置の概略構成の一例を示す図である。It is a figure which shows an example of schematic structure of the person information extraction apparatus in this embodiment. 本実施形態における身長推定例を説明するための図である。It is a figure for demonstrating the example of height estimation in this embodiment. 撮影された画像内に複数の人物がいた場合の身長推定について説明するための図である。It is a figure for demonstrating the height estimation in case a some person exists in the image | photographed image. 本実施形態において取得される色情報の具体例を示す図である。It is a figure which shows the specific example of the color information acquired in this embodiment. 撮影された画像内に複数の人物がいた場合の取得される色情報の設定例を示す図である。It is a figure which shows the example of a setting of the color information acquired when there exist several persons in the image | photographed image. 本実施形態におけるフレーム情報の一例を示す図である。It is a figure which shows an example of the frame information in this embodiment. 本実施形態における人物情報抽出処理が実現可能なハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions which can implement | achieve the person information extraction process in this embodiment. 本実施形態における人物情報抽出処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the person information extraction process procedure in this embodiment.

＜本発明について＞
本発明は、単一のカメラ等の撮像手段により撮影された映像に含まれる時系列の各画像に対して、その画像毎に含まれる人物に関する情報を取得する。その際、最初に画像に含まれる人物の顔領域を検出する。次に、検出された顔領域に基づいて身長や頭髪、服の色情報等の多種の人物情報からなる人体領域を検出する。また、本発明では、上述した顔領域検出及び人体領域検出により得られる画像を統合する。
これにより、多種の人物情報を高精度に抽出すると共に高速化を実現することができる。 <About the present invention>
The present invention acquires information about a person included in each image for each time-series image included in an image captured by an imaging unit such as a single camera. At that time, the face area of the person included in the image is first detected. Next, based on the detected face area, a human body area including various kinds of personal information such as height, hair, and clothes color information is detected. In the present invention, the images obtained by the above-described face area detection and human body area detection are integrated.
As a result, it is possible to extract various kinds of person information with high accuracy and realize high speed.

以下に、本発明における人物情報抽出装置、人物情報抽出方法、及び人物情報抽出プログラムを好適に実施した形態について、図面を用いて説明する。 Hereinafter, embodiments in which a person information extraction device, a person information extraction method, and a person information extraction program according to the present invention are suitably implemented will be described with reference to the drawings.

＜人物情報抽出装置の概略構成例＞
図１は、本実施形態における人物情報抽出装置の概略構成の一例を示す図である。図１に示す人物情報抽出装置１０は、入力手段１１と、出力手段１２と、蓄積手段１３と、顔領域検出手段１４と、人体領域検出手段１５と、人物情報統合手段１６と、フレーム情報生成手段１７と、送受信手段１８と、制御手段１９とを有するよう構成されている。なお、送受信手段１８には、人物情報を抽出するための人物を撮影するカメラ等の撮像手段２０が接続されており、撮像手段２０により撮影された映像に含まれる時系列の各画像を取得することができる。また、撮像手段２０は、人物情報抽出装置１０と一体に構成されていてもよい。 <Schematic configuration example of person information extraction device>
FIG. 1 is a diagram illustrating an example of a schematic configuration of a person information extraction device according to the present embodiment. 1 includes an input unit 11, an output unit 12, a storage unit 13, a face region detection unit 14, a human body region detection unit 15, a person information integration unit 16, and a frame information generation unit. Means 17, transmission / reception means 18, and control means 19 are provided. Note that the transmission / reception means 18 is connected to an imaging means 20 such as a camera for photographing a person for extracting person information, and acquires each time-series image included in the video photographed by the imaging means 20. be able to. Further, the imaging unit 20 may be configured integrally with the person information extraction device 10.

入力手段１１は、ユーザ等からの顔領域検出指示や、人体領域検出指示、人物情報統合指示、フレーム情報生成指示、送受信指示等の各種指示を受け付ける。なお、入力手段１１は、例えばキーボードや、マウス等のポインティングデバイス、マイク等の音声入力デバイス等からなる。 The input unit 11 receives various instructions such as a face area detection instruction, a human body area detection instruction, a person information integration instruction, a frame information generation instruction, and a transmission / reception instruction from a user or the like. Note that the input unit 11 includes, for example, a keyboard, a pointing device such as a mouse, a voice input device such as a microphone, and the like.

出力手段１２は、入力手段１１により入力された指示内容や、各指示内容に基づいて生成された制御データや顔領域検出手段１４、人体領域検出手段１５、人物情報統合手段１６、フレーム情報生成手段１７、送受信手段１８等の各構成により実行された経過又は結果により得られる各種情報の内容を表示したり、その音声を出力する。なお、出力手段１２は、ディスプレイ等の画面表示機能やスピーカ等の音声出力機能等を有する。 The output unit 12 includes the instruction content input by the input unit 11, control data generated based on each instruction content, the face region detection unit 14, the human body region detection unit 15, the person information integration unit 16, and the frame information generation unit. 17. The contents of various information obtained by the progress or result executed by each component such as the transmission / reception means 18 are displayed and the sound is output. The output unit 12 has a screen display function such as a display, a sound output function such as a speaker, and the like.

蓄積手段１３は、上述した本実施形態を実現するための様々な情報を蓄積することができ、必要に応じて読み出しや書き込みが行われる。具体的には、蓄積手段１３は、顔の認証や、性別・年代等を推定するのに使用される各種特徴量データや、顔領域検出手段１４における顔領域検出結果、人体領域検出手段１５における人体領域検出結果、人物情報統合手段１６における人物情報統合結果、フレーム情報生成手段１７におけるフレーム情報結果、送受信手段１８、制御手段１９により制御された情報、エラー発生時のエラー情報、ログ情報、本発明を実現するためのプログラム等の各情報が蓄積される。 The storage unit 13 can store various information for realizing the above-described embodiment, and reading and writing are performed as necessary. Specifically, the accumulating unit 13 includes various feature amount data used for face authentication, gender and age estimation, face area detection results in the face area detecting unit 14, and human body area detecting unit 15. Human body region detection result, person information integration result in person information integration means 16, frame information result in frame information generation means 17, information controlled by transmission / reception means 18 and control means 19, error information at the time of error occurrence, log information, book Each piece of information such as a program for realizing the invention is stored.

顔領域検出手段１４は、画像中に人物が含まれていると判断した場合、その人物の顔領域を検出する。顔領域検出手段１４は、カメラ等の撮像手段２０等により撮影された映像を取得し、その取得した映像に含まれる時系列の各画像のうち、所定の画像（各フレーム画像や数フレーム分の間隔を空けた画像等）について１又は複数の人物の顔を検出する。 When it is determined that the person is included in the image, the face area detecting unit 14 detects the face area of the person. The face area detection unit 14 acquires a video imaged by the imaging unit 20 such as a camera, and among the time-series images included in the acquired video, a predetermined image (for each frame image or several frames). The face of one or a plurality of persons is detected with respect to an image or the like at intervals.

具体的には、顔領域検出手段１４は、例えば撮影された画像に含まれる顔における目や鼻、口等の位置情報からその顔の特徴量を取得し、予め設定された顔として検出されるための特徴量の照合パターンを用いたマッチング処理等を行うことにより人物の顔を検出する。また、顔領域検出手段１４は、上述の顔検出処理に限定されず、例えばエッジ検出や形状パターン検出による顔検出、色相抽出又は肌色抽出による顔検出等を用いることができる。なお、顔領域検出手段１４は、顔検出後、画像中の顔の縦幅、横幅からなる矩形の顔領域等を検出する。 Specifically, the face area detection unit 14 acquires the feature amount of the face from position information such as eyes, nose, and mouth in the face included in the photographed image, for example, and detects it as a preset face. For example, a human face is detected by performing a matching process using a feature amount matching pattern. Further, the face area detection unit 14 is not limited to the above-described face detection processing, and for example, face detection by edge detection or shape pattern detection, face detection by hue extraction or skin color extraction, or the like can be used. Note that the face area detection unit 14 detects a rectangular face area having a vertical and horizontal width of the face in the image after detecting the face.

また、顔領域検出手段１４は、顔領域の中心座標（位置情報）、及び領域の画像上の大きさ（サイズ）を検出し、その顔領域を所定形状により元の画像に合成して顔領域が明確に分かるように画面表示できるような各種情報を蓄積する。なお、顔領域の形状は、本発明においては、矩形や円形、楕円形、他の多角形、人物の顔の外形形状から所定倍率で拡大させたシルエット形状等であってもよい。 The face area detecting means 14 detects the center coordinates (position information) of the face area and the size (size) of the area on the image, and synthesizes the face area with the original image with a predetermined shape. Accumulate various information that can be displayed on the screen so that can be clearly understood. In the present invention, the shape of the face region may be a rectangle, a circle, an ellipse, another polygon, a silhouette shape enlarged from the outer shape of a human face at a predetermined magnification, or the like.

また、顔領域検出手段１４は、トラッキング手段２１と、顔認証手段２２と、性別・年代推定手段２３と、顔隠し判定手段２４とを有している。トラッキング手段２１は、検出された顔領域を用いて人物追跡を行い、その際、画像フレーム中に含まれる複数の顔領域をそれぞれ識別して蓄積するために識別情報（トラッキングＩＤ）を割り当てる。また、また、トラッキング手段２１は、割り当てたトラッキングＩＤについて、トラッキングの状態により、「未使用」、「フレームイン」、「フレームアウト」、「追跡中」の４種の状態を出力する。 The face area detection unit 14 includes a tracking unit 21, a face authentication unit 22, a gender / age estimation unit 23, and a face hiding determination unit 24. The tracking means 21 performs person tracking using the detected face area, and assigns identification information (tracking ID) in order to identify and accumulate each of the plurality of face areas included in the image frame. Further, the tracking means 21 outputs four types of statuses of “unused”, “frame-in”, “frame-out”, and “tracking” for the allocated tracking ID depending on the tracking status.

顔認証手段２２は、検出された顔領域から目や鼻、口等の各配置情報からなる顔の特徴量を取得し、取得した特徴量と、予め蓄積手段１３等にデータベース（ＤＢ）等として蓄積された既に登録されている人物（登録人物）や芸能人等の人物の顔の特徴量とを用いて顔認証処理を行い、認証された人物の識別情報（ＩＤ）を出力する。また、顔認証手段２２は、蓄積手段１３に蓄積された顔画像のＩＤを、人物（登録人物・芸能人）の名前に変換して出力してもよい。 The face authentication unit 22 acquires a facial feature amount including each piece of arrangement information such as eyes, nose, and mouth from the detected face area, and stores the acquired feature amount in advance as a database (DB) or the like in the storage unit 13 or the like. Face authentication processing is performed using the accumulated facial features of persons such as registered persons (registered persons) and entertainers, and the identification information (ID) of the authenticated person is output. Further, the face authentication means 22 may convert the face image ID stored in the storage means 13 into the name of a person (registered person / celebrity) and output it.

性別・年代推定手段２３は、検出された顔領域から、顔の特徴量を取得し、取得した顔の特徴量と、蓄積手段１３により性別、年代を推定し、結果を出力する。 The gender / age estimation means 23 acquires a facial feature quantity from the detected face area, estimates the gender and age using the acquired facial feature quantity and the storage means 13, and outputs the result.

顔隠し判定手段２４は、検出された顔領域に対し、マスクやサングラス等の顔を隠すための処理が行われているか否かを検出する。具体的には、顔隠し判定手段２４は、顔領域から推測される目領域、口領域の色情報を取得し、目領域の色情報がサングラスを着用していると想定される色（例えば黒系）であるときには、サングラスにより顔が隠されていると判定し、口領域がマスクをそれぞれについて着用していると想定される色（例えば白系）であるときには、マスクにより顔が隠されていると判定し、その判定結果を出力する。 The face hiding determination unit 24 detects whether or not a process for hiding the face such as a mask or sunglasses is performed on the detected face area. Specifically, the face hiding determination means 24 acquires color information of the eye area and mouth area estimated from the face area, and the color information of the eye area is assumed to be wearing sunglasses (for example, black). System), it is determined that the face is hidden by sunglasses, and when the mouth area is a color assumed to be wearing a mask (for example, white), the face is hidden by the mask. And the determination result is output.

人体領域検出手段１５は、顔領域検出手段１４により顔領域が検出された場合に、その顔領域情報に基づいて、所定の画像（各フレーム画像や数フレーム分の間隔を空けた画像等）から１又は複数の人体領域を検出する。 When the face area is detected by the face area detecting means 14, the human body area detecting means 15 is based on the face area information from a predetermined image (such as each frame image or an image spaced several frames apart). One or more human body regions are detected.

具体的には、人体領域検出手段１５は、例えば連続する画像フレーム同士を比較して、色情報（輝度、色度等）が所定時間内に変化する場所が存在し、更にその場所で囲まれる領域が所定の領域以上のもの、又は経時的な移動範囲が所定の範囲内のものを人体領域として検出する。なお、人体検出手法については、本発明においてはこれに限定されない。 Specifically, the human body region detection unit 15 compares, for example, successive image frames, and there is a place where the color information (luminance, chromaticity, etc.) changes within a predetermined time, and is further surrounded by the place. A human body region is detected when the region is equal to or larger than the predetermined region or when the moving range with time is within the predetermined range. The human body detection method is not limited to this in the present invention.

また、人体領域検出手段１５は、人体領域の中心座標、及び領域の画像上の大きさを検出し、その顔領域を所定形状により元の画像に合成して人体領域が明確に分かるように画面表示できるような各種情報を蓄積する。なお、人体領域の形状は、上述した顔領域と同様に、矩形や円形、楕円形、他の多角形、人物の外形形状から所定倍率で拡大させたシルエット形状等であってもよい。 The human body region detecting means 15 detects the center coordinates of the human body region and the size of the region on the image, and combines the face region with the original image with a predetermined shape so that the human body region can be clearly seen. Accumulate various information that can be displayed. The shape of the human body region may be a rectangle, a circle, an ellipse, another polygon, a silhouette shape enlarged at a predetermined magnification from the outer shape of a person, and the like, similar to the face region described above.

また、人体領域検出手段１５は、身長推定手段２５と、色情報抽出手段２６と、人物位置推定手段２７とを有している。 The human body region detection unit 15 includes a height estimation unit 25, a color information extraction unit 26, and a person position estimation unit 27.

身長推定手段２５は、検出された人体領域、顔領域それぞれに対して、人物の身長を算出する。本実施形態における具体的な身長推定手法については、後述する。 The height estimation means 25 calculates the height of the person for each of the detected human body area and face area. A specific height estimation method in the present embodiment will be described later.

色情報抽出手段２６は、検出された人体領域及び顔領域のそれぞれの領域情報の位置関係を用いて、人物の頭部領域や肌領域、上半身領域、下半身領域を推定し、各領域の色情報を抽出する。具体的には、頭髪、上衣、下衣等の色情報を抽出し、抽出した色情報の代表色（平均色等）を頭髪、上衣、下衣等の色情報として決定する。 The color information extraction unit 26 estimates the head region, skin region, upper body region, and lower body region of the person using the positional relationship between the detected region information of the human body region and the face region, and color information of each region To extract. Specifically, color information such as hair, upper garment, and lower garment is extracted, and a representative color (average color, etc.) of the extracted color information is determined as color information such as hair, upper garment, and lower garment.

なお、上述した各領域の色情報の算出では、各領域の総画素数・色の出現頻度をカウントし、例えば上位１０色についてのＲＧＢ値を出力する。ここで、上述した各領域は、具体的には、顔領域の上部から頭部領域、顔領域の左右から頬（肌）領域、人体領域の中心よりも上部で顔領域よりも下部から上半身領域、人体領域の中心より下部或いは地面から上部で下半身領域を抽出する。なお、抽出される領域の設定内容は、本発明においてはこれに限定されない。更に、色情報抽出手段２６は、顔領域、人体領域、頭部領域、肌領域、上半身領域、下半身領域の少なくとも１つの領域の画像データをそのまま切り取って出力することもできる。本実施形態において取得される色情報の具体例については、後述する。 In the above-described calculation of the color information of each area, the total number of pixels and the appearance frequency of each area are counted, and, for example, RGB values for the top 10 colors are output. Here, each of the above-described areas is specifically the head area from the top of the face area, the cheek (skin) area from the left and right of the face area, and the upper body area from the bottom of the face area above the center of the human body area. The lower body region is extracted below the center of the human body region or above the ground. In addition, the setting content of the area | region extracted is not limited to this in this invention. Further, the color information extracting means 26 can cut out and output the image data of at least one of the face area, the human body area, the head area, the skin area, the upper body area, and the lower body area as it is. Specific examples of the color information acquired in this embodiment will be described later.

人物位置推定手段２７は、検出された人体領域、顔領域それぞれに対して、人物の実空間上での位置座標を算出する。この場合、人物位置推定手段２７は、カメラから撮影された画像から得られる２次元座標に対して予め設定される変換式により３次元の実空間上での位置座標（Ｘ，Ｙ，Ｚ）を取得する。 The person position estimating means 27 calculates the position coordinates of the person in the real space for each of the detected human body area and face area. In this case, the person position estimating means 27 calculates the position coordinates (X, Y, Z) in the three-dimensional real space by a conversion formula set in advance for the two-dimensional coordinates obtained from the image taken by the camera. get.

人物情報統合手段１６は、顔領域と人体領域とを同一人物として対応付けて、その人物の特徴を統合する。具体的には、人物情報統合手段１６は、画像中における顔領域の重心座標を取得し、取得した座標を包含する人体領域があった場合、その顔領域及び人体領域は、同一人物によるものであるとして対応付けを行う。 The person information integration unit 16 associates the face area and the human body area as the same person, and integrates the characteristics of the person. Specifically, the person information integration unit 16 acquires the center of gravity coordinates of the face area in the image, and when there is a human body area that includes the acquired coordinates, the face area and the human body area are from the same person. Associating as there is.

更に、人物情報統合手段１６は、１つの人体領域が２つ以上の顔領域を包含する場合、全ての顔領域に同じ人体領域を対応付けておく。これにより、追跡処理等において映像の途中で複数の人物が画面上で一時的に重なった場合にも、途切れることなく継続して追跡することができる。なお、人物の管理はＩＤ等の識別情報を用いて行う。なお、上述した人体領域と顔領域とを同一人物のものとして対応付ける処理については、本発明においてはこれに限定されるものではなく、例えば人物の姿勢や向き等を抽出し、抽出した情報を用いて対応付けを行ってもよい。 Furthermore, when one human body region includes two or more face regions, the person information integration unit 16 associates the same human body region with all the face regions. Thereby, even when a plurality of persons temporarily overlap on the screen during the tracking process or the like, it can be continuously tracked without interruption. Persons are managed using identification information such as IDs. Note that the processing for associating the human body region and the face region as those of the same person is not limited to this in the present invention. For example, the posture and orientation of the person are extracted and the extracted information is used. May be associated with each other.

フレーム情報生成手段１７は、上述した人物情報統合手段１６により統合された人物毎の人物情報を、映像に含まれる画像の１フレーム毎に格納したフレーム情報を生成する。これにより、フレーム単位で人物の管理を行うことができ、フレーム毎にその撮影された時間情報が付与されているので、検索の際に、どの時間にどのような人物が何人いるか等、多種の検索キーワードを用いて様々な検索を高精度に実現することができる。なお、本実施形態における具体的なフレーム情報の構成等については、後述する。 The frame information generation unit 17 generates frame information in which the person information for each person integrated by the person information integration unit 16 described above is stored for each frame of the image included in the video. As a result, the person can be managed in units of frames, and the time information of the shots is given for each frame. Various searches can be realized with high accuracy using search keywords. Note that a specific configuration of frame information in the present embodiment will be described later.

送受信手段１８は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）やインターネット等の通信ネットワーク等を介して単一の撮像手段２０からの監視映像を受信する。なお、送受信手段１８は、撮像手段２０から直接監視映像を受信しなくてもよく、例えば予め撮像手段２０で取得した映像をどこかに一時的に保存しておき、その保存された情報を用いて本実施形態における人物情報抽出を行ってもよい。 The transmission / reception means 18 receives a monitoring video from a single imaging means 20 via a communication network such as a LAN (Local Area Network) or the Internet. Note that the transmission / reception means 18 does not have to receive the monitoring video directly from the imaging means 20. For example, the transmission / reception means 18 temporarily stores the video acquired in advance by the imaging means 20 somewhere and uses the saved information. The person information extraction in this embodiment may be performed.

また、送受信手段１８は、人物情報抽出装置１０を構成する他の端末に送信したり、他の端末から各種データを受信するための通信インタフェースとして用いることができる。 Moreover, the transmission / reception means 18 can be used as a communication interface for transmitting to other terminals constituting the person information extracting apparatus 10 and receiving various data from other terminals.

制御手段１９は、人物情報抽出装置１０における各機能構成全体の制御を行う。具体的には、制御手段１９は、入力手段１１により入力されたユーザからの入力情報に基づいて顔領域検出を行ったり、人体領域検出を行ったり、人物情報を統合したり、フレーム情報を生成したり、送受信制御を行う等の各種制御を行う。 The control means 19 controls the entire functional configuration of the person information extraction apparatus 10. Specifically, the control unit 19 performs face area detection, human body area detection, person information integration, or frame information generation based on input information from the user input by the input unit 11. And various controls such as performing transmission / reception control.

＜本実施形態における身長推定例＞
ここで、本実施形態における身長推定例について、具体的に説明する。図２は、本実施形態における身長推定例を説明するための図である。図２に示すように撮像手段２０であるカメラが所定の箇所に設置され、カメラの画角θ内に存在する人物３０に対して、身長を推定する。なお、図２の例では、画角θの中に人物の顔及び人体の全部があるため、顔領域形状３１及び人体領域形状３２が示されている。 <Example of height estimation in this embodiment>
Here, an example of height estimation in the present embodiment will be specifically described. FIG. 2 is a diagram for explaining an example of height estimation in the present embodiment. As shown in FIG. 2, a camera that is the imaging means 20 is installed at a predetermined location, and the height is estimated for a person 30 existing within the angle of view θ of the camera. In the example of FIG. 2, the face area shape 31 and the human body area shape 32 are shown because there are all of the human face and human body in the angle of view θ.

具体的には、本実施形態では、例えば以下に示す（１）式により身長Ｈを推定する。 Specifically, in the present embodiment, the height H is estimated by, for example, the following expression (1).

ここで、上述した（１）式において、ＣａｍＺはカメラを設置した地上からの高さを示し、Ｄは現在のカメラと人物３０との水平距離を示しており、αはＣａｍＺの高さを基準とした水平線に対する顔領域の中心までの角度を示している。また、上述のＤは、以下に示す（２）式で算出される。

Here, in the above-described equation (1), CamZ indicates the height from the ground where the camera is installed, D indicates the horizontal distance between the current camera and the person 30, and α is the height of CamZ. The angle to the center of the face area with respect to the horizontal line is shown. The above D is calculated by the following equation (2).

ここで、上述した（２）式において、Ｓは検出された顔領域３１のサイズ（１辺の長さ（ｐｉｘｅｌｓ））を示し、θ_Ｈは、現在のカメラの水平画角を示し、θ_Ｖは現在のカメラの垂直画角を示し、Ｄ_ｂは基準値測定時のカメラと人物の水平距離を示し、Ｓ_ｂは基準値測定時に検出された顔領域のサイズ（１辺の長さ（ｐｉｘｅｌｓ））を示し、θ_Ｈｂは基準値測定時のカメラの水平画角を示し、θ_Ｖｂは基準値測定時のカメラの垂直画角を示している。

Here, in the above equation (2), S indicates the size of the detected face region 31 (length of one side (pixels)), θ _H indicates the horizontal angle of view of the current camera, and θ _V Indicates the vertical angle of view of the current camera, D _b indicates the horizontal distance between the camera and the person at the time of measuring the reference value, and S _b indicates the size of the face area (length of one side (pixels) detected at the time of measuring the reference value. )), Θ _Hb represents the horizontal angle of view of the camera when measuring the reference value, and θ _Vb represents the vertical angle of view of the camera when measuring the reference value.

つまり、本実施形態では、画像中の顔の大きさから遠近法を用いてカメラまでの距離Ｄを推定し、カメラと人までの距離Ｄに基づいて、上述したように身長を推定する。なお、本実施形態では、身長を推定するために、予め基準値を測定する必要がある。その基準値とは、カメラと人物までの距離が既知の状態でのカメラ画角、及び、カメラと人物間の水平距離である。 That is, in the present embodiment, the distance D to the camera is estimated from the size of the face in the image using a perspective method, and the height is estimated as described above based on the distance D between the camera and the person. In the present embodiment, it is necessary to measure the reference value in advance in order to estimate the height. The reference values are the camera angle of view when the distance between the camera and the person is known, and the horizontal distance between the camera and the person.

また、図３は、撮影された画像内に複数の人物がいた場合の身長推定について説明するための図である。図３には、複数の例として２人の人物３０−１，３０−２が撮影されており、撮影された画像からそれぞれ顔領域３１−１，３１−２を抽出することができるが、撮像手段２０としてのカメラの画角θ内に全身は含まれていないため、人体領域形状は、抽出されない。しかしながら、そのような場合でも身長を推定することができる。 FIG. 3 is a diagram for explaining the height estimation when there are a plurality of persons in the photographed image. In FIG. 3, two persons 30-1 and 30-2 are photographed as a plurality of examples, and the face regions 31-1 and 31-2 can be extracted from the photographed images. Since the whole body is not included in the angle of view θ of the camera as the means 20, the human body region shape is not extracted. However, even in such a case, the height can be estimated.

具体的には、撮影されている人物３０−１，３０−２に対して、画像中の顔領域の大きさから得られるカメラと人物の水平距離Ｄ１，Ｄ２、カメラの高さ（ＣａｍＺ）を基準とした水平線に対するそれぞれの顔領域３１−１，３１−２の中心までの角度α１，α２として身長を推定することができる。 Specifically, for the photographed persons 30-1 and 30-2, the horizontal distances D1 and D2 between the camera and the person obtained from the size of the face area in the image, and the camera height (CamZ). Height can be estimated as angles α1 and α2 to the centers of the face areas 31-1 and 31-2 with respect to the reference horizontal line.

なお、上述した身長推定手法以外にも様々な推定手法を用いることができ、例えば画像中に含まれる１又は複数の人物を検出し、その人物がいた場合に新規の人物であるか否かを判断し、新規人物である場合、その人体領域の足の先と、その画像に対応する消失点とから実空間上の人物の位置を算出し、画像上の見かけ上の大きさを併せて、実際の人物の身長を推定することもできる。消失点は、画像中に含まれる場合もあるが、カメラのアングル等により画像中に含まれない場合もある。この場合には、仮想的な空間上に消失点を設定し、その消失点を利用する。 Various estimation methods other than the above-described height estimation method can be used. For example, when one or a plurality of persons included in an image are detected and the person is present, whether or not the person is a new person is determined. Judging, if it is a new person, calculate the position of the person in real space from the tip of the foot of the human body region and the vanishing point corresponding to the image, together with the apparent size on the image, You can also estimate the actual height of a person. The vanishing point may be included in the image, but may not be included in the image depending on the camera angle or the like. In this case, a vanishing point is set on a virtual space and the vanishing point is used.

＜本実施形態において取得される色情報の具体例＞
次に、本実施形態において取得される色情報の具体例について説明する。図４は、本実施形態において取得される色情報の具体例を示す図である。例えば撮影された画像から図４に示す人物画像が取得できた場合、人物の頭部３０ｈから髪の毛の色を取得するための頭部領域４１を取得する。この頭部領域４１は、図４に示すように顔領域３１の位置を基準とした上部にあり、縦方向（１辺）の長さｄに対して更にｄ／２の範囲内で少なくとも１つ設定される（図４の例では１つ）。また、上半身領域４２は、図４に示すように顔領域３１の位置を基準とした下部にあり、長さｄに対して更にｄの範囲内で人物の上半身３０ｂのうち少なくとも１つ設定される（図４の例では１つ）。 <Specific example of color information acquired in this embodiment>
Next, a specific example of color information acquired in this embodiment will be described. FIG. 4 is a diagram illustrating a specific example of the color information acquired in the present embodiment. For example, when the person image shown in FIG. 4 can be acquired from the captured image, the head region 41 for acquiring the hair color is acquired from the head 30h of the person. As shown in FIG. 4, the head region 41 is at the top with respect to the position of the face region 31, and at least one within a range of d / 2 with respect to the length d in the vertical direction (one side). It is set (one in the example of FIG. 4). Further, as shown in FIG. 4, the upper body area 42 is at the lower part with respect to the position of the face area 31, and at least one of the upper body 30 b of the person is set within a range of d with respect to the length d. (One in the example of FIG. 4).

つまり、顔領域検出手段１４により検出された顔領域のサイズを基準に図４に示す領域を決定することができる。なお、各領域の設定内容は、本発明においては上記内容に限定されず、例えば上述した人体領域を基準に、地面からの高さによって設定することもできる。なお、その場合には、実空間上の領域情報になるため、画像面上の位置座標への変換が必要になる。 That is, the area shown in FIG. 4 can be determined based on the size of the face area detected by the face area detecting means 14. In addition, the setting content of each area | region is not limited to the said content in this invention, For example, it can also set with the height from the ground on the basis of the human body area | region mentioned above. In this case, since the area information is in real space, it is necessary to convert it into position coordinates on the image plane.

また、図５は、撮影された画像内に複数の人物がいた場合の取得される色情報の設定例を示す図である。図５の画面５０には、複数の例として２人の人物３０−１，３０−２が撮影されており、撮影された画像からそれぞれ顔領域３１−１，３１−２が抽出されている。この場合には、それぞれの顔領域３１−１，３１−２の位置を基準として、それぞれの頭部領域４１−１，４１−２が設定され、更にそれぞれの上半身領域４２が設定されて、各領域の色情報を取得する。なお、図５の例では、人物３０−１の場合には、上半身領域４２−１ａ，４２−１ｂ，４２−１ｃが設定され、人物３０−２の場合には、上半身領域４２−２ａ，４２−２ｂが設定されている。なお、本実施形態において設定される領域や数は、図５に限定されず、例えば肌の色や下半身領域も設定することができる。更には、各領域の色情報だけでなく、その領域の画像そのものを切り取って蓄積していてもよい。 FIG. 5 is a diagram illustrating an example of setting color information acquired when there are a plurality of persons in a captured image. In the screen 50 of FIG. 5, two persons 30-1 and 30-2 are photographed as a plurality of examples, and face areas 31-1 and 31-2 are extracted from the photographed images, respectively. In this case, the head regions 41-1 and 41-2 are set based on the positions of the face regions 31-1 and 31-2, and the upper body regions 42 are further set. Get the color information of the area. In the example of FIG. 5, in the case of the person 30-1, upper body areas 42-1a, 42-1b, and 42-1c are set, and in the case of the person 30-2, the upper body areas 42-2a and 42. -2b is set. In addition, the area | region and number set in this embodiment are not limited to FIG. 5, For example, a skin color and a lower body area | region can also be set. Further, not only the color information of each area but also the image of the area itself may be cut out and stored.

なお、色情報は、上述したようにそれぞれの領域について、例えば平均色のＲＧＢ値により取得することができるが、ＲＧＢ値ではなく視覚特性を考慮した色空間であるＨＳＶ値に変換して出力してもよい。 As described above, the color information can be acquired for each area by, for example, the RGB value of the average color. However, the color information is not converted to the RGB value but is converted into an HSV value that is a color space in consideration of visual characteristics and is output. May be.

＜本実施形態における具体的なフレーム情報の構成例＞
次に、上述したフレーム情報生成手段１７におけるフレーム情報の構成例について説明する。図６は、本実施形態におけるフレーム情報の一例を示す図である。フレーム情報生成手段１７は、１フレーム毎に処理を行い検出された人物について、図６に示す情報が抽出される。ここで、図６に示すフレーム情報としては、例えば共通の項目として「ファイル名」、「検出日時」があり、顔領域から抽出される情報として「人物位置座標（Ｘ，Ｙ，Ｚ）」、「身長情報」、「各種色情報」、「登録者情報」、「似ている芸能人情報」、「年代情報」、「性別情報」、「顔の向き（ＰＡＮ，ＴＩＬＴ）」、「マスク」、「サングラス」、「取得顔画像情報」、があり「人物位置座標（Ｘ，Ｙ，Ｚ）」、「身長情報」、「各種色情報」等がある。 <Specific Frame Information Configuration Example in the Present Embodiment>
Next, a configuration example of frame information in the above-described frame information generation unit 17 will be described. FIG. 6 is a diagram illustrating an example of frame information in the present embodiment. The frame information generation unit 17 extracts information shown in FIG. 6 for the person detected by performing processing for each frame. Here, the frame information shown in FIG. 6 includes, for example, “file name” and “detection date / time” as common items, and “person position coordinates (X, Y, Z)” as information extracted from the face area, "Height information", "various color information", "registrant information", "similar celebrity information", "age information", "sex information", "face orientation (PAN, TILT)", "mask" “Sunglasses”, “acquired face image information”, “personal position coordinates (X, Y, Z)”, “height information”, “various color information”, and the like.

「ファイル名」には、現在処理をしている映像ファイルのファイル名が格納される。また、「検出日時」には、人物が検出された日時が格納される。なお、「検出日時」としては、映像ファイル名が実際の録画開始時刻と対応しているため、撮影開始時刻と映像内での時刻を加算し、実際の時刻が格納される。 The “file name” stores the file name of the video file currently being processed. In addition, “date and time of detection” stores the date and time when a person was detected. As “detection date and time”, since the video file name corresponds to the actual recording start time, the actual time is stored by adding the shooting start time and the time in the video.

また、顔領域から抽出される「人物位置座標（Ｘ，Ｙ，Ｚ）」には、カメラ位置のＸＹ座標と地面を原点とした実空間上での人物足元のＸＹＺ座標が格納される。つまり、「人物位置座標（Ｘ，Ｙ，Ｚ）」には、顔領域から算出した人物の実空間上での位置座標が格納される。また、「身長情報」には、顔領域から算出した身長が格納される。また、「各種色情報」には、顔領域から算出した頭部・上半身の総画素数・色上位１０色・頻度が格納される。 Further, the “person position coordinates (X, Y, Z)” extracted from the face area stores the XY coordinates of the camera position and the XYZ coordinates of the person's feet in the real space with the ground as the origin. That is, the position coordinates in the real space of the person calculated from the face area are stored in “person position coordinates (X, Y, Z)”. Also, the height calculated from the face area is stored in the “height information”. The “various color information” stores the total number of pixels of the head and upper body, the top 10 colors, and the frequency calculated from the face area.

なお、頭部の色情報としては、顔領域から算出した頭部についての頻度１位から１０位までのＲＧＢの数値と頻度が格納される。また、頭部の総画素数としては、顔領域から算出した頭部領域の総画素数が格納される。また、上半身の色情報としては、顔領域から算出した上半身についての頻度１位から１０位までのＲＧＢの数値と頻度が格納される。更に、上半身の総画素数としては、顔領域から算出した上半身領域の総画素数が格納される。 As the head color information, RGB numerical values and frequencies from the first to the tenth frequency for the head calculated from the face area are stored. Further, the total number of pixels of the head region calculated from the face region is stored as the total number of pixels of the head. Further, as the upper body color information, RGB numerical values and frequencies of the upper body calculated from the face area from the 1st frequency to the 10th frequency are stored. Further, the total number of pixels of the upper body area calculated from the face area is stored as the total number of pixels of the upper body.

また、「登録者情報」には、登録人物ＤＢと照合した結果、最も近い人物名と認証スコアが設定される。具体的には、登録人物ＤＢとの照合により、登録者として（認証閾値を上回った）認証された場合、該当登録者の名前が上位１０名分格納される。認証されなかった場合は、「未登録者」が格納される。また、認証結果スコアは、上位１０名分の認証スコアが０から１０００までの整数値で示され、照合の結果、「未登録者」であった場合は−１が格納される。 In “registrant information”, the closest person name and authentication score are set as a result of collation with the registered person DB. Specifically, when authentication is performed as a registrant (exceeding the authentication threshold) by collation with the registered person DB, the names of the top 10 registrants are stored. If not authenticated, “unregistered person” is stored. Further, the authentication result score is indicated by an integer value from 0 to 1000 for the top 10 authentication scores, and -1 is stored if the result of collation is “unregistered”.

また、「似ている芸能人情報」には、芸能人を登録したＤＢと照合した結果、最も近い人物名と認証スコアが格納される。具体的には、芸能人ＤＢとの照合により、認証スコアが最も高く、かつ、閾値よりも上の場合に、該当芸能人の名前が格納される。認証がされなかった場合、「似ていない」を格納する。また、認証結果スコアは、上述の同様である。 Further, in “similar celebrity information”, the closest person name and authentication score are stored as a result of collation with a DB in which celebrities are registered. Specifically, the name of the entertainer is stored when the authentication score is the highest and is above the threshold value by collation with the entertainer DB. If not authenticated, “similar” is stored. The authentication result score is the same as described above.

また、「年代情報」には、検出人物の年代・信頼度が格納される。また、「性別情報」には、男女何れかの性別と信頼度が格納される。また、「顔の向き」には、パン、チルト等のカメラワークに対するカメラを原点とした、顔向き角度が格納される。具体的には、カメラを原点とし、カメラから見た顔の角度が格納され、カメラから見て上方向と左方向が＋となる。
また、「マスク」には、マスク等で口の周辺が隠れているかどうかが格納され、具体的には「あり」、「なし」、「不明」の何れかが格納される。また、「サングラス」には、サングラス等で目の周辺が隠れているかどうかが格納され、具体的には「あり」、「なし」、「不明」の何れかが格納される。 The “age information” stores the age and reliability of the detected person. The “sex information” stores the sex and reliability of either gender. Further, the “face orientation” stores a face orientation angle with respect to camera work such as pan and tilt as the origin. Specifically, the angle of the face viewed from the camera is stored with the camera as the origin, and the upward direction and the left direction viewed from the camera are +.
Further, “mask” stores whether or not the periphery of the mouth is hidden by a mask or the like, and specifically stores “Yes”, “No”, or “Unknown”. Further, “Sunglasses” stores whether or not the periphery of the eyes is hidden by sunglasses or the like, and specifically, “Yes”, “No”, or “Unknown” is stored.

また、「取得顔画像情報」には、検出された顔画像へのファイルパスと信頼度が格納され、具体的には、このフレームで検出された顔画像が蓄積されている場所へのファイルパスと、検出された顔領域の一辺の長さ、認証スコアが閾値以上で最も高い認証結果名、その信頼度等が格納される。 The “acquired face image information” stores the file path to the detected face image and the reliability, and specifically, the file path to the location where the face image detected in this frame is stored. And the length of one side of the detected face area, the authentication result name having the highest authentication score equal to or higher than the threshold, the reliability, and the like are stored.

また、人体領域から抽出される「人物位置座標（Ｘ，Ｙ，Ｚ）」には、カメラ位置のＸＹ座標と地面を原点とした実空間上での人物足元のＸＹＺ座標が格納され、人体領域から算出された、人物の実空間上での位置座標が格納される。 Further, in the “person position coordinates (X, Y, Z)” extracted from the human body region, the XY coordinates of the camera position and the XYZ coordinates of the human foot in the real space with the ground as the origin are stored. The position coordinates of the person in the real space calculated from the above are stored.

また、「身長情報」には、人体領域・顔領域から算出した身長が格納される。また、「各種色情報」には、人体領域から算出した頭部・上半身・下半身の総画素数・色上位１０色・頻度が格納され、具体的には頭部についての頻度１位から１０位までのＲＧＢの数値と頻度、頭部領域の総画素数、上半身についての頻度１位から１０位までのＲＧＢの数値と頻度、上半身領域の総画素数、下半身についての頻度１位から１０位までのＲＧＢの数値と頻度、及び下半身領域の総画素数のうち少なくとも１つが格納される。 Also, the height calculated from the human body area / face area is stored in the “height information”. Further, the “various color information” stores the total number of pixels of the head, upper body, and lower body, the top 10 colors, and the frequency calculated from the human body region. Specifically, the frequency for the head is ranked from 1st to 10th. RGB values and frequencies up to, the total number of pixels in the head region, the RGB numbers and frequencies from the first to the 10th frequency for the upper body, the total number of pixels in the upper body region, the frequencies from the first to the 10th in the lower body At least one of the RGB numerical value and frequency, and the total number of pixels in the lower body region are stored.

本実施形態では、上述したような情報を各種処理の結果として、フレーム単位で出力される。なお、人物領域、顔領域共に検出されない場合は何も出力されない。また、人物領域、顔領域の検出状態によっては、出力されない情報もある。その場合には、「不明」を示す値が出力される。 In the present embodiment, the above-described information is output in units of frames as a result of various processes. If neither a person area nor a face area is detected, nothing is output. Some information may not be output depending on the detection state of the person area and the face area. In that case, a value indicating “unknown” is output.

また、本実施形態では、上述した情報がフレーム毎に生成され、フレーム情報を出力された各データにはタグを付与する。これにより、各データが何の情報に対応するかを明確にすることができる。また、１フレーム内に複数の人物が検出された場合には、人物毎に別々のトラッキングＩＤ（識別情報）が割り振られ、トラッキングＩＤ毎に同日付、同時刻を付与し、複数行のフレーム情報が出力される。 In the present embodiment, the above-described information is generated for each frame, and a tag is assigned to each data for which the frame information is output. This makes it possible to clarify what information each data corresponds to. In addition, when a plurality of persons are detected in one frame, different tracking IDs (identification information) are allocated for each person, and the same date and time are assigned to each tracking ID. Is output.

＜ハードウェア構成例＞
ここで、上述した人物情報抽出装置１０は、上述した機能を有する専用の装置構成により制御を行うこともできるが、各機能をコンピュータに実行させることができる実行プログラム（人物情報抽出プログラム）を生成し、例えば、汎用のパーソナルコンピュータ、サーバ等にその実行プログラムをインストールすることにより、本発明における人物情報抽出処理を実現することができる。 <Hardware configuration example>
Here, the person information extraction apparatus 10 described above can be controlled by a dedicated apparatus configuration having the functions described above, but generates an execution program (person information extraction program) that can cause a computer to execute each function. For example, by installing the execution program in a general-purpose personal computer, server, or the like, the personal information extraction processing in the present invention can be realized.

本実施形態における人物情報抽出処理が実現可能なコンピュータのハードウェア構成例について図を用いて説明する。図７は、本実施形態における人物情報抽出処理が実現可能なハードウェア構成の一例を示す図である。 A hardware configuration example of a computer capable of realizing the person information extraction process in the present embodiment will be described with reference to the drawings. FIG. 7 is a diagram illustrating an example of a hardware configuration capable of realizing the person information extraction process according to the present embodiment.

図７におけるコンピュータ本体には、入力装置６１と、出力装置６２と、ドライブ装置６３と、補助記憶装置６４と、メモリ装置６５と、各種制御を行うＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６６と、ネットワーク接続装置６７とを有するよう構成されており、これらはシステムバスＢで相互に接続されている。 7 includes an input device 61, an output device 62, a drive device 63, an auxiliary storage device 64, a memory device 65, a CPU (Central Processing Unit) 66 for performing various controls, and a network connection device. 67, which are connected to each other by a system bus B.

入力装置６１は、使用者等が操作するキーボード及びマウス等のポインティングデバイスを有しており、使用者等からのプログラムの実行等、各種操作信号を入力する。 The input device 61 has a pointing device such as a keyboard and a mouse operated by a user or the like, and inputs various operation signals such as execution of a program from the user or the like.

出力装置６２は、本発明における処理を行うためのコンピュータ本体を操作するのに必要な各種ウィンドウやデータ等を表示するモニタを有し、ＣＰＵ６６が有する制御プログラムによりプログラムの実行経過や結果等を表示することができる。 The output device 62 has a monitor for displaying various windows and data necessary for operating the computer main body for performing the processing in the present invention, and displays the program execution progress, results, etc. by the control program of the CPU 66. can do.

なお、入力装置６１と出力装置６２とは、例えばタッチパネル等のように一体型の入出力手段であってもよく、この場合には使用者等の指やペン型の入力装置等を用いて所定の位置をタッチして入力を行うことができる。 The input device 61 and the output device 62 may be integrated input / output means such as a touch panel. In this case, the input device 61 and the output device 62 are predetermined using a finger of a user, a pen-type input device, or the like. The position can be touched to input.

ここで、本発明においてコンピュータ本体にインストールされる実行プログラムは、例えば、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）メモリやＣＤ−ＲＯＭ等の可搬型の記録媒体６８等により提供される。プログラムを記録した記録媒体６８は、ドライブ装置６３にセット可能であり、記録媒体６８に含まれる実行プログラムが、記録媒体６８からドライブ装置６３を介して補助記憶装置６４にインストールされる。 Here, the execution program installed in the computer main body in the present invention is provided by, for example, a portable recording medium 68 such as a USB (Universal Serial Bus) memory or a CD-ROM. The recording medium 68 on which the program is recorded can be set in the drive device 63, and the execution program included in the recording medium 68 is installed from the recording medium 68 to the auxiliary storage device 64 via the drive device 63.

補助記憶装置６４は、ハードディスク等のストレージ手段であり、本発明における実行プログラムや、コンピュータに設けられた制御プログラム等を蓄積し必要に応じて入出力を行うことができる。 The auxiliary storage device 64 is a storage means such as a hard disk, and can store an execution program according to the present invention, a control program provided in a computer, and the like, and can perform input / output as necessary.

メモリ装置６５は、ＣＰＵ６６により補助記憶装置６４から読み出された実行プログラム等を格納する。なお、メモリ装置６５は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等からなる。 The memory device 65 stores an execution program read from the auxiliary storage device 64 by the CPU 66. The memory device 65 includes a ROM (Read Only Memory), a RAM (Random Access Memory), and the like.

ＣＰＵ６６は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）等の制御プログラム、及びメモリ装置６５により読み出され格納されている実行プログラムに基づいて、各種演算や各ハードウェア構成部とのデータの入出力等、コンピュータ全体の処理を制御して各処理を実現することができる。プログラムの実行中に必要な各種情報等は、補助記憶装置６４から取得することができ、また実行結果等を格納することもできる。 The CPU 66 performs various operations and data input / output with each hardware component based on a control program such as an OS (Operating System) and an execution program read and stored by the memory device 65. Each process can be realized by controlling the process. Various information necessary during the execution of the program can be acquired from the auxiliary storage device 64, and the execution result and the like can also be stored.

ネットワーク接続装置６７は、通信ネットワーク等と接続することにより、実行プログラムを通信ネットワークに接続されている他の端末等から取得したり、プログラムを実行することで得られた実行結果、又は本発明における実行プログラム自体を他の端末等に提供することができる。上述したようなハードウェア構成により、本発明における人物情報抽出処理を実行することができる。また、プログラムをインストールすることにより、汎用のパーソナルコンピュータ等で本発明における人物情報抽出処理を容易に実現することができる。次に、人物情報抽出処理の具体的な内容について説明する。 The network connection device 67 is connected to a communication network or the like to acquire an execution program from another terminal connected to the communication network or the execution result obtained by executing the program, or in the present invention. The execution program itself can be provided to other terminals. With the above-described hardware configuration, the person information extraction process according to the present invention can be executed. Also, by installing the program, the personal information extraction process of the present invention can be easily realized by a general-purpose personal computer or the like. Next, specific contents of the person information extraction process will be described.

＜人物情報抽出処理＞
次に、本発明における実行プログラム（人物情報抽出プログラム）で実行される人物情報抽出処理手順についてフローチャートを用いて説明する。図８は、本実施形態における人物情報抽出処理手順の一例を示すフローチャートである。 <Person information extraction process>
Next, a person information extraction processing procedure executed by the execution program (person information extraction program) according to the present invention will be described with reference to a flowchart. FIG. 8 is a flowchart showing an example of a person information extraction processing procedure in the present embodiment.

図８において、まずカメラ等の撮像手段により撮影された映像を入力する（Ｓ０１）。次に、その映像に含まれる１又は複数の人物が検出されたか否かを判断し（Ｓ０２）、人物が検出された場合（Ｓ０２において、ＹＥＳ）、上述した顔領域検出処理（Ｓ０３）及び人体領域検出処理（Ｓ０４）を行う。なお、Ｓ０２の処理は、例えば時系列画像フレーム間の比較において、色情報が変化している領域が所定の大きさ以上あるか否か等により判断することができ、また、Ｓ０３の処理と併せて、顔領域の検出ができたか否かによって判断することができる。 In FIG. 8, first, an image taken by an imaging means such as a camera is input (S01). Next, it is determined whether or not one or more persons included in the video are detected (S02). When a person is detected (YES in S02), the face area detection process (S03) and the human body described above are performed. An area detection process (S04) is performed. Note that the process of S02 can be determined based on, for example, whether or not the area where the color information is changed is greater than or equal to a predetermined size in the comparison between time-series image frames, and is combined with the process of S03. Thus, the determination can be made based on whether or not the face area has been detected.

また、Ｓ０３及びＳ０４の処理が終了後、人物情報統合処理を行い（Ｓ０５）、その統合情報からフレーム情報を生成し（Ｓ０６）、生成した情報を蓄積する（Ｓ０７）。また、Ｓ０７の処理が終了後、又はＳ０２の処理において、人物が検出されなかった場合（Ｓ０２において、ＮＯ）、次に、映像が終了か否かを判断し（Ｓ０８）、映像が終了でない場合（Ｓ０８において、ＮＯ）、Ｓ０２の処理を戻り映像を継続して再生して後続の処理を行う。 Further, after the processes of S03 and S04 are completed, a person information integration process is performed (S05), frame information is generated from the integrated information (S06), and the generated information is stored (S07). Further, after the process of S07 is completed or when no person is detected in the process of S02 (NO in S02), it is next determined whether or not the video is finished (S08), and the video is not finished. (NO in S08), the process of S02 is returned, the video is continuously reproduced, and the subsequent process is performed.

上述したように、本発明によれば、撮影された画像に含まれる人物に関する多種の情報を効率的に取得することができる。 As described above, according to the present invention, it is possible to efficiently acquire various types of information related to a person included in a captured image.

具体的には、例えば、監視カメラ等の映像に映る人物の特徴を抽出することができるようになり、犯罪発生時の犯人の身長や着ている服の色等を推定し、ユーザや警察への迅速な情報提供を行うことができる。また、蓄積された映像中の人物を検索する際、身長や色情報をキーとした検索が可能となる。また、推定された情報と、その他の情報（年齢、性別等）を組み合わせることで、特定の人物を検出することが可能となる。更に、頭髪の色や服の色などの統計情報を取ることで、マーケティングなどへの応用が可能となる。 Specifically, for example, it becomes possible to extract the characteristics of a person appearing in the video of a surveillance camera, etc., and estimate the height of the criminal at the time of the crime, the color of clothes worn, etc. It is possible to provide prompt information. In addition, when searching for a person in the stored video, it is possible to search using the height and color information as keys. Moreover, it becomes possible to detect a specific person by combining the estimated information and other information (age, gender, etc.). Furthermore, by taking statistical information such as hair color and clothing color, it can be applied to marketing and the like.

以上本発明の好ましい実施例について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed.

１０人物情報抽出装置
１１入力手段
１２出力手段
１３蓄積手段
１４顔領域検出手段
１５人体領域検出手段
１６人物情報統合手段
１７フレーム情報生成手段
１８送受信手段
１９制御手段
１９人物情報抽出装置
２０撮像手段
２１トラッキング手段
２２顔認証手段
２３性別・年代推定手段
２４顔隠し判定手段
２５身長推定手段
２６色情報抽出手段
２７人物位置推定手段
３０人物
３１顔領域
３２人体領域
４１頭部領域
４２上半身領域
５０画面
６１入力装置
６２出力装置
６３ドライブ装置
６４補助記憶装置
６５メモリ装置
６６ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）
６７ネットワーク接続装置
６８記録媒体 DESCRIPTION OF SYMBOLS 10 Person information extraction apparatus 11 Input means 12 Output means 13 Accumulation means 14 Face area detection means 15 Human body area detection means 16 Person information integration means 17 Frame information generation means 18 Transmission / reception means 19 Control means 19 Person information extraction apparatus 20 Imaging means 21 Tracking Means 22 Face authentication means 23 Gender / age estimation means 24 Face hiding judgment means 25 Height estimation means 26 Color information extraction means 27 Human position estimation means 30 Person 31 Face area 32 Human body area 41 Head area 42 Upper body area 50 Screen 61 Input device 62 Output Device 63 Drive Device 64 Auxiliary Storage Device 65 Memory Device 66 CPU (Central Processing Unit)
67 Network connection device 68 Recording medium

Claims

In a person information extraction device that extracts information about a person included in a video imaged by an imaging means,
A face area detecting means for detecting a face area for one or a plurality of persons from a predetermined image among time-series images included in the video;
Human body region detecting means for detecting a human body region using the face region obtained by the face region detecting unit;
Human information integration means for integrating the detection result obtained by the face area detection means and the detection result obtained by the human body area detection means;
The human body region detection means includes a height estimation means for estimating the height of the person based on an installation position of the imaging means, an angle of view, and a horizontal distance between the imaging means and the person based on the face area. A person information extracting device characterized by comprising:

The human body region detecting means includes
The person information extracting apparatus according to claim 1, further comprising color information extracting means for extracting color information relating to the person included in the human body area on the basis of the size of the face area.

The color information extracting means includes
3. The person information extracting apparatus according to claim 2, wherein at least one color information or image data is extracted from the head region, the skin region, the upper body region, and the lower body region on the basis of the size of the face region. .

4. The person according to claim 1, further comprising a frame information generation unit configured to generate person information for each integrated person for each image frame in the video. Information extraction device.

In a person information extraction method for extracting information about a person included in a video photographed by an imaging means,
A face area detection procedure for detecting a face area for one or a plurality of persons from a predetermined image among time-series images included in the video;
A human body region detection procedure for detecting a human body region using the face region obtained by the face region detection procedure;
A human information integration procedure for integrating the detection result obtained by the face region detection procedure and the detection result obtained by the human body region detection procedure;
The human body region detection procedure includes a height estimation procedure for estimating a height of the person based on an installation position, an angle of view of the imaging unit, and a horizontal distance between the imaging unit and the person with reference to the face region. A person information extraction method characterized by comprising:

The human body region detection procedure includes:
6. The person information extraction method according to claim 5, further comprising a color information extraction procedure for extracting color information related to the person included in the human body area on the basis of the size of the face area.

The color information extraction procedure includes:
7. The person information extraction method according to claim 6, wherein at least one color information or image data is extracted from the head region, the skin region, the upper body region, and the lower body region with reference to the size of the face region. .

8. The person according to claim 5, further comprising: frame information generation means for generating person information for each person integrated in the person information integration procedure for each image frame in the video. Information extraction method.

In a person information extraction program for extracting information about a person included in a video shot by an imaging means,
Computer
A face area detecting means for detecting a face area for one or a plurality of persons from a predetermined image among time-series images included in the video;
Human body region detecting means for detecting a human body region using the face region obtained by the face region detecting unit; and
Function as human information integration means for integrating the detection result obtained by the face area detection means and the detection result obtained by the human body area detection means;
The human body region detection means includes a height estimation means for estimating the height of the person based on an installation position of the imaging means, an angle of view, and a horizontal distance between the imaging means and the person based on the face area. A personal information extraction program characterized by comprising:

The human body region detecting means includes
The person information extraction program according to claim 9, further comprising color information extraction means for extracting color information about the person included in the human body area on the basis of the size of the face area.

The color information extracting means includes
The person information extraction program according to claim 10, wherein at least one color information or image data is extracted from a head region, a skin region, an upper body region, and a lower body region on the basis of the size of the face region. .

12. The person according to claim 9, further comprising a frame information generation unit configured to generate the person information for each person integrated for each image frame in the video. Information extraction program.