JP2001167110A

JP2001167110A - Picture retrieving method and its device

Info

Publication number: JP2001167110A
Application number: JP34952699A
Authority: JP
Inventors: Ryuta Ito; 隆太伊藤; Shin Yamada; 伸山田; Masayoshi Soma; 正宜相馬; Kenji Nagao; 健司長尾
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-12-08
Filing date: 1999-12-08
Publication date: 2001-06-22

Abstract

PROBLEM TO BE SOLVED: To distinguish and display the faces of persons appearing in video by detecting the faces in particular from the video and identifying the detected face in addition. SOLUTION: The device provided with a means for detecting a face from the video and a means for identifying the detected face detects a frame including the face from the video, extracts a face picture from the frame, and groups the faces of the same appearing person from all the extracted face pictures to extract the representative face picture of each appearing person to identify the face of the appearing person in the video. Thus, the face of the person appearing in the video can be distinguished and displayed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、映像編集装置や画
像検索装置において、動画像中における特定物が写って
いる画像を探索する方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and an apparatus for searching for an image in a moving image in which a specific object is shown in a video editing apparatus or an image search apparatus.

【０００２】[0002]

【従来の技術】従来の画像検索装置として、特開平6-22
3179号公報に記載されたものが知られている。上記出願
は、カラー動画像から特定物（検索対象）が写っている
フレームを検出する画像検索装置である。図１はその構
成を示すブロック図である。図１において、１は特定対
象物を判定する中央処理装置としてのコンピュータ、２
はコンピュータ１の出力画面を表示するＣＲＴ等のディ
スプレイ、３は光ディスク等の動画像再生装置、４はア
ナログ信号をディジタル信号に変換するＡ／Ｄ変換器、
５は動画像再生装置３とコンピュータ１間の制御信号を
接続する制御線、６はハードディスクなどからなる外部
記憶装置、７はマウスなどの入力装置、８ａ〜８ｅは、
コンピュータ１と周辺装置との接続を行うインタフェー
ス、９はコンピュータにおける演算処理を行なうＣＰ
Ｕ、１０はＣＰＵ９から直接アクセスするメモリであ
る。2. Description of the Related Art As a conventional image retrieval apparatus, Japanese Patent Application Laid-Open No.
What is described in 3179 gazette is known. The above application is an image search device that detects a frame in which a specific object (a search target) appears from a color moving image. FIG. 1 is a block diagram showing the configuration. In FIG. 1, reference numeral 1 denotes a computer as a central processing unit for determining a specific target object;
Is a display such as a CRT for displaying an output screen of the computer 1, 3 is a moving image reproducing device such as an optical disk, 4 is an A / D converter for converting an analog signal into a digital signal,
Reference numeral 5 denotes a control line for connecting control signals between the moving picture reproducing device 3 and the computer 1, 6 denotes an external storage device such as a hard disk, 7 denotes an input device such as a mouse, and 8a to 8e
An interface for connecting the computer 1 to peripheral devices; and 9, a CP for performing arithmetic processing in the computer
U and 10 are memories directly accessed by the CPU 9.

【０００３】図１２は、上記従来の画像検索装置の動作
を示すフローチャートである。以下、図１２のフローチ
ャートに従って動作を説明する。まず、検索者が検索し
たい画像を選択し、装置に入力装置７などから入力する
と、対象物を含む画像が１枚指定される（ステップ２０
０１）。この装置は、入力された画像について、類似し
た色の領域に分割する（ステップ２００２）。分割され
た各部分領域について色のヒストグラムを生成し（ステ
ップ２００４）、度数の高い色を順にＮ個選択し、表１
のリストＣＧ（ｒ）を作成する（ステップ２００５）。
ただし、ＣＧ（ｒ）は、ｒ番目の部分領域のリストを示
す。さらに各部分領域について、リストＣＧ（ｒ）と、
当該部分領域と画像中の位置関係で隣接する部分領域の
リストＣＧ（ｒ）との対応関係を表２に示すリストＲ
ＣＧＰを作成する（ステップ２００７）。FIG. 12 is a flowchart showing the operation of the above-described conventional image search apparatus. Hereinafter, the operation will be described with reference to the flowchart of FIG. First, when a searcher selects an image to be searched and inputs the image to the apparatus from the input device 7 or the like, one image including the target is designated (step 20).
01). The apparatus divides an input image into regions of similar colors (step 2002). A color histogram is generated for each of the divided partial regions (step 2004), and N colors having a higher frequency are selected in order, and the color histogram is determined as shown in Table 1.
Is created (step 2005).
Here, CG (r) indicates a list of the r-th partial area. Further, for each partial area, a list CG (r),
A list R shown in Table 2 shows a correspondence relationship between the partial region and a list CG (r) of adjacent partial regions in a positional relationship in the image.
A CGP is created (Step 2007).

【０００４】次に探索の動作について説明する。図１３
は、探索動作を示すフローチャートである。まず、判定
対象のフレームを複数のセルＣ（ｘ、ｙ）分割し（２０
１）、各セルについて色ヒストグラムを生成し（ステッ
プ２０３）、ヒストグラムの度数が設定閾値より大きい
色をリストＣＣ（ｘ、ｙ）に登録する（ステップ２０
５、２０６）。次に、各セルＣ（ｘ，ｙ）について（ス
テップ２０８）、ＲＣＧＰでの色対を構成する色群ＣＧ
うちの一方の色群に属するいずれかの色が、リストＣＣ
（ｘ、ｙ）に含まれており（ステップ２０９）、他方の
色群に属するいずれかの色が、セルＣ（ｘ、ｙ）自身、
あるいはその隣接する８セルのうちのいずれかのセルの
リストＣＣに含まれている場合は、セルＣ（ｘ、ｙ）を
有効セルとして抽出し（ステップ２１０）、１フレーム
における有効セルの総数を求める（ステップ２１１）。
１フレームにおける有効セルの数が閾値以上であれば
（ステップ２１２）、その色群対ＣＧを有効色群対とし
てカウントする（ステップ２１３）。そして、カウン
トした有効色群対の総数が設定閾値以上ならば、対象物
（検索対象）フレーム内に存在すると判定する（ステ
ップ２１５）。閾値以下ならば、対象物が存在しないも
のとして、判断する（ステップ２１６）というものであ
る。Next, a search operation will be described. FIG.
Is a flowchart showing a search operation. First, the frame to be determined is divided into a plurality of cells C (x, y) (20
1) A color histogram is generated for each cell (step 203), and a color having a histogram frequency greater than a set threshold is registered in the list CC (x, y) (step 20).
5, 206). Next, for each cell C (x, y) (step 208), a color group CG forming a color pair in RCGP
One of the colors belonging to one of the color groups is included in the list CC.
(X, y) (step 209), and one of the colors belonging to the other color group is the cell C (x, y) itself,
Alternatively, if the cell is included in the list CC of any one of the eight adjacent cells, the cell C (x, y) is extracted as a valid cell (step 210), and the total number of valid cells in one frame is calculated. It is determined (step 211).
If the number of valid cells in one frame is greater than or equal to the threshold
(Step 212), the color group pair CG is counted as an effective color group pair (Step 213). If the total number of counted effective color group pairs is equal to or larger than the set threshold value, it is determined that the valid color group pair exists in the object (search target) frame (step 215). If the value is equal to or smaller than the threshold value, it is determined that the target object does not exist (step 216).

【０００５】この画像検索装置においては、映像編集者
または画像検索者の負担を軽くし、効率よく検索対象の
画像を探し出す技術が要求されている。しかるに、上記
従来の技術では、検索したい画像の色群化と色群間での
位置関係の情報に基いて検索するので、例えば制服等、
服装が同一の場合、実際は検索したい人物と異なる人物
あるのにかかわらず、検索対象に人物であると判定しま
い、誤検出の増加を招いてしまうという課題を有してい
る。In this image search apparatus, there is a demand for a technique for reducing the burden on a video editor or an image searcher and efficiently searching for an image to be searched. However, in the above-described conventional technique, since the search is performed based on the information on the color grouping of the image to be searched and the positional relationship between the color groups, for example,
When the clothes are the same, there is a problem that the search target is determined to be a person regardless of whether there is actually a different person from the person to be searched, and this leads to an increase in erroneous detection.

【０００６】本発明は、上記従来の問題点に鑑みてなさ
れたもので、映像から顔に特化した検出を行い、さらに
検出した顔の識別を行うことで、上記課題を解決するも
のである。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned conventional problems, and solves the above-mentioned problems by performing a detection specialized for a face from an image and identifying the detected face. .

【０００７】[0007]

【課題を解決するための手段】この課題を解決するため
に、本発明は、映像から顔を検出する手段と、検出した
顔を識別する手段とを設けることにより、誤検出が少な
く、同じ服装であっても異なる人物であると識別し得る
ようにしたことを要旨とするものである。SUMMARY OF THE INVENTION In order to solve this problem, the present invention provides means for detecting a face from an image and means for identifying the detected face so that false detection is reduced and the same clothes are provided. However, the gist of the present invention is to be able to identify even a different person.

【０００８】このような態様を有する発明として、本発
明請求項１に記載の発明は、画像検索方法として、映像
中から顔が写っているフレームを検出し、前記フレーム
から顔画像を抽出し、抽出した全ての顔画像から同一登
場人物の顔をグループ化し、登場人物別にその代表顔画
像を抽出するようにしたものであり、映像中の登場人物
の顔を識別することを特徴とする画像検索装置としたも
のであり、映像中に登場する人物の顔を区別して表示す
ることができるという作用を有する。[0008] As an invention having such an aspect, the invention according to claim 1 of the present invention, as an image search method, detects a frame including a face from a video, extracts a face image from the frame, Image retrieval characterized by grouping faces of the same character from all the extracted face images and extracting a representative face image for each character, and identifying the faces of the characters in the video This is an apparatus, and has an effect that a face of a person appearing in a video can be distinguished and displayed.

【０００９】本発明の請求項２に記載の発明は、請求項
１記載の画像検索方法において、映像から顔が写ってい
るフレームの検出は、映像からシーンの切れ目を検出
し、各シーンの代表画像として顔が含まれているフレー
ムを検出するようにしたものであり、各シーンの代表画
像として顔が写っている画像を選択することができると
いう作用を有する。According to a second aspect of the present invention, in the image retrieving method according to the first aspect, the detection of the frame in which the face is captured from the video is performed by detecting a break in the scene from the video. The present embodiment detects a frame including a face as an image, and has an effect that an image in which a face appears as a representative image of each scene can be selected.

【００１０】本発明の請求項３に記載の発明は、請求項
１または２記載の画像検索方法において、映像から顔が
写っているフレームの検出は、写っている顔の数（人
数）、顔のサイズ、顔の向き、性別、顔の表情、（年齢
推定、顔の人種判定、）眼鏡の有無、またはひげの有無
のうちの少なくとも１つ以上を所定の条件として検出す
るようにしたものであり、所定の条件を満たす顔画像を
抽出することにより検索者の指定した条件に一致する顔
画像が含まれた画像を抽出できるという作用を有する。According to a third aspect of the present invention, in the image retrieval method according to the first or second aspect, the detection of the frame in which the face is captured from the video is performed by detecting the number of faces (number of people) and the number of faces. At least one of size, face orientation, gender, facial expression, (age estimation, face race determination), presence or absence of glasses, or presence or absence of a beard is detected as a predetermined condition. By extracting a face image that satisfies a predetermined condition, an image including a face image that matches a condition specified by a searcher can be extracted.

【００１１】本発明の請求項４に記載の発明は、画像検
索方法として、映像中の登場人物の顔と検索者が指定し
た顔画像との類似度を計算し、所定の類似度以上の顔が
写ったフレームを検出するようにしたものであり、検索
者が指定した顔が映像中のどこに記録されているかを知
ることができるという作用を有する。According to a fourth aspect of the present invention, as an image search method, a similarity between a face of a character in a video and a face image designated by a searcher is calculated, and a face having a predetermined similarity or more is calculated. Is detected so that it is possible to know where in the video the face designated by the searcher is recorded.

【００１２】本発明の請求項５に記載の発明は、請求項
４記載の画像検索方法において、映像からシーンの切れ
目を検出し、各シーンの代表画像として顔が含まれてい
るフレームを選択することを特徴とする画像検索装置と
したものであり、各シーンの代表画像として顔が写って
いる画像を選択することができるという作用を有する。According to a fifth aspect of the present invention, in the image retrieval method according to the fourth aspect, a scene break is detected from a video, and a frame including a face is selected as a representative image of each scene. The image retrieval apparatus is characterized in that an image having a face can be selected as a representative image of each scene.

【００１３】本発明の請求項６に記載の発明は、画像検
索方法として、映像から人物が写っているフレームを検
出し、所定の条件を満たす人物の姿勢または服装の写っ
たフレームを抽出するようにしたものであり、特定の人
物の属性がふくまれた画像を抽出することができるとい
う作用を有する。According to a sixth aspect of the present invention, as an image search method, a frame in which a person is captured from a video is detected, and a frame in which the posture or clothes of the person satisfying a predetermined condition is captured is extracted. This has the effect that an image containing the attributes of a specific person can be extracted.

【００１４】本発明の請求項７に記載の発明は、請求項
６記載の画像検索方法において、映像から人物が写って
いるフレームの検出は、映像からシーンの切れ目を検出
し、各シーンの代表画像として顔が含まれているフレー
ムを検出するようにしたものであり、各シーンの代表画
像として顔が写っている画像を選択することができると
いう作用を有する。According to a seventh aspect of the present invention, in the image search method according to the sixth aspect, the detection of a frame in which a person is present in a video is performed by detecting a break in a scene from the video and representing a representative of each scene. The present embodiment detects a frame including a face as an image, and has an effect that an image in which a face appears as a representative image of each scene can be selected.

【００１５】本発明の請求項８に記載の発明は、請求項
４記載の画像検索方法において、検索者が指定する顔画
像は、登場人物リストにより予め登録した顔データベー
スから少なくとも１つ以上の顔画像を指定するようにし
たものであり、人物の特定が容易に行なえるという作用
を有する。According to an eighth aspect of the present invention, in the image search method according to the fourth aspect, the face image specified by the searcher is at least one or more face images from a face database registered in advance by a character list. An image is designated, and has an effect that a person can be easily specified.

【００１６】本発明の請求項９に記載の発明は、請求項
８記載の画像検索方法において、登場人物リストは、予
め検索者が作成するか、または番組表から生成するよう
にしたものであり、予め作成することにより検索者によ
る顔画像の指定が簡単にできるとともに、番組表から生
成することにより登場人物リストが簡単且つ容易に生成
できるという作用を有する。According to a ninth aspect of the present invention, in the image retrieval method according to the eighth aspect, the character list is prepared in advance by a searcher or generated from a program table. By creating in advance, the face image can be easily specified by the searcher, and the character list can be easily and easily generated by generating from the program table.

【００１７】本発明の請求項１０に記載の発明は、請求
項４記載の画像検索方法において、検索者が指定する顔
画像は、予め映像中の登場人物の顔画像を指定登録する
ようにしたものであり、顔画像を予め指定登録すること
により検索者による顔画像の指定が簡単にできるという
作用を有する。According to a tenth aspect of the present invention, in the image retrieving method according to the fourth aspect, a face image of a character in a video is designated and registered in advance as a face image specified by a searcher. This has the effect that the face image can be easily designated by the searcher by designating and registering the face image in advance.

【００１８】本発明の請求項１１に記載の発明は、画像
検索装置として、映像中から顔が写っているフレームを
検出し、前記フレームから少なくとも１つ以上の顔画像
を抽出する顔検出部と、前記抽出した全ての顔画像から
同一登場人物の顔をグループ化し、登場人物別にその代
表顔画像を抽出する登場人物識別部とを備えたものであ
り、映像中に登場する人物の顔を区別して表示すること
ができるという作用を有する。According to an eleventh aspect of the present invention, there is provided an image search device, comprising: a face detection unit for detecting a frame in which a face is present in a video and extracting at least one face image from the frame; A character identification unit that groups the faces of the same character from all the extracted face images and extracts a representative face image for each character, and identifies the faces of the characters appearing in the video. It has the effect that it can be displayed separately.

【００１９】本発明の請求項１２に記載の発明は、画像
検索装置として、映像信号を入力としてフレーム画像を
時系列に監視することによりシーンの切り変わりを検出
するシーンチェンジ検出部と、前記シーンチェンジ検出
部から各シーンの代表画像から顔が写っているフレーム
を検出し、前記フレームから少なくとも１つ以上の顔画
像を抽出する顔検出部と、前記抽出した全ての顔画像か
ら同一登場人物の顔をグループ化し、登場人物別にその
代表顔画像を抽出する登場人物識別部とを備えたもので
あり、各シーン毎に顔の写っているフレーム代表画像と
してディスプレイに表示したり、記憶したりできるとい
う作用を有する。According to a twelfth aspect of the present invention, as the image search device, a scene change detecting unit for detecting a scene change by monitoring a frame image in a time series with a video signal as an input; A face detection unit that detects a frame including a face from a representative image of each scene from a change detection unit, and extracts at least one or more face images from the frame; and a face detection unit that extracts the same character from all the extracted face images. It is equipped with a character identification unit that groups faces and extracts a representative face image for each character, and can be displayed on a display or stored as a frame representative image showing the face for each scene It has the action of:

【００２０】本発明の請求項１３に記載の発明は、画像
検索装置として、検索者が登場人物リストにより予め登
録した顔データベースから少なくとも１つ以上の顔画像
を指定する登場人物指定部と、映像信号を入力として受
けてフレーム単位で出力する映像入力部と、映像入力部
から出力されたフレーム画像から顔領域を検出する顔検
出部と、前記登場人物指定部から出力された顔画像と前
記顔検出部から顔領域で検出された顔画像とから登場人
物に該当するか否かを判定する登場人物識別部と、前記
登場識別部から判定された識別結果を表示・記録する画
像出力部とを備えたものであり、ある映像番組に登場人
物を登場人物別にディスプレイに表示したり記憶したり
することができるという作用を有する。According to a thirteenth aspect of the present invention, as an image search device, a character designating section for designating at least one or more face images from a face database registered in advance by a searcher with a character list; A video input unit that receives a signal as an input and outputs the frame unit, a face detection unit that detects a face area from a frame image output from the video input unit, a face image output from the character designating unit, and the face A character identification unit that determines whether or not the image corresponds to a character from the face image detected in the face area from the detection unit, and an image output unit that displays and records the identification result determined by the appearance identification unit. This has the effect that characters in a certain video program can be displayed on a display or stored for each character.

【００２１】本発明の請求項１４に記載の発明は、画像
検索装置として、検索対象の画像を読み込む顔画像読み
込み部と、映像信号を映像信号を入力として受けてフレ
ーム単位で出力する映像入力部と、映像入力部から出力
されたフレーム画像から顔領域を検出する顔検出部と、
前記検索対象画像と前記顔検出部から出力された顔領域
で検出した顔画像が検索対象あるか否かを判定する照合
部と、前記照合結果を表示・記憶する画像出力部とを備
えたものであり、検索者が指定した顔画像に一致ましく
は類似した顔画像を含むフレームをディスプレイに表示
したり記憶したりすることができるという作用を有す
る。According to a fourteenth aspect of the present invention, as an image search device, a face image reading unit that reads an image to be searched, and a video input unit that receives a video signal as an input and outputs the frame signal in frame units A face detection unit that detects a face area from a frame image output from the video input unit;
A matching unit that determines whether the search target image and the face image detected in the face area output from the face detection unit are a search target, and an image output unit that displays and stores the matching result This has the effect that a frame including a face image matching or similar to the face image specified by the searcher can be displayed on the display or stored.

【００２２】[0022]

【発明の実施の形態】（実施の形態１）以下、本発明の
実施の形態について、図１から図１１を用いて説明す
る。図１乃至図３は本発明の第１の実施の形態に係る画
像検索装置を説明する図である。本実施の形態では、顔
の検出に関し説明したものである。具体的には、読み込
んだ画像に顔が存在するか否かを判定し、顔領域が存在
する場合は、画像中の顔の位置とその数、顔のサイズ、
顔の向き（正面、上向き、下向き、左向き、右向き）の
判別し、その結果をモニタ等に表示するものである。(Embodiment 1) Hereinafter, an embodiment of the present invention will be described with reference to FIGS. FIG. 1 to FIG. 3 are diagrams illustrating an image search device according to a first embodiment of the present invention. In the present embodiment, detection of a face has been described. Specifically, it is determined whether or not a face exists in the read image. If a face area exists, the position and number of faces in the image, the face size,
The direction of the face (front, upward, downward, left, right) is determined, and the result is displayed on a monitor or the like.

【００２３】図１は、顔の検出装置の全体構成図であ
る。図１において、１は対象画像を検出するコンピュー
タであり、２はコンピュータ１の検出結果を表示するＣ
ＲＴ等であり、３は光ディスク等の動画像再生装置であ
り、４はアナログ信号をディジタル信号に変換するＡ／
Ｄ変換器であり、５は動画像再生装置３とコンピュータ
１間の制御信号を接続する制御線、６はハードディスク
などからなる外部記憶装置であり、７はマウス、キーボ
ードなどの入力装置、８ａ〜８ｅは、コンピュータ１と
周辺装置との接続を行うインタフェース、９はコンピュ
ータのＣＰＵ、１０はＣＰＵ９から直接アクセスするメ
モリである。FIG. 1 is an overall configuration diagram of the face detecting apparatus. In FIG. 1, reference numeral 1 denotes a computer for detecting a target image, and 2 denotes a C for displaying a detection result of the computer 1.
RT, etc., 3 is a moving image reproducing apparatus such as an optical disk, and 4 is an A / A for converting an analog signal into a digital signal.
Reference numeral 5 denotes a D converter, 5 denotes a control line for connecting a control signal between the moving picture reproducing device 3 and the computer 1, 6 denotes an external storage device such as a hard disk, 7 denotes an input device such as a mouse and a keyboard, and 8a to 8a to 8d. Reference numeral 8e denotes an interface for connecting the computer 1 to a peripheral device, 9 denotes a CPU of the computer, and 10 denotes a memory directly accessed from the CPU 9.

【００２４】動画像再生装置３から出力される映像信号
は、逐次、Ａ／Ｄ変換器４によってデジタル画像に変換
され、コンピュータ１に送られる。デジタル画像は、コ
ンピュータ１では、インターフェース８ｃを介してメモ
リ１０に入り、メモリ１０に格納されたプログラムに従
って、ＣＰＵ９により処理される。The video signal output from the moving picture reproducing device 3 is sequentially converted into a digital image by the A / D converter 4 and sent to the computer 1. In the computer 1, the digital image enters the memory 10 via the interface 8c, and is processed by the CPU 9 according to a program stored in the memory 10.

【００２５】図２は検出方法の流れを示したフローチャ
ートである。まず、フレーム画像rがコンピュータ１に
読み込まれると（ステップ１０１）、予めプログラムに
よって記述された関数にｆに従って、顔領域が検出され
る（ステップ１０２）。関数ｆの出力Ｐはフレーム上で
検出した全ての顔座標が記録された行列である。例え
ば、行列Ｐのｉ列目のベクトルは、フレームrでｉ番目
に検出された顔を格納しており、行列Ｐの(ｉ,1)成分
は、例えば顔領域が長方形の領域で切り出されるとすれ
ば、顔領域の左上のｘ座標、（ｉ,2）成分は顔領域の左
上のｙ座標、(ｉ，3) 成分は顔領域の右下のｘ座標、
(ｉ,4)成分は、顔領域の右下のｙ座標を格納している。
また、顔領域が１つも見つからなかった場合は、関数ｆ
は例えば−１を返すようにしておく。行列Ｐの出力が−
１でなければ、ディスプレイ２に検出結果が出力される
（ステップ１０４）。FIG. 2 is a flowchart showing the flow of the detection method. First, when the frame image r is read into the computer 1 (step 101), a face area is detected according to a function f described in advance by a program (step 102). The output P of the function f is a matrix in which all face coordinates detected on the frame are recorded. For example, the vector in the i-th column of the matrix P stores the face detected i-th in the frame r, and the (i, 1) component of the matrix P is, for example, when the face region is cut out in a rectangular region. Then, the upper left x coordinate of the face area, the (i, 2) component is the upper left y coordinate of the face area, the (i, 3) component is the lower right x coordinate of the face area,
The (i, 4) component stores the y coordinate at the lower right of the face area.
If no face area is found, the function f
Returns, for example, -1. The output of matrix P is-
If it is not 1, the detection result is output to the display 2 (step 104).

【００２６】ディスプレイには検出した顔領域のみ表示
されても良いし、顔領域に特定マークが付されてフレー
ム全体が表示されても良い。なお、行列Ｐの列サイズが
検出した人数を表しているので、検出人数を併せてディ
スプレイに表示することもできる。On the display, only the detected face area may be displayed, or a specific mark may be added to the face area to display the entire frame. Note that since the column size of the matrix P indicates the number of detected persons, the detected number of persons can also be displayed on the display.

【００２７】具体的な関数ｆについては、例えばカラー
画像の場合は、肌色検出により実現できる。入力された
画像の各画素の（Ｒ，Ｇ，Ｂ）値を色空間（例えば、Ｙ
ｕｖ空間）にプロットし、予め定義した肌色空間に含ま
れた画素のみを選択する。さらに、前述の該当画素のみ
で形成される画像領域のうち、予め設定した閾値以上の
面積を持った領域で且つ、その領域の形状が予め定めた
形状（例えば楕円）と同一または類似する領域のみを選
択することにより、顔検出を実現する。また、モノクロ
画像とカラー画像のどちらにも対応できる方法として、
テンプレートマッチングによる方法も考えられる。予め
１枚または複数の顔画像を標準パターンとして記憶して
おく。また、フレーム画像に対しては、切り出し用ウィ
ンドウの位置を移動させながらフレーム上の一部の領域
を順次切り出していき、切り出し画像と上述のテンプレ
ートとの類似度を計算する。例えば、切り出し画像とテ
ンプレート画像の相関値が設定閾値以上の場合は、その
切り出し領域は、顔画像であると判定する。For example, in the case of a color image, the specific function f can be realized by skin color detection. The (R, G, B) value of each pixel of the input image is stored in a color space (for example, Y
uv space), and only pixels included in a predefined skin color space are selected. Further, of the image regions formed by only the corresponding pixels, only those regions having an area equal to or larger than a predetermined threshold and having the same or similar shape as a predetermined shape (for example, an ellipse) By selecting, face detection is realized. Also, as a method that can handle both monochrome images and color images,
A method using template matching is also conceivable. One or more face images are stored in advance as standard patterns. For the frame image, a partial area on the frame is sequentially cut out while moving the position of the cut-out window, and the similarity between the cut-out image and the above-described template is calculated. For example, if the correlation value between the cut-out image and the template image is equal to or greater than a set threshold, the cut-out area is determined to be a face image.

【００２８】なお、異なるサイズのテンプレートを用意
しておくことにより、顔のサイズの異なる顔領域も検出
することが可能である。また、顔の向きの異なるテンプ
レートを用意しておくことにより、顔の向きの異なる顔
領域も検出することができる。例えば、上向き・下向き
・左向き・右向きの４種類の顔画像のテンプレートを用
意しておき、各切り出し画像に対して上記４種類のテン
プレートとの類似度を求め、この４つの類似度の最大値
が予め定めた閾値以上であれば、該当するテンプレート
から顔の向きを特定することができる。By preparing templates of different sizes, it is possible to detect face regions having different face sizes. Also, by preparing templates having different face directions, face areas having different face directions can be detected. For example, four types of face image templates, upward, downward, leftward, and rightward, are prepared, and the similarity between each of the cut-out images and the four types of templates is obtained. If it is equal to or greater than a predetermined threshold, the direction of the face can be specified from the corresponding template.

【００２９】別の関数ｆの実現方法として、切り出され
た領域が、顔か非顔であるかの識別関数を予め作成して
おく方法も考えられる。例えばニューラルネットワーク
を用いて学習により、顔と非顔の識別関数を作成してお
くというものである。なお、顔の向きの異なる画像に予
め異なるラベルをふることにより形成したカテゴリを識
別するようにニューラルネットワークを学習させ、顔の
向きを検出することもできる。As another method of realizing the function f, a method may be considered in which a discriminant function for determining whether the cut-out area is a face or a non-face is prepared in advance. For example, a learning function using a neural network is used to create a face / non-face discrimination function. The neural network can be trained so as to identify a category formed by applying different labels to images having different face directions in advance, and the face direction can be detected.

【００３０】図３は、顔検出の処理をブロック図で示し
たものである。図３において、１１は、画像入力部であ
り、１２は顔検出部であり、１３は画像出力部である。
画像入力部は、Ａ／Ｄ変換器４から出力される映像信号
Ｔを入力として、１フレーム分の画像をフレーム信号ｒ
として出力する。顔検出部１２は、画像入力部から出力
されたフレーム信号ｒを入力として、１フレーム内に存
在する顔の位置の座標、顔の向きの情報を出力する顔検
出信号Ｐと、フレーム信号ｒを出力する。画像出力部１
３は、顔検出部１２から出力される顔検出信号Ｐとフレ
ーム信号ｒとから、顔領域のみの画像を検出結果信号Ｓ
として出力する。検出結果信号Ｓは、コンピュータ内の
インタフェース8ｂを介してディスプレイ２に表示され
る。図２のフレームには、例えば各フレーム毎に切り出
された顔画像が表示される。図２のように、様々な顔の
大きさと向きに対応が可能である。例えば顔の向きが矢
印等でディスプレイ上に表示されても良い。なお、検出
信号Ｓは、顔領域に特定のマークを付されたフレーム全
体の画像でも構わない。また、顔検出と同様に手法によ
り、ひげや眼鏡の検出等もできる。FIG. 3 is a block diagram showing the face detection processing. In FIG. 3, reference numeral 11 denotes an image input unit, 12 denotes a face detection unit, and 13 denotes an image output unit.
The image input unit receives a video signal T output from the A / D converter 4 as an input and converts an image for one frame into a frame signal r.
Output as The face detection unit 12 receives the frame signal r output from the image input unit as an input, and outputs a face detection signal P that outputs information on the coordinates of the face position and the face direction existing in one frame, and a frame signal r. Output. Image output unit 1
3 is a detection result signal S based on the face detection signal P output from the face detection unit 12 and the frame signal r.
Output as The detection result signal S is displayed on the display 2 via the interface 8b in the computer. In the frame of FIG. 2, for example, a face image cut out for each frame is displayed. As shown in FIG. 2, various face sizes and orientations can be handled. For example, the direction of the face may be displayed on the display with an arrow or the like. Note that the detection signal S may be an image of the entire frame in which a specific mark is attached to the face area. In addition, beards and glasses can be detected by a technique similar to face detection.

【００３１】（実施の形態２）図４は、本発明の第２の
実施の形態に係る画像検索装置の動作を説明するフロー
チャートである。この実施の形態に係る画像検索装置は
上記第１の実施の形態に係る装置構成と同じ構成を有す
る。第２の実施の形態では、映像のシーンチェンジ機能
と実施の形態１で記述した顔検出機能とを組み合わせ
て、或るシーンでの代表画像を選択する際に、人の顔が
写っている画像を代表画像とするものである。以下図４
に示したフローチャートで説明する。(Embodiment 2) FIG. 4 is a flowchart for explaining the operation of an image retrieval apparatus according to a second embodiment of the present invention. The image search device according to this embodiment has the same configuration as the device configuration according to the first embodiment. In the second embodiment, when a representative image in a certain scene is selected by combining the video scene change function and the face detection function described in the first embodiment, an image in which a human face is shown is selected. Is a representative image. Figure 4 below
This will be described with reference to the flowchart shown in FIG.

【００３２】図４において、まず現在のシーンの代表画
像がすでに記憶されているか否かのフラグFlgを初期化
する（ステップ１０７）。次に映像のｒ番目のフレーム
ｒを読み込み、シーンが変化したか否かを判断する（ス
テップ１０９）。シーンが変化したと判断された場合
は、予め記憶している前シーンの代表画像ＲＴをディス
プレイに表示する（ステップ１１０）とともに、フラグ
Flgを０にリセットする（ステップ１１１）。ただし、
映像入力の開始直後は、前シーンの代表画像ＲＴは存在
しないので、ディスプレイには何も表示されない。In FIG. 4, first, a flag Flg as to whether or not the representative image of the current scene has already been stored is initialized (step 107). Next, the r-th frame r of the video is read, and it is determined whether or not the scene has changed (step 109). If it is determined that the scene has changed, the representative image RT of the previous scene stored in advance is displayed on the display (step 110), and the flag is set.
Flg is reset to 0 (step 111). However,
Immediately after the start of video input, there is no representative image RT of the previous scene, so nothing is displayed on the display.

【００３３】次に、フラグFlgを評価し（ステップ１１
２）、Flgが０の場合は現在読み込んでいるフレームｒ
を代表画像RＴとして記録して（ステップ１１３）、フ
ラグFlgを１にセットする（ステップ１１４）。この処
理は、該当シーン内に１度も人の顔が含まれていない場
合を考慮したもので、予め、シーンチェンジが判断され
た直後のフレームを代表画像として記憶しておくもので
ある。なお、シーンチェンジが判断されてから所定フレ
ーム経過後のフレームを予め代表画像として記憶するも
のであっても良い。Next, the flag Flg is evaluated (step 11).
2) If Flg is 0, the currently read frame r
Is recorded as the representative image RT (step 113), and the flag Flg is set to 1 (step 114). This processing takes into account a case where a human face is never included in the scene, and stores a frame immediately after a scene change is determined as a representative image in advance. Note that a frame after a predetermined frame has elapsed after the scene change is determined may be stored in advance as a representative image.

【００３４】フラグが１にセットされている場合は、す
でに以前のフレームが代表画像を記録していることにな
るので、そのまま顔検出を行う。また、ステップ１１
３、１１４で最初の代表画像が取り込まれた後も、実施
の形態１に記述した手法で顔の検出くを行う（ステップ
１１５）。When the flag is set to 1, since the previous frame has already recorded the representative image, the face detection is performed as it is. Step 11
Even after the first representative image is captured in steps 3 and 114, face detection is performed by the method described in the first embodiment (step 115).

【００３５】検出の結果、関数ｆの出力Ｐが−１の場合
（ステップ１１６）、つまり顔が検出されない場合は、
次のフレームを読み込む。Ｐが−１でない場合、つまり
現在読み込んでいるフレームｒに顔が検出された場合
は、代表画像を更新する（ステップ１１７）。As a result of the detection, when the output P of the function f is -1 (step 116), that is, when no face is detected,
Read the next frame. If P is not -1, that is, if a face is detected in the currently read frame r, the representative image is updated (step 117).

【００３６】以上の処理手順により、あるシーンに顔領
域が存在している場合は、顔が写っているフレームを代
表画像とすることができる。なお、実施の形態１で記述
した手法は、顔の向き、サイズ、数も判定することがで
きるので、例えば正面顔のフレームを代表画像にすると
か、顔のサイズがより大きいフレームを代表画像とする
とか、顔の数がより多いフレームを代表画像として選ぶ
こともできる。According to the above-described processing procedure, when a face area exists in a certain scene, a frame in which the face is shown can be set as the representative image. The method described in the first embodiment can also determine the direction, size, and number of faces. For example, a frame of a frontal face is used as a representative image, or a frame with a larger face size is used as a representative image. Or, a frame having a larger number of faces can be selected as the representative image.

【００３７】図５は、上記の処理手順をブロック図で表
したものである。図５において、１１は、画像入力部で
あり、実施の形態１で記述したものと同じである。１４
は、シーンチェンジ検出部であり、１２は顔検出部であ
り、実施の形態１で記述したものと同じである。１３は
画像出力部であり、実施の形態１で記述したものと同じ
である。FIG. 5 is a block diagram showing the above processing procedure. In FIG. 5, reference numeral 11 denotes an image input unit, which is the same as that described in the first embodiment. 14
Is a scene change detection unit, and 12 is a face detection unit, which is the same as that described in the first embodiment. Reference numeral 13 denotes an image output unit, which is the same as that described in the first embodiment.

【００３８】画像入力部１１は、Ａ／Ｄ変換器４から出
力される映像信号Ｔを入力として、１フレーム分の画像
をフレーム信号ｒとして出力する。シーンチェンジ検出
部１４では、映像の不連続性からシーンの切り変わりを
判定し、シーン切り換え信号Ｃと、フレーム信号ｒを出
力する。顔検出部１２は、シーンチェンジ検出部１４か
ら出力されたシーン切り換え信号Ｃとフレーム信号ｒを
入力として、１フレーム内に存在する顔の位置の座標、
顔の向きの情報を出力する顔検出信号Ｐと、フレーム信
号ｒを出力する。画像出力部１３は、顔検出部１２から
出力される顔検出信号Ｐとフレーム信号ｒとから、顔の
写っているフレームを検出結果信号Ｓとして出力する。
検出結果信号Ｓは、コンピュータ内のインタフェース8
ｂを介してディスプレイ２に表示される。ディスプレイ
２には、例えばシーン毎の代表画像が表示され、そのシ
ーン内に顔画像が含まれている場合は、顔が含まれてい
る画像を代表画像にすることができる。The image input unit 11 receives a video signal T output from the A / D converter 4 and outputs an image for one frame as a frame signal r. The scene change detection unit 14 determines a scene change from the discontinuity of the video, and outputs a scene switching signal C and a frame signal r. The face detection unit 12 receives the scene switching signal C and the frame signal r output from the scene change detection unit 14 as inputs, and receives the coordinates of the position of the face in one frame,
A face detection signal P for outputting information of the face direction and a frame signal r are output. The image output unit 13 outputs a frame including a face as a detection result signal S from the face detection signal P and the frame signal r output from the face detection unit 12.
The detection result signal S is transmitted to the interface 8 in the computer.
b is displayed on the display 2. The display 2 displays, for example, a representative image for each scene. When a face image is included in the scene, an image including a face can be used as the representative image.

【００３９】なお、シーンチェンジ検出部の実現は、例
えば文献「ビデオインデックス作成編集技術」，（Mats
ushita Technical Journal Vol.44 No.5）,山田伸他に
記載された公知技術を利用することにより実現できる。The realization of the scene change detection unit is described in, for example, the document “Video index creation and editing technology”, (Mats
ushita Technical Journal Vol.44 No.5), Shin Yamada et al.

【００４０】（実施の形態３）図６は、本発明の第３の
実施の形態に係る画像検索装置の動作を説明するフロー
チャートである。この実施の形態に係る画像検索装置は
上記第１の実施の形態に係る装置構成と同じ構成を有す
る。本実施の形態は、読み込んだフレームについて顔を
検出し、その検出した顔から性別の識別の実施について
説明したものである。(Embodiment 3) FIG. 6 is a flowchart for explaining the operation of an image retrieval apparatus according to a third embodiment of the present invention. The image search device according to this embodiment has the same configuration as the device configuration according to the first embodiment. In the present embodiment, a face is detected from a read frame, and gender identification is performed from the detected face.

【００４１】図６のフローチャートを用いて説明する。
まず、フレーム画像rがコンピュータ１に読み込まれる
と（ステップ１１９）、予めプログラムによって記述さ
れた関数にｆに従って、顔領域が検出される（ステップ
１２０）。関数ｆは、実施の形態１で説明したものと同
じである。関数ｆの出力Ｐが−１でない場合、すなわ
ち、顔領域が検出された場合は、検出領域の顔が男性
か、女性かの識別を行う（ステップ１２２）。関数ｇは
男女識別関数である。関数ｇはフレーム上の顔領域の座
標Ｐと、フレームｒを引数として、男女識別し、識別結
果を出力Ｑに格納する。出力Ｑは例えば行列となってお
り、行列Ｑのｉ列目のベクトルは、ｉ番目に検出した顔
について記録されている。行列Ｑの（ｉ、１）成分はｉ
番目に検出した顔領域の左上のｘ座標、（ｉ、２）成分
はｙ座標、（ｉ、３）成分は顔の右下のｘ座標、（ｉ、
４）成分は顔の右下のｙ座標を、（ｉ、５）成分は、男
性の場合は＋１、女性の場合は−１が格納される。This will be described with reference to the flowchart of FIG.
First, when the frame image r is read into the computer 1 (step 119), a face area is detected in accordance with f according to a function described in advance by a program (step 120). The function f is the same as that described in the first embodiment. If the output P of the function f is not -1, that is, if a face area is detected, it is determined whether the face in the detection area is a man or a woman (step 122). The function g is a gender discrimination function. The function g uses the coordinates P of the face area on the frame and the frame r as arguments to discriminate between genders and stores the discrimination result in the output Q. The output Q is, for example, a matrix, and the vector in the i-th column of the matrix Q is recorded for the i-th detected face. The (i, 1) component of the matrix Q is i
The x coordinate at the upper left of the face area detected the third time, the (i, 2) component is the y coordinate, the (i, 3) component is the x coordinate at the lower right of the face, (i,
The 4) component stores the y coordinate at the lower right of the face, and the (i, 5) component stores +1 for males and -1 for females.

【００４２】関数ｇの実現方法としては、予め性別が既
知の顔画像を多数収集し、この収集した顔画像を元に判
別分析等の統計的手法により実現することができる。ま
た例えば、ニューラルネットワークの学習により、識別
関数を実現することもできる。最後に、ディスプレイ２
に検出結果が出力される（ステップ１２３）。As a method of realizing the function g, it is possible to collect a large number of face images whose genders are known in advance, and to implement the function g by a statistical method such as discriminant analysis based on the collected face images. Further, for example, a discriminant function can be realized by learning a neural network. Finally, display 2
Is output (step 123).

【００４３】したがって、本実施の形態では、顔検出し
たのち、さらに検出した顔が男性か女性を表示すること
ができる。また、予め映像編集者、映像検索者が特定の
性別のみを検索したい旨を入力装置７介して入力してい
ればその特定性別のみの顔画像が含まれたフレームを検
出することもできる。Therefore, in this embodiment, after the face is detected, the detected face can be displayed as a man or a woman. Further, if the video editor and the video searcher have input in advance via the input device 7 that the user wants to search only a specific gender, it is possible to detect a frame including a face image of only the specific gender.

【００４４】図７は、上記の処理をブロック図で表示し
たものである。図７において、１１は、画像入力部であ
り、１２は顔検出部であり、１８は男女識別部であり、
１３は画像出力部である。画像入力部１１は、Ａ／Ｄ変
換器４から出力される映像信号Ｔを入力として、１フレ
ーム分の画像をフレーム信号ｒとして出力する。顔検出
部１２は、画像入力部から出力されたフレーム信号ｒを
入力として、１フレーム内に存在する顔の位置の座標、
顔の向き、サイズ等の情報を出力する顔検出信号Ｐと、
フレーム信号ｒを出力する。男女識別部では、顔検出部
１２から出力された顔検出信号Ｐと、フレーム信号ｒと
を入力とし、Ｐに記録された顔領域の画像に基いて男女
識別を行い、識別結果と顔検出信号Ｐとの結果を併せて
男女識別信号Ｑとして出力する。フレーム信号ｒも出力
する。画像出力部１３は、男女識別部１５から出力され
た男女識別信号Ｑとフレーム信号ｒとから、顔領域のみ
を切り出した後、各顔画像に男性、女性の識別マークを
付加した画像を検出結果信号Ｓとして出力する。検出結
果信号Ｓは、コンピュータ内のインタフェース8ｂを介
してディスプレイ２に表示される。図７のディスプレイ
には、例えば各フレーム毎に切り出された顔画像と各顔
画像の性別マークが付されて表示される。FIG. 7 is a block diagram showing the above processing. In FIG. 7, 11 is an image input unit, 12 is a face detection unit, 18 is a gender identification unit,
Reference numeral 13 denotes an image output unit. The image input unit 11 receives a video signal T output from the A / D converter 4 and outputs an image for one frame as a frame signal r. The face detection unit 12 receives, as an input, the frame signal r output from the image input unit, the coordinates of the position of the face existing in one frame,
A face detection signal P for outputting information such as face orientation and size,
It outputs a frame signal r. The gender discriminating unit receives the face detection signal P output from the face detecting unit 12 and the frame signal r, and performs gender discrimination based on the image of the face area recorded in P. The discrimination result and the face detection signal The result of P is output together with the gender identification signal Q. It also outputs a frame signal r. The image output unit 13 cuts out only the face region from the gender discrimination signal Q and the frame signal r output from the gender discrimination unit 15, and detects an image in which male and female identification marks are added to each face image. Output as a signal S. The detection result signal S is displayed on the display 2 via the interface 8b in the computer. For example, the face image cut out for each frame and the gender mark of each face image are displayed on the display of FIG.

【００４５】なお、ディスプレイへの表示のやり方は上
述したものに限ったものではなく、顔領域に特定のマー
クを付された後、フレーム全体がディスプレイに表示さ
れるものでもよい。また、実施の形態２に記述したシー
ンチェンジ検出部１４を図７の画像入力部１１の直後に
挿入することにより、各シーンの代表画像は、特性性別
の写っている顔にするなども可能である。The display method on the display is not limited to the above-described method, and the entire frame may be displayed on the display after a specific mark is attached to the face area. In addition, by inserting the scene change detection unit 14 described in the second embodiment immediately after the image input unit 11 in FIG. 7, the representative image of each scene can be a face in which the gender of the characteristic appears. is there.

【００４６】さらに、図８のブロック図のように、例え
ば年齢識別部１６、表情識別部１７、人種識別部１８と
を加えることにより、より細かい検索、編集が可能とな
る。年齢識別部１６は、顔検出部１２から出力された顔
検出信号Ｐとフレーム信号ｒとから、検出された顔から
年齢を推定し、年齢識別信号ｙを出力する。年齢の識別
は、例えば２０代等の年代を出力する。年齢識別１６の
実現方法としては、男女識別部を実現する手法と同様の
方法で実現できる。Further, as shown in the block diagram of FIG. 8, by adding, for example, an age discriminating unit 16, a facial expression discriminating unit 17, and a race discriminating unit 18, more detailed search and editing can be performed. The age discrimination unit 16 estimates the age from the detected face from the face detection signal P and the frame signal r output from the face detection unit 12, and outputs an age discrimination signal y. The age is output, for example, the age of the twenties or the like. The age discrimination 16 can be realized by the same method as that for realizing the gender discriminator.

【００４７】予め年代が既知の顔画像を多数収集し、こ
の収集した顔画像を元に判別分析等の統計的手法により
実現することができる。また例えば、ニューラルネット
ワークの学習により、識別関数を実現することもでき
る。この表情識別部１７は、顔検出部１２から出力され
た顔検出信号Ｐとフレーム信号ｒとを入力として、検出
された顔から表情を推定し、表情識別信号Ｈを出力す
る。表情の識別は、例えば、「笑う」、「泣く」、「怒
る」等の表情別に付けられたラベルが出力される。表情
識別部１７の実現方法としては、男女識別部を実現する
手法と同様の方法で実現できる。A large number of face images whose dates are known in advance can be collected, and based on the collected face images, a statistical method such as discriminant analysis can be used. Further, for example, a discriminant function can be realized by learning a neural network. The expression identifying section 17 receives the face detection signal P and the frame signal r output from the face detecting section 12 as inputs, estimates an expression from the detected face, and outputs an expression identifying signal H. For the identification of the facial expression, for example, a label attached to each facial expression such as “laughing”, “crying”, “angry” is output. As a method of realizing the facial expression identification unit 17, it can be realized by the same method as the method of implementing the gender identification unit.

【００４８】予め表情が既知の顔画像を多数収集し、こ
の収集した顔画像を元に判別分析等の統計的手法により
実現することができる。また例えば、ニューラルネット
ワークの学習により、識別関数を実現することもでき
る。A large number of face images whose expressions are known in advance can be collected, and a statistical method such as discriminant analysis can be implemented based on the collected face images. Further, for example, a discriminant function can be realized by learning a neural network.

【００４９】人種識別部１８は、顔検出部１２から出力
された顔検出信号Ｐとフレーム信号ｒとを入力として、
検出された顔から人種を推定し、人種識別信号Ｌを出力
する。人種の識別は、例えば、「黄色人種」、「白
人」、「黒人」等の人種別に付けられたラベルが出力さ
れる。表情識別部１７の実現方法としては、男女識別部
を実現する手法と同様の方法で実現できる。The race identification unit 18 receives the face detection signal P output from the face detection unit 12 and the frame signal r as inputs,
The race is estimated from the detected face, and a race identification signal L is output. For the identification of the race, for example, a label attached to the race such as “yellow race”, “white”, “black”, or the like is output. As a method of realizing the facial expression identification unit 17, it can be realized by the same method as the method of implementing the gender identification unit.

【００５０】予め人種が既知の顔画像を多数収集し、こ
の収集した顔画像を元に判別分析等の統計的手法により
実現することができる。また例えば、ニューラルネット
ワークの学習により、識別関数を実現することもでき
る。A large number of face images whose races are known in advance can be collected, and the collected face images can be realized by a statistical method such as discriminant analysis. Further, for example, a discriminant function can be realized by learning a neural network.

【００５１】画像出力部１３は、男女識別部１５から出
力された男女識別信号Ｑ、フレーム信号ｒ、年齢識別部
から出力された年齢識別信号ｙ、表情識別部１７から出
力された表情識別信号Ｈと、人種識別部から出力された
人種識別信号Ｌとを入力として受けて、各顔画像に性
別、年齢、表情、人種のラベルを付し、コンピュータ内
のインタフェースI/F８ｂを介してディスプレイ２に表
示させる。The image output unit 13 includes a gender discrimination signal Q and a frame signal r output from the gender discrimination unit 15, an age discrimination signal y output from the age discrimination unit, and an expression discrimination signal H output from the expression discrimination unit 17. And the race identification signal L output from the race identification unit as input, attaches gender, age, facial expression, and race labels to each face image, and via the interface I / F 8b in the computer. It is displayed on the display 2.

【００５２】その他同様の手法により、髪型や帽子の識
別等も可能である。また、切り出した顔画像の周囲の画
像を切り出すことにより、検出された人物についてのそ
の他の情報も取得できる。例えば、顔画像の下に位置す
る画像から、その人物の着ている洋服の色、ネクタイの
色、洋服の種類等も識別できる。さらに背景差分等の手
法からその人物の姿勢等も検出でき、複数の連続するフ
レームからその人物の姿勢変化を捉えることにより、動
作を推定することもできる。By other similar methods, it is possible to identify a hairstyle or a hat. In addition, by cutting out the image around the cut-out face image, other information about the detected person can be obtained. For example, the color of the clothes worn by the person, the color of the tie, the type of clothes, and the like can be identified from the image located below the face image. Furthermore, the posture of the person can be detected from a technique such as background subtraction, and the movement can be estimated by capturing the change in the posture of the person from a plurality of continuous frames.

【００５３】また、第２の実施の形態において記述した
シーンチェンジ検出部１４を図８の画像入力部１１の直
後に挿入することにより、各シーンの代表画像を、特定
性別でかつ特定年代でかつ特定表情でかつ特定人種の写
っている顔にするなども可能である。Also, by inserting the scene change detection unit 14 described in the second embodiment immediately after the image input unit 11 in FIG. 8, the representative images of each scene can be identified by specific gender, specific age, and It is also possible to use a face with a specific expression and a specific race.

【００５４】ここで説明した実施の形態の用途としては
映像編集、検索装置を想定したが、なにもそれに限った
用途だけでなく、小売店における購入者分析等にも利用
できる。例えばスーパーなどの小売店のレジにカメラを
設置し、レジに並んだ購入者の映像から顔を検出して男
女識別、年齢識別等を実施することにより、購入商品と
性別、年齢との関係の情報を取得することができる。こ
れらの情報により仕入れ品を変更するなどして、売り上
げ改善に貢献することも可能な技術である。The application of the embodiment described here is assumed to be a video editing and retrieval device. However, the present invention can be used not only for limited applications but also for purchaser analysis at retail stores. For example, a camera is installed at the cash register of a retail store such as a supermarket, and the face is detected from the images of the purchasers lined up at the cash register and gender identification and age identification are performed. Information can be obtained. It is a technology that can contribute to sales improvement by changing purchased goods based on such information.

【００５５】（実施の形態４）図９は、本発明の第４の
実施の形態に係る画像検索装置の動作を説明するフロー
チャートである。この実施の形態においても、画像検索
装置は上記第１の実施の形態に係る装置構成と同じ構成
を有する。本実施の形態は、ある番組内の登場人物の識
別に関して説明したものである。ここにいう登場人物識
別とは、ある番組内に登場する人物を区別して表示する
機能をいう。(Embodiment 4) FIG. 9 is a flowchart for explaining the operation of an image retrieval apparatus according to a fourth embodiment of the present invention. Also in this embodiment, the image search device has the same configuration as the device configuration according to the first embodiment. In the present embodiment, identification of characters in a certain program has been described. Here, the character identification means a function of distinguishing and displaying characters appearing in a certain program.

【００５６】最初に、ある映像番組内に登場する人物の
リストが既知で、かつその登場人物の顔画像が画像デー
タベース等から取り出すことができる場合を想定して以
下を説明する。図９に示す処理のフローチャートにした
がって動作を説明する。まず、番組登場人物リストを読
み込む（ステップ１２６）。このリストは各登場人物別
にＩＤ番号が与えられており、このＩＤ番号と顔画像デ
ータベースの格納アドレスが記録されている。次に、各
ＩＤ番号毎に登録されている顔データベースから顔画像
を読み込む。顔画像は、例えば向きやサイズ、表情の異
なる画像が登録されているとする。First, the following description is made on the assumption that a list of persons appearing in a certain video program is known and a face image of the persons can be extracted from an image database or the like. The operation will be described with reference to the flowchart of the process shown in FIG. First, the program character list is read (step 126). In this list, an ID number is given to each character, and the ID number and the storage address of the face image database are recorded. Next, a face image is read from a face database registered for each ID number. It is assumed that, for example, images having different orientations, sizes, and expressions are registered as the face images.

【００５７】上記のように各登場人物別の複数の顔画像
から特徴量を抽出する（ステップ１２８）。例えば特徴
量抽出方法の実現手法として、例えばＫＬ展開がある。
つまり登場人物別の顔画像毎にＫＬ展開を実施する。As described above, feature values are extracted from a plurality of face images for each character (step 128). For example, there is KL expansion as an implementation method of the feature amount extraction method.
That is, KL expansion is performed for each face image for each character.

【００５８】ここまでが、登場人物識別のための準備で
ある。The above is the preparation for character identification.

【００５９】次に、対象としている番組映像を読み込み
（ステップ１２９）、各フレーム毎に実施の形態１に記
述した手法により顔画像を検出する（ステップ１３
０）。検出した顔画像について、顔識別を実施し、登場
人物との対応付けを行う（ステップ１３１）。顔識別手
法としては例えば、部分空間法等を用いることができ
る。最後に、ディスプレイに登場人物識別結果を表示す
る（ステップ１３２）。Next, the target program video is read (step 129), and a face image is detected for each frame by the method described in the first embodiment (step 13).
0). The detected face image is subjected to face identification and associated with the characters (step 131). As the face identification method, for example, a subspace method or the like can be used. Finally, the character identification result is displayed on the display (step 132).

【００６０】図１０は、上記処理のブロック図を示した
ものである。図１０において、１９は登場人物リストで
あり、２０は登場人物の顔画像が記録されている顔デー
タベースであり、２１は顔画像取り込み部であり、２２
は顔画像から特徴を抽出する特徴抽出部であり、２３は
登場人物の顔を識別する顔識別部であり、１１は映像信
号を入力として受けて、フレーム毎にフレーム信号ｒを
出力する画像入力部であり、１２は顔検出部であり、１
３は識別結果を出力する画像出力部であり、８ｂはコン
ピュータ内のインタフェースであり、２は識別結果を表
示するディスプレイである。以下、各ブロック別に動作
を説明する。FIG. 10 is a block diagram showing the above processing. In FIG. 10, reference numeral 19 denotes a character list, reference numeral 20 denotes a face database storing face images of the characters, reference numeral 21 denotes a face image capturing unit, and reference numeral 22 denotes a face image capturing unit.
Is a feature extraction unit that extracts features from a face image, 23 is a face identification unit that identifies the face of a character, and 11 is an image input that receives a video signal as input and outputs a frame signal r for each frame. 12 is a face detection unit, and 1 is a face detection unit.
Reference numeral 3 denotes an image output unit for outputting the identification result, 8b denotes an interface in the computer, and 2 denotes a display for displaying the identification result. Hereinafter, the operation will be described for each block.

【００６１】画像取り込み部２１は、ます入力された登
場人物リストから顔データベースに記憶されている登場
人物の顔画像を取り込み、該当顔画像を顔データベース
信号ｆｔとして出力する。特徴抽出部２２は、顔画像取
り込み部から出力される顔データベース信号ｆｔから、
特徴量を抽出する。例えば、各登場人物別にＫＬ展開を
行い求めた固有ベクトルを特徴量信号Ｋとして出力す
る。The image capturing section 21 captures a face image of a character stored in the face database from the input character list, and outputs the corresponding face image as a face database signal ft. The feature extracting unit 22 calculates the face database signal ft output from the face image capturing unit,
Extract feature values. For example, an eigenvector obtained by performing KL expansion for each character is output as a feature amount signal K.

【００６２】一方、画像入力部１１はＡ／Ｄ変換器４を
介して入力された映像信号Ｔを取り込み、１フレームず
つ、フレーム信号ｒとして出力する。顔検出部１２の動
作は、実施の形態１に記載したものと同じである。画像
入力部１１から出力されたフレーム信号ｒから、実施の
形態１に記述した手法によりフレーム内の顔領域を検出
し、顔検出信号Ｐとフレーム信号ｒを出力する。登場人
物識別部２３は、前述の特徴量信号Ｋと顔検出信号Ｐと
フレーム信号ｒを入力として受け、まず顔検出信号Ｐと
フレーム信号ｒとから顔領域を切り出し、ベクトル化す
る。このベクトルに対し、例えば部分空間法等を実施、
切り出された顔画像が、どの登場人物に最も類似するの
かを求め、その最も類似度が高い登場人物のＩＤ番号、
該当する顔画像のフレーム上の位置を併せて識別結果信
号Ｒｅｓとして出力する。またフレーム信号ｒも同時に
出力する。On the other hand, the image input section 11 takes in the video signal T input via the A / D converter 4 and outputs it as a frame signal r frame by frame. The operation of the face detection unit 12 is the same as that described in the first embodiment. A face area in a frame is detected from the frame signal r output from the image input unit 11 by the method described in the first embodiment, and a face detection signal P and a frame signal r are output. The character identification unit 23 receives the above-described feature amount signal K, face detection signal P, and frame signal r as inputs, and first extracts a face area from the face detection signal P and the frame signal r, and converts it into a vector. For this vector, for example, a subspace method is performed,
Find out which character the cut-out face image is most similar to, and find the ID number of the character with the highest similarity,
The position of the corresponding face image on the frame is output as the identification result signal Res. Also, the frame signal r is output at the same time.

【００６３】画像出力部１３は、登場人物識別部２３か
ら出力された識別結果信号Ｒｅｓとフレーム信号ｒとか
ら、ディスプレイ２上に登場人物別に顔画像を表示する
ように、識別結果信号Ｓとして出力する。なお、ディス
プレイへの表示の仕方はこれに限るものではなく、ディ
スプレイにはフレーム全体を表示させ、登場人物別に異
なるマークを付して表示させることもできる。The image output unit 13 outputs an identification result signal S from the identification result signal Res and the frame signal r output from the character identification unit 23 so as to display a face image for each character on the display 2. I do. The manner of displaying on the display is not limited to this, and the entire frame may be displayed on the display, and a different mark may be displayed for each character.

【００６４】さらに、実施の形態２に記述したシーンチ
ェンジ検出部１４を画像入力部１１の直後に挿入するこ
とにより、シーン毎に登場人物の検出結果を出力するこ
ともできる。Further, by inserting the scene change detecting section 14 described in the second embodiment immediately after the image input section 11, it is possible to output the detection result of the characters for each scene.

【００６５】なお、本実施例では予め映像番組内に登場
する人物のリストが既知の場合を説明したが、登場人物
が既知でない場合は、まず対象とする映像全体で顔検出
したのち、教師なしクラスタリングを実施することによ
り実現できる。また、登場人物が顔データベースに登録
されていない場合で、初期のシーンでの登場人物の顔が
その後の映像中のどこに登場しているかを検出したい場
合は、検索者や編集者が映像の初期シーンの中から登場
人物を切り出して、特徴抽出部２２に入力することによ
り実現できる。In this embodiment, the case where the list of the characters appearing in the video program is known in advance is described. If the characters are not known, first the face is detected in the entire target video and then the unsupervised This can be realized by performing clustering. If the character is not registered in the face database and you want to detect where the character's face in the initial scene appears in the subsequent video, the searcher or This can be realized by cutting out characters from the scene and inputting them to the feature extraction unit 22.

【００６６】（実施の形態５）図１１は、本発明の第５
の実施の形態に係る画像検索装置の構成を示すブロック
図である。本実施の形態は、ある特定人物の顔が対象映
像中のどこに存在するか検出することについて説明した
ものである。(Embodiment 5) FIG. 11 shows a fifth embodiment of the present invention.
It is a block diagram which shows the structure of the image search device which concerns on Embodiment. In the present embodiment, detection of where a specific person's face exists in a target video is described.

【００６７】図１１において、２５は検索対象の顔画像
であり、２１は検索対象の顔画像を取り込む顔画像取り
込み部であり、２２は顔画像から特徴を抽出する特徴抽
出部であり、１１は、画像入力部であり、１２は顔検出
部であり、２６は、顔検出部１２から検出された顔画像
から特徴量を抽出する映像信号用特徴抽出部であり、２
４は照合部であり、１３は識別結果を出力する画像出力
部であり、８ｂはコンピュータ内のインタフェースであ
り、２は識別結果を表示するディスプレイである。In FIG. 11, reference numeral 25 denotes a face image to be searched, reference numeral 21 denotes a face image capturing unit for capturing the face image to be searched, reference numeral 22 denotes a feature extracting unit for extracting features from the face image, and reference numeral 11 denotes a feature extracting unit. , An image input unit, 12 is a face detection unit, 26 is a video signal feature extraction unit that extracts a feature amount from the face image detected by the face detection unit 12, and 2
4 is a collating unit, 13 is an image output unit that outputs the identification result, 8b is an interface in the computer, and 2 is a display that displays the identification result.

【００６８】以下、ブロック毎にその動作を説明する。
画像取り込み部２１は、検索対象の顔画像２５を取り込
み、前処理を実行する。例えばヒストグラム平滑化等を
実施し、前処理済検索画像信号ＴＴａとして出力する。
特徴抽出部２２は、顔画像取り込み部２１から出力され
る前処理済検索画像信号ＴＴａから所定の特徴量を特徴
量を抽出する。Hereinafter, the operation will be described for each block.
The image capturing unit 21 captures the face image 25 to be searched and executes preprocessing. For example, histogram smoothing or the like is performed and output as a preprocessed search image signal TTa.
The feature extracting unit 22 extracts a predetermined feature amount from the preprocessed search image signal TTa output from the face image capturing unit 21.

【００６９】一方、画像入力部１１はＡ／Ｄ変換器４を
介して入力された映像信号Ｔを取り込み、１フレームず
つ、フレーム信号ｒとして出力する。顔検出部１２の動
作は、画像入力部１１から出力されたフレーム信号ｒか
ら、フレーム内の顔領域を検出し顔検出信号Ｐとフレー
ム信号ｒを出力し、出力された信号は映像信号用特徴量
抽出部２６において、顔画像を切り出し、前処理を実施
した後、特徴量信号ｋとフレーム信号ｒを出力する。照
合部２４では、特徴抽出部２２と映像信号用特徴量抽出
部から出力された特徴量信号を照合し、所定の類似度以
上であれば、検索対象の顔であると判定し、そのフレー
ムを画像出力部１３に出力する。画像出力部１３は、入
力されたフレームをディスプレイ２上にフレーム番号と
ともに出力する。On the other hand, the image input section 11 takes in the video signal T input via the A / D converter 4 and outputs the frame signal r frame by frame. The operation of the face detection unit 12 is to detect a face area in a frame from the frame signal r output from the image input unit 11 and output a face detection signal P and a frame signal r. After the face image is cut out and pre-processed by the quantity extraction unit 26, the feature quantity signal k and the frame signal r are output. The matching unit 24 checks the feature amount signals output from the feature extracting unit 22 and the video signal feature amount extracting unit. If the feature amount signal is equal to or greater than a predetermined similarity, the matching unit 24 determines that the face is a search target face, and Output to the image output unit 13. The image output unit 13 outputs the input frame on the display 2 together with the frame number.

【００７０】以上の実施の形態によれば、例えば有名な
俳優、政治家等の検索をすばやく行うことができ、映像
編集者や検索者の負担を軽くするこができる。According to the above embodiment, for example, a search for famous actors, politicians, etc. can be performed quickly, and the burden on video editors and searchers can be reduced.

【００７１】[0071]

【発明の効果】以上のように本発明によれば、大量に蓄
積されている映像データベース等から、顔画像を検出
し、検出した顔を識別するので、制服等、服装が同一の
場合であっても誤検出することなく精度良く検索したい
顔画像が含まれた画像を抽出することができるという有
利な効果を有している。さらに、本発明の効果として男
女識別等、検索者のより細かい検索要求にも対応するこ
とができるという効果を有する。As described above, according to the present invention, a face image is detected from a video database or the like stored in a large amount, and the detected face is identified. However, there is an advantageous effect that an image including a face image to be searched with high accuracy can be extracted without erroneous detection. Furthermore, as an effect of the present invention, there is an effect that it is possible to respond to a more detailed search request of a searcher such as gender identification.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態に係る画像検索装置
の全体構成を表すブロック図FIG. 1 is a block diagram illustrating an overall configuration of an image search device according to a first embodiment of the present invention.

【図２】実施の形態１の動作を説明するためのフローチ
ャートFIG. 2 is a flowchart for explaining the operation of the first embodiment;

【図３】前記実施の形態１を説明するためのブロック図FIG. 3 is a block diagram for explaining the first embodiment;

【図４】本発明の第１の実施の形態に係る画像検索装置
の動作を説明するためのフローチャートFIG. 4 is a flowchart for explaining the operation of the image search device according to the first embodiment of the present invention;

【図５】前記実施の形態２を説明するためののブロック
図FIG. 5 is a block diagram for explaining the second embodiment;

【図６】本発明の第３の実施の形態に係る画像検索装置
の動作を説明するためのフローチャートFIG. 6 is a flowchart for explaining the operation of the image search device according to the third embodiment of the present invention.

【図７】本発明の実施の形態３を説明するためのブロッ
ク図FIG. 7 is a block diagram for explaining Embodiment 3 of the present invention;

【図８】前記実施の形態３を説明するためのブロック図FIG. 8 is a block diagram for explaining the third embodiment.

【図９】本発明の第４の実施の形態に係る画像検索装置
の動作を説明するためのフローチャートFIG. 9 is a flowchart for explaining the operation of the image search device according to the fourth embodiment of the present invention.

【図１０】前記実施の形態４を説明するためのブロック
図FIG. 10 is a block diagram for explaining the fourth embodiment;

【図１１】本発明の第５の実施の形態に係る画像検索装
置の動作を説明するためのブロック図FIG. 11 is a block diagram for explaining the operation of an image search device according to a fifth embodiment of the present invention.

【図１２】従来の技術を説明するためのフローチャートFIG. 12 is a flowchart for explaining a conventional technique.

【図１３】従来の技術を説明するためのフローチャートFIG. 13 is a flowchart for explaining a conventional technique.

[Explanation of symbols]

１コンピュータ２ディスプレイ３動画像再生装置４Ａ／Ｄ変換器５制御線６外部記憶装置７入力装置８ａ〜８ｅインタフェース９ＣＰＵ１０メモリ１１画像入力部１２顔検出部１３画像出力部１４シーンチェンジ検出部１５男女識別部１６年齢識別部１７表情識別部１８人種識別部１９登場人物リスト２０顔データベース２１顔画像取り込み部２２特徴抽出部２３登場人物識別部２４照合部２５検索対象顔画像２６映像信号用特徴抽出部 REFERENCE SIGNS LIST 1 computer 2 display 3 moving image reproducing device 4 A / D converter 5 control line 6 external storage device 7 input device 8 a to 8 e interface 9 CPU 10 memory 11 image input unit 12 face detection unit 13 image output unit 14 scene change detection unit 15 Gender identification unit 16 Age identification unit 17 Expression identification unit 18 Race identification unit 19 Character list 20 Face database 21 Face image capture unit 22 Feature extraction unit 23 Character identification unit 24 Collation unit 25 Search target face image 26 For video signal Feature extraction unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者相馬正宜神奈川県川崎市多摩区東三田３丁目10番１号松下技研株式会社内 (72)発明者長尾健司神奈川県川崎市多摩区東三田３丁目10番１号松下技研株式会社内Ｆターム(参考） 5B075 ND12 NK07 PQ02 PR06 QM08 5C052 AA01 AC08 DD04 DD10 EE02 EE03 ──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Masanori Soma 3-10-1, Higashi-Mita, Tama-ku, Kawasaki-shi, Kanagawa Prefecture Inside Matsushita Giken Co., Ltd. (72) Inventor Kenji Nagao 3-chome, Higashi-Mita, Tama-ku, Kawasaki-shi, Kanagawa No. 10 No. 1 Matsushita Giken Co., Ltd. F term (reference) 5B075 ND12 NK07 PQ02 PR06 QM08 5C052 AA01 AC08 DD04 DD10 EE02 EE03

Claims

[Claims]

1. A frame in which a face is shown in a video is detected, a face image is extracted from the frame, faces of the same character are grouped from all the extracted face images, and a representative face image is classified for each character. An image search method characterized by extracting a character string.

2. The method according to claim 1, wherein detecting a frame in which a face is included in the video includes detecting a scene break from the video and detecting a frame including the face as a representative image of each scene. Image search method described.

3. A method for detecting a frame in which a face is captured from a video includes detecting the number (number of faces) of the captured face, face size, face orientation, gender, facial expression, (age estimation, face race determination. 3. The method according to claim 1, wherein at least one of the presence or absence of glasses and the presence or absence of a beard is detected as a predetermined condition.

4. An image search method comprising calculating a similarity between a face of a character in a video and a face image designated by a searcher, and detecting a frame in which a face having a predetermined similarity or more is captured. .

5. The image according to claim 4, wherein, for the face of the character in the video, a break in the scene is detected from the video, and a frame including the face is selected as a representative image of each scene. retrieval method.

6. An image search method comprising: detecting a frame in which a person is captured from a video; and extracting a frame in which a posture or clothing of the person that satisfies predetermined conditions is extracted.

7. The method according to claim 6, wherein detecting a frame in which a person is included in the video includes detecting a scene break from the video and detecting a frame including a face as a representative image of each scene. Image search method described.

8. A face image designated by a searcher, wherein at least one face image is designated from a face database registered in advance by a character list.
Image search method described.

9. The image search method according to claim 8, wherein the character list is created in advance by a searcher or generated from a program table.

10. The image search method according to claim 4, wherein a face image of a character in the video is specified and registered in advance as the face image specified by the searcher.

11. A face detection unit that detects a frame in which a face is present in a video and extracts at least one face image from the frame, and detects a face of the same character from all the extracted face images. An image search device comprising: a grouping unit; and a character identification unit that extracts a representative face image for each character.

12. A scene change detecting section for detecting a scene change by monitoring a frame image in time series with a video signal as an input, and a face is taken from a representative image of each scene from the scene change detecting section. A face detection unit that detects a frame and extracts at least one or more face images from the frame; and groups faces of the same character from all the extracted face images, and extracts a representative face image for each character An image search device comprising a character identification unit.

13. A character designating unit for designating at least one or more face images from a face database previously registered by a searcher in a character list, and a video input unit for receiving a video signal as input and outputting it in frame units. A face detection unit that detects a face area from a frame image output from a video input unit; and a character based on a face image output from the character specification unit and a face image detected in the face area from the face detection unit. An image search device comprising: a character identification unit that determines whether or not the above condition is satisfied; and an image output unit that displays and records the identification result determined by the appearance identification unit.

14. A face image reading unit for reading an image to be searched, a video input unit for receiving a video signal as an input of a video signal and outputting the frame unit, and a face area from a frame image output from the video input unit. A face detection unit for detecting,
A matching unit that determines whether or not the search target image and a face image detected in the face area output from the face detection unit are a search target; and an image output unit that displays and stores the matching result. Characteristic image retrieval device.