JP2009088905A

JP2009088905A - Information processor

Info

Publication number: JP2009088905A
Application number: JP2007255030A
Authority: JP
Inventors: Hidetoshi Yokoi; 秀年横井; Kenichi Tanabe; 謙一田部
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-09-28
Filing date: 2007-09-28
Publication date: 2009-04-23
Anticipated expiration: 2027-09-28
Also published as: JP5038836B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information processor capable of focusing a plurality of places when converting contents of video content data into a plurality of still pictures and displaying the still pictures, and to provide a face image displaying method. <P>SOLUTION: The information processor extracts a plurality of face images from video content data and displays a list of the face images on a first display area. The information processor extracts images from the video content data and displays the images on a second display area time sequentially. When one face image 500 is selected from among the plurality of face images displayed on the first display area, three images 600 to 602 which are displayed on the second display area and correspond to three face images 500 to 502 in total, including the selected face image 500, the preceding image and the next image of the selected face image are focused and displayed, and images other than the three images among the images displayed on the second display area are compressed and displayed. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は映像コンテンツデータに登場するサムネイル画像を表示する情報処理装置に関する。 The present invention relates to an information processing apparatus that displays thumbnail images appearing in video content data.

一般に、ビデオレコーダ、パーソナルコンピュータといった情報処理装置は、テレビジョン放送番組データのような各種映像コンテンツデータを記録および再生することが可能である。この場合、情報処理装置に格納された各映像コンテンツデータにはタイトル名が付加されるが、タイトル名だけでは、ユーザが、各映像コンテンツデータがどのような内容のものであるかを把握することは困難である。このため、映像コンテンツデータの内容を把握するためには、その映像コンテンツデータを再生することが必要となる。しかし、総時間長の長い映像コンテンツデータの再生には、たとえ早送り再生機能等を用いた場合であっても、多くの時間が要される。 In general, an information processing apparatus such as a video recorder or a personal computer can record and reproduce various video content data such as television broadcast program data. In this case, a title name is added to each video content data stored in the information processing apparatus, but only the title name allows the user to understand what the content of each video content data is. It is difficult. For this reason, in order to grasp the content of the video content data, it is necessary to reproduce the video content data. However, reproduction of video content data with a long total time length requires a lot of time even when a fast-forward reproduction function or the like is used.

特許文献１には、映像コンテンツデータの内容を複数の静止画にして表示する機能を有する技術が開示されている。
特開２００６−１１３７１４号公報 Patent Document 1 discloses a technique having a function of displaying the contents of video content data as a plurality of still images.
JP 2006-113714 A

しかし、特許文献１の技術であると、映像コンテンツデータの内容を複数のサムネイル画像にして表示する場合、フォーカスする場所は１カ所のみ可能となる。 However, according to the technique of Patent Document 1, when the contents of video content data are displayed as a plurality of thumbnail images, only one place can be focused.

本発明は上述の事情を考慮してなされたものであり、映像コンテンツデータの内容を複数のサムネイル画像にして表示する場合、複数箇所をフォーカスすることができる情報処理装置を提供することを目的とする。 The present invention has been made in consideration of the above-described circumstances, and an object of the present invention is to provide an information processing apparatus capable of focusing on a plurality of locations when displaying the contents of video content data as a plurality of thumbnail images. To do.

上述の課題を解決するため、本発明の一態様によれば、映像コンテンツデータから複数の顔画像を抽出する顔画像抽出手段と、前記複数の顔画像の一覧を第１の表示エリア上に表示する顔画像一覧表示手段と、映像コンテンツデータを等時間間隔で分割して得られる各区間の各々から少なくとも１フレームのサムネイル画像を抽出するサムネイル画像抽出手段と、前記画像抽出手段によって抽出された各サムネイル画像を、複数のサムネイル画像が通常サイズで表示され、その他のサムネイル画像が通常サイズよりも横方向サイズが縮小されて表示されるように、時系列順に並べて第２の表示エリア上に表示するサムネイル画像一覧表示手段と、を備え、前記サムネイル画像一覧表示手段が通常サイズとして表示する前記複数のサムネイル画像は、前記第１の表示エリア上に表示される顔画像の一覧のうち選択された顔画像と関連するサムネイル画像であることを特徴とする情報処理装置が提供される。 In order to solve the above-described problem, according to one aspect of the present invention, a face image extracting unit that extracts a plurality of face images from video content data, and a list of the plurality of face images are displayed on a first display area. A face image list display means, a thumbnail image extraction means for extracting a thumbnail image of at least one frame from each section obtained by dividing video content data at equal time intervals, and each of the image extracted by the image extraction means Thumbnail images are displayed in the second display area in chronological order so that a plurality of thumbnail images are displayed at a normal size and other thumbnail images are displayed with a smaller horizontal size than the normal size. Thumbnail image list display means, and the thumbnail image list display means displays the plurality of thumbnail images as a normal size. An information processing apparatus which is a thumbnail image associated with the selected face image in the list of the face images displayed on the first display on the area is provided.

また、映像コンテンツデータの音声データから歓声レベルを時系列に検出する検出手段と、映像コンテンツデータを等時間間隔で分割して得られる各区間の各々から少なくとも１つのフレームのサムネイル画像を抽出するサムネイル画像抽出手段と、前記画像抽出手段によって抽出された各サムネイル画像を、複数のサムネイル画像が通常サイズで表示され、その他のサムネイル画像が通常サイズよりも横方向サイズが縮小されて表示されるように、時系列に並べて表示エリア上に表示するサムネイル画像一覧表示手段と、を備え、前記サムネイル画像一覧表示手段が通常サイズで表示する前記複数のサムネイル画像は、前記検出手段で検出した歓声レベルに応じて抽出された複数のサムネイル画像であることを特徴とする情報処理装置が提供される。 Also, detection means for detecting the cheer level in time series from the audio data of the video content data, and a thumbnail for extracting a thumbnail image of at least one frame from each section obtained by dividing the video content data at equal time intervals An image extracting unit and each thumbnail image extracted by the image extracting unit are displayed such that a plurality of thumbnail images are displayed in a normal size, and the other thumbnail images are displayed in a smaller horizontal size than the normal size. Thumbnail image list display means for displaying the images in a time series on the display area, wherein the thumbnail image list display means displays the plurality of thumbnail images in a normal size according to a cheer level detected by the detection means. An information processing apparatus characterized by a plurality of thumbnail images extracted by It is.

本発明によれば、映像コンテンツデータの内容を複数のサムネイル画像にして表示する場合、複数箇所をフォーカスすることができる。 According to the present invention, when the contents of video content data are displayed as a plurality of thumbnail images, a plurality of locations can be focused.

以下、図面を参照して、本発明の実施形態を説明する。
まず、図１および図２を参照して、本発明の一実施形態に係る情報処理装置の構成を説明する。本実施形態の情報処理装置は、例えば、ノートブック型の携帯型パーソナルコンピュータ１０から実現されている。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
First, with reference to FIG. 1 and FIG. 2, the structure of the information processing apparatus which concerns on one Embodiment of this invention is demonstrated. The information processing apparatus according to the present embodiment is realized by, for example, a notebook portable personal computer 10.

このパーソナルコンピュータ１０は、放送番組データ、外部機器から入力されるビデオデータといった、映像コンテンツデータ（オーディオビジュアルコンテンツデータ）を記録および再生することができる。即ち、パーソナルコンピュータ１０は、テレビジョン放送信号によって放送される放送番組データの視聴および録画を実行するためのテレビジョン（ＴＶ）機能を有している。このＴＶ機能は、例えば、パーソナルコンピュータ１０に予めインストールされているＴＶアプリケーションプログラムによって実現されている。また、ＴＶ機能は、外部のＡＶ機器から入力されるビデオデータを記録する機能、および記録されたビデオデータおよび記録された放送番組データを再生する機能も有している。 The personal computer 10 can record and reproduce video content data (audio visual content data) such as broadcast program data and video data input from an external device. That is, the personal computer 10 has a television (TV) function for viewing and recording broadcast program data broadcast by a television broadcast signal. This TV function is realized by, for example, a TV application program installed in the personal computer 10 in advance. The TV function also has a function of recording video data input from an external AV device, and a function of reproducing recorded video data and recorded broadcast program data.

さらに、パーソナルコンピュータ１０は、パーソナルコンピュータ１０に格納されたビデオデータ、放送番組データのような映像コンテンツデータに登場する人物の顔画像の一覧等を表示する顔画像一覧表示機能を有している。この顔画像一覧表示機能は、例えば、ＴＶ機能内にその一機能として実装されている。顔画像一覧表示機能は、映像コンテンツデータの概要等をユーザに提示するための映像インデキシング機能の一つである。この顔画像一覧表示機能は、映像コンテンツデータ全体の中のどの時間帯にどの人物が登場するのかをユーザに提示することができる。また、この顔画像一覧表示機能は、映像コンテンツデータの映像インデキシングを行う際に、人物の顔画像から特徴量を算出し、同一人物であるか否かの判別を行い、同一人物の顔画像を他の顔画像と区別（強調）して表示することもできる。 Furthermore, the personal computer 10 has a face image list display function for displaying a list of face images of persons appearing in video content data such as video data and broadcast program data stored in the personal computer 10. This face image list display function is implemented as one function in the TV function, for example. The face image list display function is one of video indexing functions for presenting an outline of video content data to the user. This face image list display function can present to the user which person appears in which time zone in the entire video content data. In addition, this face image list display function calculates a feature amount from a person's face image when performing video indexing of video content data, determines whether or not they are the same person, and displays the face image of the same person. It can also be displayed with distinction (emphasis) from other face images.

更にパーソナルコンピュータ１０は、等時間間隔で分割して得られる各区間の各々から少なくとも１フレームのサムネイル画像を抽出してこれらのサムネイル画像をじゃばら状に表示する機能を有する。じゃばら状の表示とは、少なくとも１枚のサムネイル画像を通常サイズで、それ以外のサムネイル画像を通常サイズ以下の横幅サイズとなるようにする表示方法であり、通常サイズで表示されるサムネイル画像から時間的に離れるほど、横幅サイズが縮小して表示される。本実施例のパーソナルコンピュータ１０では、この通常サイズで表示されるサムネイル画像を複数とすることが可能である。 Furthermore, the personal computer 10 has a function of extracting thumbnail images of at least one frame from each of the sections obtained by dividing at equal time intervals and displaying these thumbnail images in a loose manner. Jagged display is a display method in which at least one thumbnail image is a normal size and the other thumbnail images have a horizontal size smaller than the normal size. As the distance increases, the width size is reduced and displayed. In the personal computer 10 of the present embodiment, a plurality of thumbnail images displayed at the normal size can be provided.

図１はコンピュータ１０のディスプレイユニットを開いた状態における斜視図である。本コンピュータ１０は、コンピュータ本体１１と、ディスプレイユニット１２とから構成されている。ディスプレイユニット１２には、ＴＦＴ−ＬＣＤ（Thin Film Transistor Liquid Crystal Display）１７から構成される表示装置が組み込まれている。 FIG. 1 is a perspective view of the computer 10 with the display unit opened. The computer 10 includes a computer main body 11 and a display unit 12. The display unit 12 incorporates a display device including a TFT-LCD (Thin Film Transistor Liquid Crystal Display) 17.

ディスプレイユニット１２は、コンピュータ本体１１に対し、コンピュータ本体１１の上面が露出される開放位置とコンピュータ本体１１の上面を覆う閉塞位置との間を回動自在に取り付けられている。コンピュータ本体１１は薄い箱形の筐体を有しており、その上面にはキーボード１３、本コンピュータ１０をパワーオン／パワーオフするためのパワーボタン１４、入力操作パネル１５、タッチパッド１６、およびスピーカ１８Ａ，１８Ｂなどが配置されている。 The display unit 12 is attached to the computer main body 11 so as to be rotatable between an open position where the upper surface of the computer main body 11 is exposed and a closed position covering the upper surface of the computer main body 11. The computer main body 11 has a thin box-shaped housing, and has a keyboard 13 on its upper surface, a power button 14 for powering on / off the computer 10, an input operation panel 15, a touch pad 16, and a speaker. 18A, 18B, etc. are arranged.

入力操作パネル１５は、押されたボタンに対応するイベントを入力する入力装置であり、複数の機能をそれぞれ起動するための複数のボタンを備えている。これらボタン群には、ＴＶ機能（視聴、録画、録画された放送番組データ／ビデオデータの再生）を制御するための操作ボタン群も含まれている。また、コンピュータ本体１１の正面には、本コンピュータ１０のＴＶ機能をリモート制御するリモコンユニットとの通信を実行するためのリモコンユニットインタフェース部２０が設けられている。リモコンユニットインタフェース部２０は、赤外線信号受信部などから構成されている。 The input operation panel 15 is an input device that inputs an event corresponding to a pressed button, and includes a plurality of buttons for starting a plurality of functions. These button groups also include operation button groups for controlling TV functions (viewing, recording, and reproduction of recorded broadcast program data / video data). In addition, a remote control unit interface unit 20 for executing communication with a remote control unit for remotely controlling the TV function of the computer 10 is provided on the front surface of the computer main body 11. The remote control unit interface unit 20 includes an infrared signal receiving unit and the like.

コンピュータ本体１１の例えば右側面には、ＴＶ放送用のアンテナ端子１９が設けられている。また、コンピュータ本体１１の例えば背面には、例えばＨＤＭＩ(high-definition multimedia interface)規格に対応した外部ディスプレイ接続端子が設けられている。この外部ディスプレイ接続端子は、放送番組データのような映像コンテンツデータに含まれる映像データ（動画像データ）を外部ディスプレイに出力するために用いられる。 On the right side of the computer main body 11, for example, an antenna terminal 19 for TV broadcasting is provided. Further, on the back surface of the computer main body 11, for example, an external display connection terminal corresponding to the HDMI (high-definition multimedia interface) standard is provided. The external display connection terminal is used to output video data (moving image data) included in video content data such as broadcast program data to an external display.

次に、図２を参照して、本コンピュータ１０のシステム構成について説明する。 Next, the system configuration of the computer 10 will be described with reference to FIG.

本コンピュータ１０は、図２に示されているように、ＣＰＵ１０１、ノースブリッジ１０２、主メモリ１０３、サウスブリッジ１０４、グラフィクスプロセッシングユニット（ＧＰＵ）１０５、ビデオメモリ（ＶＲＡＭ）１０５Ａ、サウンドコントローラ１０６、ＢＩＯＳ−ＲＯＭ１０９、ＬＡＮコントローラ１１０、ハードディスクドライブ（ＨＤＤ）１１１、ＤＶＤドライブ１１２、ビデオプロセッサ１１３、メモリ１１３Ａ、カードコントローラ１１３、無線ＬＡＮコントローラ１１４、IEEE 1394コントローラ１１５、エンベデッドコントローラ／キーボードコントローラＩＣ（ＥＣ／ＫＢＣ）１１６、ＴＶチューナ１１７、およびＥＥＰＲＯＭ１１８等を備えている。 As shown in FIG. 2, the computer 10 includes a CPU 101, a north bridge 102, a main memory 103, a south bridge 104, a graphics processing unit (GPU) 105, a video memory (VRAM) 105A, a sound controller 106, a BIOS- ROM 109, LAN controller 110, hard disk drive (HDD) 111, DVD drive 112, video processor 113, memory 113A, card controller 113, wireless LAN controller 114, IEEE 1394 controller 115, embedded controller / keyboard controller IC (EC / KBC) 116 TV tuner 117, EEPROM 118, and the like.

ＣＰＵ１０１は本コンピュータ１０の動作を制御するプロセッサであり、ハードディスクドライブ（ＨＤＤ）１１１から主メモリ１０３にロードされる、オペレーティングシステム（ＯＳ）２０１、およびＴＶアプリケーションプログラム２０２のような各種アプリケーションプログラムを実行する。ＴＶアプリケーションプログラム２０２はＴＶ機能を実行するためのソフトウェアである。このＴＶアプリケーションプログラム２０２は、ＴＶチューナ１１７によって受信された放送番組データを視聴するためのライブ再生処理、受信された放送番組データをＨＤＤ１１１に記録する録画処理、およびＨＤＤ１１１に記録された放送番組データ／ビデオデータを再生する再生処理等を実行する。また、ＣＰＵ１０１は、ＢＩＯＳ−ＲＯＭ１０９に格納されたＢＩＯＳ（Basic Input Output System）も実行する。ＢＩＯＳはハードウェア制御のためのプログラムである。 The CPU 101 is a processor that controls the operation of the computer 10 and executes various application programs such as an operating system (OS) 201 and a TV application program 202 that are loaded from the hard disk drive (HDD) 111 to the main memory 103. . The TV application program 202 is software for executing a TV function. The TV application program 202 includes a live reproduction process for viewing broadcast program data received by the TV tuner 117, a recording process for recording the received broadcast program data in the HDD 111, and broadcast program data / data recorded in the HDD 111. A reproduction process for reproducing video data is executed. The CPU 101 also executes a BIOS (Basic Input Output System) stored in the BIOS-ROM 109. The BIOS is a program for hardware control.

ノースブリッジ１０２はＣＰＵ１０１のローカルバスとサウスブリッジ１０４との間を接続するブリッジデバイスである。ノースブリッジ１０２には、主メモリ１０３をアクセス制御するメモリコントローラも内蔵されている。また、ノースブリッジ１０２は、PCI EXPRESS規格のシリアルバスなどを介してＧＰＵ１０５との通信を実行する機能も有している。 The north bridge 102 is a bridge device that connects the local bus of the CPU 101 and the south bridge 104. The north bridge 102 also includes a memory controller that controls access to the main memory 103. The north bridge 102 also has a function of executing communication with the GPU 105 via a PCI EXPRESS standard serial bus or the like.

ＧＰＵ１０５は、本コンピュータ１０のディスプレイモニタとして使用されるＬＣＤ１７を制御する表示コントローラである。このＧＰＵ１０５によって生成される表示信号はＬＣＤ１７に送られる。また、ＧＰＵ１０５は、ＨＤＭＩ制御回路３およびＨＤＭＩ端子２を介して、外部ディスプレイ装置１にデジタル映像信号を送出することもできる。 The GPU 105 is a display controller that controls the LCD 17 used as a display monitor of the computer 10. A display signal generated by the GPU 105 is sent to the LCD 17. The GPU 105 can also send a digital video signal to the external display device 1 via the HDMI control circuit 3 and the HDMI terminal 2.

ＨＤＭＩ端子２は上述の外部ディスプレイ接続端子である。ＨＤＭＩ端子２は、非圧縮のデジタル映像信号と、デジタルオーディオ信号とを一本のケーブルでテレビのような外部ディスプレイ装置１に送出することができる。ＨＤＭＩ制御回路３は、ＨＤＭＩモニタと称される外部ディスプレイ装置１にデジタル映像信号をＨＤＭＩ端子２を介して送出するためのインタフェースである。 The HDMI terminal 2 is the above-described external display connection terminal. The HDMI terminal 2 can send an uncompressed digital video signal and a digital audio signal to the external display device 1 such as a television with a single cable. The HDMI control circuit 3 is an interface for sending a digital video signal to the external display device 1 called an HDMI monitor via the HDMI terminal 2.

サウスブリッジ１０４は、ＬＰＣ（Low Pin Count）バス上の各デバイス、およびＰＣＩ（Peripheral Component Interconnect）バス上の各デバイスを制御する。また、サウスブリッジ１０４は、ハードディスクドライブ（ＨＤＤ）１１１およびＤＶＤドライブ１１２を制御するためのＩＤＥ（Integrated Drive Electronics）コントローラを内蔵している。さらに、サウスブリッジ１０４は、サウンドコントローラ１０６との通信を実行する機能も有している。 The south bridge 104 controls each device on an LPC (Low Pin Count) bus and each device on a PCI (Peripheral Component Interconnect) bus. The south bridge 104 includes an IDE (Integrated Drive Electronics) controller for controlling the hard disk drive (HDD) 111 and the DVD drive 112. Further, the south bridge 104 has a function of executing communication with the sound controller 106.

またさらに、サウスブリッジ１０４には、PCI EXPRESS規格のシリアルバスなどを介してビデオプロセッサ１１３が接続されている。 Furthermore, a video processor 113 is connected to the south bridge 104 via a PCI EXPRESS standard serial bus or the like.

ビデオプロセッサ１１３は、前述の映像インデキシングに関する各種処理を実行するプロセッサである。このビデオプロセッサ１１３は、映像インデキシング処理を実行するためのインデキシング処理部として機能する。すなわち、映像インデキシング処理においては、ビデオプロセッサ１１３は、映像コンテンツデータに含まれる動画像データから複数の顔画像および顔画像の特徴量情報を抽出すると共に、抽出された各顔画像が映像コンテンツデータ内に登場する時点を示すタイムスタンプ情報、等を出力する。顔画像の抽出は、例えば、動画像データの各フレームから顔領域を検出する顔検出処理、検出された顔領域をフレームから切り出す切り出し処理等によって実行される。顔領域の検出は、例えば、各フレームの画像の特徴を解析して、予め用意された顔画像特徴サンプルと類似する特徴を有する領域を探索することによって行うことができる。顔画像特徴サンプルは、多数の人物それぞれの顔画像特徴を統計的に処理することによって得られた特徴データである。また、顔画像からの特徴量情報の抽出は、例えば、目、鼻、口等の顔の各パーツの大きさやパーツ同士の配置等を数値化したデータとして抽出する。そして、各特徴量情報同士を比較して類似度を求めることにより、同一人物であるか否かを判別する。 The video processor 113 is a processor that executes various processes related to the above-described video indexing. The video processor 113 functions as an indexing processing unit for executing a video indexing process. That is, in the video indexing process, the video processor 113 extracts a plurality of face images and feature information of face images from moving image data included in the video content data, and each extracted face image is included in the video content data. Outputs time stamp information indicating the point in time at which it appears. The extraction of the face image is executed by, for example, a face detection process for detecting a face area from each frame of moving image data, a cutout process for cutting out the detected face area from the frame, and the like. The face area can be detected by, for example, analyzing an image feature of each frame and searching for an area having a feature similar to a face image feature sample prepared in advance. The face image feature sample is feature data obtained by statistically processing the face image features of a large number of persons. In addition, the feature amount information is extracted from the face image as, for example, data obtained by quantifying the size of each part of the face such as eyes, nose, mouth, and the arrangement of the parts. And it is discriminate | determined whether it is the same person by comparing each feature-value information and calculating | requiring similarity.

さらに、ビデオプロセッサ１１３は、例えば、映像コンテンツデータ内に含まれるコマーシャル（ＣＭ）区間を検出する処理、および音声インデキシング処理も実行する。通常、各ＣＭ区間の時間長は、予め決められた幾つかの時間長の内の１つに設定されている。 Furthermore, the video processor 113 also executes, for example, a process for detecting a commercial (CM) section included in the video content data and an audio indexing process. Usually, the time length of each CM section is set to one of several predetermined time lengths.

次に、音声インデキシング処理は、映像コンテンツデータ内に含まれるオーディオデータを分析して、映像コンテンツデータ内に含まれる、音楽が流れている音楽区間、および人物によるトークが行われているトーク区間を検出するインデキシング処理である。音声インデキシング処理は、映像コンテンツデータ内の各部分データ（一定時間長のデータ）毎に歓声レベルを検出する歓声レベル検出処理、および映像コンテンツデータ内の各部分データ毎に盛り上がりレベルを検出する盛り上がりレベル検出処理を実行する。 Next, in the audio indexing process, the audio data included in the video content data is analyzed, and the music interval in which the music is flowing and the talk interval in which the person is talking are included in the video content data. This is an indexing process to be detected. The audio indexing process is a cheering level detection process for detecting a cheer level for each partial data (data of a certain length of time) in video content data, and a swell level for detecting a swell level for each partial data in video content data. Perform detection processing.

歓声レベルは、歓声の大きさを示す。歓声は、大勢の人の声が合わさった音である。大勢の人の声が合わさった音は、ある特定の周波数スペクトルの分布を有する。歓声レベル検出処理においては、映像コンテンツデータに含まれるオーディオデータの周波数スペクトルが分析され、そしてその周波数スペクトルの分析結果に従って、各部分データの歓声レベルが検出される。盛り上がりレベルは、ある一定以上の音量レベルがある一定時間長以上連続的に発生する区間の音量レベルである。例えば、比較的盛大な拍手、大きな笑い声のような音の音量レベルが、盛り上がりレベルである。盛り上がりレベル検出処理においては、映像コンテンツデータに含まれるオーディオデータの音量の分布が分析され、その分析結果に従って、各部分データの盛り上がりレベルが検出される。 The cheer level indicates the size of the cheer. Cheers are the sounds of many people. The sound of many human voices has a certain frequency spectrum distribution. In the cheer level detection process, the frequency spectrum of the audio data included in the video content data is analyzed, and the cheer level of each partial data is detected according to the analysis result of the frequency spectrum. The excitement level is a volume level of a section in which a volume level above a certain level is continuously generated for a certain length of time. For example, the volume level of a sound such as a relatively large applause and a loud laughter is the excitement level. In the swell level detection process, the volume distribution of the audio data included in the video content data is analyzed, and the swell level of each partial data is detected according to the analysis result.

メモリ１１３Ａは、ビデオプロセッサ１１３の作業メモリとして用いられる。インデキシング処理（ＣＭ検出処理、映像インデキシング処理、および音声インデキシング処理）を実行するためには多くの演算量が必要とされる。本実施形態においては、ＣＰＵ１０１とは異なる専用のプロセッサであるビデオプロセッサ１１３がバックエンドプロセッサとして使用され、このビデオプロセッサ１１３によってインデキシング処理が実行される。よって、ＣＰＵ１０１の負荷の増加を招くことなく、インデキシング処理を実行することが出来る。 The memory 113A is used as a working memory for the video processor 113. A large amount of calculation is required to execute the indexing process (CM detection process, video indexing process, and audio indexing process). In the present embodiment, a video processor 113 that is a dedicated processor different from the CPU 101 is used as a back-end processor, and indexing processing is executed by the video processor 113. Therefore, the indexing process can be executed without increasing the load on the CPU 101.

サウンドコントローラ１０６は音源デバイスであり、再生対象のオーディオデータをスピーカ１８Ａ，１８ＢまたはＨＤＭＩ制御回路３に出力する。 The sound controller 106 is a sound source device, and outputs audio data to be reproduced to the speakers 18A and 18B or the HDMI control circuit 3.

無線ＬＡＮコントローラ１１４は、たとえばIEEE 802.11規格の無線通信を実行する無線通信デバイスである。IEEE 1394コントローラ１１５は、IEEE 1394規格のシリアルバスを介して外部機器との通信を実行する。 The wireless LAN controller 114 is a wireless communication device that performs wireless communication of, for example, IEEE 802.11 standard. The IEEE 1394 controller 115 executes communication with an external device via an IEEE 1394 standard serial bus.

エンベデッドコントローラ／キーボードコントローラＩＣ（ＥＣ／ＫＢＣ）１１６は、電力管理のためのエンベデッドコントローラと、キーボード（ＫＢ）１３およびタッチパッド１６を制御するためのキーボードコントローラとが集積された１チップマイクロコンピュータである。このエンベデッドコントローラ／キーボードコントローラＩＣ（ＥＣ／ＫＢＣ）１１６は、ユーザによるパワーボタン１４の操作に応じて本コンピュータ１０をパワーオン／パワーオフする機能を有している。さらに、エンベデッドコントローラ／キーボードコントローラＩＣ（ＥＣ／ＫＢＣ）１１６は、リモコンユニットインタフェース２０との通信を実行する機能を有している。 The embedded controller / keyboard controller IC (EC / KBC) 116 is a one-chip microcomputer in which an embedded controller for power management and a keyboard controller for controlling the keyboard (KB) 13 and the touch pad 16 are integrated. . The embedded controller / keyboard controller IC (EC / KBC) 116 has a function of powering on / off the computer 10 in accordance with the operation of the power button 14 by the user. Further, the embedded controller / keyboard controller IC (EC / KBC) 116 has a function of executing communication with the remote control unit interface 20.

ＴＶチューナ１１７はテレビジョン（ＴＶ）放送信号によって放送される放送番組データを受信する受信装置であり、アンテナ端子１９に接続されている。このＴＶチューナ１１７は、例えば、地上波デジタルＴＶ放送のようなデジタル放送番組データを受信可能なデジタルＴＶチューナとして実現されている。また、ＴＶチューナ１１７は、外部機器から入力されるビデオデータをキャプチャする機能も有している。 The TV tuner 117 is a receiving device that receives broadcast program data broadcast by a television (TV) broadcast signal, and is connected to the antenna terminal 19. The TV tuner 117 is realized as a digital TV tuner capable of receiving digital broadcast program data such as terrestrial digital TV broadcast. The TV tuner 117 also has a function of capturing video data input from an external device.

次に、図３を参照して、ＴＶアプリケーションプログラム２０２によって実行される顔画像一覧表示機能について説明する。 Next, the face image list display function executed by the TV application program 202 will be described with reference to FIG.

放送番組データのような映像コンテンツデータに対するインデキシング処理（映像インデキシング処理、および音声インデキシング処理等）は、上述したように、インデキシング処理部として機能するビデオプロセッサ１１３によって実行される。 As described above, the indexing process (video indexing process, audio indexing process, etc.) for video content data such as broadcast program data is executed by the video processor 113 functioning as an indexing processing unit.

ビデオプロセッサ１１３は、顔画像一覧表示処理部３０１を備えたＴＶアプリケーションプログラム２０２の制御の下、例えば、ユーザによって指定された録画済みの放送番組データ等の映像コンテンツデータに対してインデキシング処理を実行する。また、ビデオプロセッサ１１３は、ＴＶチューナ１１７によって受信された放送番組データをＨＤＤ１１１に格納する録画処理と並行して、当該放送番組データに対するインデキシング処理を実行することもできる。 Under the control of the TV application program 202 including the face image list display processing unit 301, the video processor 113 executes an indexing process on video content data such as recorded broadcast program data designated by the user, for example. . The video processor 113 can also execute an indexing process on the broadcast program data in parallel with a recording process for storing the broadcast program data received by the TV tuner 117 in the HDD 111.

映像インデキシング処理（顔画像インデキシング処理ともいう）においては、ビデオプロセッサ１１３は、映像コンテンツデータに含まれる動画像データをフレーム単位で解析する。そして、ビデオプロセッサ１１３は、動画像データを構成する複数のフレームそれぞれから人物の顔画像を抽出すると共に、抽出された各顔画像が映像コンテンツデータ内に登場する時点を示すタイムスタンプ情報を出力する。各顔画像に対応するタイムスタンプ情報としては、映像コンテンツデータの開始から当該顔画像が登場するまでの経過時間、または当該顔画像が抽出されたフレーム番号、等を使用することが出来る。 In the video indexing process (also referred to as face image indexing process), the video processor 113 analyzes moving image data included in the video content data in units of frames. Then, the video processor 113 extracts a human face image from each of a plurality of frames constituting the moving image data, and outputs time stamp information indicating a time point at which each extracted face image appears in the video content data. . As the time stamp information corresponding to each face image, the elapsed time from the start of the video content data to the appearance of the face image, the frame number from which the face image is extracted, or the like can be used.

さらに、ビデオプロセッサ１１３は、抽出された各顔画像の特徴量（顔画像の主要パーツである、目、鼻、口等の大きさ、各パーツ同士の配置の間隔情報等）、さらに各顔画像のサイズ（解像度）も出力する。ビデオプロセッサ１１３から出力される顔検出結果データ（顔画像、タイムスタンプ情報ＴＳ、およびサイズ）は、データベース１１１Ａに顔画像インデキシング情報として格納される。このデータベース１１１Ａは、ＨＤＤ１１１内に用意されたインデキシングデータ記憶用の記憶領域である。 Furthermore, the video processor 113 extracts the feature amount of each extracted face image (the size of the main parts of the face image, such as the size of eyes, nose, mouth, etc., the interval information of the arrangement of each part), and each face image. The size (resolution) is also output. The face detection result data (face image, time stamp information TS, and size) output from the video processor 113 is stored as face image indexing information in the database 111A. The database 111A is a storage area for storing indexing data prepared in the HDD 111.

さらに、映像インデキシング処理においては、ビデオプロセッサ１１３は、顔画像を抽出する処理と並行して、サムネイル画像取得処理も実行する。サムネイル画像は、映像コンテンツデータから例えば等時間間隔で抽出された複数のフレームの各々に対応する画像（縮小画像）である。すなわち、ビデオプロセッサ１１３は、顔画像を含むフレームであるか否かに関係なく、映像コンテンツデータから例えば所定の等時間間隔でフレームを順次抽出し、抽出した各フレームに対応する画像（サムネイル画像）と、そのサムネイル画像が出現する時点を示すタイムスタンプ情報ＴＳとを出力する。ビデオプロセッサ１１３から出力されるサムネイル画像取得結果データ（サムネイル、タイムスタンプ情報ＴＳ）も、データベース１１１Ａにサムネイルインデキシング情報として格納される。 Further, in the video indexing process, the video processor 113 also executes a thumbnail image acquisition process in parallel with the process of extracting the face image. A thumbnail image is an image (reduced image) corresponding to each of a plurality of frames extracted from video content data at regular time intervals, for example. That is, the video processor 113 sequentially extracts frames, for example, at predetermined equal time intervals from the video content data regardless of whether or not the frame includes a face image, and images (thumbnail images) corresponding to the extracted frames. And time stamp information TS indicating the time when the thumbnail image appears. Thumbnail image acquisition result data (thumbnail and time stamp information TS) output from the video processor 113 is also stored as thumbnail indexing information in the database 111A.

上述した特徴量は、図４に示されているように、例えば、抽出された各顔画像のファイルごとに、目の横幅、目の高さ、鼻の幅、鼻の長さ、口の横幅、口の高さ、目と目の間隔、目と鼻の距離、鼻と口の距離等のデータを検出する。例えば、抽出された「顔画像００００１」においては、目の横幅＝2.54ｃｍ、目の高さ＝1.22ｃｍ、鼻の幅＝1.54ｃｍ、鼻の長さ＝3.02ｃｍ、口の横幅＝5.24ｃｍ、口の高さ＝2.86ｃｍ、目と目の間隔＝4.59ｃｍ、目と鼻の距離＝3.87ｃｍ、鼻と口の距離＝2.35ｃｍとして検出して、ＨＤＤ１１１等の記憶装置に記憶する。以後、同様に、抽出された各顔画像のファイルについて特徴量を検出して、ＨＤＤ１１１等の記憶装置に記憶する。 As shown in FIG. 4, the above-described feature amount includes, for example, the width of the eyes, the height of the eyes, the width of the nose, the length of the nose, and the width of the mouth for each extracted face image file. Detect data such as mouth height, eye-to-eye distance, eye-to-nose distance, and nose-to-mouth distance. For example, in the extracted “face image 00001”, the width of the eye = 2.54 cm, the height of the eye = 1.22 cm, the width of the nose = 1.54 cm, the length of the nose = 3.02 cm, the width of the mouth = 5.24 cm, Mouth height = 2.86 cm, eye-to-eye distance = 4.59 cm, eye-nose distance = 3.87 cm, nose-to-mouth distance = 2.35 cm are detected and stored in a storage device such as HDD 111. Thereafter, similarly, feature amounts are detected for each extracted face image file and stored in a storage device such as the HDD 111.

尚、顔の特徴量としては、顔の各パーツ間の距離以外のデータを使用してもよい。例えば、顔の各パーツ間の距離間の比率等の情報を用いてもよい。このような距離感の比率を用いる場合には、同一人物の顔であるか否かは、該比率が一定誤差範囲内であるか否かに基づき判断することができる。 Note that data other than the distance between each part of the face may be used as the facial feature amount. For example, information such as the ratio between the distances between the parts of the face may be used. In the case of using such a distance ratio, it can be determined whether or not the faces of the same person are within a certain error range.

また、音声インデキシング処理においては、ビデオプロセッサ１１３は、映像コンテンツに含まれるオーディオデータを分析して、映像コンテンツデータ内に含まれる複数種の属性区間（ＣＭ区間、音楽区間、およびトーク区間）を検出し、検出された各属性区間の開始時点および終了時点を示す区間属性情報を出力する。この区間属性情報は、属性検出結果情報としてデータベース１１１Ａに格納される。さらに、音声インデキシング処理においては、ビデオプロセッサ１１３は、上述の歓声レベル検出処理および盛り上がりレベル検出処理も実行する。これら歓声レベル検出処理の結果および盛り上がりレベル検出処理の結果も、上述の属性検出結果情報の一部としてデータベース１１１Ａに格納される。 In the audio indexing process, the video processor 113 analyzes audio data included in the video content and detects a plurality of types of attribute sections (CM section, music section, and talk section) included in the video content data. Then, section attribute information indicating the start time and end time of each detected attribute section is output. This section attribute information is stored in the database 111A as attribute detection result information. Further, in the audio indexing process, the video processor 113 also executes the cheering level detection process and the excitement level detection process described above. The result of the cheer level detection process and the result of the excitement level detection process are also stored in the database 111A as part of the attribute detection result information.

属性検出結果情報（区間属性情報）は、図５に示されているように、例えば、ＣＭ区間テーブル、音楽区間テーブル、トーク区間テーブル、および歓声／盛り上がりテーブルから構成されている。 As shown in FIG. 5, the attribute detection result information (section attribute information) includes, for example, a CM section table, a music section table, a talk section table, and a cheer / climax table.

ＣＭ区間テーブルは、検出されたＣＭ区間の開始時点および終了時点を示すＣＭ区間属性情報を格納する。映像コンテンツデータの開始位置から終端位置までのシーケンス内に複数のＣＭ区間が存在する場合には、それら複数のＣＭ区間それぞれに対応するＣＭ区間属性情報がＣＭ区間テーブルに格納される。ＣＭ区間テーブルにおいては、検出された各ＣＭ区間毎に当該ＣＭ区間の開始時点および終了時点をそれぞれ示すスタートタイム情報およびエンドタイム情報が格納される。 The CM section table stores CM section attribute information indicating the start time and end time of the detected CM section. When there are a plurality of CM sections in the sequence from the start position to the end position of the video content data, CM section attribute information corresponding to each of the plurality of CM sections is stored in the CM section table. In the CM section table, start time information and end time information indicating the start time and end time of the CM section are stored for each detected CM section.

音楽区間テーブルは、検出された音楽区間の開始時点および終了時点を示す音楽区間属性情報を格納する。映像コンテンツデータの開始位置から終端位置までのシーケンス内に複数の音楽区間が存在する場合には、それら複数の音楽区間それぞれに対応する音楽区間属性情報が音楽区間テーブルに格納される。音楽区間テーブルにおいては、検出された各音楽区間毎に当該音楽区間の開始時点および終了時点をそれぞれ示すスタートタイム情報およびエンドタイム情報が格納される。 The music section table stores music section attribute information indicating the start time and end time of the detected music section. When there are a plurality of music sections in the sequence from the start position to the end position of the video content data, music section attribute information corresponding to each of the plurality of music sections is stored in the music section table. In the music section table, start time information and end time information indicating the start time and end time of the music section are stored for each detected music section.

トーク区間テーブルは、検出されたトーク区間の開始時点および終了時点を示す音楽区間属性情報を格納する。映像コンテンツデータの開始位置から終端位置までのシーケンス内に複数のトーク区間が存在する場合には、それら複数のトーク区間それぞれに対応するトーク区間属性情報がトーク区間テーブルに格納される。トーク区間テーブルにおいては、検出された各トーク区間毎に当該トーク区間の開始時点および終了時点をそれぞれ示すスタートタイム情報およびエンドタイム情報が格納される。 The talk section table stores music section attribute information indicating the start time and end time of the detected talk section. When there are a plurality of talk sections in the sequence from the start position to the end position of the video content data, talk section attribute information corresponding to each of the plurality of talk sections is stored in the talk section table. In the talk section table, start time information and end time information indicating the start time and end time of the talk section are stored for each detected talk section.

歓声／盛り上がりテーブルは、映像コンテンツデータ内の一定時間長の部分データ（タイムセグメントＴ１，Ｔ２，Ｔ３，…）それぞれにおける歓声レベルおよび盛り上がりレベルを格納する。 The cheer / climax table stores cheer levels and excitement levels in each partial data (time segments T1, T2, T3,...) Of a certain time length in the video content data.

図３に示されているように、ＴＶアプリケーションプログラム２０２は、顔画像一覧表示機能を実行するための顔画像一覧表示処理部３０１を含んでいる。この顔画像一覧表示処理部３０１は、例えば、インデキシングビューワプログラムとして実現されており、データベース１１１Ａに格納されたインデキシング情報（顔画像インデキシング情報、サムネイルインデキシング情報、区間属性情報、等）を用いて、映像コンテンツデータの概要を俯瞰するためのインデキシングビュー画面を表示する。 As shown in FIG. 3, the TV application program 202 includes a face image list display processing unit 301 for executing a face image list display function. The face image list display processing unit 301 is realized as an indexing viewer program, for example, and uses indexing information (face image indexing information, thumbnail indexing information, section attribute information, and the like) stored in the database 111A. Displays an indexing view screen for an overview of the content data.

具体的には、顔画像一覧表示処理部３０１は、データベース１１１Ａから顔画像インデキシング情報（顔画像、タイムスタンプ情報ＴＳ、およびサイズ）を読み出し、そしてその顔画像インデキシング情報を用いて、映像コンテンツデータに登場する人物の顔画像の一覧を、インデキシングビュー画面上の２次元の表示エリア（以下、顔サムネイル表示エリアと称する）上に表示する。この場合、顔画像一覧表示処理部３０１は、映像コンテンツデータの総時間長を、例えば等間隔で、複数の時間帯に分割し、時間帯毎に、抽出された顔画像の内から当該時間帯に登場する顔画像を所定個選択する。そして、顔画像一覧表示処理部３０１は、時間帯毎に、選択した所定個の顔画像それぞれを並べて表示する。 Specifically, the face image list display processing unit 301 reads face image indexing information (face image, time stamp information TS, and size) from the database 111A, and uses the face image indexing information to generate video content data. A list of face images of people appearing is displayed on a two-dimensional display area (hereinafter referred to as a face thumbnail display area) on the indexing view screen. In this case, the face image list display processing unit 301 divides the total time length of the video content data into a plurality of time zones, for example, at regular intervals, and for each time zone, extracts the face image from the extracted face images. A predetermined number of face images appearing on the screen are selected. Then, the face image list display processing unit 301 displays the selected predetermined number of face images side by side for each time period.

すなわち、２次元の顔サムネイル表示エリアは、複数の行および複数の列を含むマトリクス状に配置された複数の顔画像表示エリアを含む。複数の列それぞれには、映像コンテンツデータの総時間長を構成する複数の時間帯が割り当てられている。具体的には、例えば、複数の列それぞれには、映像コンテンツデータの総時間長をこれら複数の列の数で等間隔に分割することによって得られる、互いに同一の時間長を有する複数の時間帯がそれぞれ割り当てられる。もちろん、各列に割り当てられる時間帯は必ずしも同一の時間長でなくてもよい。 That is, the two-dimensional face thumbnail display area includes a plurality of face image display areas arranged in a matrix including a plurality of rows and a plurality of columns. A plurality of time zones constituting the total time length of the video content data are assigned to each of the plurality of columns. Specifically, for example, in each of the plurality of columns, a plurality of time zones having the same time length obtained by dividing the total time length of the video content data at equal intervals by the number of the plurality of columns. Are assigned to each. Of course, the time zone allocated to each column does not necessarily have the same time length.

顔画像一覧表示処理部３０１は、顔画像それぞれに対応するタイムスタンプ情報ＴＳに基づき、各列内に属する行数分の顔画像表示エリア上に、当該各列に割り当てられた時間帯に属する顔画像それぞれを、例えば、それら顔画像の出現頻度順（顔画像の検出時間長順）のような順序で並べて表示する。この場合、例えば、当該各列に割り当てられた時間帯に属する顔画像の内から、出現頻度（登場頻度）の高い順に顔画像が行数分だけ選択され、選択された顔画像が登場頻度順に上から下に向かって並んで配置される。もちろん、出現頻度順ではなく、各列に割り当てられた時間帯に出現する顔画像それぞれを、その出現順に並べて表示してもよい。 The face image list display processing unit 301, based on the time stamp information TS corresponding to each face image, on the face image display area for the number of rows belonging to each column, faces belonging to the time zone assigned to each column. For example, the images are arranged and displayed in the order of appearance frequency order (face image detection time length order). In this case, for example, face images corresponding to the number of lines are selected in descending order of appearance frequency (appearance frequency) from the face images belonging to the time zone assigned to each column, and the selected face images are arranged in the appearance frequency order. They are arranged side by side from top to bottom. Of course, instead of the appearance frequency order, the face images that appear in the time zone assigned to each column may be displayed in the order of appearance.

この顔画像一覧表示機能により、映像コンテンツデータ全体の中のどの時間帯にどの人物が登場するのかをユーザに分かりやすく提示することができる。 With this face image list display function, it is possible to easily show to the user which person appears in which time zone in the entire video content data.

次に、図６を参照して、ＴＶアプリケーションプログラム２０２の機能構成を説明する。 Next, the functional configuration of the TV application program 202 will be described with reference to FIG.

ＴＶアプリケーションプログラム２０２は、上述の顔画像一覧表示処理部３０１に加え、記録処理部４０１、インデキシング制御部４０２、再生処理部４０３等を備えている。 The TV application program 202 includes a recording processing unit 401, an indexing control unit 402, a reproduction processing unit 403, and the like in addition to the face image list display processing unit 301 described above.

顔画像一覧表示処理部３０１は、上述したように、映像コンテンツデータから複数の顔画像および、各顔画像の特徴量を抽出し、同一人物であるか否かの判別を行う。また、ユーザによって所定の顔画像が選択されると、選択された顔画像と同一人物であると判別された顔画像を他の顔画像と区別して強調表示する。また、選択された顔画像及び選択された顔画像の前後１つの計３つの顔画像に対応するじゃばら形式（後述）にサムネイル表示の３画像をフォーカス表示（後述）する。 As described above, the face image list display processing unit 301 extracts a plurality of face images and feature amounts of each face image from the video content data, and determines whether or not they are the same person. Further, when a predetermined face image is selected by the user, the face image determined to be the same person as the selected face image is highlighted and distinguished from other face images. In addition, the three thumbnail images are focused and displayed (described later) in a loose format (described later) corresponding to the selected face image and a total of three face images before and after the selected face image.

記録処理部４０１は、ＴＶチューナ１１７によって受信された放送番組データ、または外部機器から入力されるビデオデータをＨＤＤ１１１に記録する記録処理を実行する。また、記録処理部４０１は、ユーザによって予め設定された録画予約情報（チャンネル番号、日時）によって指定される放送番組データをＴＶチューナ１１７を用いて受信し、その放送番組データをＨＤＤ１１１に記録する予約録画処理も実行する。 The recording processing unit 401 executes a recording process of recording broadcast program data received by the TV tuner 117 or video data input from an external device in the HDD 111. Further, the recording processing unit 401 receives broadcast program data designated by recording reservation information (channel number, date and time) preset by the user using the TV tuner 117, and reserves to record the broadcast program data in the HDD 111. Recording processing is also executed.

インデキシング制御部４０２は、ビデオプロセッサ（インデキシング処理部）１１３を制御して、インデキシング処理（映像インデキシング処理、音声インデキシング処理）をビデオプロセッサ１１３に実行させる。ユーザは、録画対象の放送番組データ毎にインデキシング処理を実行するか否かを指定することができる。例えば、インデキシング処理の実行が指示された録画対象の放送番組データについては、その放送番組データがＨＤＤ１１１に記録された後に、インデキシング処理が自動的に開始される。また、ユーザは、既にＨＤＤ１１１に格納されている映像コンテンツデータの内から、インデキシング処理を実行すべき映像コンテンツデータを指定することもできる。 The indexing control unit 402 controls the video processor (indexing processing unit) 113 to cause the video processor 113 to execute indexing processing (video indexing processing and audio indexing processing). The user can specify whether or not to perform the indexing process for each broadcast program data to be recorded. For example, for broadcast program data to be recorded for which execution of the indexing process is instructed, the indexing process is automatically started after the broadcast program data is recorded in the HDD 111. The user can also specify video content data to be indexed from video content data already stored in the HDD 111.

再生処理部４０３は、ＨＤＤ１１１に格納されている各映像コンテンツデータを再生する処理を実行する。また、再生処理部４０３は、ある映像コンテンツデータの顔画像一覧の中の一つの顔画像が選択されている状態でユーザ操作によって再生指示イベントが入力された時、選択されている顔画像が登場する時点よりも所定時間前の時点から映像コンテンツデータの再生を開始する機能を有している。 The playback processing unit 403 executes processing for playing back each video content data stored in the HDD 111. In addition, the playback processing unit 403 displays the selected face image when a playback instruction event is input by a user operation while one face image in the face image list of a certain video content data is selected. It has a function of starting playback of video content data from a time point before a predetermined time.

ユーザは、１つの顔画像を選択的に指定することにより、特徴量により同一人物と判別された顔画像が区別して強調表示される。これらの強調表示された顔画像を含む表示の一覧およびじゃばら形式（後述）の表示を見ながら、映像コンテンツデータの再生位置を決定することができる。 By selectively designating one face image, the user distinguishes and highlights the face images determined to be the same person by the feature amount. The playback position of the video content data can be determined while viewing the display list including the highlighted face images and the display in the loose format (described later).

なお、インデキシング処理は、必ずしもビデオプロセッサ１１３によって実行する必要はなく、例えば、ＴＶアプリケーションプログラム２０２にインデキシング処理を実行する機能を設けてもよい。この場合、インデキシング処理は、ＴＶアプリケーションプログラム２０２の制御の下に、ＣＰＵ１０１によって実行される。 Note that the indexing process is not necessarily executed by the video processor 113. For example, the TV application program 202 may be provided with a function for executing the indexing process. In this case, the indexing process is executed by the CPU 101 under the control of the TV application program 202.

次に、図７を参照して、インデキシングビュー画面の具体的な構成について説明する。 Next, a specific configuration of the indexing view screen will be described with reference to FIG.

図７には、顔画像一覧表示処理部３０１によってＬＣＤ１７に表示されるインデキシングビュー画面の例が示されている。このインデキシングビュー画面は、ある映像コンテンツデータ（例えば放送番組データ）をインデキシング処理することによって得られた画面である。このインデキシングビュー画面には、顔画像の一覧を表示するための上述の顔サムネイル表示エリアと、上述のレベル表示エリアと、上述の区間バーと、サムネイル画像の一覧をじゃばら形式で表示するための上述のじゃばらサムネイル表示エリアとを含んでいる。 FIG. 7 shows an example of an indexing view screen displayed on the LCD 17 by the face image list display processing unit 301. This indexing view screen is a screen obtained by indexing certain video content data (for example, broadcast program data). On the indexing view screen, the above-mentioned face thumbnail display area for displaying a list of face images, the above-mentioned level display area, the above-mentioned section bar, and the above-mentioned for displaying a list of thumbnail images in a loose format. And a thumbnail display area.

ここで、じゃばら形式とは、選択されているサムネイル画像を通常サイズ（フルサイズ）で表示し、他の各サムネイル画像についてはその横方向サイズを縮小して表示する表示形式である。図７においては、選択されたサムネイル画像５００との距離が大きいサムネイル画像ほど、その横方向サイズが縮小されている。 Here, the loose format is a display format in which a selected thumbnail image is displayed in a normal size (full size), and the other thumbnail images are displayed with a reduced size in the horizontal direction. In FIG. 7, the larger the distance from the selected thumbnail image 500 is, the smaller the horizontal size is.

また、選択されたサムネイル画像５００と同一人物であると判別された顔画像は、その他の顔画像と区別するように強調表示される。例えば、該当の顔画像を太枠で囲むようにする。 The face image determined to be the same person as the selected thumbnail image 500 is highlighted so as to be distinguished from other face images. For example, the face image is surrounded by a thick frame.

レベル表示エリアにおいては、歓声レベルの変化を示すグラフが表示される。また、歓声レベルの上位３位までは、例えばレベル表示が濃い色で塗りつぶされる（図７：レベル表示７００、７０１、７０２参照）。 In the level display area, a graph showing a change in the cheer level is displayed. In addition, for example, the level display is filled with a deep color up to the top three of the cheering levels (see FIG. 7: level display 700, 701, 702).

区間バーは、トーク区間バーと、音楽区間バーと、ＣＭ区間バーとを含んでいる。ＣＭ区間バーにおいては、各ＣＭ区間（各部分ＣＭ区間）の位置にバー領域（図７の黒色の帯状領域）が表示される。音楽区間バーにおいては、各音楽区間（各部分音楽区間）の位置にバー領域（図７のクロスハッチングされた帯状領域）が表示される。トーク区間バーにおいては、各トーク区間（各部分トーク区間）の位置にバー領域（図７のハッチングされた帯状領域）が表示される。ユーザは、リモコンユニットのボタン、上下左右等のカーソルキーの操作により、トーク区間バー、音楽区間バーと、ＣＭ区間バーのいずれか一つを選択することができる。また、ユーザは、リモコンユニットのボタン、上下左右等のカーソルキーの操作により、選択された区間バー内の一つのバー領域を選択することもできる。 The section bar includes a talk section bar, a music section bar, and a CM section bar. In the CM section bar, a bar area (black belt-like area in FIG. 7) is displayed at the position of each CM section (each partial CM section). In the music section bar, a bar area (cross-hatched strip area in FIG. 7) is displayed at the position of each music section (each partial music section). In the talk section bar, a bar area (hatched band area in FIG. 7) is displayed at the position of each talk section (each partial talk section). The user can select any one of the talk section bar, the music section bar, and the CM section bar by operating the buttons on the remote control unit and the cursor keys such as up / down / left / right. The user can also select one bar area in the selected section bar by operating the buttons of the remote control unit and the cursor keys such as up / down / left / right.

次に、図８のフローチャートを参照して、顔画像一覧を表示する処理の手順を説明する。 Next, a procedure of processing for displaying a face image list will be described with reference to the flowchart of FIG.

コンピュータ１０のビデオプロセッサ１１３は、映像コンテンツデータから顔画像を抽出する。また、抽出された各顔画像から特徴量を抽出する（ステップＳ１０１）。抽出された複数の顔画像および特徴量は、ＨＤＤ１１１等に記憶する。 The video processor 113 of the computer 10 extracts a face image from the video content data. Further, a feature amount is extracted from each extracted face image (step S101). The extracted plurality of face images and feature quantities are stored in the HDD 111 or the like.

ビデオプロセッサ１１３は、抽出された複数の顔画像を図７に示すように、顔サムネイルとして表示する。ユーザによりある顔画像５００が選択されたと、ビデオプロセッサ１１３によって判別された場合は（ステップＳ１０２のＹＥＳ）、選択された顔画像５００（図７参照）の特徴量をＨＤＤ１１１から読み出し、選択された顔画像５００の特徴量と類似する特徴量をＨＤＤ１１１中から検索する（ステップＳ１０３）。例えば、図４に示したように、特徴量の各パラメータの誤差が０．０５以内である場合は同一人物と判別する。選択された顔画像５００と同一人物と判別された顔画像は、図７に示すように、その他の顔画像と区別するように強調表示される（同一人物の顔画像５０１〜５０６）。例えば、該当の顔画像を太枠で囲むようにする。さらに、強調表示された顔画像５０１〜５０６のうち、選択された顔画像５００および選択された顔画像５００の前後１つの顔画像５０１、５０２に対応するじゃばらサムネイル画像６００〜６０２（同一又は最も近接するタイムスタンプ情報を持つサムネイル画像であるものとする）をフォーカス表示する（図７参照）。また、これら３つのじゃばらサムネイル画像６００〜６０２以外のサムネイル画像はじゃばらサムネイル表示（圧縮して表示）している。さらに、強調表示された顔画像５０１〜５０６のうち、選択された顔画像５００および選択された顔画像５００の前後１つの顔画像５０１、５０２に対応しないサムネイル画像５０３〜５０６は、映像コンテンツデータの位置を示すマーキング表示（矢印Ａ、Ｂ：図７参照）をする。これにより、ユーザに、選択された顔画像５００および選択された顔画像５００の前後１つの顔画像５０１、５０２に対応しないサムネイル画像５０３〜５０６の存在を示すことができる（ステップＳ１０４）。また、同時に、映像コンテンツデータから抽出した歓声レベルによって、歓声レベルの上位３位までは、例えばレベル表示が濃い色で塗りつぶされる（図７：レベル表示７００、７０１、７０２参照）。なお、上述した内容では、選択された顔画像５００および選択された顔画像５００の前後１つの顔画像５０１、５０２に対応するじゃばらサムネイル画像６００〜６０２をフォーカス表示しているが、歓声レベルの上位３位の箇所に対応する３カ所のサムネイル画像（同一又は最も近接するタイムスタンプ情報を持つサムネイル画像であるものとする）をフォーカス表示するようにしてもよい。さらには、登場頻度の高い登場人物の上位３位の箇所に対応する３カ所のサムネイル画像をフォーカス表示するようにしてもよい。 The video processor 113 displays the extracted plurality of face images as face thumbnails as shown in FIG. If the video processor 113 determines that a certain face image 500 has been selected by the user (YES in step S102), the feature amount of the selected face image 500 (see FIG. 7) is read from the HDD 111, and the selected face is selected. A feature amount similar to the feature amount of the image 500 is searched from the HDD 111 (step S103). For example, as shown in FIG. 4, when the error of each parameter of the feature amount is within 0.05, it is determined that they are the same person. The face image determined to be the same person as the selected face image 500 is highlighted so as to be distinguished from other face images as shown in FIG. 7 (face images 501 to 506 of the same person). For example, the face image is surrounded by a thick frame. Furthermore, among the highlighted face images 501 to 506, the selected thumbnail image 600 to 602 (same or closest) corresponding to the selected face image 500 and one face image 501 and 502 before and after the selected face image 500. The thumbnail image having time stamp information to be displayed is displayed in focus (see FIG. 7). In addition, thumbnail images other than these three rose thumbnail images 600 to 602 are displayed as thumbnail thumbnails (compressed and displayed). Furthermore, among the highlighted face images 501 to 506, the selected face image 500 and thumbnail images 503 to 506 that do not correspond to one face image 501 and 502 before and after the selected face image 500 are video content data. A marking indicating the position (arrows A and B: see FIG. 7) is displayed. Accordingly, it is possible to indicate to the user the presence of the selected face image 500 and thumbnail images 503 to 506 that do not correspond to the one face image 501 and 502 before and after the selected face image 500 (step S104). At the same time, depending on the cheer level extracted from the video content data, for example, the level display is filled with a dark color up to the top three of the cheer level (see FIG. 7: level display 700, 701, 702). In the above-described contents, the selected face image 500 and the jagged thumbnail images 600 to 602 corresponding to the face images 501 and 502 before and after the selected face image 500 are displayed in focus. Three thumbnail images corresponding to the third place (assuming that the thumbnail images have the same or closest time stamp information) may be displayed in focus. Further, three thumbnail images corresponding to the top three places of the characters with the highest appearance frequency may be displayed in focus.

ユーザによって、太枠で囲まれた顔画像のうち、１つが選択され、再生指示がされた場合は（ステップＳ１０５のＹＥＳ）、再生指示がされた顔画像に対応する位置から映像コンテンツデータを再生する（ステップＳ１０６）。なお、再生指示は、例えば、選択された顔画像をマウス等で右クリックしてメニューを表示させて、表示させたメニューから再生コマンドを選択する。また、選択された顔画像５００と同一人物と判別された顔画像をその他の顔画像と区別するように強調表示する場合、該当の顔画像を太枠で囲むようにする以外にも、その他の顔画像をグレーアウト（所定の領域にグレーのハッチングフィルターを被せて表示を見えにくくする）する等の処理を行ってもよい。 When one of the face images surrounded by a thick frame is selected by the user and a playback instruction is issued (YES in step S105), the video content data is played back from the position corresponding to the face image for which the playback instruction has been issued. (Step S106). Note that the playback instruction is performed by, for example, right-clicking the selected face image with a mouse or the like to display a menu, and selecting a playback command from the displayed menu. When highlighting a face image determined to be the same person as the selected face image 500 so as to be distinguished from other face images, in addition to surrounding the face image with a thick frame, For example, the face image may be grayed out (a predetermined area is covered with a gray hatching filter to make the display difficult to see).

以上のように、本実施形態においては、映像コンテンツデータ全体を対象に顔画像の一覧を表示させ、映像コンテンツデータ全体の中の特定の人物に着目して顔画像の表示をその他の顔画像と区別するように強調表示し、そのうちの複数箇所をフォーカスして表示させ、所望の人物が登場する可能の映像コンテンツデータを容易に再生させることができる。また、映像コンテンツデータを再生する前に、映像コンテンツデータ全体の中のどの辺りの時間帯に所望の人物が登場するのかをユーザに提示できる。 As described above, in this embodiment, a list of face images is displayed for the entire video content data, and the display of the face image is made different from other face images by focusing on a specific person in the entire video content data. Video content data that can be displayed with a desired person appearing can be easily reproduced by highlighting them so as to distinguish them from each other, and displaying a plurality of them with focus. In addition, before reproducing the video content data, it is possible to present to the user at which time zone in the entire video content data a desired person appears.

なお、本実施形態では、インデキシング処理部として機能するビデオプロセッサ１１３によってインデキシング情報を生成したが、例えば、放送番組データ内に予め当該放送番組データに対応するインデキシング情報が含まれている場合には、インデキシング処理を行う必要はない。よって、本実施形態の顔画像一覧表示機能は、データベース１１１Ａと顔画像一覧表示処理部３０１のみによっても実現することが出来る。 In the present embodiment, the indexing information is generated by the video processor 113 functioning as an indexing processing unit. For example, when indexing information corresponding to the broadcast program data is included in the broadcast program data in advance, There is no need to perform the indexing process. Therefore, the face image list display function of the present embodiment can be realized only by the database 111A and the face image list display processing unit 301.

また、本実施形態の顔画像一覧表示処理の手順は全てソフトウェアによって実現することができるので、このソフトウェアをコンピュータ読み取り可能な記憶媒体を通じて通常のコンピュータに導入することにより、本実施形態と同様の効果を容易に実現することができる。 Further, since all the face image list display processing procedures of the present embodiment can be realized by software, the same effects as those of the present embodiment can be obtained by introducing this software into a normal computer through a computer-readable storage medium. Can be easily realized.

また、本実施形態の電子機器はコンピュータ１０によって実現するのみならず、例えば、ＨＤＤレコーダ、ＤＶＤレコーダ、テレビジョン装置といった様々なコンシューマ電子機器によって実現することもできる。この場合、ＴＶアプリケーションプログラム２０２の機能は、ＤＳＰ（Digital Signal Processor）、マイクロコンピュータのようなハードウェアによって実現することができる。 In addition, the electronic device of the present embodiment can be realized not only by the computer 10 but also by various consumer electronic devices such as an HDD recorder, a DVD recorder, and a television device. In this case, the function of the TV application program 202 can be realized by hardware such as a DSP (Digital Signal Processor) or a microcomputer.

また、本発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態に構成要素を適宜組み合わせてもよい。 Further, the present invention is not limited to the above-described embodiments as they are, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment. For example, some components may be deleted from all the components shown in the embodiment. Furthermore, you may combine a component suitably in different embodiment.

本発明の一実施形態に係る情報処理装置の概観の例を示す斜視図。The perspective view which shows the example of the external appearance of the information processing apparatus which concerns on one Embodiment of this invention. 同実施形態の情報処理装置のシステム構成の例を示すブロック図。2 is an exemplary block diagram showing an example of the system configuration of the information processing apparatus of the embodiment. FIG. 同実施形態の情報処理装置の顔画像一覧表示機能を説明するためのブロック図。FIG. 3 is an exemplary block diagram for explaining a face image list display function of the information processing apparatus according to the embodiment; 同実施形態の情報処理装置で用いられる特徴量のパラメータの例を示す図。FIG. 6 is a diagram showing an example of feature quantity parameters used in the information processing apparatus of the embodiment. 同実施形態の情報処理装置で用いられる区間属性情報（属性検出結果情報）の例を示す図。The figure which shows the example of the section attribute information (attribute detection result information) used with the information processing apparatus of the embodiment. 同実施形態の情報処理装置で用いられるプログラムの機能構成を示すブロック図。2 is an exemplary block diagram illustrating a functional configuration of a program used in the information processing apparatus of the embodiment. FIG. 同実施形態の情報処理装置によって表示装置に表示されるインデキシングビュー画面の例を示す図。The figure which shows the example of the indexing view screen displayed on a display apparatus by the information processing apparatus of the embodiment. 同実施形態の情報処理装置によって実行される顔画像一覧表示処理の手順の例を示すフローチャート。6 is an exemplary flowchart illustrating an example of a procedure of face image list display processing executed by the information processing apparatus of the embodiment.

Explanation of symbols

１０…コンピュータ（情報処理装置）、１１１Ａ…データベース、１１３…ビデオプロセッサ、１１７…ＴＶチューナ、３０１…顔画像一覧表示処理部、４０２…インデキシング制御部、４０３…再生処理部。 DESCRIPTION OF SYMBOLS 10 ... Computer (information processing apparatus), 111A ... Database, 113 ... Video processor, 117 ... TV tuner, 301 ... Face image list display process part, 402 ... Indexing control part, 403 ... Reproduction process part.

Claims

Face image extraction means for extracting a plurality of face images from video content data;
Face image list display means for displaying a list of the plurality of face images on a first display area;
Thumbnail image extracting means for extracting a thumbnail image of at least one frame from each section obtained by dividing video content data at equal time intervals;
Each thumbnail image extracted by the thumbnail image extraction means is displayed in chronological order so that a plurality of thumbnail images are displayed in a normal size and the other thumbnail images are displayed in a smaller horizontal size than the normal size. Thumbnail image list display means for displaying side by side on the second display area,
The plurality of thumbnail images displayed as the normal size by the thumbnail image list display means are thumbnail images related to the face image selected from the list of face images displayed on the first display area. A characteristic information processing apparatus.

When one face image is selected from the plurality of face images displayed on the first display area, the selected face image and the first display area are displayed. Comparing means for comparing a plurality of face images based on the extracted feature amount information;
Among the plurality of face images displayed on the first display area by the comparing means, the face image determined to be the same person as the selected face image is highlighted and distinguished from other face images. And further highlighting means for
The face image extraction means extracts time stamp information indicating a time point when the face image appears in the video content data together with each face image,
The thumbnail image extraction means extracts time stamp information in which the thumbnail image appears in the video content data together with each thumbnail image,
The plurality of thumbnail images displayed as normal size by the thumbnail image list display means are thumbnail images having time stamp information that is the same as or closest to the selected face image and the face image highlighted by the highlight display means. The information processing apparatus according to claim 1, wherein:

The plurality of thumbnail images displayed as normal size by the thumbnail image list display means are the selected face image and the face image highlighted by the highlight display means, and times before and after the selected face image. 3. The information processing apparatus according to claim 2, wherein the face image has stamp information and the three thumbnail images have the same or closest time stamp information.

The information processing apparatus according to claim 3.
Marking a thumbnail image having time stamp information that is the same as or closest to the face image highlighted by the highlighting means and displayed in a reduced horizontal direction than the normal size. Information processing device.

The information processing apparatus according to claim 1,
The thumbnail image list display means, when there are a plurality of the video content data, simultaneously displays thumbnail images included in each video content data as a normal size.

The information processing apparatus according to claim 1,
The number of appearances of the same person is counted by the comparison means, and thumbnail images having time stamp information that is the same as or closest to the face images of the top three persons with the most appearances are displayed in the thumbnail image list. An information processing apparatus characterized in that the means displays in a normal size.

Detection means for detecting cheering levels in time series from audio data of video content data;
Thumbnail image extraction means for extracting a thumbnail image of at least one frame from each section obtained by dividing video content data at equal time intervals;
Each thumbnail image extracted by the image extracting means is arranged in chronological order so that a plurality of thumbnail images are displayed in a normal size and the other thumbnail images are displayed with a smaller horizontal size than the normal size. A thumbnail image list display means for displaying on the display area,
The information processing apparatus according to claim 1, wherein the plurality of thumbnail images displayed at a normal size by the thumbnail image list display unit are a plurality of thumbnail images extracted according to a cheer level detected by the detection unit.

The thumbnail image extraction means extracts time stamp information in which the thumbnail image appears in the video content data together with each thumbnail image,
The plurality of thumbnail images displayed as a normal size by the thumbnail image list display means are thumbnail images having time stamp information that is the same as or closest to a plurality of higher-level cheer levels detected by the detection means. The information processing apparatus according to claim 7.