JP2013207530A

JP2013207530A - Information processing device, information processing method and program

Info

Publication number: JP2013207530A
Application number: JP2012074115A
Authority: JP
Inventors: Hirotaka Suzuki; 洋貴鈴木
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2012-03-28
Filing date: 2012-03-28
Publication date: 2013-10-07

Abstract

PROBLEM TO BE SOLVED: To generate a digest by which a rough content of time series data can be grasped.SOLUTION: From each chapter obtained by segmenting time series data constituted by a plurality of time series data, a chapter segment extraction unit extracts a chapter segment that represents a predetermined chapter portion. Among each chapter obtained by segmenting the time serial data, a feature peak segment extraction unit extracts a feature segment. An effect addition unit combines the chapter segment with the feature segment in the order of time series, to generate a digest that reflects a rough content of the time series data. The present technique is applicable to a recorder for recording contents and the like.

Description

本開示は、情報処理装置、情報処理方法、及びプログラムに関し、特に、例えば、コンテンツ等の時系列データの大まかな内容がわかるダイジェストを容易に生成できるようにした情報処理装置、情報処理方法、及びプログラムに関する。 The present disclosure relates to an information processing device, an information processing method, and a program, and in particular, for example, an information processing device, an information processing method, and an information processing method that can easily generate a digest that understands the rough contents of time-series data such as content. Regarding the program.

例えば、サッカー等のスポーツを中継するスポーツ中継番組から、ハイライトシーンを検出することにより、スポーツ中継番組の大まかな内容が反映されたダイジェストを生成するダイジェスト生成技術が存在する。 For example, there is a digest generation technique for generating a digest in which a rough content of a sports broadcast program is reflected by detecting a highlight scene from a sports broadcast program that relays sports such as soccer.

このダイジェスト生成技術では、例えば、スポーツ中継番組において、音の盛り上がり（歓声）を特徴付ける特徴量を用いて、歓声があがるシーンを、スポーツ番組のハイライトシーンとして検出する（例えば、特許文献１参照）。 In this digest generation technology, for example, in a sports broadcast program, a scene where a cheer is raised is detected as a highlight scene of a sports program by using a feature value that characterizes the sound excitement (cheer) (see, for example, Patent Document 1). .

そして、検出したハイライトシーンをつなぎ合わせることにより、ダイジェストを生成するようにしている。 A digest is generated by connecting the detected highlight scenes.

特開2008-185626号公報JP 2008-185626 A

しかしながら、上述のダイジェスト生成技術では、ダイジェストの生成対象が、ニュース番組等や、個人で撮影した動画等のコンテンツである場合、歓声があがるシーンは必ずしもハイライトシーンであるとは限らない。 However, in the above-described digest generation technique, when the digest generation target is a content such as a news program or a moving image shot by an individual, the scene where the cheer is raised is not necessarily a highlight scene.

このため、コンテンツのジャンルによっては、コンテンツの大まかな内容が反映されたダイジェストを生成できないことが生じ得る。 For this reason, depending on the genre of the content, it may not be possible to generate a digest reflecting the general content.

本開示は、このような状況に鑑みてなされたものであり、コンテンツ等の時系列データの大まかな内容がわかるダイジェストを容易に生成できるようにするものである。 The present disclosure has been made in view of such a situation, and makes it possible to easily generate a digest in which rough contents of time-series data such as content can be understood.

本開示の一側面の情報処理装置は、時系列に並ぶ複数のデータにより構成される時系列データを区分して得られる各チャプタから、前記チャプタを代表する予め決められた部分を表すチャプタセグメントを抽出するチャプタセグメント抽出部と、前記時系列データを区分して得られる各チャプタのうち、チャプタの特徴的な部分を表す特徴セグメントを有するチャプタから、前記特徴セグメントを抽出する特徴セグメント抽出部と、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、前記時系列データの大まかな内容を反映したダイジェストを生成する生成部とを含む情報処理装置である。 An information processing apparatus according to an aspect of the present disclosure includes a chapter segment representing a predetermined portion representing the chapter from each chapter obtained by dividing time-series data including a plurality of pieces of data arranged in time series. A chapter segment extracting unit for extracting, a feature segment extracting unit for extracting the feature segment from chapters having a feature segment representing a characteristic part of the chapter among the chapters obtained by dividing the time series data, and An information processing apparatus including a generation unit that generates a digest reflecting a rough content of the time-series data by combining the chapter segments and the feature segments in a time-series order.

前記生成部では、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、ユーザの設定操作により設定された長さの前記ダイジェストを生成することができる。 The generation unit can generate the digest having a length set by a user's setting operation by combining the chapter segment and the feature segment in chronological order.

前記時系列データに基づいて、前記複数のデータの属性をそれぞれ表すシンボルを時系列に並べたシンボル列を作成するシンボル列生成部と、前記シンボル列におけるシンボルの分散に基づいて、前記時系列データを複数のチャプタに区分する区分部とをさらに設けることができる。 Based on the time series data, a symbol string generation unit that creates a symbol string in which symbols representing the attributes of the plurality of data are arranged in time series, and the time series data based on the distribution of symbols in the symbol string And a partitioning section for partitioning into a plurality of chapters.

前記区分部では、前記シンボル列を構成する各シンボルの分散に基づいて、前記時系列データを、前記ユーザの設定操作により設定された長さに基づく区分数のチャプタに区分することができる。 The division unit can divide the time-series data into chapters having the number of divisions based on the length set by the user's setting operation based on the variance of each symbol constituting the symbol string.

前記時系列データから、前記時系列データの特徴を表す特徴量を抽出する特徴量抽出部をさらに設けることができ、前記特徴セグメント抽出部では、前記特徴量に基づいて、前記特徴セグメントを有するチャプタから、前記特徴セグメントを抽出することができる。 A feature quantity extraction unit that extracts a feature quantity representing the feature of the time series data from the time series data can be further provided, and the feature segment extraction section includes a chapter having the feature segment based on the feature quantity. From the above, the feature segment can be extracted.

前記特徴セグメント抽出部では、前記特徴量に基づいて、前記チャプタの開始から終了までの区間で前記特徴量が最大又は極大の一方となる箇所を含む前記特徴セグメントを、前記チャプタから抽出することができる。 The feature segment extraction unit may extract, from the chapter, the feature segment including a location where the feature amount is one of maximum or maximum in a section from the start to the end of the chapter based on the feature amount. it can.

前記特徴セグメント抽出部では、前記特徴量に基づいて、前記チャプタの開始から終了までの区間で前記特徴量が最大又は極大の一方となる箇所であって、且つ、前記特徴量が予め決められた閾値以上となる箇所を含む前記特徴セグメントを、前記チャプタから抽出することができる。 In the feature segment extraction unit, based on the feature amount, the feature amount is one of the maximum or maximum in a section from the start to the end of the chapter, and the feature amount is determined in advance. The feature segment including a portion that is equal to or greater than the threshold can be extracted from the chapter.

前記特徴セグメント抽出部では、複数の異なる前記特徴量に基づいて、前記複数の異なる特徴量のうち、前記チャプタの開始から終了までの区間で最大とされる前記特徴量が最大となる箇所を含む前記特徴セグメントを、前記チャプタから抽出することができる。 The feature segment extraction unit includes a portion where the maximum feature amount is maximized in a section from the start to the end of the chapter among the plurality of different feature amounts based on the plurality of different feature amounts. The feature segment can be extracted from the chapter.

前記生成部では、前記チャプタセグメントと特徴セグメントとのそれぞれに、対応する重みで予め用意された音声が付加された前記ダイジェストを生成することができる。 The generation unit can generate the digest in which a voice prepared in advance with a corresponding weight is added to each of the chapter segment and the feature segment.

前記特徴セグメント抽出部では、複数の異なる前記特徴量に基づいて、前記特徴セグメントを有するチャプタから、前記特徴セグメントを抽出し、前記生成部では、前記複数の異なる特徴量のうち、音声の特徴を表す特徴量に基づき抽出された前記特徴セグメントに、他の前記特徴セグメントよりも小さな重みで前記音声が付加された前記ダイジェストを生成することができる。 The feature segment extraction unit extracts the feature segment from chapters having the feature segment based on a plurality of different feature amounts, and the generation unit extracts a voice feature from the plurality of different feature amounts. The digest in which the voice is added to the feature segment extracted based on the feature quantity to be expressed with a weight smaller than that of the other feature segment can be generated.

前記生成部では、連続的に変化して切替わる重みで前記音声が付加された前記ダイジェストを生成することができる。 The generating unit can generate the digest to which the voice is added with a weight that is continuously changed and switched.

本開示の一側面の情報処理方法は、ダイジェストを生成する情報処理装置の情報処理方法であって、前記情報処理装置による、時系列に並ぶ複数のデータにより構成される時系列データを区分して得られる各チャプタから、前記チャプタの予め決められた部分を表すチャプタセグメントを抽出するチャプタセグメント抽出ステップと、前記時系列データを区分して得られる各チャプタのうち、チャプタの特徴的な部分を表す特徴セグメントを有するチャプタから、前記特徴セグメントを抽出する特徴セグメント抽出ステップと、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、前記時系列データの大まかな内容を反映したダイジェストを生成する生成ステップとを含む情報処理方法である。 An information processing method according to an aspect of the present disclosure is an information processing method of an information processing device that generates a digest, and classifies time-series data including a plurality of data arranged in time series by the information processing device. A chapter segment extraction step for extracting a chapter segment representing a predetermined part of the chapter from each obtained chapter, and a characteristic portion of the chapter among the chapters obtained by dividing the time-series data. A feature segment extracting step for extracting the feature segment from the chapter having the feature segment, and a digest reflecting a rough content of the time series data by combining the chapter segment and the feature segment in a time series order. An information processing method including a generation step of generating.

本開示の一側面のプログラムは、コンピュータを、時系列に並ぶ複数のデータにより構成される時系列データを区分して得られる各チャプタから、前記チャプタの予め決められた部分を表すチャプタセグメントを抽出するチャプタセグメント抽出部と、前記時系列データを区分して得られる各チャプタのうち、チャプタの特徴的な部分を表す特徴セグメントを有するチャプタから、前記特徴セグメントを抽出する特徴セグメント抽出部と、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、前記時系列データの大まかな内容を反映したダイジェストを生成する生成部として機能させるためのプログラムである。 A program according to an aspect of the present disclosure extracts a chapter segment representing a predetermined portion of the chapter from each chapter obtained by dividing time series data including a plurality of pieces of data arranged in time series by a computer. A feature segment extraction unit that extracts a feature segment from a chapter having a feature segment representing a characteristic part of the chapter among the chapters obtained by dividing the time series data, and This is a program for functioning as a generation unit that generates a digest reflecting the rough contents of the time-series data by combining the chapter segments and the feature segments in time-series order.

本開示によれば、時系列に並ぶ複数のデータにより構成される時系列データを区分して得られる各チャプタから、前記チャプタの予め決められた部分を表すチャプタセグメントが抽出され、前記時系列データを区分して得られる各チャプタのうち、チャプタの特徴的な部分を表す特徴セグメントを有するチャプタから、前記特徴セグメントが抽出され、前記チャプタセグメントと前記特徴セグメントが時系列の順序で結合されることにより、前記時系列データの大まかな内容を反映したダイジェストが生成される。 According to the present disclosure, a chapter segment representing a predetermined portion of the chapter is extracted from each chapter obtained by dividing time-series data including a plurality of data arranged in time series, and the time-series data Among the chapters obtained by classifying the feature segments, the feature segments are extracted from the chapters having the feature segments representing the characteristic portions of the chapters, and the chapter segments and the feature segments are combined in a time-series order. Thus, a digest reflecting the rough contents of the time series data is generated.

本開示によれば、コンテンツ等の時系列データの大まかな内容がわかるダイジェストを容易に生成することが可能となる。 According to the present disclosure, it is possible to easily generate a digest in which rough contents of time-series data such as content can be understood.

第１の実施の形態であるレコーダの構成例を示すブロック図である。It is a block diagram which shows the structural example of the recorder which is 1st Embodiment. 図１のシンボル列生成部が生成するシンボル列の一例を示す図である。It is a figure which shows an example of the symbol sequence which the symbol sequence production | generation part of FIG. 1 produces | generates. 図１のコンテンツモデル学習部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the content model learning part of FIG. left-to-right型のHMMの一例を示す図である。FIG. 3 is a diagram illustrating an example of a left-to-right type HMM. エルゴディック(Ergodic)型のHMMの一例を示す図である。It is a figure which shows an example of an Ergodic type HMM. スパースな構造のHMMである2次元近傍拘束HMMの一例を示す図である。FIG. 3 is a diagram illustrating an example of a two-dimensional neighborhood constrained HMM that is an HMM having a sparse structure. スパースな構造のHMMの、2次元近傍拘束HMM以外の一例を示す図である。It is a figure which shows an example of HMM of a sparse structure other than a two-dimensional neighborhood constraint HMM. 図３の特徴量抽出部による特徴量の抽出の処理を示す図である。It is a figure which shows the process of extraction of the feature-value by the feature-value extraction part of FIG. 図３のコンテンツモデル学習部が行うコンテンツモデル学習処理を説明するためのフローチャートである。It is a flowchart for demonstrating the content model learning process which the content model learning part of FIG. 3 performs. 図１のシンボル列生成部の構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of a symbol string generation unit in FIG. 1. 図１のシンボル列生成部が行うシンボル列生成処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the symbol sequence production | generation process which the symbol sequence production | generation part of FIG. 1 performs. 図１のシンボル列生成部が行うシンボル列生成処理を説明するためのフローチャートである。It is a flowchart for demonstrating the symbol sequence production | generation process which the symbol sequence production | generation part of FIG. 1 performs. 図１の分割部が、シンボル列に基づいて、コンテンツを複数のセグメントに分割するときの一例を示す図である。It is a figure which shows an example when the division part of FIG. 1 divides | segments a content into a some segment based on a symbol row | line. 図１の分割部が行う再帰二分割処理を説明するためのフローチャートである。It is a flowchart for demonstrating the recursive bisection process which the division part of FIG. 1 performs. 図１の分割部が行う焼きなまし分割処理を説明するためのフローチャートである。It is a flowchart for demonstrating the annealing division | segmentation process which the division | segmentation part of FIG. 1 performs. 図１のレコーダが行うコンテンツ分割処理を説明するためのフローチャートである。It is a flowchart for demonstrating the content division | segmentation process which the recorder of FIG. 1 performs. 第２の実施の形態であるレコーダの構成例を示すブロック図である。It is a block diagram which shows the structural example of the recorder which is 2nd Embodiment. 図１７の分割部により生成されるチャプタポイントデータの一例を示す図である。It is a figure which shows an example of the chapter point data produced | generated by the division part of FIG. 図１７のダイジェスト生成部が行うダイジェスト生成処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the digest production | generation process which the digest production | generation part of FIG. 17 performs. 図１７のダイジェスト生成部の詳細な構成例を示すブロック図である。It is a block diagram which shows the detailed structural example of the digest production | generation part of FIG. 図２０の特徴量抽出部が、音声パワー時系列データを生成する様子を説明するための図である。It is a figure for demonstrating a mode that the feature-value extraction part of FIG. 20 produces | generates audio | voice power time series data. フレームの動きベクトルの一例を示す図である。It is a figure which shows an example of the motion vector of a flame | frame. ズームインテンプレートの一例を示す図である。It is a figure which shows an example of a zoom-in template. 図２０のエフェクト追加部が行う処理を説明するための図である。It is a figure for demonstrating the process which the effect addition part of FIG. 20 performs. 図１７のレコーダが行うダイジェスト生成処理を説明するためのフローチャートである。It is a flowchart for demonstrating the digest production | generation process which the recorder of FIG. 17 performs. 第３の実施の形態であるレコーダの構成例を示すブロック図である。It is a block diagram which shows the structural example of the recorder which is 3rd Embodiment. ユーザの指定操作により、チャプタポイントデータが変化する様子の一例を示す図である。It is a figure which shows an example of a mode that chapter point data change by a user's designation | designated operation. チャプタポイントとされたフレームの一例を示す図である。It is a figure which shows an example of the flame | frame made into the chapter point. チャプタポイントとされたフレームの右方向に、50フレームの間隔でサムネイル画像を表示させるときの一例を示す図である。It is a figure which shows an example when displaying a thumbnail image by the space | interval of 50 frames to the right of the flame | frame used as the chapter point. 表示部の表示画面の一例を示す第１の図である。It is a 1st figure which shows an example of the display screen of a display part. 表示部の表示画面の一例を示す第２の図である。It is a 2nd figure which shows an example of the display screen of a display part. 表示部の表示画面の一例を示す第３の図である。It is a 3rd figure which shows an example of the display screen of a display part. 表示部の表示画面の一例を示す第４の図である。It is a 4th figure which shows an example of the display screen of a display part. 図２６の提示部の詳細な構成例を示すブロック図である。It is a block diagram which shows the detailed structural example of the presentation part of FIG. 表示部の表示画面の一例を示す第５の図である。It is a 5th figure which shows an example of the display screen of a display part. 表示部の表示画面の一例を示す第６の図である。It is a 6th figure which shows an example of the display screen of a display part. 表示部の表示画面の一例を示す第７の図である。It is a 7th figure which shows an example of the display screen of a display part. 表示部の表示画面の一例を示す第８の図である。It is an 8th figure which shows an example of the display screen of a display part. 表示部の表示画面の一例を示す第９の図である。It is a 9th figure which shows an example of the display screen of a display part. 図２６のレコーダが行う提示処理を説明するためのフローチャートである。It is a flowchart for demonstrating the presentation process which the recorder of FIG. 26 performs. 表示モードが移行する様子の一例を示すフローチャートである。It is a flowchart which shows an example of a mode that a display mode transfers. コンピュータの構成例を示すブロック図である。It is a block diagram which shows the structural example of a computer.

以下、本開示における実施の形態（以下、実施の形態という）について説明する。なお、説明は以下の順序で行う。
１．第１の実施の形態（コンテンツを、意味的にまとまりのあるセグメントに区分するときの一例）
２．第２の実施の形態（コンテンツの大まかな内容がわかるダイジェストを生成するときの一例）
３．第３の実施の形態（コンテンツを構成する各チャプタのサムネイル画像を表示をするときの一例）
４．変形例 Hereinafter, embodiments of the present disclosure (hereinafter referred to as embodiments) will be described. The description will be given in the following order.
1. First embodiment (an example of dividing content into semantically coherent segments)
2. Second embodiment (an example of generating a digest that shows the general content)
3. Third embodiment (an example of displaying thumbnail images of chapters constituting a content)
4). Modified example

＜１．第１の実施の形態＞
［レコーダ１の構成例］ <1. First Embodiment>
[Configuration Example of Recorder 1]

図１は、第１の実施の形態であるレコーダ１の構成例を示している。 FIG. 1 shows a configuration example of a recorder 1 according to the first embodiment.

図１のレコーダ１は、例えば、HD(Hard Disk)レコーダ等であり、テレビジョン放送の番組や、インターネット等のネットワークを介して提供されるコンテンツ、ビデオカメラ等で撮影したコンテンツ等の各種のコンテンツを録画（記録）（記憶）することができる。 The recorder 1 in FIG. 1 is, for example, an HD (Hard Disk) recorder, and various contents such as a television broadcast program, content provided via a network such as the Internet, content shot with a video camera, and the like. Can be recorded (recorded) (stored).

すなわち、図１において、レコーダ１は、コンテンツ記憶部１１、コンテンツモデル学習部１２、モデル記憶部１３、シンボル列生成部１４、分割部１５、制御部１６、及び操作部１７から構成される。 1, the recorder 1 includes a content storage unit 11, a content model learning unit 12, a model storage unit 13, a symbol string generation unit 14, a division unit 15, a control unit 16, and an operation unit 17.

コンテンツ記憶部１１は、例えば、テレビジョン放送の番組等のコンテンツを記憶（記録）する。コンテンツ記憶部１１へのコンテンツの記憶が、そのコンテンツの録画となり、その録画がされたコンテンツ（コンテンツ記憶部１１に記憶されたコンテンツ）は、例えば、操作部１７を用いたユーザの再生操作に応じて再生される。 The content storage unit 11 stores (records) content such as a television broadcast program, for example. The storage of the content in the content storage unit 11 results in the recording of the content, and the recorded content (the content stored in the content storage unit 11) corresponds to, for example, a user's playback operation using the operation unit 17 Played.

コンテンツモデル学習部１２は、例えば、コンテンツ記憶部１１に記憶されたコンテンツ等を、所定の特徴量空間において自己組織的に構造化し、コンテンツの構造（時空間構造）を表すモデル（以下、コンテンツモデルともいう）を求める学習（統計学習）を行う。コンテンツモデル学習部１２は、学習の結果得られるコンテンツモデルを、モデル記憶部１３に供給する。 For example, the content model learning unit 12 self-organizes the content stored in the content storage unit 11 in a predetermined feature amount space and expresses the content structure (time-space structure) (hereinafter, content model). (Also called statistical learning). The content model learning unit 12 supplies a content model obtained as a result of learning to the model storage unit 13.

モデル記憶部１３は、コンテンツモデル学習部１２から供給されるコンテンツモデルを記憶する。 The model storage unit 13 stores the content model supplied from the content model learning unit 12.

シンボル列生成部１４は、コンテンツ記憶部１１からコンテンツを読み出す。そして、シンボル列生成部１４は、読み出したコンテンツを構成する各フレーム（又はフィールド）の属性を表すシンボルを求め、各フレーム毎に求めた複数のシンボルを時系列に並べたシンボル列を作成（生成）し、分割部１５に供給する。 The symbol string generation unit 14 reads content from the content storage unit 11. Then, the symbol sequence generation unit 14 obtains a symbol representing the attribute of each frame (or field) constituting the read content, and creates (generates) a symbol sequence in which a plurality of symbols obtained for each frame are arranged in time series. And supply to the dividing unit 15.

すなわち、例えば、シンボル列生成部１４は、コンテンツ記憶部１１に記憶されたコンテンツと、モデル記憶部１３に記憶されたコンテンツモデルとを用いて、複数のシンボルから構成されるシンボル列を作成し、分割部１５に供給する。 That is, for example, the symbol sequence generation unit 14 creates a symbol sequence composed of a plurality of symbols using the content stored in the content storage unit 11 and the content model stored in the model storage unit 13, This is supplied to the dividing unit 15.

ここで、シンボルとしては、例えば、特徴量空間を構成する各部分空間である複数のクラスタのうち、フレームの特徴量が含まれるクラスタを表すクラスタIDを採用することができる。 Here, as the symbol, for example, a cluster ID representing a cluster including the feature amount of the frame among a plurality of clusters that are the partial spaces constituting the feature amount space can be employed.

なお、クラスタIDは、そのクラスタIDが表すクラスタに応じた値とされる。すなわち、例えば、クラスタIDは、クラスタどうしの位置が近い程に、互いのクラスタIDは近い値とされる。したがって、フレームの特徴量が類似しているほどに、フレームどうしのクラスタIDは、近い値とされる。 The cluster ID is a value corresponding to the cluster represented by the cluster ID. That is, for example, the cluster IDs are closer to each other as the positions of the clusters are closer. Accordingly, the cluster IDs of frames are closer to each other as the feature amounts of the frames are more similar.

また、例えば、シンボルとしては、複数の異なる状態をそれぞれ表す状態IDのうち、フレームの状態を表す状態IDを採用するようにしてもよい。なお、状態IDは、その状態IDが表す状態に応じた値とされる。すなわち、例えば、状態IDは、フレームの状態が近い程に、互いの状態IDは近い値とされる。 Further, for example, as a symbol, a state ID representing a frame state among state IDs representing a plurality of different states may be adopted. The state ID is a value corresponding to the state represented by the state ID. That is, for example, the state IDs are closer to each other as the frame states are closer.

シンボルとしてクラスタIDを採用した場合、同一のシンボルに対応する各フレームは、フレームに表示されるオブジェクトが類似しているものとなる。 When a cluster ID is employed as a symbol, each frame corresponding to the same symbol has a similar object displayed in the frame.

また、シンボルとして状態IDを採用した場合、同一のシンボルに対応する各フレームは、フレームに表示されるオブジェクトが類似している他、時間的な前後関係も類似しているものとなる。 Further, when the state ID is adopted as the symbol, the frames corresponding to the same symbol have similar objects displayed in the frames and similar temporal relationships.

すなわち、例えば、シンボルとしてクラスタIDを採用した場合、発車寸前の電車が表示されたフレームと、停車寸前の電車が表示されたフレームとは、同一のシンボルとされる。 That is, for example, when the cluster ID is adopted as a symbol, the frame in which the train just before the departure is displayed and the frame in which the train just before the stop is displayed are the same symbol.

これは、シンボルとしてクラスタIDを採用した場合、オブジェクトが類似しているか否かのみで、フレームにシンボルが割り当てられることによる。 This is because when a cluster ID is adopted as a symbol, the symbol is assigned to the frame only depending on whether or not the objects are similar.

これに対して、シンボルとして状態IDを採用した場合、発車寸前の電車が表示されたフレームと、停車寸前の電車が表示されたフレームとは、異なるシンボルとされる。 On the other hand, when the state ID is adopted as the symbol, the frame in which the train just before the departure is displayed is different from the frame in which the train just before the stop is displayed.

これは、シンボルとして状態IDを採用した場合、オブジェクトが類似しているか否かの他、時間的な前後関係も考慮して、フレームにシンボルが割り当てられることによる。 This is because when the state ID is adopted as the symbol, the symbol is assigned to the frame in consideration of whether the object is similar or not, and also the temporal context.

したがって、シンボルとして状態IDを採用した場合、シンボルは、クラスタIDを採用した場合よりも、フレームの属性をより詳細に表したものとなる。 Therefore, when the state ID is employed as the symbol, the symbol represents the frame attributes in more detail than when the cluster ID is employed.

第１の実施の形態では、シンボル列における各シンボルのばらつき（分散）に基づいて、コンテンツを複数のセグメントに分割する点がポイントである。 In the first embodiment, the point is that the content is divided into a plurality of segments based on the variation (dispersion) of each symbol in the symbol string.

したがって、第１の実施の形態では、シンボルとして状態IDを採用した場合、シンボルとしてクラスタIDを採用した場合と比較して、精度良く、コンテンツを、意味的にまとまりのある複数のセグメントに分割できる。 Therefore, in the first embodiment, when the state ID is adopted as the symbol, the content can be divided into a plurality of segments that are semantically grouped with higher accuracy than when the cluster ID is adopted as the symbol. .

なお、モデル記憶部１３に、学習済みのコンテンツモデルが、既に記憶されている場合には、コンテンツモデル学習部１２を設けずに、レコーダ１を構成することができる。 If a learned content model is already stored in the model storage unit 13, the recorder 1 can be configured without providing the content model learning unit 12.

ここで、コンテンツ記憶部１１に記憶されるコンテンツのデータは、画像、音声、及び、必要なテキスト（字幕）のデータ（ストリーム）を含むこととする。 Here, the content data stored in the content storage unit 11 includes data (stream) of images, sound, and necessary text (caption).

また、ここでは、コンテンツのデータのうちの、画像のデータだけを、コンテンツモデルの学習の処理や、コンテンツモデルを用いた処理に用いることとする。 Here, it is assumed that only the image data of the content data is used for the content model learning process and the process using the content model.

但し、コンテンツモデルの学習の処理や、コンテンツモデルを用いた処理には、画像のデータの他、音声やテキストのデータをも用いることが可能であり、この場合、処理の精度を向上させることができる。 However, it is possible to use not only image data but also audio and text data for the content model learning process and the process using the content model. In this case, the accuracy of the process can be improved. it can.

また、コンテンツモデルの学習の処理や、コンテンツモデルを用いた処理には、画像ではなく、音声のデータだけを用いることが可能である。 In addition, it is possible to use only audio data, not images, for the content model learning process and the process using the content model.

分割部１５は、シンボル列生成部１４からのシンボル列を生成する際に用いられたコンテンツと同一のコンテンツを、コンテンツ記憶部１１から読み出す。そして、分割部１５は、シンボル列生成部１４からのシンボル列における各シンボルのばらつき（分散）に基づいて、読み出したコンテンツを、意味的にまとまりのある複数のセグメントに分割（区分）する。 The dividing unit 15 reads the same content as the content used when generating the symbol sequence from the symbol sequence generating unit 14 from the content storage unit 11. Then, the dividing unit 15 divides (divides) the read content into a plurality of segments that are semantically grouped based on the variation (dispersion) of each symbol in the symbol sequence from the symbol sequence generating unit 14.

すなわち、例えば、分割部１５は、意味的にまとまりのある複数のセグメントとして、番組のコーナー毎や、ニュースのトピック毎に、コンテンツを分割する。 That is, for example, the dividing unit 15 divides content into a plurality of segments that are semantically grouped for each corner of a program or for each topic of news.

制御部１６は、例えば、操作部１７からの操作信号に基づいて、コンテンツモデル学習部１２、シンボル列生成部１４、及び分割部１５を制御する。 For example, the control unit 16 controls the content model learning unit 12, the symbol string generation unit 14, and the division unit 15 based on an operation signal from the operation unit 17.

操作部１７は、ユーザにより操作される操作ボタン等であり、ユーザにより操作されたことに対応して、ユーザの操作に対応する操作信号を、制御部１６に供給する。 The operation unit 17 is an operation button or the like operated by the user, and supplies an operation signal corresponding to the user operation to the control unit 16 in response to the operation by the user.

次に、図２は、シンボル列生成部１４が生成するシンボル列の一例を示している。 Next, FIG. 2 shows an example of a symbol string generated by the symbol string generator 14.

なお、図２において、横軸は時刻tを表しており、縦軸は、時刻tにおけるフレーム(フレームt)のシンボルを表している。 In FIG. 2, the horizontal axis represents time t, and the vertical axis represents the symbol of the frame (frame t) at time t.

ここで、時刻tとは、例えば、コンテンツの先頭を基準とする時刻であり、時刻tにおけるフレームtとは、コンテンツの先頭からt番目のフレームを意味する。なお、コンテンツの先頭のフレームは、フレーム０とされる。 Here, the time t is, for example, a time based on the beginning of the content, and the frame t at the time t means the t-th frame from the beginning of the content. Note that the top frame of the content is frame 0.

また、シンボルは、シンボル（の値）どうしが近い程に、シンボルに対応するフレームどうしの属性は近いものとなる。 Further, as the symbols (values) are closer to each other, the attributes of the frames corresponding to the symbols are closer.

また、図２において、図中垂直方向に延びる太線の線分は、複数のシンボルから構成されるシンボル列を、６個の部分系列に分割する分割線を表す。 In FIG. 2, a thick line segment extending in the vertical direction in the drawing represents a dividing line that divides a symbol string composed of a plurality of symbols into six partial series.

このシンボル列は、図２に示されるように、比較的少ない種類のシンボルが頻繁に観測される第１の部分系列（「停留」的な特徴を持つ部分系列）と、比較的多くの種類のシンボルが観測される第２の部分系列（「大分散」的な特徴を持つ部分系列）とで構成される。 As shown in FIG. 2, this symbol string includes a first partial sequence (a partial sequence having a “rest” characteristic) in which relatively few types of symbols are frequently observed, and a relatively large number of types. It is composed of a second partial series (a partial series having “large variance” characteristics) in which symbols are observed.

図２では、第１の部分系列が４個、第２の部分系列が２個だけ示されている。 In FIG. 2, only four first partial series and two second partial series are shown.

本発明者らは、複数の被験者を対象として、図２に示されるようなシンボル列を、N個（図２の場合、N=6）に分割させるための分割線を引かせる実験を行なった。そして、本発明者らは、以下のような実験結果を得た。 The present inventors conducted an experiment for drawing a dividing line for dividing a symbol string as shown in FIG. 2 into N pieces (N = 6 in the case of FIG. 2) for a plurality of subjects. . And the present inventors obtained the following experimental results.

すなわち、被験者が、シンボル列のうち、第１の部分系列と第２の部分系列との境界や、第１の部分系列どうしの境界、第２の部分系列どうしの境界に、分割線を引くことが多いとの実験結果を得た。 That is, the subject draws a dividing line on the boundary between the first partial series and the second partial series, the boundary between the first partial series, and the boundary between the second partial series in the symbol string. The experiment result that there are many.

また、被験者が引いた分割線の位置で、図２に示されるシンボル列に対応するコンテンツを分割した場合にも、そのコンテンツは、概ね、意味的にまとまりのある複数のセグメントに分割されていることがわかった。 In addition, even when the content corresponding to the symbol string shown in FIG. 2 is divided at the position of the dividing line drawn by the subject, the content is generally divided into a plurality of segments that are semantically organized. I understood it.

したがって、分割部１５は、シンボル列生成部１４からのシンボル列に基づいて、被験者と同様の位置に分割線を引くことにより、コンテンツを、意味的にまとまりのある複数のセグメントに分割する。 Therefore, the dividing unit 15 divides the content into a plurality of segments that are semantically grouped by drawing a dividing line at the same position as the subject based on the symbol sequence from the symbol sequence generating unit 14.

なお、分割部１５が行う具体的な処理は、図１３乃至図１５を参照して詳述する。 Specific processing performed by the dividing unit 15 will be described in detail with reference to FIGS. 13 to 15.

［コンテンツモデル学習部１２の構成例］
図３は、図１のコンテンツモデル学習部１２の構成例を示している。 [Configuration Example of Content Model Learning Unit 12]
FIG. 3 shows a configuration example of the content model learning unit 12 of FIG.

コンテンツモデル学習部１２は、状態が遷移する状態遷移確率と、状態から、所定の観測値が観測される観測確率とで規定される状態遷移確率モデルの学習（モデル学習）を行う。また、コンテンツモデル学習部１２は、後述するクラスタ情報を得るためのクラスタ学習に用いるコンテンツである学習用コンテンツの画像の各フレームの特徴量を抽出する。さらに、コンテンツモデル学習部１２は、学習用コンテンツの特徴量を用いて、クラスタ学習を行う。 The content model learning unit 12 performs learning (model learning) of a state transition probability model defined by a state transition probability that a state transitions and an observation probability that a predetermined observation value is observed from the state. Further, the content model learning unit 12 extracts a feature amount of each frame of an image of learning content that is content used for cluster learning for obtaining cluster information described later. Further, the content model learning unit 12 performs cluster learning using the feature amount of the learning content.

すなわち、コンテンツモデル学習部１２は、学習用コンテンツ選択部２１、特徴量抽出部２２、特徴量記憶部２６、及び、学習部２７から構成される。 That is, the content model learning unit 12 includes a learning content selection unit 21, a feature amount extraction unit 22, a feature amount storage unit 26, and a learning unit 27.

学習用コンテンツ選択部２１は、コンテンツ記憶部１１に記憶されたコンテンツの中から、モデル学習及びクラスタ学習に用いるコンテンツを、学習用コンテンツとして選択し、特徴量抽出部２２に供給する。 The learning content selection unit 21 selects content used for model learning and cluster learning from the content stored in the content storage unit 11 as learning content, and supplies it to the feature amount extraction unit 22.

ここで、学習用コンテンツ選択部２１は、コンテンツ記憶部１１に記憶されたコンテンツの中から、例えば、所定のカテゴリに属する１以上のコンテンツを、学習用コンテンツとして選択する。 Here, the learning content selection unit 21 selects, for example, one or more contents belonging to a predetermined category from the contents stored in the content storage unit 11 as learning contents.

所定のカテゴリに属するコンテンツとは、例えば、ジャンルが同一の番組や、連続番組、毎週又は毎日その他周期的に放送される番組（タイトルが同一の番組）等の、コンテンツに潜む、コンテンツの構造が共通するコンテンツを意味する。 The content belonging to a predetermined category is, for example, a content structure hidden in the content such as a program of the same genre, a continuous program, a program broadcasted every week or every other day (a program having the same title), and the like. Means common content.

ジャンルとしては、例えば、スポーツ番組やニュース番組等といった、いわば大まかな分類を採用することもできるが、例えば、サッカーの試合の番組や野球の試合の番組等といった、いわば細かい分類であることが望ましい。 As the genre, for example, a rough classification such as a sports program or a news program can be adopted, but it is desirable that the classification is a fine classification such as a soccer game program or a baseball game program. .

また、例えば、サッカーの試合の番組であれば、チャンネル（放送局）が異なるごとに、異なるカテゴリに属するコンテンツに分類することもできる。 In addition, for example, a soccer game program can be classified into contents belonging to different categories every time the channel (broadcast station) is different.

なお、コンテンツのカテゴリとして、どのようなカテゴリを採用するかは、例えば、図１のレコーダ１に、あらかじめ設定されていることとする。 It is assumed that what category is adopted as the content category is set in advance in the recorder 1 in FIG. 1, for example.

また、コンテンツ記憶部１１に記憶されたコンテンツのカテゴリは、例えば、テレビジョン放送で番組とともに送信されてくる、番組のタイトルやジャンル等のメタデータや、インターネット上のサイトが提供する番組の情報等から認識することができる。 The content categories stored in the content storage unit 11 include, for example, metadata such as program titles and genres transmitted together with programs by television broadcasting, information on programs provided by sites on the Internet, and the like. Can be recognized from.

特徴量抽出部２２は、学習用コンテンツ選択部２１からの学習用コンテンツを、画像と音声のデータに逆多重化（分離）し、画像の各フレームの特徴量を抽出して、特徴量記憶部２６に供給する。 The feature amount extraction unit 22 demultiplexes (separates) the learning content from the learning content selection unit 21 into image and audio data, extracts the feature amount of each frame of the image, and extracts the feature amount storage unit. 26.

すなわち、特徴量抽出部２２は、フレーム分割部２３、サブ領域特徴量抽出部２４、及び、結合部２５から構成される。 That is, the feature quantity extraction unit 22 includes a frame division unit 23, a sub-region feature quantity extraction unit 24, and a combination unit 25.

フレーム分割部２３には、学習用コンテンツ選択部２１からの学習用コンテンツの画像の各フレームが、時系列に供給される。 Each frame of the learning content image from the learning content selection unit 21 is supplied to the frame dividing unit 23 in time series.

フレーム分割部２３は、学習用コンテンツ選択部２１から時系列に供給される学習用コンテンツのフレームを、順次、注目フレームとする。そして、フレーム分割部２３は、注目フレームを、複数の小領域であるサブ領域に分割し、サブ領域特徴量抽出部２４に供給する。 The frame dividing unit 23 sequentially sets the frames of the learning content supplied from the learning content selection unit 21 in time series as the attention frame. Then, the frame division unit 23 divides the frame of interest into a plurality of sub-regions, which are sub-regions, and supplies the sub-region feature amount extraction unit 24 to the sub-region feature amount extraction unit 24.

サブ領域特徴量抽出部２４は、フレーム分割部２３からの注目フレームの各サブ領域から、そのサブ領域の特徴量（以下、サブ領域特徴量ともいう）を抽出し、結合部２５に供給する。 The sub-region feature amount extraction unit 24 extracts the feature amount of the sub-region (hereinafter also referred to as a sub-region feature amount) from each sub-region of the frame of interest from the frame division unit 23 and supplies it to the combining unit 25.

結合部２５は、サブ領域特徴量抽出部２４からの注目フレームのサブ領域のサブ領域特徴量を結合し、その結合結果を、注目フレームの特徴量として、特徴量記憶部２６に供給する。 The combination unit 25 combines the sub-region feature amounts of the sub-regions of the target frame from the sub-region feature amount extraction unit 24, and supplies the combination result to the feature amount storage unit 26 as the feature amount of the target frame.

特徴量記憶部２６は、特徴量抽出部２２（の結合部２５）から供給される学習用コンテンツの各フレームの特徴量を時系列に記憶する。 The feature amount storage unit 26 stores the feature amount of each frame of the learning content supplied from the feature amount extraction unit 22 (the combination unit 25) in time series.

学習部２７は、特徴量記憶部２６に記憶された学習用コンテンツの各フレームの特徴量を用いて、クラスタ学習を行う。 The learning unit 27 performs cluster learning using the feature amount of each frame of the learning content stored in the feature amount storage unit 26.

すなわち、学習部２７は、特徴量記憶部２６に記憶された学習用コンテンツの各フレームの特徴量（ベクトル）を用いて、その特徴量の空間である特徴量空間を、複数のクラスタに分割するクラスタ学習を行い、クラスタの情報であるクラスタ情報を求める。 That is, the learning unit 27 divides a feature amount space, which is a space of the feature amount, into a plurality of clusters using the feature amount (vector) of each frame of the learning content stored in the feature amount storage unit 26. Cluster learning is performed to obtain cluster information that is cluster information.

ここで、クラスタ学習としては、例えば、k-means法を採用することができる。クラスタ学習として、k-means法を採用する場合、クラスタ学習の結果得られるクラスタ情報は、特徴量空間のクラスタを代表する代表ベクトルと、その代表ベクトル（が代表するクラスタ）を表すコードとが対応付けられたコードブックとなる。 Here, as the cluster learning, for example, the k-means method can be adopted. When the k-means method is used for cluster learning, the cluster information obtained as a result of cluster learning corresponds to a representative vector that represents a cluster in the feature space and a code that represents the representative vector (the cluster that it represents). It will be a codebook attached.

なお、k-means法では、注目する注目クラスタの代表ベクトルは、学習用コンテンツの特徴量（ベクトル）の中で、注目クラスタに属する特徴量（コードブックの各代表ベクトルとの距離（ユークリッド距離）の中で、注目クラスタの代表ベクトルとの距離が最も短い特徴量）の平均値（ベクトル）となる。 In the k-means method, the representative vector of the target cluster of interest is the feature amount (distance from each representative vector of the codebook (Euclidean distance)) among the feature amount (vector) of the learning content. The average value (vector) of the feature amount having the shortest distance from the representative vector of the cluster of interest.

学習部２７は、さらに、学習用コンテンツから得られたクラスタ情報を用いて、特徴量記憶部２６に記憶された学習用コンテンツの各フレームの特徴量を複数のクラスタのうちのいずれかのクラスタにクラスタリングすることにより、その特徴量が属するクラスタを表すコードを求めることで、学習用コンテンツの特徴量の時系列を、コード系列に変換する（学習用コンテンツのコード系列を求める）。 The learning unit 27 further uses the cluster information obtained from the learning content to convert the feature amount of each frame of the learning content stored in the feature amount storage unit 26 into one of the plurality of clusters. By performing clustering, a code representing a cluster to which the feature amount belongs is obtained, thereby converting a time series of the feature amount of the learning content into a code sequence (determining a code sequence of the learning content).

ここで、クラスタ学習として、k-means法を採用する場合、そのクラスタ学習によって得られるクラスタ情報としてのコードブックを用いて行われるクラスタリングは、ベクトル量子化となる。 Here, when the k-means method is adopted as the cluster learning, the clustering performed using the code book as the cluster information obtained by the cluster learning is vector quantization.

ベクトル量子化では、コードブックの代表ベクトルそれぞれについて、特徴量（ベクトル）との距離が計算され、その距離が最小となる代表ベクトルのコードが、ベクトル量子化結果として出力される。 In vector quantization, for each representative vector of the codebook, the distance from the feature quantity (vector) is calculated, and the code of the representative vector that minimizes the distance is output as the vector quantization result.

学習部２７は、学習用コンテンツの特徴量の時系列をクラスタリングすることにより、コード系列に変換すると、そのコード系列を用いて、状態遷移モデルの学習であるモデル学習を行う。 When the learning unit 27 converts the time series of the feature amount of the learning content into a code series by clustering, the learning unit 27 performs model learning that is learning of the state transition model using the code series.

そして、学習部２７は、モデル学習後の状態遷移確率モデルと、クラスタ学習により得られるクラスタ情報とのセットを、コンテンツモデルとして、学習用コンテンツのカテゴリと対応付けて、モデル記憶部１３に供給する。 Then, the learning unit 27 supplies a set of the state transition probability model after model learning and the cluster information obtained by cluster learning as a content model to the model storage unit 13 in association with the learning content category. .

したがって、コンテンツモデルは、状態遷移確率モデルと、クラスタ情報とから構成される。 Therefore, the content model is composed of a state transition probability model and cluster information.

ここで、コンテンツモデルを構成する状態遷移確率モデル（コード系列を用いて学習が行われる状態遷移確率モデル）を、以下、コードモデルともいう。 Here, a state transition probability model (a state transition probability model in which learning is performed using a code sequence) constituting the content model is also referred to as a code model.

［状態遷移確率モデル］
図４乃至図７を参照して、図３の学習部２７がモデル学習を行う状態遷移確率モデルについて説明する。 [State transition probability model]
A state transition probability model in which the learning unit 27 in FIG. 3 performs model learning will be described with reference to FIGS.

状態遷移確率モデルとしては、例えば、HMM(Hidden Marcov Model)を採用することができる。状態遷移確率モデルとして、HMMを採用する場合、HMMの学習は、例えば、Baum-Welchの再推定法によって行われる。 For example, an HMM (Hidden Marcov Model) can be adopted as the state transition probability model. When the HMM is adopted as the state transition probability model, the HMM learning is performed by, for example, the Baum-Welch re-estimation method.

図４は、left-to-right型のHMMの一例を示している。 FIG. 4 shows an example of a left-to-right type HMM.

left-to-right型のHMMは、状態が、左から右方向に、一直線上に並んだHMMであり、自己遷移（ある状態から、その状態への遷移）と、ある状態から、その状態よりも右側にある状態への遷移とを行うことができる。left-to-right型のHMMは、例えば、音声認識等で用いられる。 A left-to-right type HMM is an HMM in which the states are aligned in a straight line from left to right. From the state to the self-transition (transition from one state to the state) Can also transition to the state on the right. The left-to-right type HMM is used, for example, for speech recognition.

図４のHMMは、３つの状態s1,s2,s3から構成され、状態遷移として、自己遷移と、ある状態から、その右隣の状態への遷移とが許されている。 The HMM shown in FIG. 4 includes three states s1, s2, and s3. As a state transition, a self transition and a transition from a certain state to a state on the right side thereof are permitted.

なお、HMMは、状態s_iの初期確率π_i、状態遷移確率a_ij、及び、状態s_iから、所定の観測値oが観測される観測確率b_i(o)で規定される。 The HMM is defined by the initial probability π _i of the state s _i , the state transition probability a _ij , and the observation probability b _i (o) at which a predetermined observation value o is observed from the state s _i .

ここで、初期確率π_iは、状態s_iが、初期の状態（最初の状態）である確率であり、left-to-right型のHMMでは、最も左側の状態s₁の初期確率π₁は、1.0とされ、他の状態s_iの初期確率π_iは、0.0とされる。 Here, the initial probability [pi _i, the state s _i is the probability of the initial state (initial state), the left-to-right type HMM, the initial probability [pi ₁ of the leftmost state s ₁ is is 1.0, the initial probability [pi _i of the other state s _i, it is 0.0.

状態遷移確率a_iｊは、状態s_iから状態s_jに遷移する確率である。 The state transition probability a _ij is a probability of transition from the state s _i to the state s _j .

観測確率b_i(o)は、状態s_iへの状態遷移時に、状態s_iから観測値oが観測される確率である。観測確率b_i(o)としては、観測値oが離散値である場合には、確率となる値（離散値）が用いられるが、観測値oが連続値である場合には、確率分布関数が用いられる。確率分布関数としては、例えば、平均値（平均ベクトル）と分散（共分散行列）とで定義されるガウス分布等を採用することができる。なお、本実施の形態では、観測値oとして、離散値が用いられる。 Observation probability b _i (o), upon state transition to the state s _i, a probability that the observed value o is observed from the state s _i. As the observation probability b _i (o), when the observation value o is a discrete value, a probability value (discrete value) is used, but when the observation value o is a continuous value, the probability distribution function Is used. As the probability distribution function, for example, a Gaussian distribution defined by an average value (average vector) and a variance (covariance matrix) can be employed. In the present embodiment, a discrete value is used as the observed value o.

図５は、エルゴディック(Ergodic)型のHMMの一例を示している。 FIG. 5 shows an example of an Ergodic type HMM.

エルゴディック型のHMMは、状態遷移に制約がないHMM、すなわち、任意の状態s_iから任意の状態s_jへの状態遷移が可能なHMMである。 An ergodic type HMM is an HMM with no restrictions on state transition, that is, an HMM capable of state transition from an arbitrary state s _i to an arbitrary state s _j .

図５のHMMは、３つの状態s₁,s₂,s₃から構成され、任意の状態遷移が許されている。 The HMM in FIG. 5 includes three states s ₁ , s ₂ , and s ₃ , and arbitrary state transitions are permitted.

エルゴディック型のHMMは、状態遷移の自由度が最も高いHMMであるが、状態数が多くなると、HMMのパラメータ（初期確率π_i、状態遷移確率a_ij、及び、観測確率b_i(o)）の初期値によっては、ローカルミニマムに収束し、適切なパラメータを得られないことがある。 The ergodic HMM is the HMM having the highest degree of freedom of state transition. However, as the number of states increases, the HMM parameters (initial probability π _i , state transition probability a _ij , and observation probability b _i (o) Depending on the initial value of), it may converge to the local minimum and an appropriate parameter may not be obtained.

そこで、「自然界の現象の殆どや、ビデオコンテンツを生み出すカメラワークや番組構成は、スモールワールドネットワークのようなスパースな結合によって表現可能である」という仮説を採用し、学習部２７での学習には、状態遷移を、スパース(Sparse)な構造に制約したHMMを採用することとする。 Therefore, the learning unit 27 adopts the hypothesis that “most of the phenomena in the natural world, camera work and program structure that generate video content can be expressed by a sparse connection like a small world network”. Let us adopt an HMM in which state transitions are restricted to a sparse structure.

ここで、スパースな構造とは、ある状態から任意の状態への状態遷移が可能なエルゴディック型のHMMのような密な状態遷移ではなく、ある状態から状態遷移することができる状態が非常に限定されている構造（状態遷移が疎らな構造）である。 Here, a sparse structure is not a dense state transition such as an ergodic HMM that can make a state transition from a certain state to an arbitrary state, but a state that can make a state transition from a certain state is very It is a limited structure (a structure in which state transition is sparse).

なお、ここでは、スパースな構造であっても、他の状態への状態遷移は、少なくとも１つ存在し、また、自己遷移は存在することとする。 Note that here, even in a sparse structure, at least one state transition to another state exists, and a self-transition exists.

図６は、スパースな構造のHMMである2次元近傍拘束HMMの一例を示している。 FIG. 6 shows an example of a two-dimensional neighborhood constrained HMM that is an HMM having a sparse structure.

図６のＡ及び図６のＢのHMMには、スパースな構造であることの他、HMMを構成する状態を、２次元平面上に、格子状に配置する制約が課されている。 In addition to the sparse structure, the HMM in FIG. 6A and FIG. 6B has a constraint that the states constituting the HMM are arranged in a lattice pattern on a two-dimensional plane.

ここで、図６のＡのHMMでは、他の状態への状態遷移が、横に隣接する状態と、縦に隣接する状態とに制限されている。図６のＢのHMMでは、他の状態への状態遷移が、横に隣接する状態、縦に隣接する状態、及び、斜めに隣接する状態に制限されている。 Here, in the HMM in FIG. 6A, the state transition to another state is limited to a horizontally adjacent state and a vertically adjacent state. In the HMM of FIG. 6B, the state transition to another state is limited to a horizontally adjacent state, a vertically adjacent state, and a diagonally adjacent state.

図７は、スパースな構造のHMMの、2次元近傍拘束HMM以外の一例を示している。 FIG. 7 shows an example of an HMM having a sparse structure other than the two-dimensional neighborhood constraint HMM.

すなわち、図７のＡは、３次元グリッド制約によるHMMの例を示している。図７のＢは、２次元ランダム配置制約によるHMMの例を示している。図７のＣは、スモールワールドネットワークによるHMMの例を示している。 That is, A in FIG. 7 shows an example of an HMM with a three-dimensional grid constraint. FIG. 7B shows an example of an HMM based on a two-dimensional random arrangement constraint. FIG. 7C shows an example of an HMM by a small world network.

図３の学習部２７では、状態が、例えば、100乃至数百個程度の、図６や図７に示したスパースな構造のHMMの学習が、特徴量記憶部２６に記憶された画像の（フレームから抽出された）特徴量のコード系列を用い、Baum-Welchの再推定法によって行われる。 In the learning unit 27 in FIG. 3, the learning of the HMM having the sparse structure shown in FIGS. 6 and 7 having the state of, for example, about 100 to several hundreds is performed on the image stored in the feature amount storage unit 26 ( This is performed by the Baum-Welch re-estimation method using the feature code sequence (extracted from the frame).

学習部２７での学習の結果得られるコードモデルであるHMMは、コンテンツの画像(Visual)の特徴量のみを用いた学習によって得られるので、Visual HMMと呼ぶことができる。 The HMM, which is a code model obtained as a result of learning in the learning unit 27, can be called a Visual HMM because it is obtained by learning using only the feature amount of the content image (Visual).

ここで、HMMの学習（モデル学習）に用いられる、特徴量のコード系列は、離散値であり、HMMの観測確率bi(o)としては、確率となる値が用いられる。 Here, the code sequence of the feature amount used for HMM learning (model learning) is a discrete value, and a probability value is used as the observation probability bi (o) of the HMM.

なお、HMMについては、例えば、Laurence Rabiner, Biing-Hwang Juang 共著、「音声認識の基礎（上・下）、ＮＴＴアドバンステクノロジ株式会社」や、本件出願人が先に提案した特願2008-064993号に記載されている。また、エルゴティック型のHMMや、スパースな構造のHMMの利用については、例えば、本件出願人が先に提案した特開2009-223444号公報に記載されている。 Regarding HMM, for example, co-authored by Laurence Rabiner and Biing-Hwang Juang, “Basics of Speech Recognition (Up / Down), NTT Advanced Technology Co., Ltd.” and Japanese Patent Application No. 2008-064993 previously proposed by the applicant. It is described in. The use of an ergotic type HMM or a sparse structure HMM is described in, for example, Japanese Unexamined Patent Application Publication No. 2009-223444 previously proposed by the present applicant.

［特徴量の抽出］
図８は、図３の特徴量抽出部２２による特徴量の抽出の処理を示している。 [Feature extraction]
FIG. 8 shows a feature amount extraction process by the feature amount extraction unit 22 of FIG.

特徴量抽出部２２において、フレーム分割部２３には、学習用コンテンツ選択部２１からの学習用コンテンツの画像の各フレームが、時系列に供給される。 In the feature amount extraction unit 22, each frame of the learning content image from the learning content selection unit 21 is supplied to the frame division unit 23 in time series.

フレーム分割部２３は、学習用コンテンツ選択部２１から時系列に供給される学習用コンテンツのフレームを、順次、注目フレームとし、注目フレームを、複数のサブ領域R_kに分割して、サブ領域特徴量抽出部２４に供給する。 The frame dividing unit 23 sequentially sets the frames of the learning content supplied in time series from the learning content selecting unit 21 as the attention frame, divides the attention frame into a plurality of subregions R _k, and subregion features It supplies to the quantity extraction part 24.

ここで、図８では、注目フレームが、横×縦が4×4個の１６個のサブ領域R₁，R₂，・・・，R₁₆に等分されている。 Here, in FIG. 8, the frame of interest is equally divided into ₁₆ sub-regions R ₁ , R ₂ ,.

なお、１フレームをサブ領域R_kに分割するときの、サブ領域R_kの数は、4×4個の16個に限定されるものではない。すなわち、１フレームは、例えば、5×4個の20個のサブ領域R_kや、5×5個の25個のサブ領域R_k等に分割することができる。 Note that the number of sub-regions R _k when dividing one frame into sub-regions R _k is not limited to 16 of 4 × 4. That is, one frame can be divided into, for example, 5 × 4 20 sub-regions R _k and 5 × 5 25 sub-regions R _k .

また、図８では、１フレームが、同一のサイズのサブ領域R_kに分割（等分）されているが、サブ領域のサイズは、同一でなくても良い。すなわち、例えば、フレームの中央部分は、小さなサイズのサブ領域に分割し、フレームの周辺部分（画枠に隣接する部分等）は、大きなサイズのサブ領域に分割することができる。 Further, in FIG. 8, one frame have been divided into sub-regions R _k of the same size (equal), the size of the sub regions may not be the same. That is, for example, the central portion of the frame can be divided into small-sized sub-regions, and the peripheral portion of the frame (such as a portion adjacent to the image frame) can be divided into large-sized sub-regions.

サブ領域特徴量抽出部２４（図３）は、フレーム分割部２３からの注目フレームの各サブ領域R_kのサブ領域特徴量f_k=FeatExt(R_k)を抽出し、結合部２５に供給する。 The sub-region feature quantity extraction unit 24 (FIG. 3) extracts the sub-region feature quantity f _k = FeatExt (R _k ) of each sub-region R _k of the frame of interest from the frame division unit 23 and supplies it to the combining unit 25. .

すなわち、サブ領域特徴量抽出部２４は、サブ領域R_kの画素値（例えば、RGB成分や、YUV成分等）を用い、サブ領域R_kの大域的な特徴量を、サブ領域特徴量f_kとして求める。 That is, the sub-region feature quantity extraction unit 24 uses the pixel values of the sub-area R _k (for example, RGB components, YUV components, etc.) and converts the global feature quantity of the sub-area R _k into the sub-area feature quantity f _k. Asking.

ここで、サブ領域R_kの大域的な特徴量とは、サブ領域R_kを構成する画素の位置の情報を用いずに、画素値だけを用いて、加法的に計算される、例えば、ヒストグラムのような特徴量をいう。 Here, the global feature amount of the sub region R _k, without using the information of the position of the pixels constituting the sub region R _k, using only pixel values, is additively calculated, for example, a histogram This means the feature quantity.

大域的な特徴量としては、例えば、GISTと呼ばれる特徴量を採用することができる。GISTについては、例えば、A. Torralba, K. Murphy, W. Freeman, M. Rubin, "Context-based vision system for place and object recognition", IEEE Int. Conf. Computer Vision, vol. 1, no. 1, pp. 273-280, 2003.に、詳細が記載されている。 As the global feature quantity, for example, a feature quantity called GIST can be adopted. Regarding GIST, for example, A. Torralba, K. Murphy, W. Freeman, M. Rubin, "Context-based vision system for place and object recognition", IEEE Int. Conf. Computer Vision, vol. 1, no. 1 , pp. 273-280, 2003.

なお、大域的な特徴量は、GISTに限定されるものではない。すなわち、大域的な特徴量は、局所的な位置、明度、視点等の見えの変化に対して頑強な（変化を吸収するような）（Robustな）特徴量であれば良い。そのような特徴量としては、例えば、HLCA（局所高次相関）や、LBP(Local Binary Patterns)、カラーヒストグラム等がある。 The global feature amount is not limited to GIST. That is, the global feature value may be a feature value that is robust (absorbs change) (robust) with respect to changes in appearance such as local position, brightness, and viewpoint. Such feature amounts include, for example, HLCA (Local Higher Order Correlation), LBP (Local Binary Patterns), and a color histogram.

HLCAについては、例えば、N. Otsu, T. Kurita, "A new scheme for practical flexible and intelligent vision systems", Proc. IAPR Workshop on Computer Vision, pp.431-435, 1988に、詳細が記載されている。LBPについては、例えば、Ojala T, Pietikainen M & Maenpaa T, "Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns", IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7):971-987に、詳細が記載されている（Pietikainen、及び、Maenpaaの"a"は、正確には、"a"の上部に、"・・"を付加した文字）。 Details of HLCA are described in, for example, N. Otsu, T. Kurita, "A new scheme for practical flexible and intelligent vision systems", Proc. IAPR Workshop on Computer Vision, pp.431-435, 1988. . For details on LBP, see, for example, Ojala T, Pietikainen M & Maenpaa T, "Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns", IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (7): 971-987. (Pietikainen and Maenpaa's "a" is exactly the letter with "..." added to the top of "a").

ここで、上述のGISTや、LBP，HLCA、カラーヒストグラム等の大域的な特徴量は、次元数が大となる傾向があるが、次元間の相関が高い傾向もある。 Here, global feature quantities such as GIST, LBP, HLCA, and color histogram described above tend to have a large number of dimensions, but also tend to have a high correlation between dimensions.

そこで、サブ領域特徴量抽出部２４（図３）では、サブ領域R_kから、GIST等を抽出した後、そのGIST等の主成分分析(PCA(principal component analysis))を行うことができる。そして、サブ領域特徴量抽出部２４では、PCAの結果に基づき、累積寄与率が、ある程度高い値（例えば、95%等以上の値）となるように、GIST等の次元数を圧縮（制限）し、その圧縮結果を、サブ領域特徴量とすることができる。 Therefore, in the sub region feature amount extracting unit 24 (FIG. 3), from the sub-region R _k, after extracting the GIST or the like, it is possible to perform a principal component analysis of the GIST or the like (PCA (principal component analysis)) . Then, the sub-region feature quantity extraction unit 24 compresses (limits) the number of dimensions such as GIST so that the cumulative contribution rate becomes a somewhat high value (for example, a value of 95% or more) based on the PCA result. The compression result can be used as a sub-region feature amount.

この場合、GIST等を、次元数を圧縮したPCA空間に射影した射影ベクトルが、GIST等の次元数を圧縮した圧縮結果となる。 In this case, a projection vector obtained by projecting GIST or the like onto a PCA space in which the number of dimensions is compressed becomes a compression result obtained by compressing the number of dimensions such as GIST.

結合部２５（図３）は、サブ領域特徴量抽出部２４からの注目フレームのサブ領域R₁乃至R₁₆のサブ領域特徴量f₁乃至f₁₆を結合し、その結合結果を、注目フレームの特徴量として、特徴量記憶部２６に供給する。 The combining unit 25 (FIG. 3) combines the sub-region feature amounts f _{1 to} f ₁₆ of the sub-regions R _{1 to} R ₁₆ of the target frame from the sub-region feature amount extraction unit 24, and the result of the combination is displayed as the target frame. The feature value is supplied to the feature value storage unit 26.

すなわち、結合部２５は、サブ領域特徴量抽出部２４からのサブ領域特徴量f₁乃至f₁₆を結合することにより、そのサブ領域特徴量f₁乃至f₁₆をコンポーネントとするベクトルを生成し、そのベクトルを、注目フレームの特徴量F_tとして、特徴量記憶部２６に供給する。 That is, the combining unit 25 combines the sub-region feature amounts f _{1 to} f ₁₆ from the sub-region feature amount extraction unit 24 to generate a vector having the sub-region feature amounts f _{1 to} f ₁₆ as components, The vector is supplied to the feature amount storage unit 26 as the feature amount _Ft of the frame of interest.

ここで、図８では、時刻tのフレーム（フレームt）が、注目フレームとなっている。 Here, in FIG. 8, the frame at time t (frame t) is the frame of interest.

図３の特徴量抽出部２２では、学習用コンテンツの各フレームが、先頭から順次、注目フレームとされ、上述したようにして、特徴量Ftが求められる。そして、学習用コンテンツの各フレームの特徴量Ftは、時系列に（時間的な前後関係を維持した状態で）、特徴量抽出部２２から特徴量記憶部２６に供給されて記憶される。 In the feature amount extraction unit 22 in FIG. 3, each frame of the learning content is sequentially set as a frame of interest from the top, and the feature amount Ft is obtained as described above. Then, the feature amount Ft of each frame of the learning content is supplied and stored from the feature amount extraction unit 22 to the feature amount storage unit 26 in time series (in a state in which the temporal context is maintained).

以上のように、特徴量抽出部２２では、サブ領域特徴量f_kとして、サブ領域R_kの大域的な特徴量が求められ、そのサブ領域特徴量f_kをコンポーネントとするベクトルが、フレームの特徴量F_tとして求められる。 As described above, the feature amount extraction unit 22, a sub-region feature f _k, global feature amount of the sub region R _k is determined, a vector which its sub region feature amount f _k and components, frames determined as the feature amount F _t.

したがって、フレームの特徴量F_tは、局所的な変化（サブ領域内で起こる変化）に対しては頑強であるが、フレーム全体としてのパターンの配置の変化に対してはディスクリミネイティブ（鋭敏に違いを見分ける性質）であるような特徴量となる。 Therefore, the frame feature value F _t is robust against local changes (changes that occur within a sub-region), but is discriminative (sensitive) to changes in the pattern arrangement of the entire frame. It is a feature quantity that is a property that distinguishes differences.

［コンテンツモデル学習処理］
次に、図９のフローチャートを参照して、図３のコンテンツモデル学習部１２が行う処理（コンテンツモデル学習処理）を説明する。 [Content model learning process]
Next, processing (content model learning processing) performed by the content model learning unit 12 of FIG. 3 will be described with reference to the flowchart of FIG.

ステップＳ１１において、学習用コンテンツ選択部２１は、コンテンツ記憶部１１に記憶されたコンテンツの中から、所定のカテゴリに属する１以上のコンテンツを、学習用コンテンツとして選択する。 In step S <b> 11, the learning content selection unit 21 selects one or more contents belonging to a predetermined category from the contents stored in the content storage unit 11 as learning contents.

すなわち、例えば、学習用コンテンツ選択部２１は、コンテンツ記憶部１１に記憶されたコンテンツの中から、まだ、学習用コンテンツとしていない任意の１つのコンテンツを、学習用コンテンツとして選択する。 That is, for example, the learning content selection unit 21 selects any one content that has not yet been set as the learning content from the content stored in the content storage unit 11 as the learning content.

さらに、学習用コンテンツ選択部２１は、学習用コンテンツとして選択した１つのコンテンツのカテゴリを認識し、そのカテゴリに属する他のコンテンツが、コンテンツ記憶部１１に記憶されている場合には、そのコンテンツ（他のコンテンツ）を、さらに、学習用コンテンツとして選択する。 Furthermore, the learning content selection unit 21 recognizes a category of one content selected as the learning content, and when other content belonging to the category is stored in the content storage unit 11, the content ( Other content) is further selected as learning content.

学習用コンテンツ選択部２１は、学習用コンテンツを、特徴量抽出部２２に供給し、処理は、ステップＳ１１からステップＳ１２に進む。 The learning content selection unit 21 supplies the learning content to the feature amount extraction unit 22, and the process proceeds from step S11 to step S12.

ステップＳ１２では、特徴量抽出部２２のフレーム分割部２３が、学習用コンテンツ選択部２１からの学習用コンテンツの中の、まだ、注目学習用コンテンツ（以下、注目コンテンツともいう）に選択していない学習用コンテンツの１つを、注目コンテンツに選択する。 In step S12, the frame dividing unit 23 of the feature amount extraction unit 22 has not yet selected the learning content from the learning content selection unit 21 as the attention learning content (hereinafter also referred to as attention content). One of the learning contents is selected as the attention content.

そして、処理は、ステップＳ１２からステップＳ１３に進み、フレーム分割部２３は、注目コンテンツのフレームのうちの、まだ、注目フレームとしていない、時間的に最も先行するフレームを、注目フレームに選択し、処理は、ステップＳ１４に進む。 Then, the process proceeds from step S12 to step S13, and the frame dividing unit 23 selects, as the attention frame, the most temporally preceding frame that has not yet been set as the attention frame among the frames of the attention content. Advances to step S14.

ステップＳ１４では、フレーム分割部２３は、注目フレームを、複数のサブ領域に分割し、サブ領域特徴量抽出部２４に供給して、処理は、ステップＳ１５に進む。 In step S14, the frame dividing unit 23 divides the frame of interest into a plurality of sub-regions and supplies the sub-region feature amount extracting unit 24, and the process proceeds to step S15.

ステップＳ１５では、サブ領域特徴量抽出部２４は、フレーム分割部２３からの複数のサブ領域それぞれのサブ領域特徴量を抽出し、結合部２５に供給して、処理は、ステップＳ１６に進む。 In step S15, the sub-region feature value extraction unit 24 extracts the sub-region feature values of each of the plurality of sub-regions from the frame division unit 23, supplies them to the combining unit 25, and the process proceeds to step S16.

ステップＳ１６では、結合部２５は、サブ領域特徴量抽出部２４からの、注目フレームを構成する複数のサブ領域それぞれのサブ領域特徴量を結合することで、注目フレームの特徴量を生成し、処理は、ステップＳ１７に進む。 In step S16, the combining unit 25 generates a feature amount of the target frame by combining the sub-region feature amounts of the plurality of sub-regions constituting the target frame from the sub-region feature amount extraction unit 24, and performs processing. Advances to step S17.

ステップＳ１７では、フレーム分割部２３は、注目コンテンツのすべてのフレームを注目フレームとしたかどうかを判定する。 In step S <b> 17, the frame dividing unit 23 determines whether or not all frames of the content of interest have been used as the frame of interest.

ステップＳ１７において、注目コンテンツのフレームの中に、まだ、注目フレームとしていないフレームがあると判定された場合、処理は、ステップＳ１３に戻り、以下、同様の処理が繰り返される。 If it is determined in step S17 that there is a frame that has not yet been set as the target frame among the frames of the target content, the process returns to step S13, and the same process is repeated thereafter.

また、ステップＳ１７において、注目コンテンツのすべてのフレームを注目フレームとしたと判定された場合、処理は、ステップＳ１８に進み、結合部２５は、注目コンテンツについて求めた注目コンテンツの各フレームの特徴量（の時系列）を、特徴量記憶部２６に供給して記憶させる。 If it is determined in step S17 that all the frames of the content of interest have been used as the frame of interest, the process proceeds to step S18, and the combining unit 25 determines the feature amount ( Are supplied to the feature amount storage unit 26 and stored therein.

そして、処理は、ステップＳ１８からステップＳ１９に進み、フレーム分割部２３は、学習用コンテンツ選択部２１からの学習用コンテンツのすべてを、注目コンテンツとしたかどうかを判定する。 Then, the process proceeds from step S18 to step S19, and the frame dividing unit 23 determines whether or not all of the learning content from the learning content selection unit 21 is the content of interest.

ステップＳ１９において、学習用コンテンツの中に、まだ、注目コンテンツとしていない学習用コンテンツがあると判定された場合、処理は、ステップＳ１２に戻り、以下、同様の処理が繰り返される。 If it is determined in step S19 that there is a learning content that has not yet been set as the content of interest in the learning content, the processing returns to step S12, and the same processing is repeated thereafter.

また、ステップＳ１９において、学習用コンテンツのすべてを、注目コンテンツとしたと判定された場合、処理は、ステップＳ２０に進み、学習部２７は、特徴量記憶部２６に記憶された、学習用コンテンツの特徴量（各フレームの特徴量の時系列）を用いて、コンテンツモデルの学習を行う。 If it is determined in step S19 that all of the learning content is the content of interest, the process proceeds to step S20, and the learning unit 27 stores the learning content stored in the feature amount storage unit 26. The content model is learned using the feature amount (the time series of the feature amount of each frame).

すなわち、学習部２７は、特徴量記憶部２６に記憶された学習用コンテンツの各フレームの特徴量（ベクトル）を用いて、その特徴量の空間である特徴量空間を、複数のクラスタに分割するクラスタ学習を、k-means法によって行い、既定数としての、例えば、100乃至数100のクラスタ（代表ベクトル）のコードブックを、クラスタ情報として求める。 That is, the learning unit 27 divides a feature amount space, which is a space of the feature amount, into a plurality of clusters using the feature amount (vector) of each frame of the learning content stored in the feature amount storage unit 26. Cluster learning is performed by the k-means method, and a codebook of, for example, 100 to several hundred clusters (representative vectors) as a predetermined number is obtained as cluster information.

さらに、学習部２７は、クラスタ学習によって得られたクラスタ情報としてのコードブックを用いて、特徴量記憶部２６に記憶された学習用コンテンツの各フレームの特徴量をクラスタリングするベクトル量子化を行い、学習用コンテンツの特徴量の時系列を、コード系列に変換する。 Further, the learning unit 27 performs vector quantization for clustering the feature amounts of each frame of the learning content stored in the feature amount storage unit 26 using the code book as the cluster information obtained by the cluster learning, The time series of the feature amount of the learning content is converted into a code series.

学習部２７は、学習用コンテンツの特徴量の時系列をクラスタリングすることにより、コード系列に変換すると、そのコード系列を用いて、HMM（離散HMM）の学習であるモデル学習を行う。 The learning unit 27 performs model learning, which is learning of an HMM (discrete HMM), by clustering the time series of the feature amount of the content for learning and converting it into a code series, using the code series.

そして、学習部２７は、モデル学習後のHMMであるコードモデルと、クラスタ学習により得られるクラスタ情報としてのコードブックとのセットを、コンテンツモデルとして、学習用コンテンツのカテゴリと対応付けて、モデル記憶部１３に出力（供給）し、コンテンツモデル学習処理を終了する。 Then, the learning unit 27 associates a set of a code model that is an HMM after model learning and a code book as cluster information obtained by cluster learning as a content model with a category of learning content, and stores the model Output (supply) to the unit 13, and the content model learning process is terminated.

なお、コンテンツモデル学習処理は、任意のタイミングで開始することができる。 The content model learning process can be started at an arbitrary timing.

以上のコンテンツモデル学習処理によれば、コードモデルであるHMMにおいて、学習用コンテンツに潜む、コンテンツの構造（例えば、番組構成や、カメラワーク等が作り出す構造）が自己組織的に獲得される。 According to the content model learning process described above, in the HMM that is a code model, the content structure (for example, a program structure, a structure created by camera work, etc.) hidden in the learning content is acquired in a self-organizing manner.

その結果、コンテンツモデル学習処理よって得られるコンテンツモデルにおけるコードモデルとしてのHMMの各状態は、学習によって獲得されたコンテンツの構造の要素に対応し、状態遷移は、コンテンツの構造の要素どうしの間での、時間的な遷移を表現する。 As a result, each state of the HMM as a code model in the content model obtained by the content model learning process corresponds to an element of the content structure acquired by learning, and the state transition is between the elements of the content structure. Of time transitions.

そして、コードモデルの状態は、特徴量空間（特徴量抽出部２２（図３）で抽出される特徴量の空間）において、空間的な距離が近く、かつ、時間的な前後関係が似ているフレーム群（つまり「似たシーン」）をまとめて表現する。 The state of the code model is close to the spatial distance in the feature amount space (the feature amount space extracted by the feature amount extraction unit 22 (FIG. 3)) and has similar temporal context. Represent a group of frames (ie “similar scenes”) together.

［シンボル列生成部１４の構成例］
図１０は、図１のシンボル列生成部１４の構成例を示している。 [Configuration Example of Symbol Sequence Generation Unit 14]
FIG. 10 shows a configuration example of the symbol string generation unit 14 of FIG.

シンボル列生成部１４は、コンテンツ選択部３１、モデル選択部３２、特徴量抽出部３３、及び最尤状態系列推定部３４から構成される。 The symbol string generation unit 14 includes a content selection unit 31, a model selection unit 32, a feature amount extraction unit 33, and a maximum likelihood state sequence estimation unit 34.

コンテンツ選択部３１は、制御部１６からの制御に従い、コンテンツ記憶部１１に記憶されたコンテンツの中から、シンボル列を生成するためのコンテンツを、注目コンテンツとして選択する。 The content selection unit 31 selects content for generating a symbol string as content of interest from the content stored in the content storage unit 11 according to control from the control unit 16.

なお、制御部１６は、例えば、操作部１７からの、ユーザの選択操作に対応する操作信号に基づいて、コンテンツ選択部３１を制御し、ユーザの選択操作により選択されたコンテンツを、注目コンテンツとして選択させる。 For example, the control unit 16 controls the content selection unit 31 based on an operation signal corresponding to the user's selection operation from the operation unit 17, and sets the content selected by the user's selection operation as the content of interest. Let them choose.

また、コンテンツ選択部３１は、注目コンテンツを、特徴量抽出部３３に供給する。さらに、コンテンツ選択部３１は、注目コンテンツのカテゴリを認識し、モデル選択部３２に供給する。 In addition, the content selection unit 31 supplies the content of interest to the feature amount extraction unit 33. Further, the content selection unit 31 recognizes the category of the content of interest and supplies it to the model selection unit 32.

モデル選択部３２は、モデル記憶部１３に記憶されたコンテンツモデルの中から、コンテンツ選択部３１からの、注目コンテンツのカテゴリと一致するカテゴリのコンテンツモデル（注目コンテンツのカテゴリに対応付けられたコンテンツモデル）を、注目モデルに選択する。 The model selection unit 32 selects a content model in the category that matches the category of the content of interest from the content selection unit 31 from among the content models stored in the model storage unit 13 (a content model associated with the category of the content of interest). ) Is selected as the model of interest.

そして、モデル選択部３２は、注目モデルを、最尤状態系列推定部３４に供給する。 Then, the model selection unit 32 supplies the model of interest to the maximum likelihood state sequence estimation unit 34.

特徴量抽出部３３は、コンテンツ選択部３１から供給される注目コンテンツの（画像の）各フレームの特徴量を、図３の特徴量抽出部２２と同様にして抽出し、注目コンテンツの各フレームの特徴量（の時系列）を、最尤状態系列推定部３４に供給する。 The feature amount extraction unit 33 extracts the feature amount of each frame (image) of the content of interest supplied from the content selection unit 31 in the same manner as the feature amount extraction unit 22 of FIG. The feature amount (time series) is supplied to the maximum likelihood state sequence estimation unit 34.

最尤状態系列推定部３４は、モデル選択部３２からの注目モデルのクラスタ情報を用いて、特徴量抽出部３３からの注目コンテンツの特徴量（の時系列）をクラスタリングし、注目コンテンツの（特徴量の）コード系列を求める。 The maximum likelihood state sequence estimation unit 34 uses the cluster information of the attention model from the model selection unit 32 to cluster the feature amounts (time series) of the attention content from the feature amount extraction unit 33, and the (features) of the attention content Determine the code sequence.

さらに、最尤状態系列推定部３４は、例えば、Viterbiアルゴリズムに従い、モデル選択部３２からの注目モデルのコードモデルにおいて、特徴量抽出部３３からの注目コンテンツの（特徴量の）コード系列が観測される尤度が最も高い状態遷移が生じる状態系列である最尤状態系列（いわゆるビタビパスを構成する状態の系列）を推定する。 Further, the maximum likelihood state sequence estimation unit 34 observes the code sequence of the content of interest (feature amount) from the feature amount extraction unit 33 in the code model of the target model from the model selection unit 32 according to, for example, the Viterbi algorithm. A maximum likelihood state sequence (a sequence of states constituting a so-called Viterbi path), which is a state sequence in which a state transition with the highest likelihood occurs, is estimated.

そして、最尤状態系列推定部３４は、注目モデルのコードモデル（以下、注目コードモデルともいう）において、注目コンテンツのコード系列が観測される場合の最尤状態系列（以下、注目コンテンツに対する注目コードモデルの最尤状態系列ともいう）を、シンボル列として、分割部１５に供給する。 The maximum likelihood state sequence estimator 34 then obtains the maximum likelihood state sequence (hereinafter referred to as the attention code for the attention content) when the code sequence of the attention content is observed in the code model of the attention model (hereinafter also referred to as the attention code model). The model maximum likelihood state sequence) is supplied to the dividing unit 15 as a symbol string.

なお、最尤状態系列推定部３４は、注目コンテンツに対する注目コードモデルの最尤状態系列に代えて、クラスタリングにより求めた注目コンテンツのコード系列（クラスタIDの系列）を、シンボル列として、分割部１５に供給するようにしてもよい。 The maximum likelihood state sequence estimation unit 34 uses the code sequence (cluster ID sequence) of the content of interest obtained by clustering as a symbol string instead of the maximum likelihood state sequence of the code model of interest for the content of interest as a symbol sequence. You may make it supply to.

ここで、注目コンテンツに対する注目コードモデルの最尤状態系列の先頭を基準とする時刻tの状態（最尤状態系列を構成する、先頭からt番目の状態）を、s(t)と表すとともに、注目コンテンツのフレーム数を、Tと表すこととする。 Here, the state at time t (t-th state from the top constituting the maximum likelihood state sequence) with respect to the top of the maximum likelihood state sequence of the attention code model for the attention content is expressed as s (t), Let T denote the number of frames of the content of interest.

この場合、注目コンテンツに対する注目コードモデルの最尤状態系列は、T個の状態s(1)，S(2)，・・・，s(T)の系列であり、そのうちのt番目の状態（時刻tの状態）s(t)は、注目コンテンツの時刻tのフレーム（フレームt）に対応する。 In this case, the maximum likelihood state sequence of the attention code model for the attention content is a sequence of T states s (1), S (2),..., S (T), of which the t-th state ( The state (time t) s (t) corresponds to the frame (frame t) of the content of interest at time t.

また、注目コードモデルの状態の総数をNと表すこととすると、時刻tの状態s(t)は、N個の状態s₁，s₂，・・・，s_Nのうちのいずれかである。 Also, if it represents the total number of states of the attention code model and N, the state at time t s (t) is the one of N states _{_{s 1, s 2, ···,}} s N .

さらに、N個の状態s₁，s₂，・・・，s_Nのそれぞれには、状態を特定するインデクスである状態ID(Identification)が付されている。 Furthermore, each of the N states s ₁ , s ₂ ,..., S _N is assigned a state ID (Identification) that is an index for specifying the state.

いま、注目コンテンツに対する注目コードモデルの最尤状態系列の時刻tの状態s(t)が、N個の状態s₁乃至s_Nのうちのi番目の状態s_iであるとすると、時刻tのフレームは、状態s_iに対応する。 Assuming that the state s (t) at the time t of the maximum likelihood state sequence of the code model of interest for the content of interest is the i-th state s _i of the _N states s _{1 to} s _N , The frame corresponds to state s _i .

したがって、注目コンテンツの各フレームは、N個の状態s₁乃至s_Nのうちのいずれかに対応する。 Therefore, each frame of the content of interest corresponds to one of _N states s _{1 to} s _N.

注目コンテンツに対する注目コードモデルの最尤状態系列の実体は、注目コンテンツの各時刻tのフレームが対応する、N個の状態s₁乃至s_Nのうちのいずれかの状態の状態IDの系列である。 The entity of the maximum likelihood state sequence of the attention code model for the attention content is a state ID sequence of one of the _N states s _{1 to} s _N corresponding to the frame at each time t of the attention content. .

図１１は、図１０のシンボル列生成部１４が行うシンボル列生成処理の概要を示している。 FIG. 11 shows an outline of the symbol string generation process performed by the symbol string generation unit 14 of FIG.

図１１のＡは、コンテンツ選択部３１において注目コンテンツとして選択されたコンテンツのフレームの時系列を示している。 11A shows a time series of frames of content selected as content of interest by the content selection unit 31. FIG.

図１１のＢは、特徴量抽出部３３において抽出された、図１１のＡのフレームの時系列の特徴量の時系列を示している。 B of FIG. 11 shows a time series of time-series feature amounts extracted by the feature amount extraction unit 33 in the frame of A of FIG.

図１１のＣは、最尤状態系列推定部３４において、図１１のＢの特徴量の時系列をクラスタリングすることにより得られるコードのコード系列を示している。 C of FIG. 11 shows a code sequence of a code obtained by clustering the time series of the feature values of B of FIG. 11 in the maximum likelihood state sequence estimation unit 34.

図１１のＤは、最尤状態系列推定部３４において推定される、注目コードモデルにおいて、図１１のＣの注目コンテンツの（特徴量の時系列の）コード系列が観測される最尤状態系列（注目コンテンツに対する注目コードモデルの最尤状態系列）を示している。 D in FIG. 11 is a maximum likelihood state sequence in which the code sequence (of the feature amount time series) of the content of interest in FIG. 11C is observed in the attention code model estimated by the maximum likelihood state sequence estimation unit 34. The maximum likelihood state sequence of the attention code model for the attention content) is shown.

シンボル列生成部１４は、図１１のCに示されるコード系列をシンボル列として、分割部１５に供給する場合、コード系列を構成する各コード（クラスタID）を、シンボルとして、分割部１５に供給する。 When the code sequence shown in C of FIG. 11 is supplied as a symbol sequence to the dividing unit 15, the symbol sequence generating unit 14 supplies each code (cluster ID) constituting the code sequence to the dividing unit 15 as a symbol. To do.

また、シンボル列生成部１４は、図１１のDに示される最尤状態系列をシンボル列として、分割部１５に供給する場合、最尤状態系列を構成する各状態IDを、シンボルとして、分割部１５に供給する。 In addition, when the symbol sequence generation unit 14 supplies the maximum likelihood state sequence shown in D of FIG. 11 as a symbol sequence to the dividing unit 15, the symbol sequence generation unit 14 uses each state ID constituting the maximum likelihood state sequence as a symbol as a symbol. 15 is supplied.

[シンボル列生成部１４の動作説明]
次に、図１２のフローチャートを参照して、シンボル列生成部１４が行うシンボル列生成処理を説明する。 [Description of operation of symbol string generation unit 14]
Next, a symbol string generation process performed by the symbol string generation unit 14 will be described with reference to the flowchart of FIG.

このシンボル列生成処理は、例えば、ユーザが、操作部１７を用いて、コンテンツ記憶部１１に記憶されたコンテンツの中から、シンボル列を生成するためのコンテンツを選択する選択操作を行ったときに開始される。 This symbol sequence generation processing is performed when, for example, the user performs a selection operation for selecting content for generating a symbol sequence from the content stored in the content storage unit 11 using the operation unit 17. Be started.

このとき、操作部１７は、ユーザの選択操作に対応する操作信号を、制御部１６に供給する。制御部１６は、操作部１７からの操作信号に基づいて、コンテンツ選択部３１を制御する。 At this time, the operation unit 17 supplies an operation signal corresponding to the user's selection operation to the control unit 16. The control unit 16 controls the content selection unit 31 based on the operation signal from the operation unit 17.

すなわち、ステップＳ４１では、コンテンツ選択部３１は、制御部１６からの制御に従い、コンテンツ記憶部１１に記憶されたコンテンツの中から、シンボル列を生成するためのコンテンツを、注目コンテンツとして選択する。 That is, in step S <b> 41, the content selection unit 31 selects content for generating a symbol string as attention content from the content stored in the content storage unit 11 according to control from the control unit 16.

そして、コンテンツ選択部３１は、注目コンテンツを、特徴量抽出部３３に供給する。また、コンテンツ選択部３１は、注目コンテンツのカテゴリを認識し、モデル選択部３２に供給する。 Then, the content selection unit 31 supplies the content of interest to the feature amount extraction unit 33. Further, the content selection unit 31 recognizes the category of the content of interest and supplies it to the model selection unit 32.

ステップＳ４２では、モデル選択部３２は、モデル記憶部１３に記憶されたコンテンツモデルの中から、コンテンツ選択部３１からの、注目コンテンツのカテゴリと一致するカテゴリのコンテンツモデル（注目コンテンツのカテゴリに対応付けられたコンテンツモデル）を、注目モデルに選択する。 In step S42, the model selection unit 32 selects a content model (corresponding to the category of the content of interest) from the content model stored in the model storage unit 13 that matches the category of the content of interest from the content selection unit 31. Selected content model).

ステップＳ４３では、特徴量抽出部３３は、コンテンツ選択部３１から供給される注目コンテンツの（画像の）各フレームの特徴量を、図３の特徴量抽出部２２と同様にして抽出し、注目コンテンツの各フレームの特徴量（の時系列）を、最尤状態系列推定部３４に供給する。 In step S43, the feature amount extraction unit 33 extracts the feature amount of each frame (image) of the content of interest supplied from the content selection unit 31 in the same manner as the feature amount extraction unit 22 of FIG. Are supplied to the maximum likelihood state sequence estimator 34.

ステップＳ４４では、最尤状態系列推定部３４は、モデル選択部３２からの注目モデルのクラスタ情報を用いて、特徴量抽出部３３からの注目コンテンツの特徴量（の時系列）をクラスタリングし、注目コンテンツの（特徴量の）コード系列を求める。 In step S44, the maximum likelihood state sequence estimation unit 34 uses the cluster information of the model of interest from the model selection unit 32 to cluster the feature amounts (time series) of the content of interest from the feature amount extraction unit 33, and The code sequence of the content (feature value) is obtained.

なお、最尤状態系列推定部３４は、注目コンテンツに対する注目コードモデルの最尤状態系列に代えて、クラスタリングにより求めた注目コンテンツのコード系列を、シンボル列として、分割部１５に供給するようにしてもよい。以上でシンボル列生成処理は終了される。 The maximum likelihood state sequence estimation unit 34 supplies the code sequence of the content of interest obtained by clustering to the dividing unit 15 as a symbol string instead of the maximum likelihood state sequence of the code model of interest for the content of interest. Also good. This completes the symbol string generation process.

次に、図１３は、分割部１５が、シンボル列生成部１４からのシンボル列に基づいて、コンテンツを意味的にまとまりのある複数のセグメントに分割するときの一例を示している。 Next, FIG. 13 shows an example when the dividing unit 15 divides the content into a plurality of segments that are semantically grouped based on the symbol sequence from the symbol sequence generating unit 14.

なお、図１３は、図２と同様に構成されている。すなわち、例えば、図１３において、横軸は時刻tを表しており、縦軸はフレームtのシンボルを表している。 Note that FIG. 13 is configured in the same manner as FIG. That is, for example, in FIG. 13, the horizontal axis represents time t, and the vertical axis represents the symbol of frame t.

また、図１３には、コンテンツを６個のセグメントS₁,S₂,S₃,S₄,S₅,S₆に分割するための分割線（太い線分で示す）が示されている。なお、分割線は、任意の時刻tに配置される（引かれる）。 FIG. 13 also shows a dividing line (indicated by a thick line segment) for dividing the content into six segments S ₁ , S ₂ , S ₃ , S ₄ , S ₅ , S ₆ . The dividing line is arranged (drawn) at an arbitrary time t.

ここで、シンボル列としてコード系列が採用される場合、シンボルは、コード系列を構成する各コード（図１１のCに示されるコード）となる。また、シンボル列として最尤状態系列が採用される場合、シンボルは、最尤状態系列を構成するコード（図１１のDに示されるコード）となる。 Here, when a code sequence is adopted as the symbol sequence, the symbol is each code (code indicated by C in FIG. 11) constituting the code sequence. Further, when the maximum likelihood state sequence is adopted as the symbol string, the symbol is a code constituting the maximum likelihood state sequence (a code indicated by D in FIG. 11).

分割部１５は、図２を参照して説明したように、第１の部分系列と第２の部分系列との境界や、第１の部分系列どうしの境界、第２の部分系列どうしの境界に、分割線を引くようにして、コンテンツを分割する。 As described with reference to FIG. 2, the dividing unit 15 sets the boundary between the first partial series and the second partial series, the boundary between the first partial series, and the boundary between the second partial series. The content is divided by drawing a dividing line.

すなわち、例えば、分割部１５は、図１３に示される各セグメントSi(i=1,2,…,6)のエントロピ(entropy)H(Si)の総和Qが最小となるように、分割線を引けばよい。ここで、セグメントSiのエントロピとは、セグメントSiにおけるシンボルのばらつきの程度を表す。 That is, for example, the dividing unit 15 sets the dividing line so that the total sum Q of entropy H (Si) of each segment Si (i = 1, 2,..., 6) shown in FIG. Just pull it. Here, the entropy of the segment Si represents the degree of symbol variation in the segment Si.

なお、分割線が、任意の時刻tの位置に配置された場合、フレームtを境界として、コンテンツが分割される。すなわち、例えば、まだ分割されていないコンテンツにおいて、分割線が任意の時刻tの位置に配置された場合、そのコンテンツは、先頭のフレーム0からフレームt-1までを含むセグメントと、フレームtから最後のフレームTまでを含むセグメントに分割される。 When the dividing line is arranged at an arbitrary time t, the content is divided with the frame t as a boundary. That is, for example, in content that has not yet been divided, if the dividing line is placed at the position of an arbitrary time t, the content includes the segment including the first frame 0 to the frame t-1 and the last from the frame t. Is divided into segments including up to frame T.

分割部１５は、シンボル列生成部１４からの、図１３に示されるようなシンボル列における各シンボルのばらつき（分散）に基づいて、コンテンツを分割すべき分割位置（分割線を引くべき位置）を算出する。 The dividing unit 15 determines the division position (position to draw the dividing line) for dividing the content based on the variation (dispersion) of each symbol in the symbol string as shown in FIG. 13 from the symbol string generating unit 14. calculate.

そして、分割部１５は、シンボル列生成部１４からのシンボル列に対応するコンテンツを、コンテンツ記憶部１１から読み出し、算出した分割位置で、複数のセグメントに分割する。 Then, the dividing unit 15 reads the content corresponding to the symbol sequence from the symbol sequence generating unit 14 from the content storage unit 11, and divides the content into a plurality of segments at the calculated division position.

すなわち、例えば、分割部１５は、操作部１７を用いたユーザの指定操作により指定された総分割数Dで、コンテンツを、D個のセグメントSi(i=1,2,…,D)に分割する。 That is, for example, the dividing unit 15 divides the content into D segments Si (i = 1, 2,..., D) with the total number of divisions D specified by the user specifying operation using the operation unit 17. To do.

具体的には、例えば、分割部１５は、次式（１）により、セグメントSi毎のエントロピH(Si)を算出する。 Specifically, for example, the dividing unit 15 calculates entropy H (Si) for each segment Si by the following equation (1).

ここで、式（１）において、確率P^[Si](k)は、セグメントSiにおいて、例えばシンボルを昇順に並べたときの、k番目のシンボル（k番目に小さい値とされるシンボル）が出現する確率を表す。 Here, in equation (1), the probability P ^[Si] (k) is the kth symbol (symbol with the kth smallest value), for example, when symbols are arranged in ascending order in the segment Si. Represents the probability of

さらに、式（１）において、P^[Si](k)=(セグメントSi内における、k番目のシンボルの出現頻度数)/(セグメントSi内におけるシンボルの総数)である。 Further, in the equation (1), P ^[Si] (k) = (number of appearance frequencies of k-th symbol in segment Si) / (total number of symbols in segment Si).

また分割部１５は、次式（２）を用いて、全てのセグメントS₁乃至S_DのエントロピH(S₁)乃至H(S_D)の総和Qを算出する。 Further, the dividing unit 15 calculates the sum Q of entropies H (S ₁ ) to H (S _D ) of all the segments S _{1 to} S _D using the following equation (2).

この総和Qが最小となるときのセグメントS₁,S₂,S₃,S₄,S₅,S₆,…S_Dが、図１３に示されるような分割線で分割されたセグメントS₁,S₂,S₃,S₄,S₅,S₆,…S_Dとなる。 Segments S ₁ , S ₂ , S ₃ , S ₄ , S ₅ , S ₆ ,... S _D when the total sum Q is minimum are divided by segment lines S ₁ , _{_{_{S 2, S 3, S 4}}} , S 5, S 6, a ... S _D.

したがって、分割部１５は、算出した総和Qを最小化する最小化問題を解くことにより、コンテンツを複数のセグメントS₁乃至S_Dに分割し、分割後のコンテンツを、コンテンツ記憶部１１に供給して記憶させる。 Therefore, the dividing unit 15 divides the content into a plurality of segments S _{1 to} S _D by solving the minimization problem that minimizes the calculated sum Q, and supplies the divided content to the content storage unit 11. To remember.

なお、総和Qの最小化問題を解くには、例えば、再帰二分割処理や、焼きなまし分割処理を用いることができる。なお、総和Qの最小化問題を解く方法は、これに限定されず、例えばタブーサーチや遺伝的アルゴリズム等を利用して、最小化問題を解くこともできる。 In order to solve the minimization problem of the sum Q, for example, recursive bisection processing or annealing division processing can be used. Note that the method for solving the minimization problem of the sum Q is not limited to this, and the minimization problem can also be solved by using, for example, a tabu search or a genetic algorithm.

ここで、再帰二分割処理とは、分割後の各セグメントのエントロピの総和が最小となる分割位置で、コンテンツを分割する処理を再帰的に（繰り返して）行うことにより、コンテンツ、複数のセグメントに分割する処理をいう。再帰二分割処理は、図１４を参照して詳述する。 Here, the recursive bisection process is a process of dividing the content recursively (repeatedly) at a division position where the total entropy of each segment after division is the minimum, thereby dividing the content into a plurality of segments. The process to divide. The recursive bisection process will be described in detail with reference to FIG.

また、焼きなまし分割処理とは、コンテンツを任意に分割した分割位置を、エントロピの総和を最小化する分割位置に変更していく処理を行うことにより、コンテンツを、複数のセグメントに分割する処理をいう。焼きなまし分割処理は、図１５を参照して詳述する。 The annealing division process is a process of dividing the content into a plurality of segments by performing a process of changing the division position where the content is arbitrarily divided into division positions that minimize the total entropy. . The annealing division process will be described in detail with reference to FIG.

[分割部１５の動作説明]
次に、図１４のフローチャートを参照して、分割部１５が行う再帰二分割処理を説明する。 [Description of operation of dividing unit 15]
Next, the recursive bisection process performed by the dividing unit 15 will be described with reference to the flowchart of FIG.

この再帰二分割処理は、例えば、ユーザが、操作部１７を用いて、コンテンツの総分割数Dを指定する指定操作をしたときに開始される。このとき、操作部１７は、ユーザの指定操作に対応する操作信号を、制御部１６に供給する。 This recursive bisection process is started, for example, when the user performs a designation operation for designating the total content division number D using the operation unit 17. At this time, the operation unit 17 supplies an operation signal corresponding to the user's designated operation to the control unit 16.

制御部１６は、操作部１７からの操作信号に応じて、分割部１５を制御し、ユーザにより指定された総分割数Dで、分割部１５にシンボル列を分割させる。 The control unit 16 controls the dividing unit 15 in accordance with the operation signal from the operation unit 17 and causes the dividing unit 15 to divide the symbol string by the total number of divisions D specified by the user.

すなわち、ステップＳ８１では、分割部１５は、図示せぬ内蔵のメモリに予め保持している分割数dを１に設定する。この分割数dは、再帰二分割処理において、シンボル列を分割した分割数を表す。なお、分割数d=1であるとき、シンボル列はまだ分割されていないことを表す。 That is, in step S81, the dividing unit 15 sets the number of divisions d held in advance in a built-in memory (not shown) to 1. This division number d represents the division number obtained by dividing the symbol string in the recursive bisection process. When the division number d = 1, it indicates that the symbol string has not been divided yet.

ステップＳ８２では、分割部１５は、シンボル列生成部１４からのシンボル列における各シンボルの分散に基づいて、分割線を追加可能な時刻を表す追加ポイントLiのうち、まだ分割線が追加されていない追加ポイントLi毎に、分割線を追加したときのエントロピの総和Q=Q(Li)を算出する。 In step S82, the dividing unit 15 has not yet added a dividing line among the additional points Li representing the time at which the dividing line can be added based on the variance of each symbol in the symbol string from the symbol string generating unit 14. For each additional point Li, the total entropy Q = Q (Li) when the dividing line is added is calculated.

ここで、追加ポイントLiは、コンテンツを構成する各フレーム0乃至Tのうち、フレーム1乃至Tに対応するの時刻tとされる。 Here, the additional point Li is the time t corresponding to the frames 1 to T among the frames 0 to T constituting the content.

ステップＳ８３では、分割部１５は、ステップＳ８２で算出したエントロピの総和Q(Li)のうち、総和Q(Li)が最小となるときのLiを、L*とする。そして、ステップＳ８４では、分割部１５は、追加ポイントL*に、分割線を追加し、ステップＳ８５において分割数dに１を加算（インクリメント）する。 In step S83, the dividing unit 15 sets L * when the sum Q (Li) is minimum among the entropy sum Q (Li) calculated in step S82. In step S84, the dividing unit 15 adds a dividing line to the additional point L *, and in step S85, adds (increments) 1 to the number of divisions d.

これにより、分割部１５は、シンボル列生成部１４からのシンボル列を、追加ポイントL*で分割したこととなる。 Thereby, the dividing unit 15 divides the symbol sequence from the symbol sequence generating unit 14 at the additional point L *.

ステップＳ８６では、分割部１５は、分割数dが、ユーザの指定操作により指定された総分割数Dと等しいか否かを判定し、分割数dが総分割数Dと等しくないと判定した場合、処理をステップＳ８２に戻し、それ以降、同様の処理を繰り返す。 In step S86, the dividing unit 15 determines whether or not the division number d is equal to the total division number D specified by the user's specifying operation, and determines that the division number d is not equal to the total division number D. The process returns to step S82, and thereafter the same process is repeated.

また、ステップＳ８６では、分割部１５は、分割数dが総分割数Dと等しいと判定した場合、すなわち、シンボル列をD個のセグメントS₁乃至S_Dに分割したと判定した場合、再帰二分割処理を終了する。 In step S86, if the division unit 15 determines that the division number d is equal to the total division number D, that is, determines that the symbol string is divided into _D segments S _{1 to} S _D , The division process ends.

そして、分割部１５は、コンテンツ記憶部１１から、シンボル列生成部１４でシンボル列に変換されたコンテンツと同一のコンテンツを読み出し、読み出したコンテンツを、シンボル列を分割した分割位置と同一の分割位置で分割する。分割部１５は、複数のセグメントS₁乃至S_Dに分割したコンテンツを、コンテンツ記憶部１１に供給して記憶させる。 Then, the dividing unit 15 reads the same content as the content converted into the symbol sequence by the symbol sequence generating unit 14 from the content storage unit 11, and the same division position as the division position where the read content is divided Divide by. The dividing unit 15 supplies the content divided into the plurality of segments S _{1 to} _SD to the content storage unit 11 for storage.

以上説明したように、図１４の再帰二分割処理によれば、コンテンツを、エントロピH(Si)の総和Qが最小となるD個のセグメントS₁乃至S_Dに分割するようにした。 As described above, according to the recursive bisection processing of FIG. 14, the content is divided into _D segments S _{1 to} S _D having the minimum total Q of entropy H (Si).

したがって、図１４の再帰二分割処理によれば、被験者と同様にして、コンテンツを、意味的にまとまりのあるセグメントに分割することが可能となる。 Therefore, according to the recursive bisection process of FIG. 14, it is possible to divide the content into segments that are semantically coherent in the same manner as the subject.

すなわち、例えば、コンテンツを、複数のセグメントとして、番組のコーナーや、ニュースの各トピックに、分割することができる。 That is, for example, the content can be divided into a plurality of segments into program corners and news topics.

また、図１４の再帰二分割処理によれば、比較的単純なアルゴリズムを用いて、コンテンツを分割するようにしている。このため、再帰二分割処理では、比較的少ない計算量で、迅速に、コンテンツを分割することが可能となる。 Further, according to the recursive bisection process of FIG. 14, the content is divided using a relatively simple algorithm. For this reason, in the recursive two-division process, it is possible to divide content quickly with a relatively small amount of calculation.

[分割部１５の他の動作説明]
次に、図１５のフローチャートを参照して、分割部１５が行う焼きなまし分割処理を説明する。 [Description of Other Operations of Dividing Unit 15]
Next, the annealing division process performed by the dividing unit 15 will be described with reference to the flowchart of FIG.

この焼きなまし分割処理は、例えば、ユーザが、操作部１７を用いて、コンテンツの総分割数Dを指定する指定操作をしたときに開始される。このとき、操作部１７は、ユーザの指定操作に対応する操作信号を、制御部１６に供給する。制御部１６は、操作部１７からの操作信号に応じて、分割部１５を制御し、ユーザにより指定された総分割数Dで、分割部１５にシンボル列を分割させる。 This annealing division process is started, for example, when the user performs a designation operation for designating the total division number D of content using the operation unit 17. At this time, the operation unit 17 supplies an operation signal corresponding to the user's designated operation to the control unit 16. The control unit 16 controls the dividing unit 15 in accordance with the operation signal from the operation unit 17 and causes the dividing unit 15 to divide the symbol string by the total number of divisions D specified by the user.

ステップＳ１１１では、分割部１５は、分割線を追加可能な時刻を表す追加ポイントLiのうち、任意にD-1個の追加ポイントLiを選択し、選択したD-1個の追加ポイントLiに分割線を追加（配置）する。これにより、分割部１５は、シンボル列生成部１４からのシンボル列を、暫定的に、D個のセグメントS₁乃至S_Dに分割したこととなる。 In step S111, the dividing unit 15 arbitrarily selects D-1 additional points Li from the additional points Li representing the time at which the dividing line can be added, and divides them into the selected D-1 additional points Li. Add (place) a line. Thereby, the dividing unit 15 provisionally divides the symbol sequence from the symbol sequence generating unit 14 into _D segments S _{1 to} S _D.

ステップＳ１１２では、分割部１５は、図示せぬ内蔵のメモリに予め保持している変数t及びjをそれぞれ１に設定する。また、分割部１５は、図示せぬ内蔵のメモリに予め保持している温度パラメータtempを所定の値に設定（初期化）する。 In step S112, the dividing unit 15 sets variables t and j held in advance in a built-in memory (not shown) to 1, respectively. Further, the dividing unit 15 sets (initializes) a temperature parameter temp previously held in a built-in memory (not shown) to a predetermined value.

ステップＳ１１３では、分割部１５は、変数tが予め決められた閾値NREPであるか否かを判定し、変数tが閾値NREPではないと判定した場合、処理をステップＳ１１４に進める。 In step S113, the dividing unit 15 determines whether or not the variable t is a predetermined threshold value NREP. If it is determined that the variable t is not the threshold value NREP, the process proceeds to step S114.

ステップＳ１１４では、分割部１５は、変数jが予め決められた閾値NIREPであるか否かを判定し、変数jが閾値NIREPであると判定した場合、処理をステップＳ１１５に進める。なお、閾値NIREPは、閾値NREPよりも十分に大きな値とされることが望ましい。 In step S114, the dividing unit 15 determines whether or not the variable j is a predetermined threshold value NIREP. If it is determined that the variable j is the threshold value NIREP, the process proceeds to step S115. Note that the threshold value NIREP is desirably a value sufficiently larger than the threshold value NREP.

ステップＳ１１５では、分割部１５は、図示せぬ内蔵のメモリに予め保持している温度パラメータtempに0.9を乗算して得られる乗算結果temp×0.9を、変更後のtempとする。また、ステップＳ１１６では、分割部１５は、変数tに１を加算（インクリメント）し、ステップＳ１１７において、変数jを１とする。 In step S115, the dividing unit 15 sets the multiplication result temp × 0.9 obtained by multiplying the temperature parameter temp previously held in a built-in memory (not shown) by 0.9 as the changed temp. In step S116, the dividing unit 15 adds (increments) 1 to the variable t, and sets the variable j to 1 in step S117.

その後、処理は、ステップＳ１１３に戻され、分割部１５は、それ以降同様の処理を行う。 Thereafter, the process returns to step S113, and the dividing unit 15 performs the same process thereafter.

ステップＳ１１４において、分割部１５は、変数jが閾値NIREPではないと判定した場合、処理をステップＳ１１８に進める。 In step S114, when the dividing unit 15 determines that the variable j is not the threshold value NIREP, the process proceeds to step S118.

ステップＳ１１８では、分割部１５は、分割線を追加済みのD-1個の追加ポイントLiのうち、任意の追加ポイントLiを決定し、決定した追加ポイントLiの前後RNG幅を算出する。ここで、前後RNG幅とは、追加ポイントLi-xから追加ポイントLi+xまでの範囲を表す。なお、自然数xは、分割部１５で予め設定されているものとする。 In step S118, the dividing unit 15 determines an arbitrary additional point Li among the D-1 additional points Li to which the dividing line has been added, and calculates the front and rear RNG widths of the determined additional point Li. Here, the front / rear RNG width represents a range from the additional point Li-x to the additional point Li + x. It is assumed that the natural number x is preset by the dividing unit 15.

ステップＳ１１９では、分割部１５は、ステップＳ１１８で決定した追加ポイントLiを、同じくステップＳ１１８で算出した前後RNG幅に含まれる追加ポイントLn（nはi-xからi+xまでの自然数とされる）に移動したときのQ(Ln)を算出する。 In step S119, the dividing unit 15 sets the additional point Li determined in step S118 to the additional point Ln (n is a natural number from ix to i + x) included in the front and rear RNG width calculated in step S118. Q (Ln) when moving is calculated.

ステップＳ１２０では、分割部１５は、ステップＳ１１９で算出した複数のQ(Ln)のうち、Q(Ln)が最小となるときのLnをL*に決定し、Q(L*)を算出する。また、分割部１５は、分割線が移動前のQ(Li)を算出する。 In step S120, the dividing unit 15 determines Ln at which Q (Ln) is minimum among the plurality of Q (Ln) calculated in step S119, and calculates Q (L *). The dividing unit 15 calculates Q (Li) before the dividing line is moved.

ステップＳ１２１では、分割部１５は、分割線を移動後のQ(L*)から、分割線を移動前のQ(Li)を差し引いて得られる差分ΔQ=Q(L*)-Q(Li)を算出する。 In step S121, the dividing unit 15 obtains a difference ΔQ = Q (L *) − Q (Li) obtained by subtracting Q (Li) before moving the dividing line from Q (L *) after moving the dividing line. Is calculated.

ステップＳ１２２では、分割部１５は、ステップＳ１２１で算出した差分ΔQが０未満であるか否かを判定し、差分ΔQが０未満であると判定した場合、処理をステップＳ１２３に進める。 In step S122, the dividing unit 15 determines whether or not the difference ΔQ calculated in step S121 is less than 0. If it is determined that the difference ΔQ is less than 0, the process proceeds to step S123.

ステップＳ１２３では、分割部１５は、ステップＳ１１８で決定した追加ポイントLiに配置されている分割線を、ステップＳ１２０で決定した追加ポイントL*に移動させ、処理をステップＳ１２５に進める。 In step S123, the dividing unit 15 moves the dividing line arranged at the additional point Li determined in step S118 to the additional point L * determined in step S120, and the process proceeds to step S125.

また、ステップＳ１２２において、分割部１５は、差分ΔQが０未満ではない（０以上である）と判定した場合、処理をステップＳ１２４に進める。 In step S122, when the dividing unit 15 determines that the difference ΔQ is not less than 0 (0 or more), the process proceeds to step S124.

ステップＳ１２４では、分割部１５は、exp(ΔQ/temp)(=自然対数の底eのΔQ/temp乗)の確率で、ステップＳ１１８で決定した追加ポイントLiに配置されている分割線を、ステップＳ１２０で決定した追加ポイントL*に移動させ、処理をステップＳ１２５に進める。 In step S124, the dividing unit 15 uses the probability of exp (ΔQ / temp) (= the natural logarithm base e to the power of ΔQ / temp) as the dividing line arranged at the additional point Li determined in step S118. The process moves to the additional point L * determined in S120, and the process proceeds to step S125.

ステップＳ１２５では、分割部１５は、変数jに１を加算し、処理をステップＳ１１４に戻し、それ以降、同様の処理を行う。 In step S125, the dividing unit 15 adds 1 to the variable j, returns the process to step S114, and thereafter performs the same process.

なお、ステップＳ１１３において、分割部１５は、変数tが予め決められた閾値NREPであると判定した場合、図１５の焼きなまし分割処理は終了される。 In step S113, when the dividing unit 15 determines that the variable t is a predetermined threshold value NREP, the annealing division process in FIG. 15 is terminated.

以上説明したように、図１５の焼きなまし分割処理によれば、図１４の再帰二分割処理と同様、コンテンツを、意味的にまとまりのあるセグメントに分割することが可能となる。 As described above, according to the annealing division process of FIG. 15, it is possible to divide the content into semantically coherent segments as in the recursive bisection process of FIG.

ところで、分割部１５は、ユーザの指定操作により指定された総分割数Dで、コンテンツ記憶部１１から読み出したコンテンツを分割するようにした。しかしながら、その他、例えば、分割部１５は、コンテンツを分割可能な総分割数のうち、エントロピの総和Qが最小となる総分割数Dで、コンテンツを分割するようにしてもよい。 By the way, the dividing unit 15 divides the content read from the content storage unit 11 by the total number of divisions D specified by the user specifying operation. However, for example, the dividing unit 15 may divide the content by the total division number D that minimizes the total entropy Q among the total number of divisions into which the content can be divided.

また、例えば、ユーザの指定操作により、総分割数Dが指定されたときには、指定された総分割数Dで、総分割数Dが指定されなかったときには、エントロピの総和Qが最小となる総分割数Dで分割するようにしてもよい。 Also, for example, when the total number of divisions D is specified by the user's specifying operation, the total number of divisions D is specified, and when the total number of divisions D is not specified, the total division that minimizes the total entropy Q You may make it divide | segment by the number D. FIG.

[レコーダ１の動作説明]
次に、図１６のフローチャートを参照して、レコーダ１が、ユーザの指定操作により、総分割数Dが指定されたときには、指定された総分割数Dで、総分割数Dが指定されなかったときには、エントロピの総和Qが最小となる総分割数Dで分割するコンテンツ分割処理を説明する。 [Description of operation of recorder 1]
Next, referring to the flowchart of FIG. 16, when the total number of divisions D is designated by the user's designation operation, the recorder 1 is not designated with the designated total number of divisions D. In some cases, a content division process for dividing the entropy by the total division number D that minimizes the total entropy Q will be described.

ステップＳ１５１では、コンテンツモデル学習部１２は、図９を参照して説明したコンテンツモデル学習処理を行う。 In step S151, the content model learning unit 12 performs the content model learning process described with reference to FIG.

ステップＳ１５２では、シンボル列生成部１４は、図１２を参照して説明したシンボル列生成処理を行う。 In step S152, the symbol string generation unit 14 performs the symbol string generation process described with reference to FIG.

ステップＳ１５３では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザの指定操作により、予め決められた期間内に総分割数Dが指定されたか否かを判定する。 In step S153, based on the operation signal from the operation unit 17, the control unit 16 determines whether or not the total number of divisions D has been designated within a predetermined period by a user's designation operation.

そして、制御部１６は、操作部１７からの操作信号に基づいて、ユーザの指定操作により、総分割数Dが指定されたと判定した場合、分割部１５を制御して、分割部１５に、ユーザの指定操作により指定された総分割数Dでコンテンツを分割させる。 Then, when the control unit 16 determines that the total number of divisions D is designated by the user's designation operation based on the operation signal from the operation unit 17, the control unit 16 controls the division unit 15 to let the division unit 15 The content is divided by the total division number D designated by the designation operation.

すなわち、例えば、分割部１５は、図１４の再帰二分割処理や図１５の焼きなまし分割処理により得られる分割位置（分割線が配置される位置）で、コンテンツを分割する。そして、分割部１５は、総分割数Dのセグメントに分割したコンテンツを、コンテンツ記憶部１１に供給して記憶させる。 That is, for example, the dividing unit 15 divides the content at a dividing position (position where a dividing line is arranged) obtained by the recursive bi-dividing process of FIG. 14 or the annealing dividing process of FIG. Then, the dividing unit 15 supplies the content divided into the total number of divisions D to the content storage unit 11 for storage.

また、ステップＳ１５３では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザの指定操作により、総分割数Dが指定されなかったと判定した場合、処理をステップＳ１５５に進める。 In step S153, when the control unit 16 determines that the total division number D is not specified by the user's specifying operation based on the operation signal from the operation unit 17, the process proceeds to step S155.

ステップＳ１５５以降の処理では、制御部１６は、分割部１５を制御して、コンテンツを分割可能な総分割数のうち、エントロピの総和Qが最小となる総分割数Dを算出し、算出した総分割数Dで、分割対象のコンテンツを分割させる。 In the processing after step S155, the control unit 16 controls the dividing unit 15 to calculate the total division number D that minimizes the total entropy Q among the total number of divisions into which the content can be divided. The content to be divided is divided by the division number D.

すなわち、ステップＳ１５５では、分割部１５は、例えば、再帰二分割処理や焼きなまし分割処理の一方である分割処理を用いて、所定の総分割数D（例えば、D=２）でシンボル列を分割するときのエントロピの総和Q_Dを算出する。 That is, in step S155, the dividing unit 15 divides the symbol string by a predetermined total number of divisions D (for example, D = 2), for example, using a division process that is one of the recursive two-division process and the annealing division process. calculating the entropy of the total Q _D when.

ステップＳ１５６では、分割部１５は、算出した総和Q_Dに基づいて、平均エントロピmean(Q_D)= Q_D/Dを算出する。 In step S156, the dividing unit 15 calculates the average entropy mean (Q _D ) = Q _D / D based on the calculated total sum Q _D.

ステップＳ１５７では、分割部１５は、ステップＳ１５５と同一の分割処理を用いて、総分割数D+1でシンボル列を分割するときのエントロピの総和Q_D+1を算出する。 In step S157, the dividing unit 15 calculates the total entropy Q _{D + 1} when the symbol sequence is divided by the total number of divisions D + 1 using the same division process as in step S155.

ステップＳ１５８では、分割部１５は、算出したQ_D+1に基づいて、平均エントロピmean(Q_D+1)=Q_D+1/(D+1)を算出する。 In step S158, the dividing unit 15 based on the calculated Q _{D + 1,} the average entropy _{mean (Q D + 1) =} Q D + 1 / (D + 1) is calculated.

ステップＳ１５９では、分割部１５は、ステップＳ１５８で算出した平均エントロピmean(Q_D+1)から、ステップＳ１５６で算出した平均エントロピmean(Q_D)を差し引いて得られる差分Δmeanを算出する。 In step S159, the dividing unit 15, the average entropy mean calculated in step S158 (Q _{D + 1),} it calculates the difference Δmean obtained by subtracting the average entropy mean calculated in step S156 (Q _D).

ステップＳ１６０では、分割部１５は、ステップＳ１５９で算出した差分Δmeanが予め決められた閾値TH未満であるか否かを判定し、差分Δmeanが閾値TH未満ではない（閾値TH以上である）と判定した場合、処理をステップＳ１６１に進める。 In step S160, the dividing unit 15 determines whether or not the difference Δmean calculated in step S159 is less than a predetermined threshold TH, and determines that the difference Δmean is not less than the threshold TH (is greater than or equal to the threshold TH). If so, the process proceeds to step S161.

ステップＳ１６１では、分割部１５は、所定の総分割数Dに１を加算して得られる加算結果D+1を、新たな所定の総分割数Dとし、処理をステップＳ１５７に戻し、それ以降同様の処理を行う。 In step S161, the dividing unit 15 sets the addition result D + 1 obtained by adding 1 to the predetermined total number of divisions D as a new predetermined total number of divisions D, returns the process to step S157, and so on. Perform the process.

ステップＳ１６０では、分割部１５は、ステップＳ１５９で算出した差分Δmeanが閾値TH未満であると判定した場合、所定の総分割数Dでシンボル列を分割するときのエントロピの総和Qが最小であるものとし、処理をステップＳ１６２に進める。 In step S160, when the dividing unit 15 determines that the difference Δmean calculated in step S159 is less than the threshold value TH, the total entropy Q when the symbol string is divided by the predetermined total division number D is the smallest And the process proceeds to step S162.

ステップＳ１６２では、分割部１５は、シンボル列を分割した分割位置と同一の分割位置で、コンテンツを分割し、その分割により得られる、所定の総分割数Dで分割されたコンテンツを、コンテンツ記憶部１１に供給して記憶させる。以上で、図１６のコンテンツ分割処理は終了される。 In step S162, the dividing unit 15 divides the content at the same division position as the division position where the symbol string is divided, and the content divided by the predetermined total division number D obtained by the division is stored in the content storage unit. 11 to be stored. Thus, the content dividing process in FIG. 16 is completed.

以上説明したように、図１６のコンテンツ分割処理では、ユーザの指定操作により、総分割数Dが指定されたときには、指定された総分割数Dでコンテンツを分割した。このため、ユーザが指定した所望の総分割数Dでコンテンツを分割できる。 As described above, in the content division processing of FIG. 16, when the total division number D is designated by the user's designation operation, the content is divided by the designated total division number D. For this reason, the content can be divided by the desired total division number D designated by the user.

また、図１６のコンテンツ分割処理によれば、ユーザの指定操作により、総分割数Dが指定されなかったときには、シンボル列のエントロピの総和Qが最小となる総分割数Dでコンテンツを分割するようにした。このため、コンテンツを分割する際に、ユーザが総分割数Dを指定する手間を省くことが可能となる。 Further, according to the content division processing of FIG. 16, when the total number of divisions D is not specified by the user's designation operation, the content is divided by the total number of divisions D that minimizes the total entropy Q of the symbol string I made it. Therefore, it is possible to save the user from specifying the total number of divisions D when dividing the content.

第１の実施の形態では、レコーダ１は、コンテンツを、意味的にまとまりのある複数のセグメントに分割するようにした。これにより、レコーダ１のユーザは、意味的にまとまりのある複数のセグメントの中から、所望のセグメント（例えば、番組の一部分である所定のコーナー）を選択して再生させることがきる。 In the first embodiment, the recorder 1 divides the content into a plurality of segments that are semantically grouped. Thereby, the user of the recorder 1 can select and reproduce a desired segment (for example, a predetermined corner that is a part of a program) from a plurality of segments that are semantically grouped.

第１の実施の形態では、レコーダ１が、コンテンツを複数のセグメントに分割するようにしたが、分割の対象はコンテンツに限定されず、例えば、音声データや、脳波の波形などであってもよい。すなわち、分割の対象は、データが時系列に並ぶ時系列データであれば、どのようなデータであってもよい。 In the first embodiment, the recorder 1 divides the content into a plurality of segments. However, the division target is not limited to the content, and may be, for example, audio data, an electroencephalogram waveform, or the like. . In other words, any data can be used as long as the data is time-series data arranged in time series.

ところで、セグメント毎に、そのセグメントのダイジェスト（要約）を生成すれば、ユーザは、生成されたダイジェストを参照することにより、所望のセグメントをより容易に選択して再生させることができる。 By the way, if a digest (summary) of the segment is generated for each segment, the user can select and reproduce a desired segment more easily by referring to the generated digest.

このため、コンテンツを意味的にまとまりのある複数のセグメントに分割する他、複数のセグメント毎にダイジェストを生成することが望ましい。 For this reason, it is desirable to generate a digest for each of the plurality of segments in addition to dividing the content into a plurality of segments that are semantically grouped.

次に、図１７乃至図２５を参照して、コンテンツを意味的にまとまりのある複数のセグメントに分割する他、複数のセグメント毎にダイジェストを生成するようにしたレコーダ５１について説明する。 Next, with reference to FIGS. 17 to 25, a recorder 51 that divides content into a plurality of segments that are semantically grouped and generates a digest for each of the plurality of segments will be described.

＜２．第２の実施の形態＞
[レコーダ５１の構成例]
次に、図１７は、第２の実施の形態であるレコーダ５１の構成例を示している。 <2. Second Embodiment>
[Configuration Example of Recorder 51]
Next, FIG. 17 shows a configuration example of the recorder 51 according to the second embodiment.

なお、図１７のレコーダ５１では、第１の実施の形態であるレコーダ１（図１）と同様に構成される部分について同一の符号を付すようにしているので、それらの説明は、以下、適宜省略する。 In the recorder 51 of FIG. 17, the same reference numerals are given to the same components as those of the recorder 1 (FIG. 1) according to the first embodiment. Omitted.

すなわち、レコーダ５１において、図１の分割部１５に代えて分割部７１が設けられているとともに、新たにダイジェスト生成部７２が設けられている他は、図１のレコーダ１と同様に構成される。 That is, the recorder 51 is configured in the same manner as the recorder 1 of FIG. 1 except that a dividing unit 71 is provided instead of the dividing unit 15 of FIG. 1 and a digest generating unit 72 is newly provided. .

分割部７１は、図１の分割部１５と同様の処理を行う。そして、分割部７１は、複数のセグメントに分割後のコンテンツを、ダイジェスト生成部７２を介して、コンテンツ記憶部１１に供給して記憶させる。 The dividing unit 71 performs the same processing as the dividing unit 15 in FIG. Then, the dividing unit 71 supplies the content after being divided into a plurality of segments to the content storage unit 11 via the digest generation unit 72 and stores the content.

また、分割部７１は、コンテンツを複数のセグメントに分割したときの、各セグメントの先頭のフレーム（分割線が配置された時刻tのフレームt）を一意に識別するためのチャプタIDを、チャプタポイントデータとして生成し、ダイジェスト生成部７２に供給する。 In addition, the dividing unit 71 assigns a chapter ID for uniquely identifying the first frame of each segment (the frame t at the time t at which the dividing line is arranged) when the content is divided into a plurality of segments. Data is generated and supplied to the digest generation unit 72.

以下の説明では、分割部７１がコンテンツを分割することにより得られるセグメントを、チャプタともいう。 In the following description, a segment obtained by dividing the content by the dividing unit 71 is also referred to as a chapter.

次に、図１８は、分割部７１により生成されるチャプタポイントデータの一例を示している。 Next, FIG. 18 shows an example of chapter point data generated by the dividing unit 71.

図１８には、コンテンツを構成する複数のフレームのうち、フレーム番号300,720,1115,1431に対応する各フレームの時刻に、分割線が配置されたときの一例を示している。 FIG. 18 shows an example in which a dividing line is arranged at the time of each frame corresponding to frame numbers 300, 720, 1115, and 1431 among a plurality of frames constituting the content.

すなわち、コンテンツが、フレーム番号0乃至299に対応する各フレームから構成されるチャプタ（セグメント）、フレーム番号300乃至719に対応する各フレームから構成されるチャプタ、フレーム番号720乃至1114に対応する各フレームから構成されるチャプタ、フレーム番号1115乃至1430に対応する各フレームから構成されるチャプタ、・・・に分割されたときの一例を示している。 That is, a chapter (segment) whose content is composed of frames corresponding to frame numbers 0 to 299, a chapter composed of frames corresponding to frame numbers 300 to 719, and each frame corresponding to frame numbers 720 to 1114 Are divided into chapters composed of frames, chapters composed of frames corresponding to frame numbers 1115 to 1430, and so on.

ここで、フレーム番号tとは、コンテンツの先頭からt番目のフレームtを一意に識別するための番号をいう。 Here, the frame number t is a number for uniquely identifying the t-th frame t from the beginning of the content.

チャプタIDは、チャプタを構成する各フレームのうち、先頭のフレーム（フレーム番号が最小のフレーム）に対応付けられている。すなわち、チャプタID「0」は、フレーム番号0のフレーム0に対応付けられ、チャプタID「1」は、フレーム番号300のフレーム300に対応付けられる。また、チャプタID「2」は、フレーム番号720のフレーム720に対応付けられ、チャプタID「3」は、フレーム番号1115のフレーム1115に対応付けられ、チャプタID「4」は、フレーム番号1431のフレーム1431に対応付けられる。 The chapter ID is associated with the first frame (frame with the smallest frame number) among the frames constituting the chapter. That is, chapter ID “0” is associated with frame 0 with frame number 0, and chapter ID “1” is associated with frame 300 with frame number 300. The chapter ID “2” is associated with the frame 720 with the frame number 720, the chapter ID “3” is associated with the frame 1115 with the frame number 1115, and the chapter ID “4” is the frame with the frame number 1431. 1431.

分割部７１は、図１８に示されるような複数のチャプタIDを、チャプタポイントデータとして、図１７のダイジェスト生成部７２に供給する。 The dividing unit 71 supplies a plurality of chapter IDs as shown in FIG. 18 to the digest generating unit 72 of FIG. 17 as chapter point data.

図１７に戻る。ダイジェスト生成部７２は、コンテンツ記憶部１１から、分割部７１が読み出したコンテンツと同一のコンテンツを読み出す。 Returning to FIG. The digest generation unit 72 reads the same content as the content read by the dividing unit 71 from the content storage unit 11.

また、ダイジェスト生成部７２は、分割部７１からのチャプタポイントデータに基づいて、コンテンツ記憶部１１から読み出したコンテンツの各チャプタを識別する。そして、ダイジェスト生成部７２は、識別した各チャプタから、予め決められた長さ（基本セグメント長）のチャプタセグメントを抽出する。 Further, the digest generating unit 72 identifies each chapter of the content read from the content storage unit 11 based on the chapter point data from the dividing unit 71. And the digest production | generation part 72 extracts the chapter segment of predetermined length (basic segment length) from each identified chapter.

すなわち、ダイジェスト生成部７２は、識別した各チャプタから、チャプタを代表する部分、つまり、例えば、チャプタの先頭から基本セグメント長までの予め決められた部分などを、チャプタセグメントとして抽出する。 That is, the digest generation unit 72 extracts, from each identified chapter, a portion representing the chapter, that is, a predetermined portion from the beginning of the chapter to the basic segment length, as a chapter segment.

なお、基本セグメント長は、例えば、５乃至１０秒の範囲とされる。また、基本セグメント長は、操作部１７を用いたユーザの変更操作により変更することができる。 The basic segment length is, for example, in the range of 5 to 10 seconds. The basic segment length can be changed by a user changing operation using the operation unit 17.

さらに、ダイジェスト生成部７２は、読み出したコンテンツから、特徴量時系列データを抽出し、抽出した特徴量時系列データに基づいて、各チャプタから、基本セグメント長の、特徴的な部分である特徴ピークセグメントを抽出する。 Furthermore, the digest generation unit 72 extracts feature amount time-series data from the read content, and based on the extracted feature amount time-series data, the feature peak that is a characteristic portion of the basic segment length is extracted from each chapter. Extract a segment.

なお、特徴量時系列データとは、特徴ピークセグメントを抽出する際に用いられる時系列の特徴量を表す。特徴量時系列データの詳細は後述する。 The feature amount time-series data represents time-series feature amounts used when extracting feature peak segments. Details of the feature amount time-series data will be described later.

また、ダイジェスト生成部７２は、特徴ピークセグメントを、チャプタセグメントとは異なる長さで抽出するようにしてもよい。すなわち、チャプタセグメントの基本セグメント長と、特徴ピークセグメントの基本セグメント長とは、異なる長さとすることができる。 Moreover, the digest production | generation part 72 may extract a characteristic peak segment by the length different from a chapter segment. That is, the basic segment length of the chapter segment and the basic segment length of the feature peak segment can be different.

さらに、ダイジェスト生成部７２は、１個のチャプタから、１個の特徴ピークセグメントを抽出するようにしてもよいし、複数の特徴ピークセグメントを抽出するようにしてもよい。また、ダイジェスト生成部７２は、必ずしも、各チャプタから、特徴ピークセグメントを抽出する必要はない。 Furthermore, the digest generation unit 72 may extract one feature peak segment from one chapter, or may extract a plurality of feature peak segments. Moreover, the digest production | generation part 72 does not necessarily need to extract the characteristic peak segment from each chapter.

ダイジェスト生成部７２は、各チャプタから抽出したチャプタセグメントと特徴ピークセグメントを、時系列に並べることにより、コンテンツの大まかな内容を表すダイジェストを生成し、コンテンツ記憶部１１に供給して記憶させる。 The digest generation unit 72 generates a digest representing the rough contents of the content by arranging the chapter segments and feature peak segments extracted from each chapter in time series, and supplies the digest to the content storage unit 11 for storage.

なお、ダイジェスト生成部７２は、チャプタセグメントとして抽出すべき期間内に、著しいシーンの切替わりが発生している場合、シーンの切替わりの直前までを、チャプタセグメントとして抽出することができる。 In addition, the digest production | generation part 72 can extract until just before the switching of a scene as a chapter segment, when switching of the remarkable scene has generate | occur | produced within the period which should be extracted as a chapter segment.

これにより、ダイジェスト生成部７２は、区切りのよいところで分割されたチャプタセグメントを抽出することが可能となる。このことは、特徴ピークセグメントについても同様である。 Thereby, the digest production | generation part 72 becomes possible [extracting the chapter segment divided | segmented in the good place of a division | segmentation]. The same applies to the feature peak segment.

なお、ダイジェスト生成部７２は、例えば、時間的に隣接するフレームどうしの各画素の差分絶対値和が、所定の閾値以上であるか否かに基づいて、著しいシーンの切替わりが発生しているか否かを判定する。 Note that the digest generation unit 72, for example, determines whether significant scene switching has occurred based on whether or not the sum of absolute differences of pixels between temporally adjacent frames is equal to or greater than a predetermined threshold. Determine whether or not.

また、例えば、ダイジェスト生成部７２は、識別したチャプタの音声データに基づいて、そのチャプタで発話が行なわれている発話区間を検出するようにしてもよい。 Further, for example, the digest generation unit 72 may detect an utterance section in which an utterance is being performed in the chapter based on the voice data of the identified chapter.

そして、ダイジェスト生成部７２は、チャプタセグメントとして抽出すべき期間を経過しても、発話が行われているときには、その発話が終了するまでを、チャプタセグメントとして抽出するように構成することができる。このことは、特徴ピークセグメントについても同様である。 And the digest production | generation part 72 can be comprised so that it may extract as a chapter segment until the utterance is complete | finished, even if the period which should be extracted as a chapter segment passes. The same applies to the feature peak segment.

また、発話区間が、基本セグメント長よりも十分に長い場合、すなわち、例えば、発話区間が、基本セグメント長の２倍以上である場合、ダイジェスト生成部７２は、発話の途中で切られたチャプタセグメントを抽出するようにしてもよい。このことは、特徴ピークセグメントについても同様である。 In addition, when the utterance section is sufficiently longer than the basic segment length, that is, for example, when the utterance section is twice or more the basic segment length, the digest generation unit 72 determines the chapter segment that was cut off during the utterance. May be extracted. The same applies to the feature peak segment.

この場合、チャプタセグメントが、発話の途中で途切れることによる違和感を、ユーザに感じさせないようなエフェクトを、チャプタセグメントに追加することが望ましい。 In this case, it is desirable to add to the chapter segment an effect that does not make the user feel uncomfortable due to the chapter segment being interrupted in the middle of speech.

すなわち、例えば、ダイジェスト生成部７２は、抽出したチャプタセグメントにおける発話を、チャプタセグメントの終了に伴ってフェードアウトさせる（発話の音声を徐々に小さくする）様なエフェクトを掛ける等することが望ましい。 That is, for example, it is desirable that the digest generating unit 72 applies an effect such that the utterance in the extracted chapter segment fades out (the voice of the utterance is gradually reduced) as the chapter segment ends.

ところで、ダイジェスト生成部７２は、分割部７１により分割されたコンテンツから、チャプタセグメントや特徴ピークセグメントを抽出するようにしている。 Incidentally, the digest generation unit 72 extracts chapter segments and feature peak segments from the content divided by the division unit 71.

しかしながら、例えば、ユーザが編集ソフトなどを用いて、コンテンツを複数のチャプタに分割した場合、そのコンテンツを対象として、チャプタセグメントや特徴ピークセグメントを抽出することができる。なお、チャプタポイントデータは、ユーザが編集ソフトなどを用いて、コンテンツを複数のチャプタに分割した際に、編集ソフトなどにより生成されるものとする。 However, for example, when the user divides content into a plurality of chapters using editing software or the like, chapter segments and feature peak segments can be extracted for the content. Note that chapter point data is generated by editing software or the like when a user divides content into a plurality of chapters using editing software or the like.

以下、ダイジェスト生成部７２は、各チャプタから、それぞれ、１個のチャプタセグメントと１個の特徴ピークセグメントを抽出するとともに、生成したダイジェストにBGMのみを付加するものとして説明する。 In the following description, it is assumed that the digest generation unit 72 extracts one chapter segment and one feature peak segment from each chapter and adds only BGM to the generated digest.

次に、図１９は、ダイジェスト生成部７２が行うダイジェスト生成処理の概要を示している。 Next, FIG. 19 shows an outline of the digest generation process performed by the digest generation unit 72.

図１９には、ダイジェストの抽出対象とされたコンテンツを、複数のチャプタに分割するための分割線が示されている。この分割線の上には、対応するチャプタIDが示されている。 FIG. 19 shows dividing lines for dividing the content to be digest extracted into a plurality of chapters. A corresponding chapter ID is shown on the dividing line.

また、図１９には、特徴量時系列データとして、例えば音声パワー時系列データ９１及び顔領域時系列データ９２が示されている。 In FIG. 19, for example, voice power time series data 91 and face area time series data 92 are shown as feature amount time series data.

ここで、音声パワー時系列データ９１とは、フレームtの音声が大であるほどに大きな値とされる時系列のデータをいう。また、顔領域時系列データとは、フレームtに表示される顔（の割合）が大であるほどに大きな値とされる時系列のデータをいう。 Here, the audio power time-series data 91 refers to time-series data having a larger value as the audio of the frame t is larger. The face area time-series data refers to time-series data having a larger value as the face (ratio) displayed in the frame t is larger.

なお、図１９において、横軸はコンテンツを再生する際の時刻tを表し、縦軸は特徴量時系列データを表す。 In FIG. 19, the horizontal axis represents time t when content is reproduced, and the vertical axis represents feature amount time series data.

さらに、図１９において、白色の矩形は、チャプタの先頭部分を示すチャプタセグメントを表し、斜線で示される矩形は、音声パワー時系列データ９１に基づき抽出される特徴ピークセグメントを表す。また、黒色の矩形は、顔領域時系列データ９２に基づき抽出される特徴ピークセグメントを表す。 Further, in FIG. 19, a white rectangle represents a chapter segment indicating the head portion of the chapter, and a rectangle indicated by diagonal lines represents a feature peak segment extracted based on the audio power time series data 91. A black rectangle represents a feature peak segment extracted based on the face area time-series data 92.

ダイジェスト生成部７２は、分割部７１からのチャプタポイントデータ（チャプタID）に基づいて、コンテンツ記憶部１１から読み出したコンテンツの各チャプタを識別し、識別した各チャプタのチャプタセグメントを抽出する。 The digest generating unit 72 identifies each chapter of the content read from the content storage unit 11 based on the chapter point data (chapter ID) from the dividing unit 71, and extracts the chapter segment of each identified chapter.

また、ダイジェスト生成部７２は、コンテンツ記憶部１１から読み出したコンテンツから、例えば、図１９に示されるような音声パワー時系列データ９１を抽出する。 Moreover, the digest production | generation part 72 extracts the audio | voice power time series data 91 as shown, for example in FIG. 19 from the content read from the content memory | storage part 11. FIG.

さらに、ダイジェスト生成部７２は、識別した各チャプタにおいて、音声パワー時系列データ９１が最大値となるときのフレームを、ピーク特徴フレームとして抽出する。 Further, the digest generation unit 72 extracts, as a peak feature frame, a frame when the audio power time-series data 91 has the maximum value in each identified chapter.

そして、ダイジェスト生成部７２は、抽出したピーク特徴フレームを含む特徴ピークセグメント（例えば、ピーク特徴フレームを先頭とする特徴ピークセグメント）を、チャプタから抽出する。 Then, the digest generating unit 72 extracts a feature peak segment including the extracted peak feature frame (for example, a feature peak segment starting from the peak feature frame) from the chapter.

なお、例えば、ダイジェスト生成部７２は、一定間隔で、ピーク特徴フレームの抽出ポイントを決定する。そして、ダイジェスト生成部７２は、決定した抽出ポイントに基づき決まる範囲において、音声パワー時系列データ９１が最大値となるときのフレームを、ピーク特徴フレームとして抽出してもよい。 For example, the digest generation unit 72 determines the extraction points of the peak feature frames at regular intervals. And the digest production | generation part 72 may extract the flame | frame when the audio | voice power time series data 91 becomes the maximum value in the range determined based on the determined extraction point as a peak feature frame.

また、例えば、ダイジェスト生成部７２は、音声パワー時系列データ９１の最大値が、予め決められた閾値以下である場合、ピーク特徴フレームの抽出を行わないようにしてもよい。この場合、ダイジェスト生成部７２は、特徴ピークセグメントを抽出しないこととなる。 For example, the digest generation unit 72 may not extract the peak feature frame when the maximum value of the audio power time-series data 91 is equal to or less than a predetermined threshold. In this case, the digest generation unit 72 does not extract the feature peak segment.

さらに、例えば、ダイジェスト生成部７２は、音声パワー時系列データ９１の最大値に代えて、音声パワー時系列データ９１が極大値となるときのフレームを、ピーク特徴フレームとして抽出するようにしてもよい。 Further, for example, the digest generation unit 72 may extract a frame when the audio power time-series data 91 becomes a maximum value instead of the maximum value of the audio power time-series data 91 as a peak feature frame. .

なお、ダイジェスト生成部７２は、例えば、１個の音声パワー時系列データ９１を用いて、特徴ピークセグメントを抽出する他、複数の特徴量時系列データを用いて、特徴ピークセグメントを抽出するようにしてもよい。 For example, the digest generation unit 72 extracts feature peak segments using a plurality of feature amount time-series data in addition to extracting feature peak segments using one audio power time-series data 91. May be.

すなわち、例えば、ダイジェスト生成部７２は、コンテンツ記憶部１１から読み出したコンテンツから、音声パワー時系列データ９１の他、顔領域時系列データ９２を抽出する。 That is, for example, the digest generation unit 72 extracts the face area time-series data 92 in addition to the audio power time-series data 91 from the content read from the content storage unit 11.

また、ダイジェスト生成部７２は、音声パワー時系列データ９１及び顔領域時系列データ９２のうち、チャプタにおける最大値が大となる方の特徴量時系列データを選択する。 Moreover, the digest production | generation part 72 selects the feature-value time series data in which the maximum value in a chapter becomes large among the audio | voice power time series data 91 and the face area time series data 92.

そして、ダイジェスト生成部７２は、チャプタにおいて、選択した特徴量時系列データが最大値となるときのフレームを、ピーク特徴フレームとして抽出し、抽出したピーク特徴フレームを含む特徴ピークセグメントを、チャプタから抽出する。 Then, the digest generation unit 72 extracts a frame when the selected feature amount time-series data has the maximum value in the chapter as a peak feature frame, and extracts a feature peak segment including the extracted peak feature frame from the chapter. To do.

この場合、ダイジェスト生成部７２は、所定のチャプタにおいて、音声が大きくなっている部分を特徴ピークセグメントとして抽出し、他のチャプタにおいて、顔の割合が多くなっている部分を特徴ピークセグメントとして抽出することとなる。 In this case, the digest generation unit 72 extracts a portion where the voice is loud in a predetermined chapter as a feature peak segment, and extracts a portion where the ratio of the face is large in other chapters as a feature peak segment. It will be.

このため、ダイジェスト生成部７２において、例えば音声が大きくなっている部分のみが、特徴ピークセグメントとして抽出されることにより、単調なダイジェストが生成されることを防止できる。 For this reason, in the digest production | generation part 72, it can prevent that a monotonous digest is produced | generated by extracting only the part where the audio | voice is loud, for example as a feature peak segment.

すなわち、ダイジェスト生成部７２は、あたかも特徴ピークセグメントがランダムに抽出されたものであるかのようなランダム性のあるダイジェストを生成することができる。 That is, the digest generation unit 72 can generate a digest with randomness as if the feature peak segment was extracted at random.

これにより、ダイジェスト生成部７２では、生成されるダイジェストがパターン化することにより、ダイジェストを視聴するユーザが飽きてしまうような事態を防止できる。 Thereby, the digest production | generation part 72 can prevent the situation where the user who views a digest gets tired because the produced | generated digest forms a pattern.

その他、例えば、ダイジェスト生成部７２は、複数の特徴時系列データ毎に、特徴ピークセグメントを抽出するようにしてもよい。 In addition, for example, the digest generation unit 72 may extract a feature peak segment for each of a plurality of feature time series data.

すなわち、例えば、ダイジェスト生成部７２は、識別した各チャプタにおいて、音声パワー時系列データ９１が最大値となるときのフレームを、ピーク特徴フレームとして含む特徴ピークセグメントを抽出する。また、ダイジェスト生成部７２は、顔領域時系列データ９２が最大値となるときのフレームを、ピーク特徴フレームとして含む特徴ピークセグメントも抽出する。この場合、ダイジェスト生成部７２は、１個のチャプタから、２個の特徴ピークセグメントを抽出することとなる。 That is, for example, the digest generation unit 72 extracts a feature peak segment including, as a peak feature frame, a frame when the audio power time-series data 91 has the maximum value in each identified chapter. Moreover, the digest production | generation part 72 also extracts the feature peak segment which contains the frame when the face area time series data 92 becomes the maximum value as a peak feature frame. In this case, the digest generation unit 72 extracts two feature peak segments from one chapter.

なお、図１９の右下に示されるように、チャプタID=4に対応する分割線から、チャプタID=5に対応する分割線までのチャプタからは、チャプタセグメント（白色の矩形で示す）と特徴ピークセグメント（斜線の矩形で示す）とが重複した状態で抽出されることとなる。 Note that, as shown in the lower right of FIG. 19, chapter segments (shown in white rectangles) and features from the dividing line corresponding to chapter ID = 4 to the dividing line corresponding to chapter ID = 5 The peak segment (indicated by the hatched rectangle) is extracted in an overlapping state.

この場合、ダイジェスト生成部７２は、チャプタセグメントと特徴ピークセグメントとを、１個のセグメントとして取り扱う。 In this case, the digest generation unit 72 treats the chapter segment and the feature peak segment as one segment.

ダイジェスト生成部７２は、例えば、図１９に示されるようにして抽出したチャプタセグメント及び特徴ピークセグメントを、時系列につなぎ合わせることにより、ダイジェストを生成する。 For example, the digest generating unit 72 generates a digest by connecting the chapter segments and feature peak segments extracted as shown in FIG. 19 in time series.

そして、ダイジェスト生成部７２は、生成したダイジェストに、BGM(background music)等を付加し、BGMが付加されたダイジェストを、コンテンツ記憶部１１に供給して記憶させる。 The digest generation unit 72 adds BGM (background music) or the like to the generated digest, and supplies the digest with the BGM added to the content storage unit 11 for storage.

[ダイジェスト生成部７２の詳細]
次に、図２０は、ダイジェスト生成部７２の詳細な構成例を示している。 [Details of digest generation unit 72]
Next, FIG. 20 illustrates a detailed configuration example of the digest generation unit 72.

ダイジェスト生成部７２は、チャプタセグメント抽出部１１１、特徴量抽出部１１２、特徴ピークセグメント抽出部１１３、及びエフェクト追加部１１４から構成される。 The digest generation unit 72 includes a chapter segment extraction unit 111, a feature amount extraction unit 112, a feature peak segment extraction unit 113, and an effect addition unit 114.

なお、チャプタセグメント抽出部１１１及び特徴量抽出部１１２には、コンテンツ記憶部１１からコンテンツが供給される。 The content is supplied from the content storage unit 11 to the chapter segment extraction unit 111 and the feature amount extraction unit 112.

また、チャプタセグメント抽出部１１１及び特徴ピークセグメント抽出部１１３には、分割部７１からチャプタポイントデータが供給される。 Further, chapter point data is supplied from the dividing unit 71 to the chapter segment extracting unit 111 and the feature peak segment extracting unit 113.

チャプタセグメント抽出部１１１は、分割部７１からのチャプタポイントデータに基づいて、コンテンツ記憶部１１から供給されるコンテンツの各チャプタを識別する。そして、チャプタセグメント抽出部１１１は、識別した各チャプタから、チャプタセグメントを抽出し、エフェクト追加部１１４に供給する。 The chapter segment extraction unit 111 identifies each chapter of the content supplied from the content storage unit 11 based on the chapter point data from the division unit 71. Then, the chapter segment extraction unit 111 extracts chapter segments from the identified chapters and supplies them to the effect addition unit 114.

特徴量抽出部１１２は、コンテンツ記憶部１１から供給されるコンテンツから、例えば複数の特徴量時系列データを抽出し、特徴ピークセグメント抽出部１１３に供給する。なお、特徴量時系列データについては、図２１乃至図２３を参照して詳述する。 The feature amount extraction unit 112 extracts, for example, a plurality of feature amount time-series data from the content supplied from the content storage unit 11 and supplies the extracted feature amount time series data to the feature peak segment extraction unit 113. The feature amount time-series data will be described in detail with reference to FIGS.

また、特徴量抽出部１１２は、スムージングフィルタ（平滑化フィルタ）等を用いて、抽出した特徴量時系列データを平滑化することにより、特徴量時系列データに生じているノイズを除去した上で、特徴ピークセグメント抽出部１１３に供給するようにしてもよい。 The feature quantity extraction unit 112 smooths the extracted feature quantity time-series data using a smoothing filter (smoothing filter) or the like to remove noise generated in the feature quantity time-series data. The feature peak segment extraction unit 113 may be supplied.

さらに、特徴量抽出部１１２は、コンテンツ記憶部１１からのコンテンツを、そのまま、特徴ピークセグメント抽出部１１３に供給する。 Further, the feature amount extraction unit 112 supplies the content from the content storage unit 11 to the feature peak segment extraction unit 113 as it is.

特徴ピークセグメント抽出部１１３は、分割部７１からのチャプタポイントデータに基づいて、コンテンツ記憶部１１から特徴量抽出部１１２を介して供給されるコンテンツの各チャプタを識別する。 The feature peak segment extraction unit 113 identifies each chapter of the content supplied from the content storage unit 11 via the feature amount extraction unit 112 based on the chapter point data from the division unit 71.

また、特徴ピークセグメント抽出部１１３は、特徴量抽出部１１２から供給される複数の特徴量時系列データに基づいて、図１９を参照して説明したように、識別した各チャプタから、特徴ピークセグメントを抽出して、エフェクト追加部１１４に供給する。 Further, the feature peak segment extraction unit 113, based on the plurality of feature amount time series data supplied from the feature amount extraction unit 112, from each identified chapter, the feature peak segment as described with reference to FIG. Is extracted and supplied to the effect adding unit 114.

エフェクト追加部１１４は、例えば、図１９に示されるようにして抽出したチャプタセグメント及び特徴ピークセグメントを、時系列につなぎ合わせることにより、ダイジェストを生成する。 For example, the effect adding unit 114 generates a digest by connecting the chapter segments and feature peak segments extracted as shown in FIG. 19 in time series.

また、エフェクト追加部１１４は、生成したダイジェストにBGM等を付加し、コンテンツ記憶部１１に供給して記憶させる。なお、エフェクト追加部１１４が、ダイジェストにBGM等を付加する処理は、図２４を参照して詳述する。 Further, the effect adding unit 114 adds BGM or the like to the generated digest, and supplies it to the content storage unit 11 for storage. The process in which the effect adding unit 114 adds BGM or the like to the digest will be described in detail with reference to FIG.

さらに、エフェクト追加部１１４は、生成したダイジェストを構成する各セグメント（チャプタセグメントや特徴ピークセグメント）の終了間際のフレームをフェードアウトさせたり、開始直後のフレームをフェードインさせる等のエフェクトも付加することができる。 Further, the effect adding unit 114 may add an effect such as fading out a frame just before the end of each segment (chapter segment or feature peak segment) constituting the generated digest, or fading in a frame immediately after the start. it can.

[特徴量時系列データの例]
次に、図２１乃至図２３を参照して、図２０の特徴量抽出部１１２が、コンテンツから特徴量時系列データを抽出（生成）する方法を説明する。 [Example of feature time-series data]
Next, with reference to FIGS. 21 to 23, a method in which the feature amount extraction unit 112 in FIG. 20 extracts (generates) feature amount time-series data from content will be described.

なお、特徴量抽出部１１２は、特徴量時系列データとして、例えば、顔領域時系列データ、音声パワー時系列データ、ズームイン強度時系列データ、又はズームアウト強度時系列データの少なくとも１つを、コンテンツから抽出する。 Note that the feature quantity extraction unit 112 uses at least one of, for example, face area time series data, audio power time series data, zoom-in intensity time series data, or zoom-out intensity time series data as content quantity time-series data, Extract from

ここで、顔領域時系列データは、特徴ピークセグメント抽出部１１３において、フレーム上に表示される顔の領域（顔領域）の割合が多くなったときのフレームを含むセグメントを、特徴ピークセグメントとして、チャプタから抽出する際に用いられる。 Here, in the face area time-series data, the segment including the frame when the ratio of the face area (face area) displayed on the frame is increased in the feature peak segment extraction unit 113 as the feature peak segment. Used when extracting from chapters.

特徴量抽出部１１２は、コンテンツを構成する各フレームtから、人間の顔が存在する領域である顔領域（のピクセル数）を検出する。そして、特徴量抽出部１１２は、その検出結果に基づいて、フレームt毎に顔領域特徴値f₁(t)=R_t-ave(R_t')を算出することにより、フレームtの時系列に、顔領域特徴値f₁(t)を並べて得られる顔領域時系列データを生成する。 The feature amount extraction unit 112 detects a face area (the number of pixels) that is a region where a human face exists from each frame t constituting the content. Then, the feature amount extraction unit 112 calculates the face region feature value f ₁ (t) = R _t -ave (R _{t ′} ) for each frame t based on the detection result, thereby obtaining a time series of the frame t. In addition, face area time-series data obtained by arranging the face area feature values f ₁ (t) is generated.

なお、割合R_t=顔領域のピクセル数/フレームの総ピクセル数であり、ave(R_t')は、区間[t-W_L,t+W_L]に存在するフレームt'から得られる割合R_t'の平均を表す。また、時刻tはフレームtが表示される時刻を表し、値W_L(>0)は予め設定された値である。 Note that the ratio R _t = the number of pixels in the face area / the total number of pixels in the frame, and ave (R _{t ′} ) is the ratio R _t obtained from the frame t ′ existing in the section [tW _L , t + W _L ]. Represents the average of _' . The time t represents the time at which the frame t is displayed, and the value W _L (> 0) is a preset value.

次に、図２１は、特徴量抽出部１１２が、特徴量時系列データとして、音声パワー時系列データを生成するときの一例を示している。 Next, FIG. 21 illustrates an example when the feature quantity extraction unit 112 generates audio power time series data as the feature quantity time series data.

図２１において、音声データx（t）は、時刻t_sから時刻t_eまでの全区間[t_s,t_e]で再生される音声データを表している。なお、横軸は時刻tを表しており、縦軸は音声データx(t)を表す。 In Figure 21, the audio data x (t) is the entire interval [t _s, t _e] from time t _s to time t _e represents the audio data to be reproduced in. The horizontal axis represents time t, and the vertical axis represents audio data x (t).

ここで、音声パワー時系列データは、特徴ピークセグメント抽出部１１３において、音声（音量）が大きくなったときのフレームを含むセグメントを、特徴ピークセグメントとして、チャプタから抽出する際に用いられる。 Here, the voice power time-series data is used when the segment including the frame when the voice (volume) is increased is extracted from the chapter as the feature peak segment in the feature peak segment extraction unit 113.

特徴量抽出部１１２は、次式（３）により、コンテンツを構成する各フレームtの音声パワーP(t)を算出する。 The feature quantity extraction unit 112 calculates the audio power P (t) of each frame t constituting the content by the following equation (3).

ここで、式（３）において、音声パワーP(t)は、区間[t-W,t+W]における各音声データx(τ)の自乗和の平方根を表す。また、τはt-Wからt+Wまでの値とされ、Wは予め設定される。 Here, in Expression (3), the voice power P (t) represents the square root of the square sum of each voice data x (τ) in the section [t−W, t + W]. Also, τ is a value from t−W to t + W, and W is preset.

そして、特徴量抽出部１１２は、区間[t-W,t+W]で算出した音声パワーP(t)の平均値から、全区間[t_s,t_e]で算出した音声パワーP(t)の平均値を差し引くことにより得られる差分値を、フレームtにおける音声パワー特徴量値f₂(t)として算出する。 Then, the feature extraction unit 112, the interval [tW, t + W] from the average value of the audio calculated at power P (t), the entire interval [t _s, t _e] sound calculated in power P of (t) A difference value obtained by subtracting the average value is calculated as an audio power feature value f ₂ (t) in frame t.

特徴量抽出部１１２は、各フレームt毎に、音声パワー特徴量値f₂(t)を算出することにより、フレームtの時系列に、音声パワー特徴量値f₂(t)を並べて得られる音声パワー時系列データを生成する。 The feature amount extraction unit 112 calculates the sound power feature value f ₂ (t) for each frame t, thereby obtaining the sound power feature value f ₂ (t) in time series of the frame t. Generate voice power time-series data.

次に、図２２及び図２３を参照して、特徴量抽出部１１２が、特徴量時系列データとして、ズームイン強度時系列データを生成する方法を説明する。 Next, a method in which the feature amount extraction unit 112 generates zoom-in intensity time series data as feature amount time series data will be described with reference to FIGS.

なお、ズームイン強度時系列データは、特徴ピークセグメント抽出部１１３において、ズームイン（ズームアップ）されたときのフレームを含むセグメントを、特徴ピークセグメントとして、チャプタから抽出する際に用いられる。 Note that the zoom-in intensity time-series data is used when the segment including the frame when zoomed in (zoomed up) is extracted from the chapter as the feature peak segment by the feature peak segment extraction unit 113.

図２２は、フレームtの動きベクトルの一例を示している。 FIG. 22 shows an example of the motion vector of frame t.

図２２には、複数のブロックに区分されたフレームtが示されている。また、フレームtの各ブロックには、そのブロックの動きベクトルが示されている。 FIG. 22 shows a frame t divided into a plurality of blocks. Each block of the frame t shows a motion vector of the block.

特徴量抽出部１１２は、コンテンツを構成する各フレームtを、図２２に示されるような、複数のブロックに区分する。そして、特徴量抽出部１１２は、コンテンツを構成する各フレームtを用いて、複数のブロック毎に、ブロックマッチングなどにより、ブロックの動きベクトルを検出する。 The feature quantity extraction unit 112 divides each frame t constituting the content into a plurality of blocks as shown in FIG. Then, the feature amount extraction unit 112 detects a motion vector of the block by block matching or the like for each of a plurality of blocks using each frame t constituting the content.

ここで、フレームtにおけるブロックの動きベクトルとは、例えば、フレームtからフレームt+1に対する、ブロックの動きを表すベクトルをいう。 Here, the motion vector of the block in the frame t is, for example, a vector representing the motion of the block from the frame t to the frame t + 1.

次に、図２３は、フレームtの各ブロックの動きベクトルとの内積が計算される動きベクトルから構成されるズームインテンプレートの一例を示している。 Next, FIG. 23 shows an example of a zoom-in template composed of motion vectors for which the inner product with the motion vector of each block of the frame t is calculated.

このズームインテンプレートは、図２３に示されるように、ズームインされたときの各ブロックの動きを表す動きベクトルにより構成される。 As shown in FIG. 23, this zoom-in template is composed of motion vectors representing the motion of each block when zoomed in.

特徴量抽出部１１２は、フレームtにおける各ブロックの動きベクトルa_t（図２２）と、それぞれ対応する、ズームインテンプレートの各ブロックの動きベクトルb（図２３）との内積a_t・bを計算し、その計算結果の総和sum(a_t・b)を算出する。 Feature quantity extracting unit 112, and each block of the motion vector a _t (FIG. 22) in the frame t, corresponding respectively, the inner product a _t · b between the motion vector b of each block (FIG. 23) of the zoom template calculated calculates the calculation result sum sum of (a _t · b).

また、特徴量抽出部１１２は、区間[t-W,t+W]に含まれるフレームt'毎に算出される総和sum(a_t'・b)の平均値ave(sum(a_t'・b))を算出する。 In addition, the feature quantity extraction unit 112 calculates the average value ave (sum (a _{t ′} · b) of the sum sum (a _{t ′} · b) calculated for each frame t ′ included in the section [tW, t + W]. ) Is calculated.

そして、特徴量抽出部１１２は、総和sum(a_t・b)から平均値ave(sum(a_t'・b))を差し引くことにより得られる差分値を、フレームtにおけるズームイン特徴量値f₃(t)として算出する。ズームイン特徴量値f₃(t)は、フレームtにおけるズームインの大きさに比例する。 Then, the feature amount extraction unit 112 calculates the difference value obtained by subtracting the average value ave (sum (a _{t ′} · b)) from the sum total sum (a _t · b) as the zoom-in feature amount value f ₃ in the frame t. Calculate as (t). The zoom-in feature value f ₃ (t) is proportional to the size of the zoom-in at the frame t.

特徴量抽出部１１２は、各フレームt毎に、ズームイン特徴量値f₃(t)を算出することにより、フレームtの時系列に、ズームイン特徴量値f₃(t)を並べて得られるズームイン強度時系列データを生成する。 The feature amount extraction unit 112 calculates the zoom-in feature value f ₃ (t) for each frame t, thereby obtaining the zoom-in intensity obtained by arranging the zoom-in feature value f ₃ (t) in time series of the frame t. Generate time series data.

ここで、ズームアウト強度時系列データは、特徴ピークセグメント抽出部１１３において、ズームアウトされたときのフレームを含むセグメントを、特徴ピークセグメントとして、チャプタから抽出する際に用いられる。 Here, the zoom-out intensity time-series data is used when the segment including the frame when zoomed out is extracted from the chapter as the feature peak segment in the feature peak segment extraction unit 113.

特徴量抽出部１１２は、ズームアウト強度時系列データを生成する場合、図２３に示されるようなズームインテンプレートに代えて、図２３に示されたテンプレートの動きベクトルとは逆向きの動きベクトルを、ズームアップテンプレートとして用いる。 When generating the zoom-out intensity time-series data, the feature amount extraction unit 112 replaces the zoom-in template as shown in FIG. 23 with a motion vector in the opposite direction to the motion vector of the template shown in FIG. Used as a zoom-up template.

すなわち、特徴量抽出部１１２は、ズームイン強度時系列データを生成する場合と同様に、ズームアウトテンプレートを用いて、ズームアップ強度時系列データを生成する。 That is, the feature amount extraction unit 112 generates zoom-up intensity time-series data using the zoom-out template in the same manner as when generating zoom-in intensity time-series data.

次に、図２４は、エフェクト追加部１１４が、生成したダイジェストにBGMを付加するときの詳細を説明する。 Next, FIG. 24 illustrates details when the effect adding unit 114 adds BGM to the generated digest.

図２４の上側には、ダイジェストを構成する各セグメント（チャプタセグメント及び特徴ピークセグメント）の音量の重みが示されている。 On the upper side of FIG. 24, the weight of the volume of each segment (chapter segment and feature peak segment) constituting the digest is shown.

図２４の下側には、図１９に示されるチャプタセグメント及び特徴ピークセグメントを結合して得られるダイジェストが示されている。 24 shows a digest obtained by combining the chapter segment and the characteristic peak segment shown in FIG.

エフェクト追加部１１４は、チャプタセグメント抽出部１１１からのチャプタセグメントと、特徴ピークセグメント抽出部１１３からの特徴ピークセグメントを、図２４の下側に示されるように、時系列に結合することにより、約L秒のダイジェストを生成する。 The effect addition unit 114 combines the chapter segments from the chapter segment extraction unit 111 and the feature peak segments from the feature peak segment extraction unit 113 in time series as shown in the lower side of FIG. Generate a digest of L seconds.

ここで、ダイジェストの長さLは、チャプタセグメント抽出部１１１により抽出されるチャプタセグメントの個数や長さ、及び特徴ピークセグメント抽出部１１３により抽出される特徴ピークセグメントの個数や長さにより決まる。 Here, the length L of the digest is determined by the number and length of chapter segments extracted by the chapter segment extraction unit 111 and the number and length of feature peak segments extracted by the feature peak segment extraction unit 113.

また、例えば、ユーザは、操作部１７を用いて、ダイジェストの長さLを設定することができる。すなわち、操作部１７は、ユーザによる長さLの設定操作に対応する操作信号を、制御部１６に供給する。制御部１６は、操作部１７からの操作信号に基づいて、ダイジェスト生成部７２を制御して、設定操作で設定された長さLのダイジェストを、ダイジェスト生成部７２に生成させる。 For example, the user can set the digest length L using the operation unit 17. That is, the operation unit 17 supplies an operation signal corresponding to the length L setting operation by the user to the control unit 16. Based on the operation signal from the operation unit 17, the control unit 16 controls the digest generation unit 72 to cause the digest generation unit 72 to generate the digest having the length L set by the setting operation.

ダイジェスト生成部７２は、抽出したセグメントの総延長（長さの総和）が長さLとなるまで、チャプタセグメントや特徴ピークセグメントを抽出していくこととなる。 The digest generation unit 72 extracts chapter segments and feature peak segments until the total length (total length) of the extracted segments reaches the length L.

この場合、ダイジェスト生成部７２は、各チャプタからチャプタセグメントを優先して抽出していき、その後、特徴ピークセグメントを抽出するようにして、各チャプタから少なくともチャプタセグメントが抽出されるようにすることが望ましい。 In this case, the digest generation unit 72 preferentially extracts chapter segments from each chapter, and then extracts feature peak segments so that at least chapter segments are extracted from each chapter. desirable.

また、例えば、ダイジェスト生成部７２は、各チャプタからチャプタセグメントを優先して抽出した後、特徴ピークセグメントを抽出する際には、１又は複数の特徴量時系列データにおいて、極大値が大きい順に、対応する特徴ピークセグメントを抽出していく。 In addition, for example, when the digest generation unit 72 extracts the feature peak segment after extracting the chapter segment preferentially from each chapter, in one or a plurality of feature amount time-series data, Corresponding feature peak segments are extracted.

さらに、例えば、ユーザは、操作部１７を用いて、ダイジェストの長さLとともに、１個のチャプタから抽出されるセグメントの長さの和Sを設定する設定操作を行うことによっても、所望の長さLのダイジェストを、ダイジェスト生成部７２に生成させられる。 Further, for example, the user can perform a desired length by performing a setting operation for setting the sum S of the lengths of segments extracted from one chapter together with the length L of the digest using the operation unit 17. The digest L is generated by the digest generation unit 72.

この場合、操作部１７は、ユーザの設定操作に対応する操作信号を、制御部１６に供給する。制御部１６は、操作部１７からの操作信号に基づいて、ユーザにより設定されたL及びSを識別し、識別したL及びSに基づいて、総分割数Dを算出（逆算）する。 In this case, the operation unit 17 supplies an operation signal corresponding to the user's setting operation to the control unit 16. The control unit 16 identifies L and S set by the user based on the operation signal from the operation unit 17, and calculates (reverse calculation) the total number of divisions D based on the identified L and S.

すなわち、総分割数Dは、L/Sに最も近い整数値（例えば、L/Sを四捨五入した値）とされる。例えば、いま、ユーザの設定操作により、L=30に設定された他、チャプタから、7.5秒のチャプタセグメントと、7.5秒の特徴ピークセグメントを抽出するように設定された場合、つまり、S=15(7.5+7.5)に設定された場合を考える。 That is, the total division number D is an integer value closest to L / S (for example, a value obtained by rounding L / S). For example, when the setting operation by the user is set to L = 30 and the chapter segment of 7.5 seconds and the feature peak segment of 7.5 seconds are set to be extracted from the chapter, that is, S = 15 Consider the case where (7.5 + 7.5) is set.

この場合、制御部１６は、L=30及びS=15に基づいて、L/S=30/15=2を算出し、L/S=2に最も近い整数値2を、総分割数Dとして算出する。 In this case, the control unit 16 calculates L / S = 30/15 = 2 based on L = 30 and S = 15, and sets the integer value 2 closest to L / S = 2 as the total division number D. calculate.

制御部１６は、分割部７１を制御し、分割部７１に、算出した総分割数Dに対応するチャプタポイントデータを生成させる。これにより、分割部７１は、制御部１６からの制御に従い、算出された総分割数Dに対応するチャプタポイントデータを生成し、ダイジェスト生成部７２に供給する。 The control unit 16 controls the dividing unit 71 to cause the dividing unit 71 to generate chapter point data corresponding to the calculated total division number D. Thereby, the dividing unit 71 generates chapter point data corresponding to the calculated total number of divisions D in accordance with the control from the control unit 16 and supplies the generated chapter point data to the digest generating unit 72.

ダイジェスト生成部７２は、分割部７１からのチャプタポイントデータと、コンテンツ記憶部１１から読み出したコンテンツとに基づいて、ユーザにより設定された長さLのダイジェストを生成し、コンテンツ記憶部１１に供給して記憶させる。 The digest generation unit 72 generates a digest of length L set by the user based on the chapter point data from the division unit 71 and the content read from the content storage unit 11 and supplies the digest to the content storage unit 11. To remember.

また、エフェクト追加部１１４は、図２４の上側に示したような重みαで、ダイジェストを構成する各セグメント（チャプタセグメントや特徴ピークセグメント）の音声データにαの重み付けをし、BGM（のデータ）に1-αの重み付けをする。 Further, the effect adding unit 114 weights the audio data of each segment (chapter segment or feature peak segment) constituting the digest with the weight α as shown in the upper side of FIG. Is weighted 1-α.

そして、エフェクト追加部１１４は、重み付け後の音声データと、重み付け後のBGMとを混合し、その結果得られる混合音声データを、ダイジェストを構成する各セグメントの音声データとして、ダイジェストを構成する各フレームに対応付ける。 Then, the effect adding unit 114 mixes the weighted audio data and the weighted BGM, and uses the resulting mixed audio data as the audio data of each segment that constitutes the digest. Associate with.

なお、エフェクト追加部１１４は、図示せぬ内蔵のメモリに、BGM(のデータ)を予め保持しているものとし、ユーザの操作に応じて、付加されるBGMが指定されるものとする。 The effect adding unit 114 is assumed to store BGM (data) in a built-in memory (not shown) in advance, and the BGM to be added is designated in accordance with a user operation.

すなわち、例えば、エフェクト追加部１１４は、白色の矩形で示されるチャプタセグメントにBGMを付加する場合、BGMの音量を大きめに設定するために、チャプタセグメントの音声データに、0.5よりも小さな重みを重み付け（乗算）をする。 That is, for example, when adding BGM to a chapter segment indicated by a white rectangle, the effect adding unit 114 weights the audio data of the chapter segment with a weight smaller than 0.5 in order to set the volume of the BGM higher. (Multiply).

具体的には、例えば、エフェクト追加部１１４は、図２４において、チャプタセグメントの音声データに0.2の重み付けをし、付加するBGM（のデータ）に0.8の重み付けをする。 Specifically, for example, in FIG. 24, the effect adding unit 114 weights the chapter segment audio data by 0.2 and weights the BGM (data) to be added by 0.8.

また、例えば、エフェクト追加部１１４は、複数の特徴量時系列データのうち、音声パワー時系列データとは異なる特徴量時系列データに基づき抽出された特徴ピークセグメントにBGMを付加する場合、チャプタセグメントにBGMを付加する場合と同様の重み付けで付加する。 In addition, for example, when the effect adding unit 114 adds BGM to a feature peak segment extracted based on feature amount time-series data different from the audio power time-series data among a plurality of feature amount time-series data, the chapter segment Is added with the same weighting as when adding BGM.

具体的には、例えば、エフェクト追加部１１４は、図２４において、顔領域時系列データに基づき抽出された特徴ピークセグメント（黒色の矩形で示される）の音声データに0.2の重み付けをし、付加するBGMに0.8の重み付けをする。 Specifically, for example, in FIG. 24, the effect adding unit 114 weights and adds 0.2 to the audio data of the feature peak segment (indicated by a black rectangle) extracted based on the face area time-series data. Weight BGM 0.8.

また、例えば、エフェクト追加部１１４は、音声パワー時系列データに基づき抽出された特徴ピークセグメント（斜線の矩形で示される）にBGMを付加する場合、BGMの音量を小さめに設定するために、その特徴ピークセグメントの音声データに、0.5よりも大きな重みを重み付けをする。 Further, for example, when adding the BGM to the feature peak segment (indicated by the hatched rectangle) extracted based on the audio power time-series data, the effect adding unit 114 sets the BGM volume to a lower level. A weight greater than 0.5 is weighted to the voice data of the feature peak segment.

具体的には、例えば、エフェクト追加部１１４は、図２４において、音声パワー時系列データに基づき抽出された特徴ピークセグメントの音声データに0.8の重み付けをし、付加するBGMに0.2の重み付けをする。 Specifically, for example, in FIG. 24, the effect adding unit 114 weights the feature peak segment audio data extracted based on the audio power time-series data by 0.8, and weights the BGM to be added by 0.2.

なお、図１９に示されるように、例えば、チャプタセグメントと特徴ピークセグメントとがオーバラップした（重なった）状態で抽出された場合、１個のセグメントとして抽出されることとなる。 As shown in FIG. 19, for example, when the chapter segment and the feature peak segment are extracted in an overlapped (overlapped) state, they are extracted as one segment.

この場合、エフェクト追加部１１４において、チャプタセグメントと特徴ピークセグメントから構成される１個のセグメントの音声データに適用する重みとして、先頭のフレームの時刻が時間的に後の特徴ピークセグメントに適用すべき重みが用いられる。 In this case, the effect adding unit 114 should apply the time of the first frame to the feature peak segment that is later in time as the weight applied to the audio data of one segment composed of the chapter segment and the feature peak segment. Weights are used.

また、例えば、エフェクト追加部１１４は、図２４の上側に示されるように、重みの切替わりを、不連続ではなく連続的に変化させる。 Further, for example, as shown in the upper side of FIG. 24, the effect adding unit 114 changes the weight switching continuously instead of discontinuously.

すなわち、例えば、エフェクト追加部１１４は、ダイジェストの音声データに対する重みを、0.2から0.8に不連続に切り替えるのではなく、所定時間（例えば、500ミリ秒）で、0.2から0.8に向かって線形に変化させる。なお、エフェクト追加部１１４は、重みを線形に変化させる他、非線形に変化（例えば、時間の２乗に比例するように重みを変化させるなど）させるようにしてもよい。 That is, for example, the effect adding unit 114 does not switch the weight of the digest audio data from 0.2 to 0.8 discontinuously, but changes linearly from 0.2 to 0.8 in a predetermined time (for example, 500 milliseconds). Let Note that the effect adding unit 114 may change the weight in a non-linear manner (for example, change the weight in proportion to the square of time).

これにより、重み付けの切替わり時に、ダイジェストの音量やBGMの音量が急激に大きくなる等の事態を防止できるので、音量の急激な変化により、ユーザが不愉快な思いをしなくてすむ。 As a result, it is possible to prevent a situation in which the volume of the digest or the volume of the BGM suddenly increases when the weighting is switched, so that the user does not have to feel unpleasant due to a sudden change in the volume.

[レコーダ５１の動作説明]
次に、図２５のフローチャートを参照して、レコーダ５１（特に分割部７１及びダイジェスト生成部７２）が行うダイジェスト生成処理を説明する。 [Description of operation of recorder 51]
Next, digest generation processing performed by the recorder 51 (particularly, the dividing unit 71 and the digest generating unit 72) will be described with reference to the flowchart of FIG.

ステップＳ１９１では、分割部７１は、図１の分割部１５と同様の処理を行う。そして、分割部７１は、コンテンツを複数のセグメントに分割したときの、各セグメントの先頭のフレームを一意に識別するためのチャプタIDを、チャプタポイントデータとして生成する。 In step S191, the dividing unit 71 performs the same process as the dividing unit 15 in FIG. Then, the dividing unit 71 generates, as chapter point data, a chapter ID for uniquely identifying the head frame of each segment when the content is divided into a plurality of segments.

分割部７１は、生成したチャプタポイントデータを、ダイジェスト生成部７２のチャプタセグメント抽出部１１１及び特徴ピークセグメント抽出部１１３に供給する。 The dividing unit 71 supplies the generated chapter point data to the chapter segment extracting unit 111 and the feature peak segment extracting unit 113 of the digest generating unit 72.

ステップＳ１９２では、チャプタセグメント抽出部１１１は、分割部７１からのチャプタポイントデータに基づいて、コンテンツ記憶部１１から供給されるコンテンツの各チャプタを識別する。そして、チャプタセグメント抽出部１１１は、識別した各チャプタから、チャプタの先頭部分を表すチャプタセグメントを抽出し、エフェクト追加部１１４に供給する。 In step S 192, the chapter segment extraction unit 111 identifies each chapter of the content supplied from the content storage unit 11 based on the chapter point data from the division unit 71. Then, the chapter segment extraction unit 111 extracts a chapter segment representing the head part of the chapter from each identified chapter, and supplies it to the effect addition unit 114.

ステップＳ１９３では、特徴量抽出部１１２は、例えば、コンテンツ記憶部１１から供給されるコンテンツから、例えば複数の特徴量時系列データを抽出し、特徴ピークセグメント抽出部１１３に供給する。 In step S 193, the feature quantity extraction unit 112 extracts, for example, a plurality of feature quantity time-series data from the content supplied from the content storage unit 11, and supplies it to the feature peak segment extraction unit 113.

なお、特徴量抽出部１１２は、スムージングフィルタ（平滑化フィルタ）等を用いて、抽出した特徴量時系列データを平滑化することにより、特徴量時系列データに生じているノイズを除去した上で、特徴ピークセグメント抽出部１１３に供給するようにしてもよい。 Note that the feature amount extraction unit 112 smooths the extracted feature amount time-series data using a smoothing filter (smoothing filter) or the like to remove noise generated in the feature amount time-series data. The feature peak segment extraction unit 113 may be supplied.

ステップＳ１９４では、特徴ピークセグメント抽出部１１３は、分割部７１からのチャプタポイントデータに基づいて、コンテンツ記憶部１１から特徴量抽出部１１２を介して供給されるコンテンツの各チャプタを識別する。 In step S194, the feature peak segment extraction unit 113 identifies each chapter of the content supplied from the content storage unit 11 via the feature amount extraction unit 112 based on the chapter point data from the division unit 71.

そして、特徴ピークセグメント抽出部１１３は、特徴量抽出部１１２から供給される複数の特徴量時系列データに基づいて、識別した各チャプタから、特徴ピークセグメントを抽出して、エフェクト追加部１１４に供給する。 The feature peak segment extraction unit 113 extracts feature peak segments from the identified chapters based on the plurality of feature amount time-series data supplied from the feature amount extraction unit 112, and supplies the extracted feature peak segments to the effect addition unit 114. To do.

ステップＳ１９５では、エフェクト追加部１１４は、例えば、図１９に示されるようにして抽出したチャプタセグメント及び特徴ピークセグメントを、時系列につなぎ合わせることにより、ダイジェストを生成する。 In step S195, the effect adding unit 114 generates a digest by, for example, connecting the chapter segments and feature peak segments extracted as shown in FIG. 19 in time series.

そして、エフェクト追加部１１４は、生成したダイジェストにBGM(background music)等を付加し、コンテンツ記憶部１１に供給して記憶させる。以上で、図２５のダイジェスト生成処理は終了される。 Then, the effect adding unit 114 adds BGM (background music) or the like to the generated digest, and supplies it to the content storage unit 11 for storage. Thus, the digest generation process of FIG. 25 is completed.

以上説明したように、ダイジェスト生成処理によれば、チャプタセグメント抽出部１１１は、各チャプタから、チャプタセグメントを抽出するようにした。そして、エフェクト追加部１１４は、抽出されたチャプタセグメントを少なくとも有するダイジェストを生成するようにした。 As described above, according to the digest generation process, the chapter segment extraction unit 111 extracts chapter segments from each chapter. The effect adding unit 114 generates a digest having at least the extracted chapter segment.

このため、ユーザは、例えば、ダイジェストを再生することにより、コンテンツの各チャプタの先頭部分であるチャプタセグメントを視聴できるようになるので、コンテンツの大まかな内容（あらすじ）を容易に把握することが可能となる。 For this reason, for example, the user can view the chapter segment that is the head of each chapter of the content by playing the digest, for example, so that the user can easily grasp the rough content (summary) of the content. It becomes.

また、ダイジェスト生成処理によれば、特徴ピークセグメント抽出部１１３は、例えば、複数の特徴量時系列データに基づいて、特徴ピークセグメントとして抽出するようにしている。 Further, according to the digest generation process, the feature peak segment extraction unit 113 extracts, for example, as a feature peak segment based on a plurality of feature amount time-series data.

このため、ダイジェストの生成対象とされたコンテンツにおいて、例えば、山場となるような場面を、特徴ピークセグメントとして含むダイジェストを生成することが可能となる。 For this reason, it is possible to generate a digest that includes, for example, a scene that is a mountainous area as a feature peak segment in the content that is the digest generation target.

ここで、特徴ピークセグメントとして、例えば、音声が大となっている場面、ズームインやズームアウトが行われている場面、人間の顔の割合が多くなっている場面などが抽出される。 Here, as the feature peak segment, for example, a scene in which the voice is loud, a scene in which zoom-in or zoom-out is performed, a scene in which the ratio of human faces is increased, and the like are extracted.

また、例えば、エフェクト追加部１１４は、BGM等のエフェクトを付加したダイジェストを生成するようにした。このため、ダイジェスト生成処理によれば、コンテンツの内容をより容易に理解し易いダイジェストが生成されることとなる。 For example, the effect adding unit 114 generates a digest to which an effect such as BGM is added. For this reason, according to the digest generation process, a digest that makes it easier to understand the content is generated.

さらに、エフェクト追加部１１４は、BGMを混合する際の重み付けを、緩やかに切り替えるようにしたので、重み付けの切替わり時に、BGMの音声や、ダイジェスト本来の音声が急に大きくなる事態を防止することが可能となる。 Furthermore, since the effect adding unit 114 gradually switches the weighting when mixing BGM, the situation where the sound of the BGM or the original sound of the digest suddenly increases when the weighting is switched is prevented. Is possible.

ところで、ユーザにおいては、コンテンツ記憶部１１に記憶されているコンテンツを再生する際、所望の再生位置からコンテンツを容易に再生できることが望ましい。 By the way, it is desirable that the user can easily reproduce the content from a desired reproduction position when reproducing the content stored in the content storage unit 11.

次に、図２６乃至図４１を参照して、ユーザが、所望の再生位置を容易に検索できるようにした表示画面を表示させるレコーダ１３１を説明する。 Next, a recorder 131 that displays a display screen that allows a user to easily search for a desired reproduction position will be described with reference to FIGS. 26 to 41.

＜３．第３の実施の形態＞
[レコーダ１３１の構成例]
図２６は、第３の実施の形態であるレコーダ１３１の構成例を示している。 <3. Third Embodiment>
[Configuration Example of Recorder 131]
FIG. 26 shows a configuration example of the recorder 131 according to the third embodiment.

なお、図２６のレコーダ１３１では、第１の実施の形態であるレコーダ１（図１）と同様に構成される部分について同一の符号を付すようにしているので、それらの説明は、以下、適宜省略する。 In the recorder 131 of FIG. 26, the same reference numerals are given to the same components as those of the recorder 1 (FIG. 1) according to the first embodiment. Omitted.

すなわち、レコーダ１３１において、図１の分割部１５に代えて分割部１５１が設けられているとともに、新たに提示部１５２が設けられている他は、図１のレコーダ１と同様に構成される。 That is, the recorder 131 is configured in the same manner as the recorder 1 in FIG. 1 except that a dividing unit 151 is provided instead of the dividing unit 15 in FIG. 1 and a presentation unit 152 is newly provided.

また、レコーダ１３１には、画像を表示する表示部１３２が接続されている。さらに、レコーダ１３１は、図１７のダイジェスト生成部７２を省略しているが、図１７の場合と同様に、ダイジェスト生成部７２を設けるようにしてもよい。 The recorder 131 is connected to a display unit 132 that displays an image. Further, the recorder 131 omits the digest generation unit 72 of FIG. 17, but the digest generation unit 72 may be provided as in the case of FIG. 17.

分割部１５１は、図１の分割部１５と同様の処理を行う。また、分割部１５１は、図１７の分割部７１と同様にして、チャプタポイントデータ（チャプタID）を生成し、提示部１５２に供給する。 The dividing unit 151 performs the same processing as the dividing unit 15 in FIG. Further, the dividing unit 151 generates chapter point data (chapter ID) in the same manner as the dividing unit 71 in FIG.

さらに、分割部１５１は、シンボル列生成部１４から供給されるシンボル列を構成する各シンボルを、対応する、コンテンツを構成する各フレームに対応付けて、提示部１５２に供給する。 Furthermore, the dividing unit 151 supplies each symbol constituting the symbol string supplied from the symbol string generating unit 14 to the presentation unit 152 in association with each corresponding frame constituting the content.

また、分割部１５１は、コンテンツ記憶部１１から読み出したコンテンツを、提示部１５２に供給する。 In addition, the dividing unit 151 supplies the content read from the content storage unit 11 to the presentation unit 152.

提示部１５２は、分割部１５１からのチャプタポイントデータに基づいて、同じく分割部１５１から供給されるコンテンツの各チャプタを、行状に配置するように、表示部１３２に表示させる。 Based on the chapter point data from the dividing unit 151, the presentation unit 152 causes the display unit 132 to display the chapters of the content supplied from the dividing unit 151 in a row.

すなわち、例えば、提示部１５２は、操作部１７を用いたユーザの指定操作に応じて変化する総分割数Dのチャプタを、行状に配置するように、表示部１３２に表示させる。 That is, for example, the presentation unit 152 causes the display unit 132 to display chapters of the total number of divisions D that change according to the user's designation operation using the operation unit 17 so as to be arranged in rows.

具体的には、例えば、分割部１５１は、ユーザの指定操作により、総分割数Dが変化したことに対応して、変化後の総分割数Dに対応する新たなチャプタポイントデータを生成し、提示部１５２に供給する。 Specifically, for example, the dividing unit 151 generates new chapter point data corresponding to the changed total number of divisions D in response to the change in the total number of divisions D by the user's designation operation. This is supplied to the presentation unit 152.

提示部１５２は、分割部１５１から供給される新たなチャプタポイントデータに基づいて、ユーザの指定操作により指定された総分割数Dのチャプタを表示部１３２に表示させる。 The presentation unit 152 causes the display unit 132 to display the chapters having the total number of divisions D specified by the user's specifying operation based on the new chapter point data supplied from the division unit 151.

また、提示部１５２は、後述の図３９に示されるように、分割部１５１からのシンボルを用いて、ユーザにより選択されたフレームと同じシンボルを有するフレームをタイル状に表示させる。 In addition, as illustrated in FIG. 39 described later, the presentation unit 152 displays a frame having the same symbol as the frame selected by the user in a tile shape using the symbols from the division unit 151.

次に、図２７は、ユーザの指定操作により、総分割数Dが変化することに応じて、対応するチャプタポイントデータが変化する様子の一例を示している。 Next, FIG. 27 shows an example of how the corresponding chapter point data changes in accordance with the change of the total number of divisions D by the user's designation operation.

図２７のＡには、総分割数Dと、総分割数Dに対応するチャプタポイントデータとの組合せの一例が示されている。 FIG. 27A shows an example of a combination of the total number of divisions D and chapter point data corresponding to the total number of divisions D.

また、図２７のＢには、コンテンツの時間軸上に配置されたチャプタポイントの一例が示されている。ここで、チャプタポイントとは、チャプタを構成する各フレームのうち、先頭のフレームが配置される位置を表す。 FIG. 27B shows an example of chapter points arranged on the content time axis. Here, the chapter point represents the position where the first frame is arranged among the frames constituting the chapter.

図２７のＡに示されるように、総分割数D=2のとき、フレーム番号0のフレームの他、フレーム番号720のフレームが、チャプタポイントとされる。 As shown in FIG. 27A, when the total division number D = 2, in addition to the frame with frame number 0, the frame with frame number 720 is set as the chapter point.

総分割数D=2のとき、コンテンツは、図２７のＢの1行目に示されるように、フレーム番号0のフレームを先頭とするチャプタ、及びフレーム番号720のフレームを先頭とするチャプタに分割されていることとなる。 When the total number of divisions D = 2, as shown in the first row of B in FIG. 27, the content is divided into chapters starting with the frame number 0 and chapters starting with the frame number 720. Will be.

なお、フレーム番号0のフレームは、必ず、チャプタポイントとされるため、図２７のＡ及びＢでは、フレーム番号0の図示を省略している。 Since the frame with frame number 0 is always a chapter point, the illustration of frame number 0 is omitted in FIGS. 27A and 27B.

そして、総分割数D=2から総分割数D=3とされるとき、フレーム番号300のフレームが、新たにチャプタポイントとされる。 When the total division number D = 2 to the total division number D = 3, the frame with the frame number 300 is newly set as a chapter point.

総分割数D=3のとき、コンテンツは、図２７のＢの２行目に示されるように、フレーム番号0のフレームを先頭とするチャプタ、フレーム番号300のフレームを先頭とするチャプタ、及びフレーム番号720のフレームを先頭とするチャプタに分割されていることとなる。 When the total number of divisions D = 3, as shown in the second row of B in FIG. 27, the content includes a chapter that starts with a frame with frame number 0, a chapter that starts with a frame with frame number 300, and a frame. That is, the frame is divided into chapters starting with the frame of the number 720.

また、総分割数D=3から総分割数D=4とされるとき、フレーム番号1431のフレームが、新たにチャプタポイントとされる。 When the total number of divisions D = 3 to total number of divisions D = 4, the frame with frame number 1431 is newly set as a chapter point.

総分割数D=4のとき、コンテンツは、図２７のＢの３行目に示されるように、フレーム番号0のフレームを先頭とするチャプタ、フレーム番号300のフレームを先頭とするチャプタ、フレーム番号720のフレームを先頭とするチャプタ、及びフレーム番号1431のフレームを先頭とするチャプタに分割されていることとなる。 When the total number of divisions D = 4, as shown in the third row of B in FIG. 27, the content is a chapter starting from the frame with frame number 0, a chapter starting from the frame with frame number 300, and the frame number. This is divided into a chapter starting from the 720th frame and a chapter starting from the frame of the frame number 1431.

さらに、総分割数D=4から総分割数D=5とされるとき、フレーム番号1115のフレームが、新たにチャプタポイントとされる。 Further, when the total number of divisions D = 4 to the total number of divisions D = 5, the frame with frame number 1115 is newly set as a chapter point.

総分割数D=5のとき、コンテンツは、図２７のＢの４行目に示されるように、フレーム番号0のフレームを先頭とするチャプタ、フレーム番号300のフレームを先頭とするチャプタ、フレーム番号720のフレームを先頭とするチャプタ、フレーム番号1115のフレームを先頭とするチャプタ、及びフレーム番号1431のフレームを先頭とするチャプタに分割されていることとなる。 When the total number of divisions D = 5, as shown in the fourth line of B of FIG. 27, the content is a chapter starting from the frame with frame number 0, a chapter starting from the frame with frame number 300, and the frame number. This is divided into a chapter starting at the frame 720, a chapter starting at the frame number 1115, and a chapter starting at the frame number 1431.

次に、図２８乃至図３０を参照して、提示部１５２が、表示部１３２に表示させるための表示用データを生成する処理を説明する。なお、図２８乃至図３０では、総分割数D=5であるときの表示用データを、提示部１５２が生成する様子を説明する。 Next, a process in which the presentation unit 152 generates display data to be displayed on the display unit 132 will be described with reference to FIGS. FIGS. 28 to 30 illustrate how the presentation unit 152 generates display data when the total number of divisions D = 5.

図２８は、チャプタポイントとされたフレームの一例を示している。 FIG. 28 shows an example of a frame that is a chapter point.

なお、図２８において、矩形はフレームを表しており、矩形内に記載された番号は、フレーム番号を表す。 In FIG. 28, a rectangle represents a frame, and a number written in the rectangle represents a frame number.

提示部１５２は、分割部１５１からのチャプタポイントデータに基づいて、分割部１５１から供給されるコンテンツから、チャプタポイントとされたフレーム番号0,300,720,1115,1431の各フレームを抽出する。 The presentation unit 152 extracts each frame of frame numbers 0, 300, 720, 1115, and 1431 determined as chapter points from the content supplied from the division unit 151 based on the chapter point data from the division unit 151.

なお、いまの場合、チャプタポイントデータは、総分割数D=5に対応するものであり、フレーム番号0,300,720,1115,1431の各フレームが、チャプタポイントとされているものとする。 In this case, it is assumed that the chapter point data corresponds to the total number of divisions D = 5, and each frame of frame numbers 0, 300, 720, 1115, and 1431 is a chapter point.

提示部１５２は、抽出した各フレームを縮小してサムネイル画像とし、フレーム番号0,300,720,1115,1431の順序で、図２８に示されるように、表示部１３２の表示画面に、上から下方向に表示させる。 The presentation unit 152 reduces each extracted frame to a thumbnail image, and displays the thumbnail images in the order of frame numbers 0, 300, 720, 1115, and 1431 on the display screen of the display unit 132 from top to bottom as shown in FIG. Let

そして、提示部１５２は、表示部１３２の表示画面に、左から右方向に、例えば50フレームの間隔で、チャプタを構成するフレームを、サムネイル画像として表示させる。 Then, the presentation unit 152 displays the frames constituting the chapter as thumbnail images on the display screen of the display unit 132 from the left to the right, for example, at intervals of 50 frames.

次に、図２９は、チャプタポイントとされたフレームの右方向に、50フレームの間隔で、サムネイル画像を表示させるときの一例を示している。 Next, FIG. 29 shows an example in which thumbnail images are displayed at intervals of 50 frames in the right direction of the frame set as the chapter point.

提示部１５２は、分割部１５１からのチャプタポイントデータに基づいて、分割部１５１から供給されるコンテンツから、チャプタポイントとされたフレーム番号0のフレームの他、フレーム番号50,100,150,200,250の各フレームを抽出する。 Based on the chapter point data from the dividing unit 151, the presenting unit 152 extracts each frame of frame numbers 50, 100, 150, 200, and 250 from the content supplied from the dividing unit 151, in addition to the frame with frame number 0 set as the chapter point.

そして、提示部１５２は、抽出した各フレームを縮小してサムネイル画像とし、フレーム番号50,100,150,200,250の順序で、フレーム番号0のフレームから右方向に表示させる。 Then, the presentation unit 152 reduces each extracted frame to a thumbnail image, and displays the thumbnail image in the right direction from the frame of frame number 0 in the order of frame numbers 50, 100, 150, 200, and 250.

また、提示部１５２は、フレーム番号300のフレームから右方向に、フレーム番号350,400,450,500,550,600,650,700の各フレームを、サムネイル画像として、フレーム番号の小さい順に表示させる。 In addition, the presentation unit 152 displays the frames of the frame numbers 350, 400, 450, 500, 550, 600, 650, and 700 in the right direction from the frame of the frame number 300 as thumbnail images in ascending order of the frame numbers.

さらに、提示部１５２は、同様にして、フレーム番号720のフレームから右方向に、フレーム番号770,820,870,920,970,1020,1070の各フレームを、サムネイル画像として、フレーム番号の小さい順に表示させる。また提示部１５２は、フレーム番号1115のフレームから右方向に、フレーム番号1165,1215,1265,1315,1365,1415の各フレームを、サムネイル画像として、フレーム番号の小さい順に表示させる。さらに提示部１５２は、フレーム番号1431のフレームから右方向に、フレーム番号1481,1531,1581,1631,…の各フレームを、サムネイル画像として、フレーム番号の小さい順に表示させる。 Further, the presentation unit 152 similarly displays each frame of frame numbers 770, 820, 870, 920, 970, 1020, and 1070 as thumbnail images in order from the smallest frame number in the right direction from the frame of frame number 720. Further, the presentation unit 152 displays the frames with the frame numbers 1165, 1215, 1265, 1315, 1365, and 1415 as thumbnail images in the order from the smallest frame number in the right direction from the frame with the frame number 1115. Further, the presentation unit 152 displays the frames with the frame numbers 1481, 1531, 1581, 1631,... In the right direction from the frame with the frame number 1431 as thumbnail images in ascending order of the frame numbers.

これにより、提示部１５２は、図３０に示されるように、各チャプタ毎に、チャプタのサムネイル画像を行状に配置した表示を、表示部１３２に表示させることができる。 Thereby, as shown in FIG. 30, the presentation unit 152 can cause the display unit 132 to display a display in which chapter thumbnail images are arranged in rows for each chapter.

なお、提示部１５２は、チャプタのサムネイル画像を行状に配置する他、そのサムネイル画像に重ねるようにして、他のサムネイル画像を配置するようにしてもよい。 In addition to arranging the thumbnail images of chapters in a row, the presentation unit 152 may arrange other thumbnail images so as to overlap the thumbnail images.

具体的には、例えば、提示部１５２は、フレーム番号300のフレームをサムネイル画像として表示し、そのフレームに隠れるように、フレーム番号301乃至349の各フレームのサムネイル画像を配置するようにしてもよい。 Specifically, for example, the presentation unit 152 may display the frame with the frame number 300 as a thumbnail image and arrange the thumbnail images of the frames with the frame numbers 301 to 349 so as to be hidden by the frame. .

次に、図３０は、表示部１３２の表示画面の一例を示している。 Next, FIG. 30 shows an example of the display screen of the display unit 132.

この表示画面には、図３０に示されるように、各チャプタのサムネイル画像が、チャプタ毎に設けられたチャプタ表示領域（チャプタ番号1,2,3,4,5がそれぞれ付加された横長の矩形）に、行状に表示される。 In this display screen, as shown in FIG. 30, the thumbnail images of each chapter are displayed in chapter display areas (chapter numbers 1, 2, 3, 4, and 5 respectively) provided for each chapter. ) Are displayed in rows.

すなわち、１行目には、コンテンツの先頭から１番目のチャプタ１のサムネイル画像として、フレーム番号0,50,100,150,200,…の各フレームが、その順序で、図中左から右方向に配置される。 That is, in the first row, frames of frame numbers 0, 50, 100, 150, 200,... Are arranged in the order from the left to the right in the figure as thumbnail images of the first chapter 1 from the top of the content.

つまり、表示部１３２は、サムネイル画像を、チャプタ１の各シーンを代表する代表画像として表示する。 That is, the display unit 132 displays the thumbnail image as a representative image that represents each scene of the chapter 1.

具体的には、例えば、表示部１３２は、フレーム番号0のフレームに対応するサムネイル画像を、フレーム番号0乃至49の各フレームから構成されるシーンを代表する代表画像として表示する。このことは、図３０に示されるチャプタ２乃至５についても同様である。 Specifically, for example, the display unit 132 displays a thumbnail image corresponding to the frame with frame number 0 as a representative image representing a scene composed of the frames with frame numbers 0 to 49. The same applies to chapters 2 to 5 shown in FIG.

また、２行目には、コンテンツの先頭から２番目のチャプタ２のサムネイル画像として、フレーム番号300,350,400,450,500,…の各フレームが、その順序で、図中左から右方向に配置される。 In the second row, frames of frame numbers 300, 350, 400, 450, 500,... Are arranged in the order from the left to the right in the figure as thumbnail images of the second chapter 2 from the top of the content.

さらに、３行目には、コンテンツの先頭から３番目のチャプタ３のサムネイル画像として、フレーム番号720,770,820,870,920,…の各フレームが、その順序で、図中左から右方向に配置される。また４行目には、コンテンツの先頭から４番目のチャプタ４のサムネイル画像として、フレーム番号1115,1165,1215,1265,1315,…の各フレームが、その順序で、図中左から右方向に配置される。 Further, in the third row, frames of frame numbers 720, 770, 820, 870, 920,... Are arranged in the order from the left to the right in the figure as thumbnail images of the third chapter 3 from the top of the content. Also, in the fourth line, as thumbnail images of the fourth chapter 4 from the beginning of the content, frames with frame numbers 1115, 1165, 1215, 1265, 1315,... Are arranged in the order from left to right in the figure. Be placed.

また、５行目には、コンテンツの先頭から５番目のチャプタ５のサムネイル画像として、フレーム番号1431,1481,1531,1581,1631,…の各フレームが、その順序で、図中左から右方向に配置される。 Also, in the fifth line, as thumbnail images of the fifth chapter 5 from the top of the content, the frames with frame numbers 1431, 1481, 1531, 1581, 1631,... Placed in.

なお、表示部１３２の表示画面には、図３０に示されるように、スライダ１７１も表示させることができる。このスライダ１７１は、総分割数Dを設定する際に、図中左右方向に移動（スライド）されるものであり、スライダ１７１の位置に応じて、総分割数Dを変更させることができる。 In addition, as shown in FIG. 30, a slider 171 can also be displayed on the display screen of the display unit 132. The slider 171 is moved (slid) in the left-right direction in the figure when setting the total number of divisions D, and the total number of divisions D can be changed according to the position of the slider 171.

すなわち、例えば、スライダ１７１が図中左方向に移動するほど、総分割数Dは減少し、スライダ１７１が図中右方向に移動するほど、総分割数Dは増加する。 That is, for example, the total division number D decreases as the slider 171 moves in the left direction in the figure, and the total division number D increases as the slider 171 moves in the right direction in the figure.

したがって、例えば、ユーザが、操作部１７を用いて、図３０に示される表示画面のスライダ１７１を、図中左方向に移動させる操作を行うと、その操作に対応して、表示部１３２には、図３１に示されるような表示画面が表示される。 Therefore, for example, when the user performs an operation of moving the slider 171 of the display screen shown in FIG. 30 in the left direction in the drawing using the operation unit 17, the display unit 132 corresponds to the operation. A display screen as shown in FIG. 31 is displayed.

なお、分割部１５１は、スライダ１７１を用いたユーザのスライド操作に応じて、そのスライド操作に対応する総分割数Dのチャプタポイントデータを生成し、生成したチャプタポイントデータを、提示部１５２に供給する。 The dividing unit 151 generates chapter point data of the total number of divisions D corresponding to the slide operation in response to the user's slide operation using the slider 171, and supplies the generated chapter point data to the presentation unit 152. To do.

提示部１５２は、分割部１５１からのチャプタポイントデータに基づいて、図３１に示されるような表示画面を生成して、表示部１３２に表示させる。 The presentation unit 152 generates a display screen as shown in FIG. 31 based on the chapter point data from the dividing unit 151 and causes the display unit 132 to display the display screen.

また、分割部１５１は、ユーザのスライド操作が行われる毎に、そのスライド操作に対応する総分割数Dのチャプタポイントデータを生成するようにしてもよいし、複数の異なる総分割数D毎に、チャプタポイントデータを予め生成しておいてもよい。 Further, each time the user performs a slide operation, the dividing unit 151 may generate chapter point data of the total number of divisions D corresponding to the slide operation, or for each of a plurality of different total number of divisions D. The chapter point data may be generated in advance.

分割部１５１は、複数の異なる総分割数D毎のチャプタポイントデータを、予め生成した場合、複数の異なる総分割数D毎のチャプタポイントデータを、提示部１５２に供給する。 The division unit 151 supplies chapter point data for a plurality of different total division numbers D to the presentation unit 152 when chapter point data for each of a plurality of different total division numbers D is generated in advance.

この場合、提示部１５２は、分割部１５１から供給される、複数の異なる総分割数D毎のチャプタポイントデータのうち、スライダ１７１を用いたユーザのスライド操作に対応する総分割数Dのチャプタポイントを選択する。そして、提示部１５２は、選択したチャプタポイントデータに基づいて、表示部１３２に表示させる表示画面を生成し、表示部１３２に供給して表示させる。 In this case, the presentation unit 152 includes chapter points of the total number of divisions D corresponding to the user's slide operation using the slider 171 among the chapter point data for each of the plurality of different total division numbers D supplied from the division unit 151. Select. Then, the presentation unit 152 generates a display screen to be displayed on the display unit 132 based on the selected chapter point data, and supplies the display screen to the display unit 132 for display.

次に、図３１は、スライダ１７１を、総分割数Dが減少する方向に移動させたときに、表示部１３２に表示される表示画面の一例を示している。 Next, FIG. 31 shows an example of a display screen displayed on the display unit 132 when the slider 171 is moved in the direction in which the total division number D decreases.

図３１に示される表示画面は、図３０に示された表示画面と比較して、チャプタ数（総分割数D）が、５個から３個に減少していることがわかる。 The display screen shown in FIG. 31 shows that the number of chapters (total number of divisions D) is reduced from 5 to 3 compared to the display screen shown in FIG.

その他、例えば、提示部１５２は、図２０の特徴量抽出部１１２と同様にして、分割部１５１からのコンテンツから、特徴量時系列データを抽出するようにしてもよい。そして、提示部１５２は、抽出した特徴量時系列データの強度（大きさ）に応じて、表示部１３２に表示されるサムネイル画像を修飾するようにしてもよい。 In addition, for example, the presentation unit 152 may extract feature amount time-series data from the content from the dividing unit 151 in the same manner as the feature amount extraction unit 112 in FIG. Then, the presentation unit 152 may modify the thumbnail image displayed on the display unit 132 according to the intensity (size) of the extracted feature amount time-series data.

次に、図３２は、特徴量時系列データの強度に応じて修飾されたサムネイル画像が表示される、表示部１３２の表示画面の他の一例を示している。 Next, FIG. 32 shows another example of the display screen of the display unit 132 on which thumbnail images modified according to the strength of the feature amount time-series data are displayed.

なお、図３２に示されるサムネイル画像には、適宜、そのサムネイル画像に対応するフレームを含むシーン（例えば、サムネイル画像に対応するフレームを先頭とする５０フレーム）の特徴に応じて、帯表示が付加される。 Note that a band display is appropriately added to the thumbnail image shown in FIG. 32 according to the characteristics of a scene including a frame corresponding to the thumbnail image (for example, 50 frames starting from the frame corresponding to the thumbnail image). Is done.

帯表示１９１a乃至１９１fは、それぞれ、顔領域の割合が比較的高いシーンを代表するサムネイル画像に付加される。 The band displays 191a to 191f are respectively added to thumbnail images representing scenes with a relatively high face area ratio.

いま、フレーム番号100,150,350,400,450,1581の各サムネイル画像に、帯表示１９１a乃至１９１fが付加されている。 Now, band displays 191a to 191f are added to the thumbnail images of frame numbers 100, 150, 350, 400, 450, and 1581, respectively.

また、帯表示１９２a乃至１９２dは、それぞれ、顔領域の割合が比較的高いとともに、音声パワーが比較的大きいシーンを代表するサムネイル画像に付加される。 Further, the band displays 192a to 192d are added to thumbnail images representing scenes having a relatively high face area ratio and relatively high audio power.

さらに、帯表示１９３a及び１９３bは、それぞれ、音声パワーが比較的大きいシーンを代表するサムネイル画像に付加される。 Furthermore, the band displays 193a and 193b are respectively added to thumbnail images representing scenes with relatively high audio power.

なお、帯表示１９１a乃至１９１fは、例えば、シーンを構成する各フレームのうち、顔領域の割合が所定の閾値以上となるフレームの枚数が、予め決められた枚数閾値以上となる場合に、そのシーンを代表するサムネイル画像に付加される。 The band indications 191a to 191f are displayed when, for example, the number of frames in which the ratio of the face area is equal to or greater than a predetermined threshold among the frames constituting the scene is equal to or greater than a predetermined number threshold. Is added to the thumbnail image representing the.

その他、例えば、帯表示１９１a乃至１９１fにおいて、例えば、シーンを構成する各フレームのうち、顔領域の割合が所定の閾値以上となるフレームの枚数が多くなるほどに、帯表示１９１a乃至１９１fの色を濃くするようにしてもよい。 In addition, for example, in the band displays 191a to 191f, for example, the color of the band displays 191a to 191f becomes darker as the number of frames in which the ratio of the face area is equal to or greater than a predetermined threshold among the frames constituting the scene increases. You may make it do.

これらのことは、帯表示１９２a乃至１９２d、並びに帯表示１９３a及び１９３bについても同様である。 The same applies to the band displays 192a to 192d and the band displays 193a and 193b.

また図３２では、サムネイル画像に帯表示を付加するようにしたが、その他、例えば、帯表示１９１a乃至１９１fに代えて、人間の顔を付加するようにしてもよい。すなわち、シーンの特徴を表すようなものであれば、どのような表示方法で表示するようにしてもよい。 In FIG. 32, the band display is added to the thumbnail image. However, for example, a human face may be added instead of the band displays 191a to 191f. That is, any display method may be used as long as it represents the characteristics of the scene.

なお、図３２は、各サムネイル画像を識別するために、フレーム番号を付すようにしているが、実際には、表示部１３２の表示画面は、例えば図３３に示されるような表示とされる。 In FIG. 32, a frame number is assigned to identify each thumbnail image, but in reality, the display screen of the display unit 132 is displayed as shown in FIG. 33, for example.

[提示部１５２の詳細]
次に、図３４は、図２６の提示部１５２の詳細な構成例を示している。 [Details of presentation unit 152]
Next, FIG. 34 shows a detailed configuration example of the presentation unit 152 of FIG.

提示部１５２は、特徴量抽出部２１１、表示データ生成部２１２、及び表示制御部２１３から構成される。 The presentation unit 152 includes a feature amount extraction unit 211, a display data generation unit 212, and a display control unit 213.

特徴量抽出部２１１には、分割部１５１からコンテンツが供給される。特徴量抽出部２１１は、図２０の特徴量抽出部１１２と同様にして、特徴量時系列データを抽出して、表示データ生成部２１２に供給する。 Content is supplied from the dividing unit 151 to the feature amount extracting unit 211. The feature quantity extraction unit 211 extracts feature quantity time-series data and supplies it to the display data generation unit 212 in the same manner as the feature quantity extraction unit 112 in FIG.

すなわち、例えば、特徴量抽出部２１１は、分割部１５１からのコンテンツから、顔領域時系列データ、音声パワー時系列データ、ズームイン強度時系列データ、又はズームアウト強度時系列データの少なくとも１つを抽出し、表示データ生成部２１２に供給する。 That is, for example, the feature amount extraction unit 211 extracts at least one of face area time-series data, audio power time-series data, zoom-in intensity time-series data, or zoom-out intensity time-series data from the content from the dividing unit 151. And supplied to the display data generation unit 212.

表示データ生成部２１２には、特徴量抽出部２１１からの特徴量時系列データの他、分割部１５１からチャプタポイントデータが供給される。 In addition to the feature amount time-series data from the feature amount extraction unit 211, chapter point data is supplied from the division unit 151 to the display data generation unit 212.

表示データ生成部２１２は、特徴量抽出部２１１からの特徴量時系列データと、分割部１５１からのチャプタポイントデータに基づいて、表示部１３２の表示画面に、図３１乃至図３３に示したような表示をさせるための表示データを生成し、表示制御部２１３に供給する。 Based on the feature time series data from the feature amount extraction unit 211 and the chapter point data from the division unit 151, the display data generation unit 212 displays the display screen of the display unit 132 as shown in FIGS. 31 to 33. Display data for making a correct display is generated and supplied to the display control unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面に、図３１乃至図３３に示したような表示をさせる。 The display control unit 213 causes the display screen of the display unit 132 to display as shown in FIGS. 31 to 33 based on the display data from the display data generation unit 212.

なお、表示データ生成部２１２は、ユーザの操作に応じた表示データを生成し、表示制御部２１３に供給する。 Note that the display data generation unit 212 generates display data corresponding to a user operation and supplies the display data to the display control unit 213.

そして、表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、ユーザの操作に応じて変化させる。 Then, the display control unit 213 changes the display screen of the display unit 132 according to a user operation based on the display data from the display data generation unit 212.

すなわち、表示制御部２１３が、コンテンツのチャプタの表示の制御を行うときの表示モードとしては、レイヤ０モード、レイヤ１モード、及びレイヤ２モードの３つのモードがある。 In other words, there are three display modes when the display control unit 213 controls the display of the chapters of the content: a layer 0 mode, a layer 1 mode, and a layer 2 mode.

レイヤ０モードでは、表示部１３２は、図３１乃至図３３に示したような表示とされる。 In the layer 0 mode, the display unit 132 displays as shown in FIGS.

次に、図３５は、レイヤ０モードにおいて、ユーザが、表示部１３２の表示画面上の位置を指示したときの様子の一例を示している。 Next, FIG. 35 illustrates an example of a state when the user indicates the position on the display screen of the display unit 132 in the layer 0 mode.

ここで、以下では、説明を分かり易くするために、操作部１７として、例えばマウスが採用されていることとする。ユーザは、マウスとしての操作部１７を用いて、例えば、シングルクリックやダブルクリックを行うことができる。なお、操作部１７は、マウスに限定されない。 Here, in order to make the explanation easy to understand, it is assumed that, for example, a mouse is employed as the operation unit 17. The user can perform a single click or a double click, for example, using the operation unit 17 as a mouse. The operation unit 17 is not limited to a mouse.

レイヤ０モードでは、ユーザが、マウスとしての操作部１７を操作して、ポインタ（カーソル）２３１を、チャプタ４の、図３５の左から５番目のサムネイル画像上に移動させたとき、表示制御部２１３は、表示部１３２の表示画面を、図３５に示されるような表示とする。 In the layer 0 mode, when the user operates the operation unit 17 as a mouse to move the pointer (cursor) 231 onto the fifth thumbnail image from the left in FIG. 213 displays the display screen of the display unit 132 as shown in FIG.

すなわち、レイヤ０モードでは、ポインタ２３１により指示されるサムネイル画像２３２が、強調して表示される。図３５の例では、ポインタ２３１により指示されたサムネイル画像２３２が、例えば、黒色の枠で囲まれた状態で、他のサムネイル画像よりも大きく表示されている。 That is, in the layer 0 mode, the thumbnail image 232 designated by the pointer 231 is displayed with emphasis. In the example of FIG. 35, the thumbnail image 232 designated by the pointer 231 is displayed larger than the other thumbnail images in a state surrounded by a black frame, for example.

これにより、ユーザは、ポインタ２３１により指示しているサムネイル画像２３２を、容易に把握できる。 Thereby, the user can easily grasp the thumbnail image 232 designated by the pointer 231.

次に、図３６は、レイヤ０モードにおいて、ポインタ２３１でサムネイル画像２３２を指示した状態で、ダブルクリックをしたときの様子の一例を示している。 Next, FIG. 36 illustrates an example of a state in which double-clicking is performed in the state in which the thumbnail image 232 is designated with the pointer 231 in the layer 0 mode.

サムネイル画像２３２が、ポインタ２３１により指示されている状態で、ユーザがダブルクリックをした場合、サムネイル画像２３２に対応するフレームから、コンテンツが再生される。 When the user double-clicks while the thumbnail image 232 is designated by the pointer 231, the content is reproduced from the frame corresponding to the thumbnail image 232.

すなわち、表示制御部２１３は、例えば、図３６に示されるように、表示部１３２の表示画面において、図中左上にウインドウ２３３を配置させる。このウインドウ２３３には、サムネイル画像２３２に対応するフレームから再生されたコンテンツ２３３aが表示される。 That is, for example, as shown in FIG. 36, the display control unit 213 arranges the window 233 on the upper left in the figure on the display screen of the display unit 132. In this window 233, content 233a reproduced from a frame corresponding to the thumbnail image 232 is displayed.

また、ウインドウ２３３において、コンテンツ２３３aの上部には、図中左から右方向に、時計マーク２３３b、タイムラインバー２３３c、再生位置表示２３３d、及び音量ボタン２３３eが配置される。 In the window 233, a clock mark 233b, a timeline bar 233c, a playback position display 233d, and a volume button 233e are arranged on the upper side of the content 233a from the left to the right in the drawing.

時計マーク２３３bは、コンテンツ２３３aの総再生時間のうち、コンテンツ２３３aが再生されている再生位置（再生時刻）を、時計の針で表示するアイコンである。なお、時計マーク２３３bでは、コンテンツ２３３aの総再生時間が、例えば、時計の針の１周の時間（0分から60分までの1時間）に割り当てられている。 The clock mark 233b is an icon for displaying the playback position (playback time) at which the content 233a is played out of the total playback time of the content 233a with a clock hand. In the clock mark 233b, the total playback time of the content 233a is assigned, for example, to the time of one round of the clock hands (1 hour from 0 minutes to 60 minutes).

タイムラインバー２３３cは、横長のバーであり、時計マーク２３３bと同様に、コンテンツ２３３aの再生位置を表示するものである。なお、タイムラインバー２３３cには、コンテンツ２３３aの総再生時間が、タイムラインバー２３３cの左端から右端までに割り当てられており、コンテンツ２３３aの再生位置に応じた位置に、再生位置表示２３３dが配置される。 The timeline bar 233c is a horizontally long bar and, like the clock mark 233b, displays the playback position of the content 233a. Note that the total playback time of the content 233a is assigned to the timeline bar 233c from the left end to the right end of the timeline bar 233c, and a playback position display 233d is arranged at a position corresponding to the playback position of the content 233a. The

なお、図３６において、再生位置表示２３３dを、スライダとして移動可能とするように構成することができる。この場合、ユーザは、操作部１７を用いて、再生位置表示２３３dをスライダとして移動させる移動操作を行うことにより、移動後の再生位置表示２３３dの位置から、コンテンツ２３３aを再生させることができる。 In FIG. 36, the playback position display 233d can be configured to be movable as a slider. In this case, the user can reproduce the content 233a from the position of the reproduction position display 233d after the movement by performing a moving operation of moving the reproduction position display 233d as a slider using the operation unit 17.

音量ボタン２３３eは、再生中のコンテンツ２３３aの音量をミュート（消音）する際や、音量を変更する際に操作されるアイコンである。 The volume button 233e is an icon that is operated when the volume of the content 233a being played is muted (muted) or when the volume is changed.

すなわち、例えば、ユーザは、操作部１７を用いて、ポインタ２３１を音量ボタン２３３e上に移動させ、シングルクリックをした場合、再生中のコンテンツ２３３aの音量はミュートされる。 That is, for example, when the user moves the pointer 231 onto the volume button 233e using the operation unit 17 and performs a single click, the volume of the content 233a being reproduced is muted.

また、例えば、ユーザは、操作部１７を用いて、ポインタ２３１を音量ボタン２３３e上に移動させ、ダブルクリックをした場合、再生中のコンテンツ２３３aの音量を変更するためのウインドウが新たに表示される。 For example, when the user moves the pointer 231 onto the volume button 233e using the operation unit 17 and double-clicks the window, a window for changing the volume of the content 233a being reproduced is newly displayed. .

次に、図３７は、レイヤ０モードにおいて、ポインタ２３１でサムネイル画像２３２を指示した状態で、シングルクリックをしたときの様子の一例を示している。 Next, FIG. 37 shows an example of a state when a single click is performed in the state in which the thumbnail image 232 is designated by the pointer 231 in the layer 0 mode.

レイヤ０モードにおいて、ポインタ２３１でサムネイル画像２３２を指示した状態（図３５）で、ユーザがシングルクリックをしたとき、表示制御部２１３は、表示モードを、レイヤ０モードからレイヤ１モードに移行する。 In the layer 0 mode, when the user performs a single click with the pointer 231 indicating the thumbnail image 232 (FIG. 35), the display control unit 213 shifts the display mode from the layer 0 mode to the layer 1 mode.

そして、表示制御部２１３は、例えば、図３７に示されるように、表示部１３２の表示画面において、図中下側にウインドウ２５１を配置させる。このウインドウ２５１には、タイル画像２５１a、時計マーク２５１b、タイムラインバー２５１c、及び再生位置表示２５１dが配置される。 Then, for example, as shown in FIG. 37, the display control unit 213 arranges a window 251 on the lower side in the figure on the display screen of the display unit 132. In this window 251, a tile image 251a, a clock mark 251b, a timeline bar 251c, and a reproduction position display 251d are arranged.

タイル画像２５１aは、サムネイル画像２３２に畳み込まれたサムネイル画像の一覧の画像（サムネイル画像２３２により代表されるシーンのサムネイル画像）を表す。 The tile image 251 a represents an image of a list of thumbnail images convolved with the thumbnail image 232 (thumbnail image of a scene represented by the thumbnail image 232).

なお、例えば、サムネイル画像２３２が、フレーム番号300のフレームに対応するサムネイル画像である場合、サムネイル画像２３２には、図２９に示されるように、フレーム番号301乃至349の各フレームに対応するサムネイル画像が畳み込まれている。 For example, when the thumbnail image 232 is a thumbnail image corresponding to the frame having the frame number 300, the thumbnail image 232 includes thumbnail images corresponding to the frames having the frame numbers 301 to 349 as shown in FIG. Is folded.

また、ウインドウ２５１に、サムネイル画像２３２に畳み込まれたサムネイル画像の一覧の画像全てを、タイル画像２５１aとして表示できない場合、例えば、一部のサムネイル画像が間引かれて表示される。 In addition, when all the images in the list of thumbnail images convolved with the thumbnail image 232 cannot be displayed as the tile image 251a in the window 251, for example, a part of the thumbnail images is thinned and displayed.

その他、例えば、ウインドウ２５１にスクロールバーが表示され、そのスクロールバーを移動させることにより、サムネイル画像２３２に畳み込まれたサムネイル画像の一覧の画像全てを見られるようにしてもよい。 In addition, for example, a scroll bar may be displayed in the window 251, and the scroll bar may be moved so that all the images in the list of thumbnail images convolved with the thumbnail image 232 may be viewed.

時計マーク２５１bは、コンテンツ２３３aの総再生時間のうち、シングルクリックされたサムネイル画像２３２に対応するフレームが再生される再生位置を、時計の針で表示するアイコンであり、図３６の時計マーク２３３bと同様に構成される。 The clock mark 251b is an icon for displaying the playback position at which the frame corresponding to the single-clicked thumbnail image 232 is played out of the total playback time of the content 233a with a clock hand. It is comprised similarly.

タイムラインバー２５１cは、コンテンツ２３３aの総再生時間のうち、シングルクリックされたサムネイル画像２３２に対応するフレームが再生される再生位置を、再生位置表示２５１dで表示するものであり、図３６のタイムラインバー２３３cと同様に構成される。 The timeline bar 251c displays the playback position at which the frame corresponding to the single-clicked thumbnail image 232 is played out of the total playback time of the content 233a as the playback position display 251d. The configuration is the same as that of the bar 233c.

さらに、タイムラインバー２５１cは、タイル画像２５１aを構成するサムネイル画像（サムネイル画像２３２以外）にそれぞれ対応する各フレームの再生位置も、再生位置表示２５１dと同様の再生位置表示を用いて表示する。 Further, the timeline bar 251c also displays the playback position of each frame corresponding to the thumbnail image (other than the thumbnail image 232) constituting the tile image 251a using the playback position display similar to the playback position display 251d.

図３７では、図面が煩雑になるのをさけるため、サムネイル画像２３２の再生位置表示２５１dのみを記載し、他の再生位置表示は記載していない。 In FIG. 37, only the reproduction position display 251d of the thumbnail image 232 is described, and the other reproduction position displays are not described, in order to avoid making the drawing complicated.

また、ユーザは、操作部１７を用いて、タイル画像２５１aを構成する複数のサムネイル画像のうち、所定のサムネイル画像を、ポインタ２３１で指示するマウスオン操作を行うと、ポインタ２３１で指示された所定のサムネイル画像が強調して表示される。 In addition, when the user performs a mouse-on operation for designating a predetermined thumbnail image with a pointer 231 among a plurality of thumbnail images constituting the tile image 251 a using the operation unit 17, a predetermined instructed by the pointer 231 is performed. The thumbnail image is displayed with emphasis.

すなわち、例えば、ユーザが、操作部１７を用いて、タイル画像２５１a内のサムネイル画像２７１を、ポインタ２３１で指示するマウスオン操作を行うと、サムネイル画像２７１を強調したサムネイル画像２７１'が表示される。 That is, for example, when the user performs a mouse-on operation for pointing the thumbnail image 271 in the tile image 251a with the pointer 231 using the operation unit 17, a thumbnail image 271 ′ highlighting the thumbnail image 271 is displayed.

このとき、タイムラインバー２５１cにおいて、サムネイル画像２７１'の再生位置表示は、サムネイル画像２７１'と同様に強調して表示される。すなわち、例えば、サムネイル画像２７１'の再生位置表示は、他の再生位置表示とは異なる色等とされ、強調して表示される。 At this time, in the timeline bar 251c, the playback position display of the thumbnail image 271 ′ is displayed with emphasis in the same manner as the thumbnail image 271 ′. That is, for example, the playback position display of the thumbnail image 271 ′ is displayed in an emphasized manner such as a color different from other playback position displays.

また、タイムラインバー２５１cにおいて、強調して表示された再生位置表示を、スライダとして移動可能とするように構成することができる。 Further, the playback position display highlighted in the timeline bar 251c can be configured to be movable as a slider.

この場合、ユーザは、操作部１７を用いて、強調して表示された再生位置表示をスライダとして移動させる移動操作を行うことにより、例えば、移動後の再生位置表示に対応するサムネイル画像により代表されるシーンを、タイル画像２５１aとして表示させることができる。 In this case, the user performs a moving operation for moving the highlighted reproduction position display as a slider using the operation unit 17, for example, by a thumbnail image corresponding to the reproduction position display after the movement. Can be displayed as a tile image 251a.

なお、サムネイル画像２７１は、強調されたサムネイル画像２７１'を表示する他、図３５を参照して説明したサムネイル画像２３２と同様の方法で、強調して表示させるようにしてもよい。 Note that the thumbnail image 271 may be displayed in an emphasized manner in the same manner as the thumbnail image 232 described with reference to FIG. 35 in addition to displaying the enhanced thumbnail image 271 ′.

ユーザは、強調表示されたサムネイル画像２７１'をポインタ２３１で指示した状態で、操作部１７を用いたダブルクリックを行うと、図３８に示されるように、サムネイル画像２７１'(２７１)に対応するフレームから、コンテンツ２３３aの再生が行われる。 When the user double-clicks using the operation unit 17 in a state where the highlighted thumbnail image 271 ′ is indicated by the pointer 231, as shown in FIG. 38, the user corresponds to the thumbnail image 271 ′ (271). The content 233a is reproduced from the frame.

次に、図３８は、レイヤ１モードにおいて、ポインタ２３１でサムネイル画像２７１'を指示した状態で、ダブルルクリックをしたときの様子の一例を示している。 Next, FIG. 38 shows an example of a state in which double-clicking is performed in a state in which the thumbnail image 271 ′ is designated with the pointer 231 in the layer 1 mode.

レイヤ１モードにおいて、ポインタ２３１でサムネイル画像２７１'を指示した状態（図３７）で、ユーザがダブルクリックをしたとき、表示制御部２１３は、表示モードを、レイヤ１モードからレイヤ０モードに移行する。 In the layer 1 mode, when the user double-clicks with the pointer 231 pointing to the thumbnail image 271 ′ (FIG. 37), the display control unit 213 shifts the display mode from the layer 1 mode to the layer 0 mode. .

そして、表示制御部２１３は、例えば、図３８に示されるように、表示部１３２の表示画面において、図中左上にウインドウ２３３を配置させる。このウインドウ２３３には、サムネイル画像２７１'(２７１)に対応するフレームから再生されたコンテンツ２３３aが表示される。 And the display control part 213 arrange | positions the window 233 on the upper left in the figure on the display screen of the display part 132, for example, as FIG. 38 shows. In this window 233, content 233a reproduced from the frame corresponding to the thumbnail image 271 ′ (271) is displayed.

次に、図３９は、レイヤ１モードにおいて、ポインタ２３１でサムネイル画像２７１'を指示した状態で、シングルクリックをしたときの様子の一例を示している。 Next, FIG. 39 shows an example of a state when a single click is performed in a state in which the thumbnail image 271 ′ is designated with the pointer 231 in the layer 1 mode.

レイヤ１モードにおいて、ポインタ２３１でサムネイル画像２７１'を指示した状態（図３７）で、ユーザがシングルクリックをしたとき、表示制御部２１３は、表示モードを、レイヤ１モードからレイヤ２モードに移行する。 In the layer 1 mode, when the user performs a single click with the pointer 231 pointing to the thumbnail image 271 ′ (FIG. 37), the display control unit 213 shifts the display mode from the layer 1 mode to the layer 2 mode. .

そして、表示制御部２１３は、例えば、図３９に示されるように、表示部１３２の表示画面において、ウインドウ２９１を配置させる。このウインドウ２９１には、タイル画像２９１a、時計マーク２９１b、及びタイムラインバー２９１cが配置される。 And the display control part 213 arrange | positions the window 291 in the display screen of the display part 132, for example, as FIG. 39 shows. In this window 291, a tile image 291a, a clock mark 291b, and a timeline bar 291c are arranged.

タイル画像２９１aは、サムネイル画像２７１'(２７１)に表示された表示内容と同様の表示内容とされたサムネイル画像の一覧を表す。 The tile image 291a represents a list of thumbnail images having display contents similar to the display contents displayed on the thumbnail image 271 ′ (271).

すなわち、タイル画像２９１aは、コンテンツ２３３aを構成する各フレームのうち、サムネイル画像２７１'に対応するフレームのシンボルと同一のシンボルを有するフレームのサムネイル画像の一覧である。 That is, the tile image 291a is a list of thumbnail images of frames having the same symbol as the symbol of the frame corresponding to the thumbnail image 271 ′ among the frames constituting the content 233a.

ここで、表示データ生成部２１２には、分割部１５１からのチャプタポイントデータの他、コンテンツ２３３aと、コンテンツ２３３aのシンボル列が供給される。 Here, in addition to the chapter point data from the dividing unit 151, the content 233a and the symbol string of the content 233a are supplied to the display data generating unit 212.

表示データ生成部２１２は、分割部１５１からのシンボル列に基づいて、サムネイル画像２７１'に対応するフレームのシンボルと同一のシンボルを有するフレームを、分割部１５１からのコンテンツ２３３aから抽出する。 Based on the symbol string from the dividing unit 151, the display data generating unit 212 extracts a frame having the same symbol as the symbol corresponding to the thumbnail image 271 ′ from the content 233a from the dividing unit 151.

そして、表示データ生成部２１２は、抽出した各フレームをそれぞれサムネイル画像とし、それらのサムネイル画像の一覧であるタイル画像２９１aを生成し、生成したタイル画像２９１aを含む表示データを、表示制御部２１３に供給する。 Then, the display data generation unit 212 sets each extracted frame as a thumbnail image, generates a tile image 291a that is a list of the thumbnail images, and displays display data including the generated tile image 291a to the display control unit 213. Supply.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２を制御し、表示部１３２の表示画面に、タイル画像２９１aを含むウインドウ２９１を表示させる。 The display control unit 213 controls the display unit 132 based on the display data from the display data generation unit 212, and displays a window 291 including the tile image 291a on the display screen of the display unit 132.

なお、ウインドウ２９１に、タイル画像２９１aを構成するサムネイル画像全てを表示できない場合、ウインドウ２９１にはスクロールバー等が追加される。その他、例えば、一部分のサムネイル画像を省略するようにして、ウインドウ２９１に、タイル画像２９１aが収まるようしてもよい。 Note that if not all the thumbnail images constituting the tile image 291a can be displayed in the window 291, a scroll bar or the like is added to the window 291. In addition, for example, a part of the thumbnail images may be omitted so that the tile image 291a fits in the window 291.

時計マーク２９１bは、コンテンツ２３３aの総再生時間のうち、シングルクリックされたサムネイル画像２７１'に対応するフレームが再生される再生位置を、時計の針で表示するアイコンであり、図３６の時計マーク２３３bと同様に構成される。 The clock mark 291b is an icon for displaying the playback position at which the frame corresponding to the single-clicked thumbnail image 271 ′ is played out of the total playback time of the content 233a with a clock hand. The clock mark 233b in FIG. It is configured in the same way.

タイムラインバー２９１cは、コンテンツ２３３aの総再生時間のうち、タイル画像２９１aとしての複数のサムネイル画像に対応する各フレームが再生される再生位置を表示するものであり、図３６のタイムラインバー２３３cと同様に構成される。 The timeline bar 291c displays the playback position where each frame corresponding to a plurality of thumbnail images as the tile image 291a is played out of the total playback time of the content 233a. The timeline bar 291c is the same as the timeline bar 233c shown in FIG. It is comprised similarly.

したがって、タイムラインバー２９１cには、例えば、タイル画像２９１aとしての複数のサムネイル画像の枚数と同じ個数だけ、再生位置が表示される。 Accordingly, the timeline bar 291c displays, for example, as many playback positions as the number of thumbnail images as the tile image 291a.

また、ユーザは、操作部１７を用いて、タイル画像２９１aを構成する複数のサムネイル画像のうち、所定のサムネイル画像を、ポインタ２３１で指示するマウスオン操作を行うと、ポインタ２３１で指示された所定のサムネイル画像が強調して表示される。 In addition, when the user performs a mouse-on operation for designating a predetermined thumbnail image with a pointer 231 among a plurality of thumbnail images constituting the tile image 291 a using the operation unit 17, a predetermined instructed by the pointer 231 is performed. The thumbnail image is displayed with emphasis.

このとき、タイムラインバー２９１cでは、ポイント２３１で指示された所定のサムネイル画像の再生位置が、例えば、他の再生位置とは異なる色等とされることにより、強調して表示される。 At this time, on the timeline bar 291c, the reproduction position of the predetermined thumbnail image indicated by the point 231 is displayed in an emphasized manner, for example, by setting it to a color different from other reproduction positions.

図３９では、例えば、サムネイル画像２７１をポインタ２３１で指示するマウスオン操作を行ったときに、強調されたサムネイル画像２７１'が表示される場合（図３７）と同様にして、所定のサムネイル画像が強調して表示される。 In FIG. 39, for example, when a mouse-on operation for pointing the thumbnail image 271 with the pointer 231 is performed, the predetermined thumbnail image is highlighted in the same manner as when the highlighted thumbnail image 271 ′ is displayed (FIG. 37). Is displayed.

そして、ユーザは、強調表示された所定のサムネイル画像をポインタ２３１で指示した状態で、操作部１７を用いたダブルクリックを行うと、図３８を参照して説明した場合と同様にして、所定のサムネイル画像に対応するフレームから、コンテンツ２３３aの再生が行われる。 Then, when the user double-clicks using the operation unit 17 in a state where the highlighted predetermined thumbnail image is indicated by the pointer 231, the predetermined thumbnail image is displayed in the same manner as described with reference to FIG. The content 233a is reproduced from the frame corresponding to the thumbnail image.

[レコーダ１３１の動作説明]
次に、図４０のフローチャートを参照して、図２６のレコーダ１３１（特に提示部１５２）が行う提示処理について説明する。 [Description of operation of recorder 131]
Next, a presentation process performed by the recorder 131 (particularly the presentation unit 152) in FIG. 26 will be described with reference to the flowchart in FIG.

ステップＳ２２１では、分割部１５１は、図１の分割部１５と同様の処理を行う。また、分割部１５１は、図１７の分割部７１と同様にして、チャプタポイントデータ（チャプタID）を生成し、提示部１５２の表示データ生成部２１２に供給する。 In step S221, the dividing unit 151 performs the same process as the dividing unit 15 in FIG. Further, the dividing unit 151 generates chapter point data (chapter ID) in the same manner as the dividing unit 71 in FIG. 17, and supplies the generated chapter point data to the display data generating unit 212 of the presenting unit 152.

さらに、分割部１５１は、シンボル列生成部１４からのシンボル列における各シンボルを、対応する、コンテンツの各フレームに対応付けて、提示部１５２の表示データ生成部２１２に供給する。 Further, the dividing unit 151 supplies each symbol in the symbol sequence from the symbol sequence generating unit 14 to the display data generating unit 212 of the presentation unit 152 in association with each corresponding frame of content.

また、分割部１５１は、コンテンツ記憶部１１から読み出したコンテンツを、提示部１５２の特徴量抽出部２１１に供給する。 The dividing unit 151 supplies the content read from the content storage unit 11 to the feature amount extraction unit 211 of the presentation unit 152.

ステップＳ２２２では、特徴量抽出部２１１は、図２０の特徴量抽出部１１２と同様にして、特徴量時系列データを抽出して、表示データ生成部２１２に供給する。 In step S222, the feature quantity extraction unit 211 extracts feature quantity time-series data in the same manner as the feature quantity extraction unit 112 in FIG.

ステップＳ２２３では、表示データ生成部２１２は、特徴量抽出部２１１からの特徴量時系列データと、分割部１５１からのチャプタポイントデータに基づいて、例えば、図３１乃至図３３に示したような表示をさせるための表示データを生成し、表示制御部２１３に供給する。 In step S223, the display data generation unit 212 displays, for example, as shown in FIGS. 31 to 33 based on the feature amount time-series data from the feature amount extraction unit 211 and the chapter point data from the division unit 151. Display data for generating the data is generated and supplied to the display control unit 213.

また例えば、表示データ生成部２１２は、制御部１６からの制御に従い、ユーザの操作に応じて、表示部１３２の表示画面に表示させるための表示データを生成し、表示制御部２１３に供給する。 In addition, for example, the display data generation unit 212 generates display data to be displayed on the display screen of the display unit 132 in accordance with a user operation according to the control from the control unit 16 and supplies the display data to the display control unit 213.

すなわち、例えば、図３９に示されるように、サムネイル画像２７１'がポイント２３１で指示された状態でシングルクリックが行われた場合、表示データ生成部２１２は、分割部１５１からのシンボルを用いて、タイル画像２９１aを含むウインドウ２９１を表示させるための表示データを生成し、表示制御部２１３に供給する。 That is, for example, as illustrated in FIG. 39, when a single click is performed in a state where the thumbnail image 271 ′ is designated by the point 231, the display data generation unit 212 uses the symbols from the division unit 151 to Display data for displaying the window 291 including the tile image 291a is generated and supplied to the display control unit 213.

ステップＳ２２４では、表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面に、表示データに対応する表示をさせる。以上で、図４０の提示処理は終了される。 In step S224, the display control unit 213 causes the display screen of the display unit 132 to display corresponding to the display data based on the display data from the display data generation unit 212. Thus, the presentation process in FIG. 40 is completed.

以上説明したように、図４０の提示処理によれば、表示制御部２１３が、表示部１３２の表示画面に、コンテンツを構成する各チャプタ毎に、サムネイル画像を表示させるようにした。 As described above, according to the presentation process of FIG. 40, the display control unit 213 displays a thumbnail image for each chapter constituting the content on the display screen of the display unit 132.

このため、ユーザは、表示部１３２の表示画面を参照することにより、所定のチャプタにおける所望の再生位置からコンテンツを再生することが可能となる。 For this reason, the user can reproduce content from a desired reproduction position in a predetermined chapter by referring to the display screen of the display unit 132.

さらに、例えば、図４０の提示処理によれば、表示制御部２１３が、帯表示が付加されたサムネイル画像を表示させるようにした。このため、サムネイル画像に対応するシーンの特徴を、帯表示により容易に認識できるようになる。 Further, for example, according to the presentation process of FIG. 40, the display control unit 213 displays the thumbnail image to which the band display is added. For this reason, the feature of the scene corresponding to the thumbnail image can be easily recognized by the band display.

特に、ユーザは、サムネイル画像から、音声についての情報を得ることができないため、音声が大であるとの特徴を表す帯表示が、サムネイル画像に付加されることにより、シーンを再生することなく、シーンの特徴を容易に認識できるようになる。 In particular, since the user cannot obtain information about the sound from the thumbnail image, a band display representing the feature that the sound is loud is added to the thumbnail image without reproducing the scene. Scene features can be easily recognized.

さらに図４０の提示処理によれば、表示部１３２が、例えば、図３７に示されるように、サムネイル画像２３２に代表されるシーンのサムネイル画像を、その再生位置とともに、タイル画像２５１aとして表示するようにした。 Further, according to the presentation process of FIG. 40, for example, as shown in FIG. 37, the display unit 132 displays a thumbnail image of a scene represented by a thumbnail image 232 as a tile image 251a together with its reproduction position. I made it.

また、図４０の提示処理によれば、表示制御部２１３が、例えば、図３９に示されるように、サムネイル画像２７１'に対応するフレームのシンボルと同じシンボルとされた各フレームのサムネイル画像を、その再生位置とともに、タイル画像２９１aとして表示するようにした。 In addition, according to the presentation process of FIG. 40, the display control unit 213, as shown in FIG. 39, for example, displays the thumbnail image of each frame that is the same as the symbol of the frame corresponding to the thumbnail image 271 ′. Along with the reproduction position, the tile image 291a is displayed.

これにより、ユーザは、コンテンツ２３３aを構成する複数のフレームの中から、再生を開始したいフレームの再生位置を、容易に検索することが可能となる。よって、ユーザは、所望の開始位置から、コンテンツ２３３aを容易に再生することができる。 As a result, the user can easily search for the playback position of the frame to start playback from among the plurality of frames constituting the content 233a. Therefore, the user can easily reproduce the content 233a from a desired start position.

次に、図４１は、表示部制御部２１３の表示モードが移行する様子の一例を示している。 Next, FIG. 41 shows an example of how the display mode of the display control unit 213 is shifted.

ステップST１において、表示制御部２１３の表示モードは、レイヤ０モードである。このため、表示制御部２１３は、表示部１３２を制御し、表示部１３２の表示画面を、図３３に示したような表示とする。 In step ST1, the display mode of the display control unit 213 is the layer 0 mode. For this reason, the display control unit 213 controls the display unit 132 to display the display screen of the display unit 132 as shown in FIG.

例えば、制御部１６は、操作部１７からの操作信号に基づいて、いずれのサムネイル画像もポインタ２３１で指示されていない状態で、ユーザにより操作部１７を用いたダブルクリックが行われたと判別した場合、処理をステップST１からステップST２に進める。 For example, when the control unit 16 determines, based on the operation signal from the operation unit 17, that the user has performed a double click using the operation unit 17 in a state where none of the thumbnail images is instructed by the pointer 231. The process proceeds from step ST1 to step ST2.

そして、ステップST２では、制御部１６は、コンテンツ２３３aを再生中のウインドウ２３３が存在する場合、表示データ生成部２１２を制御し、そのウインドウ２３３を前面に表示させるための表示データを生成させ、表示制御部２１３に供給させる。 In step ST2, if there is a window 233 that is playing back the content 233a, the control unit 16 controls the display data generation unit 212 to generate display data for causing the window 233 to be displayed on the front. It is made to supply to the control part 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、ウインドウ２３３が前面に表示された表示画面に変更させ、処理はステップST２からステップST１に戻る。 Based on the display data from the display data generation unit 212, the display control unit 213 changes the display screen of the display unit 132 to a display screen with the window 233 displayed on the front, and the process returns from step ST2 to step ST1. .

また、ステップST１において、制御部１６は、適宜、処理をステップST３に進める。 Moreover, in step ST1, the control part 16 advances a process to step ST3 suitably.

ステップST３では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザにより、スライダ１７１をスライドさせるスライド操作等が行われたか否かを判別する。そして、制御部１６は、操作部１７からの操作信号に基づいて、ユーザによるスライド操作等が行われたと判別した場合、表示データ生成部２１２に、ユーザのスライド操作等に応じた表示データを生成させ、表示制御部２１３に供給する。 In step ST3, the control unit 16 determines based on the operation signal from the operation unit 17 whether or not a slide operation or the like for sliding the slider 171 has been performed by the user. When the control unit 16 determines that the user has performed a slide operation or the like based on the operation signal from the operation unit 17, the control unit 16 generates display data corresponding to the user's slide operation or the like in the display data generation unit 212. And supply it to the display control unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、ユーザのスライド操作等に応じた表示画面に変更させる。これにより、表示部１３２の表示画面は、例えば、図３０に示された表示画面から、図３１に示された表示画面に変更する。その後、処理は、ステップST３からステップST１に戻る。 Based on the display data from the display data generation unit 212, the display control unit 213 changes the display screen of the display unit 132 to a display screen corresponding to a user's slide operation or the like. Thereby, the display screen of the display unit 132 is changed from, for example, the display screen shown in FIG. 30 to the display screen shown in FIG. Thereafter, the process returns from step ST3 to step ST1.

さらに、ステップST１において、制御部１６は、適宜、処理をステップST４に進める。 Furthermore, in step ST1, the control part 16 advances a process to step ST4 suitably.

ステップST４では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１との距離が予め決められた閾値以下となるサムネイル画像２３２が存在するか否かを判別する。制御部１６は、そのようなサムネイル画像２３２が存在しないと判別した場合、処理を、ステップST１に戻す。 In step ST4, based on the operation signal from the operation unit 17, the control unit 16 determines whether there is a thumbnail image 232 whose distance from the pointer 231 is equal to or less than a predetermined threshold. If the control unit 16 determines that such a thumbnail image 232 does not exist, the control unit 16 returns the process to step ST1.

また、ステップST４では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１との距離が予め決められた閾値以下となるサムネイル画像２３２が存在すると判別した場合、処理を、ステップST５に進める。 Further, in step ST4, when the control unit 16 determines that there is a thumbnail image 232 whose distance from the pointer 231 is equal to or less than a predetermined threshold based on the operation signal from the operation unit 17, the process proceeds to step ST4. Proceed to ST5.

ここで、ポインタ２３１とサムネイル画像２３２との距離とは、例えば、ポインタ２３１の重心（又は矢印としてのポインタ２３１の先端部分）と、サムネイル画像２３２の重心との距離を表す。 Here, the distance between the pointer 231 and the thumbnail image 232 represents, for example, the distance between the center of gravity of the pointer 231 (or the tip of the pointer 231 as an arrow) and the center of gravity of the thumbnail image 232.

ステップST５では、制御部１６は、表示データ生成部２１２に、サムネイル画像２３２を強調して表示させるための表示データを生成させ、表示制御部２１３に供給させる。 In step ST5, the control unit 16 causes the display data generation unit 212 to generate display data for emphasizing and displaying the thumbnail image 232, and supplies the display data to the display control unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、図３５に示したような表示画面に変更させる。 Based on the display data from the display data generation unit 212, the display control unit 213 changes the display screen of the display unit 132 to a display screen as shown in FIG.

また、ステップST５では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１とサムネイル画像２３２との距離が閾値以下の状態で、ユーザによる操作部１７を用いたダブルクリック又はシングルクリックの一方が行われたか否かを判別する。 Further, in step ST5, the control unit 16 performs a double click or single operation by the user using the operation unit 17 in a state where the distance between the pointer 231 and the thumbnail image 232 is equal to or less than the threshold value based on the operation signal from the operation unit 17. It is determined whether one of the clicks has been performed.

なお、ステップST５では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザによる操作部１７を用いたダブルクリック及びシングルクリックのいずれも行われていないと判別した場合、適宜、処理をステップST４に戻す。 In step ST5, if the control unit 16 determines that neither double-clicking or single-clicking using the operation unit 17 by the user has been performed based on an operation signal from the operation unit 17, an appropriate process is performed. Is returned to step ST4.

ステップST５では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１とサムネイル画像２３２との距離が閾値以下の状態で、ユーザによる操作部１７を用いたダブルクリックが行われたと判別した場合、処理を、ステップST６に進める。 In step ST5, the control unit 16 determines that, based on the operation signal from the operation unit 17, the user double-clicked using the operation unit 17 in a state where the distance between the pointer 231 and the thumbnail image 232 is equal to or less than the threshold value. If so, the process proceeds to step ST6.

ステップST６では、制御部１６は、表示データ生成部２１２に、サムネイル画像２３２に対応するフレームの再生位置から、コンテンツ２３３aを再生させる際の表示データを生成させ、表示制御部２１３に供給させる。 In step ST6, the control unit 16 causes the display data generation unit 212 to generate display data for reproducing the content 233a from the reproduction position of the frame corresponding to the thumbnail image 232, and supplies the display data to the display control unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、図３６に示したような表示画面に変更させ、処理はステップST１に戻る。 The display control unit 213 changes the display screen of the display unit 132 to a display screen as shown in FIG. 36 based on the display data from the display data generation unit 212, and the process returns to step ST1.

また、ステップST５では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１とサムネイル画像２３２との距離が閾値以下の状態で、ユーザによる操作部１７を用いたシングルクリックが行われたと判別した場合、処理を、ステップST７に進める。 In step ST5, the control unit 16 performs a single click by the user using the operation unit 17 in a state where the distance between the pointer 231 and the thumbnail image 232 is equal to or smaller than the threshold value based on the operation signal from the operation unit 17. If it is determined that it has been broken, the process proceeds to step ST7.

ステップST７では、制御部１６は、表示制御部２１３を制御し、表示制御部２１３の表示モードを、レイヤ０モードからレイヤ１モードに移行させる。また、表示制御部２１３は、制御部１６からの制御に従い、表示部１３２の表示画面を、例えば、図３３に示される表示画面に、図３７のウインドウ２５１が追加された表示画面に変更させる。 In step ST7, the control unit 16 controls the display control unit 213 to shift the display mode of the display control unit 213 from the layer 0 mode to the layer 1 mode. In addition, the display control unit 213 changes the display screen of the display unit 132 to, for example, the display screen illustrated in FIG. 33 and the display screen in which the window 251 in FIG. 37 is added according to the control from the control unit 16.

また、ステップST７では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザによる操作部１７を用いたダブルクリックが行われたか否かを判別し、ユーザによるダブルクリックが行われたと判別した場合、処理をステップST８に進める。 In step ST7, the control unit 16 determines whether or not the user has performed a double click using the operation unit 17 based on the operation signal from the operation unit 17, and the user has performed the double click. If so, the process proceeds to step ST8.

ステップST８では、制御部１６は、表示データ生成部２１２に、ポインタ２３１に最も近いサムネイル画像に対応するフレームの再生位置から、コンテンツ２３３aを再生させる際の表示データを生成させ、表示制御部２１３に供給させる。 In step ST8, the control unit 16 causes the display data generation unit 212 to generate display data for reproducing the content 233a from the reproduction position of the frame corresponding to the thumbnail image closest to the pointer 231 and causes the display control unit 213 to generate the display data. Supply.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、図３６に示したよう表示画面に表示させ、処理はステップST１に戻る。 Based on the display data from the display data generation unit 212, the display control unit 213 displays the display screen of the display unit 132 on the display screen as shown in FIG. 36, and the process returns to step ST1.

さらに、ステップST７では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザによる操作部１７を用いたダブルクリックが行われていないと判別した場合、適宜、処理をステップST９に進める。 Furthermore, in step ST7, when it is determined that the double click using the operation unit 17 by the user has not been performed based on the operation signal from the operation unit 17, the control unit 16 appropriately proceeds to step ST9. .

ステップST９では、制御部１６は、操作部１７からの操作信号に基づいて、例えば、ウインドウ２５１において、ポインタ２３１との距離が予め決められた閾値以下となるサムネイル画像２７１が存在するか否かを判別する。制御部１６は、そのようなサムネイル画像２７１が存在しないと判別した場合、処理を、ステップST１０に進める。 In step ST9, based on the operation signal from the operation unit 17, for example, the control unit 16 determines whether or not there is a thumbnail image 271 whose distance from the pointer 231 is equal to or less than a predetermined threshold in the window 251. Determine. If the control unit 16 determines that such a thumbnail image 271 does not exist, the control unit 16 advances the processing to step ST10.

ステップST１０では、制御部１６は、操作部１７からの操作信号に基づいて、レイヤ１モードで表示されるウインドウ２５１のエリア外に、ポインタ２３１が移動したか否かを判別し、ウインドウ２５１のエリア外に、ポインタ２３１が移動したと判別した場合、処理をステップST1に戻す。 In step ST10, the control unit 16 determines whether or not the pointer 231 has moved outside the area of the window 251 displayed in the layer 1 mode based on the operation signal from the operation unit 17, and the area of the window 251 is determined. If it is determined that the pointer 231 has moved outside, the process returns to step ST1.

ステップST1では、制御部１６は、表示データ生成部２１２に、レイヤ０モードに対応する表示をさせるための表示データを生成させ、表示制御部２１３に供給させる。 In step ST1, the control unit 16 causes the display data generation unit 212 to generate display data for display corresponding to the layer 0 mode, and supplies the display data to the display control unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、例えば、図３３に示されるような表示画面に変更させる。なお、この場合、表示制御部２１３は、表示モードを、レイヤ１モードからレイヤ０モードに移行する。 Based on the display data from the display data generation unit 212, the display control unit 213 changes the display screen of the display unit 132 to, for example, a display screen as shown in FIG. In this case, the display control unit 213 shifts the display mode from the layer 1 mode to the layer 0 mode.

また、ステップST１０では、制御部１６は、操作部１７からの操作信号に基づいて、ウインドウ２５１のエリア外に、ポインタ２３１が移動していないと判別した場合、処理をステップST７に戻す。 In step ST10, when the control unit 16 determines that the pointer 231 has not moved outside the area of the window 251 based on the operation signal from the operation unit 17, the process returns to step ST7.

ステップST９では、制御部１６は、操作部１７からの操作信号に基づいて、例えば、ウインドウ２５１において、ポインタ２３１との距離が予め決められた閾値以下となるサムネイル画像２７１が存在すると判別した場合、処理を、ステップST１１に進める。 In step ST9, when the control unit 16 determines that there is a thumbnail image 271 whose distance from the pointer 231 is equal to or less than a predetermined threshold in the window 251, for example, based on the operation signal from the operation unit 17, The process proceeds to step ST11.

ステップST１１では、制御部１６は、表示データ生成部２１２に、サムネイル画像２７１を強調して表示させるための表示データを生成させ、表示制御部２１３に供給させる。 In step ST11, the control unit 16 causes the display data generation unit 212 to generate display data for displaying the thumbnail image 271 with emphasis, and supplies the display data to the display control unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、図３７に示したような、サムネイル画像２７１を強調したサムネイル画像２７１'が表示される表示画面に変更させる。 Based on the display data from the display data generation unit 212, the display control unit 213 displays a display screen of the display unit 132 on which a thumbnail image 271 ′ highlighting the thumbnail image 271 as shown in FIG. Change to the screen.

また、ステップST１１では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１とサムネイル画像２７１'との距離が閾値以下の状態で、ユーザによる操作部１７を用いたダブルクリック又はシングルクリックの一方が行われたか否かを判別する。 In step ST11, the control unit 16 performs double-clicking by the user using the operation unit 17 in a state where the distance between the pointer 231 and the thumbnail image 271 ′ is equal to or less than the threshold value based on the operation signal from the operation unit 17. It is determined whether one of the single clicks has been performed.

なお、ステップST１１では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザによる操作部１７を用いたダブルクリック及びシングルクリックのいずれも行われていないと判別した場合、適宜、処理をステップST９に戻す。 In step ST11, when the control unit 16 determines that neither double-clicking or single-clicking by the user using the operation unit 17 is performed based on an operation signal from the operation unit 17, an appropriate process is performed. To step ST9.

ステップST１１では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１とサムネイル画像２７１'との距離が閾値以下の状態で、ユーザによる操作部１７を用いたダブルクリックが行われたと判別した場合、処理を、ステップST１２に進める。 In step ST11, based on the operation signal from the operation unit 17, the control unit 16 performs a double click by the user using the operation unit 17 in a state where the distance between the pointer 231 and the thumbnail image 271 ′ is equal to or less than the threshold value. If it is determined that the process has been performed, the process proceeds to step ST12.

ステップST１２では、制御部１６は、表示データ生成部２１２に、サムネイル画像２７１'に対応するフレームの再生位置から、コンテンツ２３３aを再生させる際の表示データを生成させ、表示制御部２１３に供給させる。 In step ST12, the control unit 16 causes the display data generation unit 212 to generate display data for reproducing the content 233a from the reproduction position of the frame corresponding to the thumbnail image 271 ′, and supplies the display data to the display control unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、図３８に示したような表示画面に変更させ、処理はステップST７に戻る。 The display control unit 213 changes the display screen of the display unit 132 to a display screen as shown in FIG. 38 based on the display data from the display data generation unit 212, and the process returns to step ST7.

また、ステップST１１では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１とサムネイル画像２７１'との距離が閾値以下の状態で、ユーザによる操作部１７を用いたシングルクリックが行われたと判別した場合、処理を、ステップST１３に進める。 Further, in step ST11, the control unit 16 performs a single click using the operation unit 17 by the user in a state where the distance between the pointer 231 and the thumbnail image 271 ′ is equal to or less than a threshold based on the operation signal from the operation unit 17. If it is determined that the process has been performed, the process proceeds to step ST13.

ステップST１３では、制御部１６は、表示制御部２１３を制御し、表示制御部２１３の表示モードを、レイヤ１モードからレイヤ２モードに移行させる。また、表示制御部２１３は、制御部１６からの制御に従い、表示部１３２の表示画面を、例えば、図３９に示されるような、ウインドウ２９１が表示される表示画面に変更させる。 In step ST13, the control unit 16 controls the display control unit 213 to shift the display mode of the display control unit 213 from the layer 1 mode to the layer 2 mode. Further, the display control unit 213 changes the display screen of the display unit 132 to a display screen on which a window 291 is displayed as shown in FIG. 39, for example, according to control from the control unit 16.

また、ステップST１３では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザによる操作部１７を用いたダブルクリックが行われたか否かを判別し、ユーザによるダブルクリックが行われたと判別した場合、処理をステップST１４に進める。 In step ST13, the control unit 16 determines whether or not the user has performed a double click using the operation unit 17 based on the operation signal from the operation unit 17, and the user has performed the double click. If so, the process proceeds to step ST14.

ステップST１４では、制御部１６は、表示データ生成部２１２に、ポインタ２３１に最も近いサムネイル画像に対応するフレームの再生位置から、コンテンツ２３３aを再生させる際の表示データを生成させ、表示制御部２１３に供給させる。 In step ST14, the control unit 16 causes the display data generation unit 212 to generate display data for reproducing the content 233a from the reproduction position of the frame corresponding to the thumbnail image closest to the pointer 231, and causes the display control unit 213 to generate the display data. Supply.

さらに、ステップST１３では、制御部１６は、操作部１７からの操作信号に基づいて、ユーザによる操作部１７を用いたダブルクリックが行われていないと判別した場合、適宜、処理をステップST１５に進める。 Furthermore, in step ST13, when it is determined that the double click using the operation unit 17 by the user has not been performed based on the operation signal from the operation unit 17, the control unit 16 appropriately proceeds to step ST15. .

ステップST１５では、制御部１６は、操作部１７からの操作信号に基づいて、例えば、ウインドウ２９１において、ポインタ２３１との距離が予め決められた閾値以下となる所定のサムネイル画像（タイル画像２９１aに含まれる画像）が存在するか否かを判別する。制御部１６は、そのような所定のサムネイル画像が存在すると判別した場合、処理を、ステップST１６に進める。 In step ST15, based on the operation signal from the operation unit 17, for example, the control unit 16 includes a predetermined thumbnail image (included in the tile image 291a) whose distance from the pointer 231 is equal to or less than a predetermined threshold in the window 291. Image) is present. If the control unit 16 determines that such a predetermined thumbnail image exists, the control unit 16 advances the processing to step ST16.

ステップST１６では、制御部１６は、表示データ生成部２１２に、ウインドウ２９１において、ポインタ２３１との距離が閾値以下となる所定のサムネイル画像を強調して表示させるための表示データを生成させ、表示制御部２１３に供給させる。 In step ST16, the control unit 16 causes the display data generation unit 212 to generate display data for emphasizing and displaying a predetermined thumbnail image whose distance from the pointer 231 is equal to or less than a threshold in the window 291 and performing display control. To the unit 213.

表示制御部２１３は、表示データ生成部２１２からの表示データに基づいて、表示部１３２の表示画面を、所定のサムネイル画像が強調して表示される表示画面に変更させる。 Based on the display data from the display data generation unit 212, the display control unit 213 changes the display screen of the display unit 132 to a display screen on which a predetermined thumbnail image is displayed with emphasis.

また、ステップST１６では、制御部１６は、操作部１７からの操作信号に基づいて、ポインタ２３１とサムネイル画像との距離が閾値以下の状態で、ユーザによる操作部１７を用いたダブルクリックが行われたか否かを判別する。そして、制御部１６は、ダブルクリックが行われたと判定した場合、処理をステップST１７に進める。 In step ST16, the control unit 16 performs a double-click using the operation unit 17 by the user in a state where the distance between the pointer 231 and the thumbnail image is equal to or smaller than the threshold value based on the operation signal from the operation unit 17. It is determined whether or not. If the control unit 16 determines that the double-click has been performed, the control unit 16 advances the process to step ST17.

ステップST１７では、制御部１６は、表示データ生成部２１２に、サムネイル画像に対応するフレームの再生位置から、コンテンツ２３３aを再生させる際の表示データを生成させ、表示制御部２１３に供給させる。 In step ST17, the control unit 16 causes the display data generation unit 212 to generate display data for reproducing the content 233a from the reproduction position of the frame corresponding to the thumbnail image, and supplies the display data to the display control unit 213.

また、ステップST１５では、制御部１６は、操作部１７からの操作信号に基づいて、例えば、ウインドウ２９１において、ポインタ２３１との距離が予め決められた閾値以下となる所定のサムネイル画像（タイル画像２９１aに含まれる画像）が存在しないと判定した場合、処理を、ステップST１８に進める。 In step ST15, the control unit 16 determines a predetermined thumbnail image (tile image 291a) whose distance from the pointer 231 is equal to or less than a predetermined threshold in the window 291, for example, based on the operation signal from the operation unit 17. If it is determined that there is no image included in the image, the process proceeds to step ST18.

ステップST１８では、制御部１６は、操作部１７からの操作信号に基づいて、レイヤ２モードで表示されるウインドウ２９１のエリア外に、ポインタ２３１が移動したか否かを判別し、ウインドウ２９１のエリア外に、ポインタ２３１が移動したと判別した場合、処理をステップST1に戻す。 In step ST18, based on the operation signal from the operation unit 17, the control unit 16 determines whether or not the pointer 231 has moved outside the area of the window 291 displayed in the layer 2 mode. If it is determined that the pointer 231 has moved outside, the process returns to step ST1.

ステップST１では、制御部１６は、表示制御部２１３を制御し、表示モードを、レイヤ２モードからレイヤ０モードに移行させ、それ以降、同様の処理が行われる。 In step ST1, the control unit 16 controls the display control unit 213 to shift the display mode from the layer 2 mode to the layer 0 mode, and thereafter the same processing is performed.

また、ステップST１８では、制御部１６は、操作部１７からの操作信号に基づいて、レイヤ２モードで表示されるウインドウ２９１のエリア外に、ポインタ２３１が移動していないと判別した場合、処理をステップST１３に戻し、それ以降同様の処理が行われる。 In step ST18, when the control unit 16 determines that the pointer 231 has not moved outside the area of the window 291 displayed in the layer 2 mode based on the operation signal from the operation unit 17, the process is performed. Returning to step ST13, the same processing is performed thereafter.

＜４．変形例＞
ところで、本技術は、以下の構成をとることができる。
（１）時系列に並ぶ複数のデータにより構成される時系列データを区分して得られる各チャプタから、前記チャプタを代表する予め決められた部分を表すチャプタセグメントを抽出するチャプタセグメント抽出部と、前記時系列データを区分して得られる各チャプタのうち、チャプタの特徴的な部分を表す特徴セグメントを有するチャプタから、前記特徴セグメントを抽出する特徴セグメント抽出部と、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、前記時系列データの大まかな内容を反映したダイジェストを生成する生成部とを含む情報処理装置。
（２）前記生成部は、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、ユーザの設定操作により設定された長さの前記ダイジェストを生成する前記（１）に記載の情報処理装置。
（３）前記時系列データに基づいて、前記複数のデータの属性をそれぞれ表すシンボルを時系列に並べたシンボル列を作成するシンボル列生成部と、前記シンボル列におけるシンボルの分散に基づいて、前記時系列データを複数のチャプタに区分する区分部とをさらに含む前記（２）に記載の情報処理装置。
（４）前記区分部は、前記シンボル列を構成する各シンボルの分散に基づいて、前記時系列データを、前記ユーザの設定操作により設定された長さに基づく区分数のチャプタに区分する前記（３）に記載の情報処理装置。
（５）前記時系列データから、前記時系列データの特徴を表す特徴量を抽出する特徴量抽出部をさらに含み、前記特徴セグメント抽出部は、前記特徴量に基づいて、前記特徴セグメントを有するチャプタから、前記特徴セグメントを抽出する前記（１）乃至（４）に記載の情報処理装置。
（６）前記特徴セグメント抽出部は、前記特徴量に基づいて、前記チャプタの開始から終了までの区間で前記特徴量が最大又は極大の一方となる箇所を含む前記特徴セグメントを、前記チャプタから抽出する前記（５）に記載の情報処理装置。
（７）前記特徴セグメント抽出部は、前記特徴量に基づいて、前記チャプタの開始から終了までの区間で前記特徴量が最大又は極大の一方となる箇所であって、且つ、前記特徴量が予め決められた閾値以上となる箇所を含む前記特徴セグメントを、前記チャプタから抽出する前記（６）に記載の情報処理装置。
（８）前記特徴セグメント抽出部は、複数の異なる前記特徴量に基づいて、前記複数の異なる特徴量のうち、前記チャプタの開始から終了までの区間で最大とされる前記特徴量が最大となる箇所を含む前記特徴セグメントを、前記チャプタから抽出する前記（７）に記載の情報処理装置。
（９）前記生成部は、前記チャプタセグメントと特徴セグメントとのそれぞれに、対応する重みで予め用意された音声が付加された前記ダイジェストを生成する前記（５）乃至（８）に記載の情報処理装置。
（１０）前記特徴セグメント抽出部は、複数の異なる前記特徴量に基づいて、前記特徴セグメントを有するチャプタから、前記特徴セグメントを抽出し、前記生成部は、前記複数の異なる特徴量のうち、音声の特徴を表す特徴量に基づき抽出された前記特徴セグメントに、他の前記特徴セグメントよりも小さな重みで前記音声が付加された前記ダイジェストを生成する前記（９）に記載の情報処理装置。
（１１）前記生成部は、連続的に変化して切替わる重みで前記音声が付加された前記ダイジェストを生成する前記（１０）に記載の情報処理装置。
（１２）ダイジェストを生成する情報処理装置の情報処理方法において、前記情報処理装置による、時系列に並ぶ複数のデータにより構成される時系列データを区分して得られる各チャプタから、前記チャプタの予め決められた部分を表すチャプタセグメントを抽出するチャプタセグメント抽出ステップと、前記時系列データを区分して得られる各チャプタのうち、チャプタの特徴的な部分を表す特徴セグメントを有するチャプタから、前記特徴セグメントを抽出する特徴セグメント抽出ステップと、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、前記時系列データの大まかな内容を反映したダイジェストを生成する生成ステップとを含む情報処理方法。
（１３）コンピュータを、時系列に並ぶ複数のデータにより構成される時系列データを区分して得られる各チャプタから、前記チャプタの予め決められた部分を表すチャプタセグメントを抽出するチャプタセグメント抽出部と、前記時系列データを区分して得られる各チャプタのうち、チャプタの特徴的な部分を表す特徴セグメントを有するチャプタから、前記特徴セグメントを抽出する特徴セグメント抽出部と、前記チャプタセグメントと前記特徴セグメントを時系列の順序で結合することにより、前記時系列データの大まかな内容を反映したダイジェストを生成する生成部として機能させるためのプログラム。 <4. Modification>
By the way, this technique can take the following structures.
(1) a chapter segment extraction unit that extracts a chapter segment representing a predetermined portion representing the chapter from each chapter obtained by dividing time series data composed of a plurality of data arranged in time series; Of the chapters obtained by classifying the time series data, a feature segment extracting unit that extracts the feature segment from chapters having a feature segment representing a characteristic part of the chapter, and the chapter segment and the feature segment. An information processing apparatus including: a generation unit that generates a digest reflecting a rough content of the time-series data by combining in time-series order.
(2) The information according to (1), wherein the generation unit generates the digest having a length set by a user setting operation by combining the chapter segment and the feature segment in time series order. Processing equipment.
(3) Based on the time series data, a symbol string generation unit that creates a symbol string in which symbols representing the attributes of the plurality of data are arranged in time series, and based on the variance of symbols in the symbol string, The information processing apparatus according to (2), further including a classification unit that classifies the time series data into a plurality of chapters.
(4) The division unit divides the time-series data into chapters having the number of divisions based on a length set by the user's setting operation based on a variance of each symbol constituting the symbol string. The information processing apparatus according to 3).
(5) It further includes a feature amount extraction unit that extracts a feature amount representing the feature of the time series data from the time series data, and the feature segment extraction unit includes a chapter having the feature segment based on the feature amount. The information processing device according to any one of (1) to (4), wherein the feature segment is extracted from the information segment.
(6) The feature segment extraction unit extracts, from the chapter, the feature segment including a portion where the feature amount is one of maximum or maximum in a section from the start to the end of the chapter based on the feature amount. The information processing apparatus according to (5).
(7) The feature segment extraction unit is a location where the feature amount is one of the maximum or maximum in a section from the start to the end of the chapter based on the feature amount, and the feature amount is The information processing apparatus according to (6), wherein the feature segment including a portion that is equal to or greater than a predetermined threshold is extracted from the chapter.
(8) The feature segment extraction unit maximizes the feature amount that is maximized in a section from the start to the end of the chapter among the plurality of different feature amounts based on the plurality of different feature amounts. The information processing apparatus according to (7), wherein the feature segment including a location is extracted from the chapter.
(9) The information processing according to (5) to (8), wherein the generation unit generates the digest in which speech prepared in advance with a corresponding weight is added to each of the chapter segment and the feature segment. apparatus.
(10) The feature segment extraction unit extracts the feature segment from a chapter having the feature segment based on a plurality of different feature amounts, and the generation unit includes a voice of the plurality of different feature amounts. The information processing apparatus according to (9), wherein the digest is generated by adding the voice to the feature segment extracted based on the feature amount representing the feature of the feature with a weight smaller than that of the other feature segment.
(11) The information processing apparatus according to (10), wherein the generation unit generates the digest to which the voice is added with a weight that is continuously changed and switched.
(12) In the information processing method of the information processing apparatus for generating a digest, from each chapter obtained by dividing time series data composed of a plurality of data arranged in time series by the information processing apparatus, the chapters in advance A chapter segment extraction step for extracting a chapter segment representing a determined portion, and a chapter having a feature segment representing a characteristic portion of the chapter among the chapters obtained by dividing the time-series data. A feature segment extracting step of extracting a chapter, and a generating step of generating a digest reflecting a rough content of the time series data by combining the chapter segment and the feature segment in a time series order .
(13) a chapter segment extraction unit that extracts a chapter segment representing a predetermined portion of the chapter from each chapter obtained by dividing the time series data including a plurality of pieces of data arranged in time series by the computer; Among the chapters obtained by classifying the time-series data, a feature segment extracting unit that extracts the feature segment from chapters having a feature segment representing a characteristic portion of the chapter, the chapter segment, and the feature segment A program for functioning as a generation unit that generates a digest reflecting the rough contents of the time-series data by combining them in time-series order.

［本技術を適用したコンピュータの構成例］
次に、上述した一連の処理は、ハードウェアにより行うこともできるし、ソフトウェアにより行うこともできる。一連の処理をソフトウェアによって行う場合には、そのソフトウェアを構成するプログラムが、汎用のコンピュータ等にインストールされる。 [Example of computer configuration to which this technology is applied]
Next, the series of processes described above can be performed by hardware or software. When a series of processing is performed by software, a program constituting the software is installed in a general-purpose computer or the like.

そこで、図４２は、上述した一連の処理を実行するプログラムがインストールされるコンピュータの一実施の形態の構成例を示している。 Therefore, FIG. 42 shows a configuration example of an embodiment of a computer in which a program for executing the series of processes described above is installed.

プログラムは、コンピュータに内蔵されている記録媒体としてのハードディスク３０５やROM３０３に予め記録しておくことができる。 The program can be recorded in advance on a hard disk 305 or a ROM 303 as a recording medium built in the computer.

あるいはまた、プログラムは、ドライブ３０９に装着されるリムーバブル記録媒体３１１に格納（記録）しておくことができる。このようなリムーバブル記録媒体３１１は、いわゆるパッケージソフトウエアとして提供することができる。ここで、リムーバブル記録媒体３１１としては、例えば、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto Optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリ等がある。 Alternatively, the program can be stored (recorded) in a removable recording medium 311 attached to the drive 309. Such a removable recording medium 311 can be provided as so-called package software. Here, examples of the removable recording medium 311 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory.

なお、プログラムは、上述したようなリムーバブル記録媒体３１１からコンピュータにインストールする他、通信網や放送網を介して、コンピュータにダウンロードし、内蔵するハードディスク３０５にインストールすることができる。すなわち、プログラムは、例えば、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送することができる。 In addition to installing the program from the removable recording medium 311 as described above, the program can be downloaded to the computer via a communication network or a broadcast network, and can be installed in the built-in hard disk 305. That is, for example, the program is wirelessly transferred from a download site to a computer via a digital satellite broadcasting artificial satellite, or wired to a computer via a network such as a LAN (Local Area Network) or the Internet. be able to.

コンピュータは、CPU(Central Processing Unit)３０２を内蔵しており、CPU３０２には、バス３０１を介して、入出力インタフェース３１０が接続されている。 The computer includes a CPU (Central Processing Unit) 302, and an input / output interface 310 is connected to the CPU 302 via the bus 301.

CPU３０２は、入出力インタフェース３１０を介して、ユーザによって、入力部３０７が操作等されることにより指令が入力されると、それに従って、ROM(Read Only Memory)３０３に格納されているプログラムを実行する。あるいは、CPU３０２は、ハードディスク３０５に格納されたプログラムを、RAM(Random Access Memory)３０４にロードして実行する。 The CPU 302 executes a program stored in a ROM (Read Only Memory) 303 in response to an instruction input by the user operating the input unit 307 or the like via the input / output interface 310. . Alternatively, the CPU 302 loads a program stored in the hard disk 305 to a RAM (Random Access Memory) 304 and executes it.

これにより、CPU３０２は、上述したフローチャートにしたがった処理、あるいは上述したブロック図の構成により行われる処理を行う。そして、CPU３０２は、その処理結果を、必要に応じて、例えば、入出力インタフェース３１０を介して、出力部３０６から出力、あるいは、通信部３０８から送信、さらには、ハードディスク３０５に記録等させる。 Thereby, the CPU 302 performs processing according to the above-described flowchart or processing performed by the configuration of the above-described block diagram. Then, the CPU 302 causes the processing result to be output from the output unit 306 or transmitted from the communication unit 308 via the input / output interface 310, or recorded on the hard disk 305, for example, as necessary.

なお、入力部３０７は、キーボードや、マウス、マイク等で構成される。また、出力部３０６は、LCD(Liquid Crystal Display)やスピーカ等で構成される。 Note that the input unit 307 includes a keyboard, a mouse, a microphone, and the like. The output unit 306 includes an LCD (Liquid Crystal Display), a speaker, and the like.

ここで、本明細書において、コンピュータがプログラムに従って行う処理は、必ずしもフローチャートとして記載された順序に沿って時系列に行われる必要はない。すなわち、コンピュータがプログラムに従って行う処理は、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含む。 Here, in the present specification, the processing performed by the computer according to the program does not necessarily have to be performed in time series in the order described as the flowchart. That is, the processing performed by the computer according to the program includes processing executed in parallel or individually (for example, parallel processing or object processing).

また、プログラムは、１のコンピュータ（プロセッサ）により処理されるものであっても良いし、複数のコンピュータによって分散処理されるものであっても良い。さらに、プログラムは、遠方のコンピュータに転送されて実行されるものであっても良い。 Further, the program may be processed by one computer (processor) or may be distributedly processed by a plurality of computers. Furthermore, the program may be transferred to a remote computer and executed.

なお、本開示の実施の形態は、上述した実施の形態に限定されるものではなく、本開示の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiment of the present disclosure is not limited to the above-described embodiment, and various modifications can be made without departing from the gist of the present disclosure.

１レコーダ，１１コンテンツ記憶部，１２コンテンツモデル学習部，１３モデル記憶部，１４シンボル列生成部，１５分割部，１６制御部，１７操作部，２１学習用コンテンツ選択部，２２特徴量抽出部，２３フレーム分割部，２４サブ領域特徴量抽出部，２５結合部，２６特徴量記憶部，２７学習部，３１コンテンツ選択部，３２モデル選択部，３３特徴量抽出部，３４最尤状態系列推定部，５１レコーダ，７１分割部，７２ダイジェスト生成部，１１１チャプタセグメント抽出部，１１２特徴量抽出部，１１３特徴ピークセグメント抽出部，１１４エフェクト追加部，１３１レコーダ，１３２表示部，１５１分割部，１５２提示部，２１１特徴量抽出部，２１２表示データ生成部，２１３表示制御部 DESCRIPTION OF SYMBOLS 1 recorder, 11 content memory | storage part, 12 content model learning part, 13 model memory | storage part, 14 symbol sequence production | generation part, 15 division | segmentation part, 16 control part, 17 operation part, 21 content selection part for learning, 22 feature-value extraction part, 23 frame division unit, 24 sub-region feature quantity extraction unit, 25 combination unit, 26 feature quantity storage unit, 27 learning unit, 31 content selection unit, 32 model selection unit, 33 feature quantity extraction unit, 34 maximum likelihood state sequence estimation unit , 51 recorder, 71 division unit, 72 digest generation unit, 111 chapter segment extraction unit, 112 feature amount extraction unit, 113 feature peak segment extraction unit, 114 effect addition unit, 131 recorder, 132 display unit, 151 division unit, 152 presentation Part, 211 feature Out unit, 212 display data generating unit, 213 display control unit

Claims

A chapter segment extraction unit that extracts a chapter segment representing a predetermined portion representing the chapter from each chapter obtained by dividing time series data composed of a plurality of data arranged in time series; and
Of each chapter obtained by classifying the time series data, a feature segment extraction unit that extracts the feature segment from a chapter having a feature segment representing a characteristic part of the chapter;
An information processing apparatus comprising: a generating unit that generates a digest reflecting a rough content of the time-series data by combining the chapter segments and the feature segments in a time-series order.

The information processing apparatus according to claim 1, wherein the generation unit generates the digest having a length set by a user's setting operation by combining the chapter segment and the feature segment in time series order.

A symbol string generation unit that creates a symbol string in which symbols representing the attributes of the plurality of data are arranged in time series based on the time series data;
The information processing apparatus according to claim 2, further comprising: a division unit that divides the time-series data into a plurality of chapters based on symbol dispersion in the symbol string.

The division unit divides the time-series data into chapters of a division number based on a length set by the user's setting operation based on a variance of each symbol constituting the symbol string. Information processing device.

A feature amount extraction unit for extracting a feature amount representing the feature of the time series data from the time series data;
The information processing apparatus according to claim 4, wherein the feature segment extraction unit extracts the feature segment from a chapter having the feature segment based on the feature amount.

The feature segment extraction unit extracts, based on the feature amount, the feature segment including a portion where the feature amount is one of maximum or maximum in a section from the start to the end of the chapter from the chapter. 5. The information processing apparatus according to 5.

The feature segment extraction unit is a location where the feature value is one of maximum or maximum in a section from the start to the end of the chapter based on the feature value, and the feature value is determined in advance. The information processing apparatus according to claim 6, wherein the feature segment including a portion that is equal to or greater than a threshold is extracted from the chapter.

The feature segment extraction unit includes a location where the feature amount maximized in a section from the start to the end of the chapter among the plurality of different feature amounts is based on the plurality of different feature amounts. The information processing apparatus according to claim 7, wherein the feature segment is extracted from the chapter.

The information processing apparatus according to claim 5, wherein the generation unit generates the digest in which a voice prepared in advance with a corresponding weight is added to each of the chapter segment and the feature segment.

The feature segment extraction unit extracts the feature segment from chapters having the feature segment based on a plurality of different feature quantities,
The generator generates the digest in which the speech is added to the feature segment extracted based on a feature amount representing a feature of speech among the plurality of different feature amounts with a weight smaller than that of the other feature segment. The information processing apparatus according to claim 9.

The information processing apparatus according to claim 10, wherein the generation unit generates the digest to which the voice is added with a weight that continuously changes and switches.

In an information processing method of an information processing apparatus that generates a digest,
According to the information processing apparatus,
A chapter segment extraction step for extracting a chapter segment representing a predetermined portion of the chapter from each chapter obtained by dividing time series data composed of a plurality of data arranged in time series, and
Of the chapters obtained by classifying the time series data, a feature segment extracting step of extracting the feature segment from chapters having a feature segment representing a characteristic part of the chapter;
An information processing method comprising: generating a digest reflecting a rough content of the time-series data by combining the chapter segments and the feature segments in a time-series order.

Computer
A chapter segment extraction unit for extracting a chapter segment representing a predetermined portion of the chapter from each chapter obtained by dividing time series data composed of a plurality of pieces of data arranged in time series; and
Of each chapter obtained by classifying the time series data, a feature segment extraction unit that extracts the feature segment from a chapter having a feature segment representing a characteristic part of the chapter;
A program for functioning as a generation unit that generates a digest reflecting the rough contents of the time-series data by combining the chapter segments and the feature segments in time-series order.