JP5424306B2

JP5424306B2 - Information processing apparatus and method, program, and recording medium

Info

Publication number: JP5424306B2
Application number: JP2009084935A
Authority: JP
Inventors: 俊司吉村; 裕成岡本; 立也楢原; デュイ・ディン・レー; 真一佐藤
Original assignee: Sony Corp; Inter University Research Institute Corp Research Organization of Information and Systems
Current assignee: Sony Corp; Inter University Research Institute Corp Research Organization of Information and Systems
Priority date: 2009-03-31
Filing date: 2009-03-31
Publication date: 2014-02-26
Anticipated expiration: 2029-03-31
Also published as: JP2010237946A

Description

本発明は情報処理装置および方法、プログラム、並びに記録媒体に関し、特に、コンテンツの分類に応じてコンテンツに対して所定の処理を行う場合において、その処理に最適なコンテンツの分類を識別し、識別した分類に応じた方法でその処理を実行することができるようにした情報処理装置および方法、プログラム、並びに記録媒体に関する。 The present invention relates to an information processing apparatus and method, a program, and a recording medium, and in particular, when performing predetermined processing on content according to content classification, identifies and identifies the optimal content classification for the processing The present invention relates to an information processing apparatus and method, a program, and a recording medium that can execute the processing by a method according to classification.

デジタルテレビジョン放送の普及に伴い電子番組表（ＥＰＧ（Electric Program Guide））の利用も一般的なものとなってきている。ＥＰＧを構成するデータには、番組のタイトルや放送日時の他、番組がニュース番組であるかサッカー番組であるかなどのジャンルを示す情報（以下、ジャンル情報と称する）などの番組の分類を示す情報が含まれている。 With the spread of digital television broadcasting, the use of electronic program guides (EPG (Electric Program Guide)) has become common. In addition to the program title and broadcast date, the data constituting the EPG indicates program classification such as information indicating a genre such as whether the program is a news program or a soccer program (hereinafter referred to as genre information). Contains information.

ところでこのＥＰＧから得られる番組のジャンル情報は、記録再生装置における各種の機能に利用されている。例えば、録画した大量の映像の内容を簡単に把握することができるように、録画した映像から、要約映像を作成して再生するいわゆるダイジェスト再生機能が存在するが、このダイジェスト再生では、その精度を向上させる用途として番組のジャンル情報が利用されている（例えば、特許文献１参照）。 By the way, the program genre information obtained from the EPG is used for various functions in the recording / reproducing apparatus. For example, there is a so-called digest playback function that creates and plays back summary video from recorded video so that the contents of a large amount of recorded video can be easily grasped. The genre information of the program is used for the purpose of improvement (see, for example, Patent Document 1).

特許文献１では、録画された映像から検出された所定の特徴量が、ＥＰＧから取得された番組のジャンル情報に基づいて重み付けされ、その結果に基づいてダイジェスト再生される映像が決定される。
特開２００３−２８３９９３号公報 In Patent Document 1, a predetermined feature amount detected from a recorded video is weighted based on genre information of a program acquired from the EPG, and a video to be digest-reproduced is determined based on the result.
JP 2003-283993 A

しかしながら、ＥＰＧは、主として、視聴者が番組選択するのに便利なように、例えば放送局側において作成されるものである。また、それに示されているジャンル情報も、視聴者の番組選択に対応した分類となっている。従ってその分類が、機器側の処理、例えば、ダイジェスト再生といった処理に必ずしも適していない場合がる。 However, the EPG is mainly created on the broadcast station side, for example, so as to be convenient for the viewer to select a program. The genre information shown therein is also classified according to the viewer's program selection. Therefore, the classification may not necessarily be suitable for processing on the device side, for example, digest playback.

本発明は、このような状況に鑑みてなされたものであり、機器側の処理に最適なコンテンツの分類を識別することができるようにするものである。 The present invention has been made in view of such a situation, and makes it possible to identify the most suitable content classification for processing on the device side.

本発明の一側面の情報処理装置は、複数の第１のコンテンツから、所定枚数のフレームが抽出され、その抽出されたそれぞれのフレームから特徴量が抽出され、その抽出された第１の特徴量から構成される複数次元のベクトルを記憶する記憶手段と、第２のコンテンツから所定枚数のフレームを抽出し、フレーム毎に第２の特徴量を抽出する抽出手段と、前記複数次元のベクトルを構成する複数の前記第１の特徴量のそれぞれと、前記第２のコンテンツから抽出された所定の枚数のフレームのうち、処理対象とされたフレームから抽出された前記第２の特徴量との距離を算出する算出手段と、前記算出手段により前記第２の特徴量毎に算出された距離のうち、最小の距離だけを保持し、その最小の距離から構成される特徴ベクトルを生成するベクトル生成手段と、前記生成手段により生成された前記特徴ベクトルを用いて所定のアルゴリズムに基づく処理を行い、コンテンツを分類するためのパラメータを生成するパラメータ生成手段とを備える。 An information processing apparatus according to an aspect of the present invention extracts a predetermined number of frames from a plurality of first contents, extracts feature amounts from the extracted frames, and extracts the extracted first feature amounts. A storage means for storing a multi-dimensional vector composed of: an extraction means for extracting a predetermined number of frames from the second content and extracting a second feature quantity for each frame; and the multi-dimensional vector to the each of the plurality of the first feature amount, of the frames of the second predetermined number extracted from the content, the distance between the second feature amounts extracted from the frames processed a calculating means for calculating for, among the distances calculated for each amount the second feature by the calculation means, and holds only the minimum distance, and generates a feature vector composed of the minimum distance Comprising a vector generation unit, using the feature vector generated by said generating means performs a process based on a predetermined algorithm and a parameter generating means for generating parameters for classifying the content.

前記抽出手段は、前記第２のコンテンツの所定の部分から、前記第２の特徴量を抽出するようにすることができる。 The extraction unit may extract the second feature amount from a predetermined part of the second content.

前記アルゴリズムは、最急降下法、サポートベクターマシン、バックプロパゲーションのうちのいずれかのアルゴリズムであるようにすることができる。 The algorithm may be one of a steepest descent method, a support vector machine, and backpropagation .

本発明の一側面の情報処理方法は、複数の第１のコンテンツから、所定枚数のフレームが抽出され、その抽出されたそれぞれのフレームから特徴量が抽出され、その抽出された第１の特徴量から構成される複数次元のベクトルを記憶手段に記憶し、第２のコンテンツから所定枚数のフレームを抽出し、フレーム毎に第２の特徴量を抽出し、前記複数次元のベクトルを構成する複数の前記第１の特徴量のそれぞれと、前記第２のコンテンツから抽出された所定の枚数のフレームのうち、処理対象とされたフレームから抽出された前記第２の特徴量との距離を算出し、前記第２の特徴量毎に算出された距離のうち、最小の距離だけを保持し、その最小の距離から構成される特徴ベクトルを生成し、前記生成された前記特徴ベクトルを用いて所定のアルゴリズムに基づく処理を行い、コンテンツを分類するためのパラメータを生成するステップを含む。 According to an information processing method of one aspect of the present invention, a predetermined number of frames are extracted from a plurality of first contents, a feature amount is extracted from each of the extracted frames, and the extracted first feature amount Are stored in the storage means, a predetermined number of frames are extracted from the second content, a second feature amount is extracted for each frame, and a plurality of the plurality of vectors constituting the multi-dimensional vector Calculating a distance between each of the first feature values and the second feature value extracted from a frame to be processed among a predetermined number of frames extracted from the second content; the second among the distances calculated for each feature quantity, retains only the minimum distance, and generates a feature vector composed of the minimum distance, given with reference to the feature vector the generated Performs processing based on the algorithm, comprises the step of generating a parameter for classifying the content.

本発明の一側面のプログラムは、複数の第１のコンテンツから、所定枚数のフレームが抽出され、その抽出されたそれぞれのフレームから特徴量が抽出され、その抽出された第１の特徴量から構成される複数次元のベクトルを記憶手段に記憶し、第２のコンテンツから所定枚数のフレームを抽出し、フレーム毎に第２の特徴量を抽出し、前記複数次元のベクトルを構成する複数の前記第１の特徴量のそれぞれと、前記第２のコンテンツから抽出された所定の枚数のフレームのうち、処理対象とされたフレームから抽出された前記第２の特徴量との距離を算出し、前記第２の特徴量毎に算出された距離のうち、最小の距離だけを保持し、その最小の距離から構成される特徴ベクトルを生成し、前記生成された前記特徴ベクトルを用いて所定のアルゴリズムに基づく処理を行い、コンテンツを分類するためのパラメータを生成するステップを含む処理を実行させるコンピュータが読み取り可能なプログラム。 A program according to an aspect of the present invention is configured by extracting a predetermined number of frames from a plurality of first contents, extracting feature amounts from the extracted frames, and configuring the extracted first feature amounts. A plurality of dimensional vectors stored in the storage means, a predetermined number of frames are extracted from the second content, a second feature amount is extracted for each frame, and a plurality of the second dimensional vectors constituting the multi-dimensional vector are extracted. Calculating a distance between each of the one feature quantity and the second feature quantity extracted from a frame to be processed among a predetermined number of frames extracted from the second content; of the distances calculated every two feature amounts, only holds the minimum distance, and generates a feature vector composed of the minimum distance, given a using the feature vectors the generated It performs processing based on Gorizumu, computer readable program for executing a process including the step of generating a parameter for classifying the content.

本発明の一側面の記録媒体は、前記プログラムを記録している。 A recording medium according to one aspect of the present invention records the program.

本発明の一側面の情報処理装置および方法、並びにプログラムにおいては、複数の第１のコンテンツから、所定枚数のフレームが抽出され、その抽出されたそれぞれのフレームから特徴量が抽出され、その抽出された第１の特徴量から構成される複数次元のベクトルが記憶され、第２のコンテンツから所定枚数のフレームが抽出され、フレーム毎に第２の特徴量が抽出され、複数次元のベクトルを構成する複数の第１の特徴量のそれぞれと、第２のコンテンツから抽出された所定の枚数のフレームのうち、処理対象とされたフレームから抽出された第２の特徴量との距離が算出され、第２の特徴量毎に算出された距離のうち、最小の距離だけが保持され、その最小の距離から構成される特徴ベクトルが生成される。 In the information processing apparatus, method, and program according to one aspect of the present invention , a predetermined number of frames are extracted from a plurality of first contents, and feature amounts are extracted from the extracted frames. A multi-dimensional vector composed of the first feature amount is stored , a predetermined number of frames are extracted from the second content, and a second feature amount is extracted for each frame to form a multi-dimensional vector. A distance between each of the plurality of first feature values and a second feature value extracted from a frame to be processed among a predetermined number of frames extracted from the second content is calculated . Of the distances calculated for each of the two feature amounts , only the minimum distance is retained, and a feature vector composed of the minimum distance is generated.

本発明の一側面によれば、実行される所定の処理に最適なコンテンツの分類を識別することができ、その分類に応じた方法で所定の処理をコンテンツに対して実行することができる。 According to one aspect of the present invention, it is possible to identify a content classification that is optimal for a predetermined process to be executed, and to execute the predetermined process on the content by a method according to the classification.

本発明を適用した記録再生装置の一実施の形態の構成を示す図である。It is a figure which shows the structure of one Embodiment of the recording / reproducing apparatus to which this invention is applied. 教師データの例を示す図である。It is a figure which shows the example of teacher data. 他の教師データの例を示す図である。It is a figure which shows the example of other teacher data. 本発明を適用した学習器の一実施の形態の構成を示す図である。It is a figure which shows the structure of one Embodiment of the learning device to which this invention is applied. リファレンスデータの取得に関する処理について説明するフローチャートである。It is a flowchart explaining the process regarding acquisition of reference data. 識別パラメータの生成に関する処理について説明するフローチャートである。It is a flowchart explaining the process regarding the production | generation of an identification parameter. 特徴ベクトルの生成について説明する図である。It is a figure explaining the production | generation of a feature vector. 分類識別処理について説明するフローチャートである。It is a flowchart explaining a classification identification process. チャプタ情報検出対象の番組の例を示す図である。It is a figure which shows the example of the program of chapter information detection object. チャプタ情報検出対象の他の番組の例を示す図である。It is a figure which shows the example of the other program of chapter information detection object. チャプタ情報検出対象の他の番組の例を示す図である。It is a figure which shows the example of the other program of chapter information detection object. パーソナルコンピュータの構成例を示すブロック図である。And FIG. 16 is a block diagram illustrating a configuration example of a personal computer.

以下に、本発明の実施の形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

［記録再生装置の構成について］
図１は、本発明を適用した記録再生装置１の構成例を示している。この記録再生装置１は、受信されたデジタルテレビジョン放送の番組を録画し、録画した番組をダイジェスト再生することができる機能を有している。記録再生装置１は、ダイジェスト再生を行うにあたり、チャプタの区切り点を検出するとともに、ダイジェスト再生において再生される映像を選択するための優先度を表すスコアを付与する。 [Configuration of recording / playback apparatus]
FIG. 1 shows a configuration example of a recording / reproducing apparatus 1 to which the present invention is applied. The recording / playback apparatus 1 has a function of recording a received digital television broadcast program and performing digest playback of the recorded program. When performing the digest playback, the recording / playback apparatus 1 detects a chapter break point and assigns a score indicating a priority for selecting a video to be played back in the digest playback.

そのチャプタ区切り点およびスコアの付与は、後述するように、チャプタ区切り点およびスコア（以下、適宜、チャプタ情報と称する）の検出に適した番組の分類を識別し、識別したその番組の分類に応じた方法で、チャプタ情報を検出する。 As will be described later, the chapter breakpoints and scores are assigned by identifying the program classification suitable for detecting chapter breakpoints and scores (hereinafter referred to as chapter information as appropriate), and according to the identified program classification. Chapter information is detected by the method described above.

データ分離部１１には、図示せぬ受信部から供給された、その受信部により受信された、例えばデジタルテレビジョン放送波のデジタルデータが入力される。データ分離部１１は、入力されたデジタルデータをＥＰＧ（電子番組ガイド）データ、オーディオデータ、ビデオデータに分離する。以下、適宜、オーディオデータとビデオデータをまとめてＡＶデータと称する。 For example, digital data of a digital television broadcast wave received by the receiving unit supplied from a receiving unit (not shown) is input to the data separating unit 11. The data separation unit 11 separates the input digital data into EPG (electronic program guide) data, audio data, and video data. Hereinafter, audio data and video data are collectively referred to as AV data as appropriate.

データ分離部１１は、分離したＥＰＧデータを保持部２２に供給して保持させ、分離したＡＶデータを、入力制御部１２に供給する。入力制御部１２は、受信された放送番組の録画が行われるとき、データ分離部１１から供給されたＡＶデータを、保持部２０に供給して保持させる。入力制御部１２はまた、データ分離部１１から供給されたＡＶデータを、チャプタ情報検出対象としてデコーダ１３に供給する。 The data separation unit 11 supplies the separated EPG data to the holding unit 22 to hold it, and supplies the separated AV data to the input control unit 12. When the received broadcast program is recorded, the input control unit 12 supplies the AV data supplied from the data separation unit 11 to the holding unit 20 for holding. The input control unit 12 also supplies the AV data supplied from the data separation unit 11 to the decoder 13 as a chapter information detection target.

デコーダ１３は、入力制御部１２から供給されたチャプタ情報の検出対象としてのＡＶデータ、または保持部２０からチャプタ情報の検出対象として読み出したＡＶデータを、オーディオデータとビデオデータに分離し、オーディオデータをオーディオ特徴量抽出部１４に、ビデオデータをビデオ特徴量抽出部１５に、それぞれ供給する。 The decoder 13 separates the AV data as the detection target of the chapter information supplied from the input control unit 12 or the AV data read out as the detection target of the chapter information from the holding unit 20 into audio data and video data. Are supplied to the audio feature amount extraction unit 14 and the video data are supplied to the video feature amount extraction unit 15, respectively.

オーディオ特徴量抽出部１４は、デコーダ１３から供給されたオーディオデータから、音量、周波数スペクトラム、左右チャンネル相関値などを、オーディオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。 The audio feature amount extraction unit 14 extracts volume, frequency spectrum, left and right channel correlation values, and the like from the audio data supplied from the decoder 13 as audio feature amounts, and outputs them to the feature vector generation unit 16 and the chapter information detection unit 18. Supply.

ビデオ特徴量抽出部１５は、デコーダ１３から供給されたビデオデータから、色ヒストグラム、色モーメント、差分画像、縮小画像などを、ビデオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。なお、特徴量の抽出対象としてフレーム画像が使用される場合、フレームを小領域に分割し、それぞれの領域の特徴量を連結することで、1枚のフレームの特徴量とすることも可能である。特に、色ヒストグラム、色モーメントなど、それ自体では位置や形状の情報を持たない特徴量を使用するときに有効である。 The video feature amount extraction unit 15 extracts a color histogram, a color moment, a difference image, a reduced image, and the like from the video data supplied from the decoder 13 as video feature amounts, and a feature vector generation unit 16 and a chapter information detection unit. 18 is supplied. When a frame image is used as a feature quantity extraction target, it is possible to divide the frame into small areas and connect the feature quantities of each area to obtain a feature quantity of one frame. . In particular, it is effective when using feature quantities that do not have position or shape information by themselves, such as color histograms and color moments.

特徴ベクトル生成部１６は、オーディオ特徴量抽出部１４またはビデオ特徴量抽出部１５から供給された特徴量の中から、識別部１７において、チャプタ情報が付される番組の分類を識別するのに用いる所定の特徴量を選択し、選択した特徴量を要素とするベクトル（以下、特徴ベクトルと称する）を生成する。特徴ベクトル生成部１６は、生成した特徴ベクトルを、識別部１７に供給する。 The feature vector generation unit 16 is used by the identification unit 17 to identify the classification of the program to which the chapter information is attached from the feature amounts supplied from the audio feature amount extraction unit 14 or the video feature amount extraction unit 15. A predetermined feature amount is selected, and a vector having the selected feature amount as an element (hereinafter referred to as a feature vector) is generated. The feature vector generation unit 16 supplies the generated feature vector to the identification unit 17.

識別部１７は、特徴ベクトル生成部１６から供給された特徴ベクトルに基づいて、番組の分類（いまの例の場合、チャプタ情報を検出するのに適した番組の分類）を識別する。例えば識別部１７は、線形識別器、非線形識別器、またはニューラルネットなどの識別器で構成され、特徴ベクトルを構成する各要素を、学習器（後述する）により設定された識別パラメータに基づいて生成した直線や曲線等で分割した所定の特徴空間に配置し、配置した各要素の分布が属する特徴空間の分割領域に基づいて、番組の分類を識別する。 Based on the feature vector supplied from the feature vector generation unit 16, the identification unit 17 identifies a program classification (in the present example, a program classification suitable for detecting chapter information). For example, the discriminating unit 17 includes a discriminator such as a linear discriminator, a nonlinear discriminator, or a neural network, and generates each element constituting the feature vector based on the discriminating parameter set by the learning device (described later). The program classification is identified based on the divided area of the feature space to which the distribution of the arranged elements belongs.

識別部１７は、識別結果として、番組の分類を示す情報（以下、分類情報と称する）を、チャプタ情報検出部１８に供給する。チャプタ情報検出部１８は、識別部１７から供給された分類情報が示す番組の分類に応じて、チャプタ情報を検出し、保持部１９に供給して保持させる。 The identification unit 17 supplies information indicating the classification of the program (hereinafter referred to as classification information) to the chapter information detection unit 18 as an identification result. The chapter information detection unit 18 detects chapter information according to the program classification indicated by the classification information supplied from the identification unit 17, supplies it to the holding unit 19, and holds it.

例えばチャプタ情報検出部１８は、オーティオ特徴量抽出部１４またはビデオ特徴量抽出部１５から供給される特徴量の中から、番組の分類に応じた特徴量を選択するとともに、番組の分類に応じた演算処理を実行する。 For example, the chapter information detection unit 18 selects a feature amount according to the program classification from the feature amounts supplied from the audio feature amount extraction unit 14 or the video feature amount extraction unit 15 and also according to the program classification. Perform arithmetic processing.

すなわちこの場合、チャプタ情報検出部１８は、番組の分類に応じた特徴量の選択および演算を実行するための実行データ（例えば、パラメータやアルゴリズムを含むプログラム）を、番組の分類毎に保持しており、番組の分類に応じた実行データを選択して実行することにより、チャプタ情報を検出する。 That is, in this case, the chapter information detection unit 18 holds execution data (for example, a program including parameters and algorithms) for executing selection and calculation of feature amounts according to the program classification for each program classification. The chapter information is detected by selecting and executing the execution data corresponding to the classification of the program.

再生部２１は、保持部２０に保持されているＡＶデータを読み出して、通常の再生やダイジェスト再生を行う。ダイジェスト再生を行う際は、再生部２１は、保持部１９に保持されているチャプタ情報に基づき、チャプタに付与されたスコアが一定の値以上である映像区間を、保持部２０から適宜読み出して再生する。すなわちチャプタ情報に基づいて映像が間引かれて再生される。 The playback unit 21 reads the AV data held in the holding unit 20 and performs normal playback and digest playback. When performing digest reproduction, the reproduction unit 21 appropriately reads out and reproduces video sections in which the score assigned to the chapter is equal to or greater than a certain value based on the chapter information held in the holding unit 19. To do. That is, the video is thinned and reproduced based on the chapter information.

サッカーの試合を放送する番組では、図２に示すように、通常、グランドの映像を含むシーンが多く放送されるので、サッカーの試合を放送する番組からは、例えば緑色の頻度が高い色ヒストグラムが多く（例えば、連続して）得られる。 In a program that broadcasts a soccer game, as shown in FIG. 2, many scenes including a video of the ground are normally broadcasted. Therefore, from a program that broadcasts a soccer game, for example, a color histogram with a high green frequency is displayed. Many (eg continuously) are obtained.

そこで詳細は後述する学習器が行う学習においては、図２に示すようなグランドの映像を含むシーンから構成されるサッカーの試合を放送する番組を、分類が「サッカー番組」の教師データとし、その教師データから得られるフレーム毎の色ヒストグラムの特徴ベクトルが抽出される。 Therefore, in learning performed by a learning device, which will be described in detail later, a program that broadcasts a soccer game composed of a scene including a ground image as shown in FIG. 2 is set as teacher data whose classification is “soccer program”. A feature vector of the color histogram for each frame obtained from the teacher data is extracted.

そして、学習器は、分類が「サッカー番組」であると識別することができるような識別パラメータを生成する。すなわち、例えば色ヒストグラムから得られた緑色の頻度の特徴空間上の分布が、「サッカー番組」の領域内となるように特徴空間を分割する直線等を生成するための識別パラメータを生成する。そのように設定された識別パラメータが、識別部１７に設定されている。 Then, the learning device generates an identification parameter capable of identifying that the classification is “soccer program”. That is, for example, an identification parameter is generated for generating a straight line or the like that divides the feature space so that the distribution of the green frequency obtained from the color histogram in the feature space is within the region of “soccer program”. The identification parameter set in such a manner is set in the identification unit 17.

また、事件や出来事を報道する番組は、図３に示すように、人物とスタジオの映像を含むシーンが多く放送されるので、事件や出来事を報道する番組からは、人物とスタジオの特有の色の頻度が高い色ヒストグラムが多く得られる。 In addition, as shown in Fig. 3, programs that report incidents and events broadcast many scenes that include images of people and studios. A large number of color histograms with high frequency are obtained.

そこで詳細は後述する学習器が行う学習においては、図３に示すような人物とスタジオの映像を含むシーンから構成される事件や出来事を報道する番組を、分類が「ニュース番組」の教師データとし、その教師データから得られるフレーム毎の色ヒストグラムの特徴ベクトルが抽出される。 Therefore, in learning performed by a learning device, which will be described in detail later, a program that reports an incident or event composed of a scene including a person and a studio video as shown in FIG. Then, the feature vector of the color histogram for each frame obtained from the teacher data is extracted.

そして学習器は、分類が「ニュース番組」であると識別することができるような識別パラメータを生成する。すなわち、例えば特定の色の頻度の特徴空間上の分布が、「ニュース番組」の領域内となるように特徴空間を分割する直線等を生成するための識別パラメータを生成する。そのように設定された識別パラメータが、識別部１７に設定されている。 Then, the learning device generates an identification parameter capable of identifying that the classification is “news program”. That is, for example, an identification parameter for generating a straight line or the like that divides the feature space so that the distribution of the frequency of a specific color on the feature space is within the area of the “news program” is generated. The identification parameter set in such a manner is set in the identification unit 17.

図１の説明に戻り制御部４１は、記録再生装置１全体を制御し、番組録画、通常再生、ダイジェスト再生等の処理を実行させる。 Returning to the description of FIG. 1, the control unit 41 controls the entire recording / reproducing apparatus 1 to execute processing such as program recording, normal reproduction, and digest reproduction.

［学習器の構成について］
図４は、本発明を適用した学習器の一実施の形態の構成を示す図である。図４に示した学習器１００は、入力制御部１１１、デコーダ１１２、フレーム抽出部１１３、ビデオ特徴量抽出部１１４、リファレンスデータ記憶部１１５、距離算出部１１６、最小距離保持部１１７、学習アルゴリズム処理部１１８、識別パラメータ保持部１１９、ドライブ１２０、通信部１２１を含む構成とされている。 [About the configuration of the learning device]
FIG. 4 is a diagram showing a configuration of an embodiment of a learning device to which the present invention is applied. The learning device 100 illustrated in FIG. 4 includes an input control unit 111, a decoder 112, a frame extraction unit 113, a video feature amount extraction unit 114, a reference data storage unit 115, a distance calculation unit 116, a minimum distance holding unit 117, and a learning algorithm process. Unit 118, identification parameter holding unit 119, drive 120, and communication unit 121.

入力制御部１１１は、外部から入力されるビデオデータの入力を制御する。ここでは、ビデオデータが入力され、そのビデオデータから識別パラメータが生成される例を挙げて説明をする。そのため、学習器１００にはビデオデータが入力されるとして説明を続ける。また、図４に示した学習器１００の構成も、ビデオストリームを処理する構成を示している。しかしながら、例えば、オーディオデータが学習器１００に入力され、そのオーディオデータから識別パラメータが生成されるようにしてもよい。そのようにした場合、学習器１００は、オーディオデータの入力を制御し、オーディオデータから識別パラメータを生成する構成とされる。 The input control unit 111 controls input of video data input from the outside. Here, an example in which video data is input and an identification parameter is generated from the video data will be described. Therefore, the description will be continued assuming that video data is input to the learning device 100. Also, the configuration of the learning device 100 shown in FIG. 4 shows a configuration for processing a video stream. However, for example, audio data may be input to the learning device 100, and an identification parameter may be generated from the audio data. In such a case, the learning device 100 is configured to control input of audio data and generate an identification parameter from the audio data.

デコーダ１１２は、入力制御部１１１により入力が制御されたビデオデータをデコード（Decode）する。ビデオデータが何らかの符号化されている場合、その符号化に対応する復号が、デコード１１２において実行される。デコードされたビデオデータは、フレーム抽出部１１３に供給される。 The decoder 112 decodes the video data whose input is controlled by the input control unit 111. If the video data is encoded in some way, decoding corresponding to the encoding is performed in the decoding 112. The decoded video data is supplied to the frame extraction unit 113.

フレーム抽出部１１３は、デコードされたビデオデータから、所定の条件に基づいて、フレームを抽出する。デコードされたビデオデータの全てのフレームを処理対処としても良いが、そのようにすると、処理負担の増大や、処理時間の増大につながるため、処理負担、処理時間の軽減をはかるために、ここでは、所定の条件に基づいて、所定枚数のフレームが処理対象として抽出されるとして説明を続ける。なお、所定の条件などについては、図５のフローチャートを参照した説明で明らかにする。 The frame extraction unit 113 extracts frames from the decoded video data based on a predetermined condition. Although it is possible to treat all frames of the decoded video data as processing, doing so will lead to an increase in processing load and an increase in processing time, so in order to reduce the processing load and processing time, here The description will be continued assuming that a predetermined number of frames are extracted as processing targets based on a predetermined condition. Note that the predetermined conditions and the like will be clarified in the description with reference to the flowchart of FIG.

フレーム抽出部１１３により抽出されたフレームは処理対象のフレームとして、ビデオ特徴量抽出部１１４に供給される。ビデオ特徴量抽出部１１４は、供給されたフレームから特徴量を抽出する。 The frame extracted by the frame extraction unit 113 is supplied to the video feature amount extraction unit 114 as a frame to be processed. The video feature amount extraction unit 114 extracts feature amounts from the supplied frames.

デコーダ１１２は、記録再生装置１のデコーダ１３（図１）と同様の処理を実行し、ビデオ特徴量抽出部１１４は、記録再生装置１のビデオ特徴量抽出部１５（図１）と同様の処理を実行する。よって、ビデオ特徴量抽出部１５が、上記したようにフレームから色ヒストグラム、色モーメント、差分画像、縮小画像などを、ビデオの特徴量として抽出する場合、ビデオ特徴量抽出部１１４も、フレームから色ヒストグラム、差分画像、縮小画像などを、ビデオの特徴量として抽出する。 The decoder 112 executes the same processing as that of the decoder 13 (FIG. 1) of the recording / playback apparatus 1, and the video feature amount extraction unit 114 performs the same processing as that of the video feature amount extraction unit 15 (FIG. 1) of the recording / playback device 1. Execute. Therefore, when the video feature quantity extraction unit 15 extracts a color histogram, a color moment, a difference image, a reduced image, and the like from the frame as described above as a video feature quantity, the video feature quantity extraction unit 114 also selects the color from the frame. Histograms, difference images, reduced images, and the like are extracted as video feature amounts.

ビデオ特徴量抽出部１１４からのビデオ特徴量は、リファレンスデータ記憶部１１５または距離算出部１１６に供給される。ビデオ特徴量が、リファレンスデータとして用いられる場合、リファレンスデータ記憶部１１５に供給され、記憶される。一方、ビデオ特徴量が、教師データとしてのビデオデータから抽出され、リファレンスデータとの比較対象とされるデータである場合、距離算出部１１６に供給される。 The video feature amount from the video feature amount extraction unit 114 is supplied to the reference data storage unit 115 or the distance calculation unit 116. When the video feature amount is used as reference data, it is supplied to and stored in the reference data storage unit 115. On the other hand, when the video feature amount is extracted from the video data as the teacher data and is the data to be compared with the reference data, the video feature amount is supplied to the distance calculation unit 116.

リファレンスデータ記憶部１１５は、リファレンスデータを記憶する。このリファレンスデータ記憶部１１５は、識別パラメータを生成するために、予め比較対象とされるデータとして記憶されているデータである。 The reference data storage unit 115 stores reference data. This reference data storage unit 115 is data stored in advance as data to be compared in order to generate an identification parameter.

リファレンスデータ記憶部１１５に記憶されるリファレンスデータは、入力制御部１１１により入力が制御されたビデオデータから作成されたデータでも良いし、予め他の装置などで作成されたデータであっても良い。他の装置で作成されたデータである場合、例えば、リムーバブルディスク１４１に記憶されて配布されるようにしても良い。そのような場合、ドライブ１２０に、そのリムーバブルディスク１４１がセットされ、そのセットされたリムーバブルディスク１４１から読み出されることで、リファレンスデータ記憶部１１５にリファレンスデータが供給され、記憶される。 The reference data stored in the reference data storage unit 115 may be data created from video data whose input is controlled by the input control unit 111, or data created in advance by another device or the like. In the case of data created by another device, for example, it may be stored in the removable disk 141 and distributed. In such a case, the removable disk 141 is set in the drive 120, and the reference data is supplied to and stored in the reference data storage unit 115 by being read from the set removable disk 141.

また、ネットワークを介してリファレンスデータが配信されるようにしても良い。ネットワークを介してリファレンスデータが配信される場合、通信部１２１により、配信されたリファレンスデータが受信され、その受信されたリファレンスデータが、リファレンスデータ記憶部１１５に供給されることで、記憶される。 Further, reference data may be distributed via a network. When the reference data is distributed via the network, the distributed reference data is received by the communication unit 121, and the received reference data is supplied to the reference data storage unit 115 to be stored.

このように、リファレンスデータは、学習器１００で生成され、記憶されるようにしても良いし、リムーバブルディスク１４１などの記録媒体を介して、供給され、記憶されるようにしても良いし、ネットワークを介して供給され、記憶されるようにしても良い。 In this way, the reference data may be generated and stored by the learning device 100, supplied via a recording medium such as the removable disk 141, and stored, or may be stored in a network. It is also possible to be supplied and stored via

なお、学習器１００は、記録再生装置１に備えることも可能である。学習器１００が記録再生装置１に備えられ、学習器１００自体でリファレンスデータを生成するように構成された場合、記録再生装置１に入力されたビデオデータからリファレンスデータを生成することができる。また、ネットワークを介して配信されるように構成した場合や、記録媒体を用いて配信されるように構成した場合、リファレンスデータを更新することが容易にできるようなる。 The learning device 100 can also be provided in the recording / reproducing apparatus 1. When the learning device 100 is provided in the recording / reproducing device 1 and the learning device 100 itself is configured to generate the reference data, the reference data can be generated from the video data input to the recording / reproducing device 1. Also, when configured to be distributed via a network, or configured to be distributed using a recording medium, the reference data can be easily updated.

図４に示した学習器１００の説明に戻り、距離算出部１１６は、リファレンスデータ記憶部１１５に記憶されているリファレンスデータと、ビデオ特徴量抽出部１１４からの処理対象とされているフレームとの距離を算出する。ここでは、リファレンスデータとしての特徴量と、フレームから抽出された特徴量とが用いられて距離が算出される。 Returning to the description of the learning device 100 shown in FIG. 4, the distance calculation unit 116 calculates the reference data stored in the reference data storage unit 115 and the frame to be processed from the video feature amount extraction unit 114. Calculate the distance. Here, the distance is calculated using the feature quantity as reference data and the feature quantity extracted from the frame.

距離算出部１１６で算出された距離（距離のデータ）は、最小距離保持部１１７に供給される。最小距離保持部１１７は、処理対象とされた複数のフレームのうち、リファレンスデータとの距離が最小の距離であったフレームとの距離を保持する。例えば、リファレンスデータが、３０フレーム分の特徴量から構成されている場合、それぞれのフレームとの最小距離が、最小距離保持部１１７に保持されるため、３０個の最小距離が保持される。このようにして保持された、例えば、３０個の最小距離が、特徴ベクトルとして、学習アルゴリズム処理部１１８に供給される。 The distance (distance data) calculated by the distance calculation unit 116 is supplied to the minimum distance holding unit 117. The minimum distance holding unit 117 holds a distance from a frame whose distance from the reference data is the minimum distance among a plurality of frames to be processed. For example, when the reference data is composed of feature amounts for 30 frames, the minimum distance to each frame is held in the minimum distance holding unit 117, and therefore 30 minimum distances are held. For example, the 30 minimum distances held in this way are supplied to the learning algorithm processing unit 118 as feature vectors.

学習アルゴリズム処理部１１８は、所定のアルゴリズムに基づき、供給された特徴量ベクトルを用いて、識別パラメータを生成する。この生成された識別パラメータは、識別パラメータ保持部１１９に供給され、保持される。 The learning algorithm processing unit 118 generates an identification parameter using the supplied feature vector based on a predetermined algorithm. The generated identification parameter is supplied to the identification parameter holding unit 119 and held.

識別パラメータ保持部１１９に保持された識別パラメータは、記録再生装置１の識別部１７（図１）に供給され、保持される。例えば、ドライブ１２０に、リムーバブルディスク１４１がセットされ、そのセットされたリムーバブルディスク１４１に、識別パラメータ保持部１１９に保持されている識別パラメータが書き込まれる。そして、その識別パラメータが書き込まれたリムーバブルディスク１４１が、記録再生装置１にセットされることにより、識別パラメータが、識別部１７に供給される。 The identification parameter held in the identification parameter holding unit 119 is supplied to and held in the identification unit 17 (FIG. 1) of the recording / reproducing apparatus 1. For example, the removable disk 141 is set in the drive 120, and the identification parameter held in the identification parameter holding unit 119 is written in the set removable disk 141. Then, the removable disk 141 in which the identification parameter is written is set in the recording / reproducing apparatus 1, whereby the identification parameter is supplied to the identification unit 17.

また、ネットワークを介して識別パラメータが配信されるようにしても良い。この場合、識別パラメータ保持部１１９に保持されている識別パラメータが、通信部１２１に読み出され、通信部１２１の制御の基、記録再生装置１の識別部１７に供給される。 Further, the identification parameter may be distributed via a network. In this case, the identification parameter held in the identification parameter holding unit 119 is read by the communication unit 121 and supplied to the identification unit 17 of the recording / reproducing apparatus 1 under the control of the communication unit 121.

このような構成を有する学習器１００の学習について、以下に説明する。 Learning of the learning device 100 having such a configuration will be described below.

［リファレンスデータの取得について］
まず、リファレンスデータ記憶部１１５に記憶されるリファレンスデータの取得に関する処理について説明する。リファレンスデータは、識別パラメータを生成する前の時点で、リファレンスデータ記憶部１１５に記憶されている必要がある。そこで、識別パラメータの生成に関する説明の前に、図５のフローチャートを参照し、学習器１００でリファレンスデータを作成するときの処理について説明する。 [About obtaining reference data]
First, processing related to acquisition of reference data stored in the reference data storage unit 115 will be described. The reference data needs to be stored in the reference data storage unit 115 before the identification parameter is generated. Therefore, before explaining the generation of the identification parameter, the processing when the learning device 100 creates reference data will be described with reference to the flowchart of FIG.

ステップＳ１０１において、入力制御部１１１は、ビデオストリームを取得する。この取得されるビデオストリームは、カテゴリが予めわかっている番組のビデオストリームである。カテゴリとは、例えば、“ニュース”とか、“バラエティ”といった、番組が属するジャンルなどであり、分類に関する情報である。分類とは、上記した説明において、識別部１７が、“特徴ベクトル生成部１６から供給された特徴ベクトルに基づいて番組の分類を識別する”際の“分類”である。 In step S101, the input control unit 111 acquires a video stream. This acquired video stream is a video stream of a program whose category is known in advance. A category is, for example, a genre to which a program belongs, such as “news” or “variety”, and is information related to classification. In the above description, the classification is the “classification” when the identification unit 17 “identifies the classification of the program based on the feature vector supplied from the feature vector generation unit 16”.

なお、この分類に関する情報、すなわちカテゴリは、詳細な分類の基、割り振られたカテゴリであることが望ましい。例えば、スポーツというカテゴリも、詳細に分類し、“スポーツ中継”、“スポーツニュース”、“スポーツに関するバラエティ”といったようなカテゴリであることが好ましい。 It should be noted that the information related to this classification, that is, the category, is preferably a category assigned based on a detailed classification. For example, the category of sports is also classified in detail, and is preferably a category such as “sports relay”, “sports news”, “sports variety”.

ＥＰＧには、番組に関する情報として、カテゴリに関する情報が記載されているが、その記載されている情報は、例えば、“スポーツ”といった大雑把な情報であることが多い。このような情報に基づいて、例えば、チャプタ情報を検出すると、適切なチャプタ情報を検出できなことがある。換言すれば、“スポーツ中継”の番組と“スポーツニュース”の番組とでは、チャプタ情報を検出するとき、同じアルゴリズムではなく、異なるアルゴリズムで検出した方が、それぞれの番組に適したチャプタ情報を検出できる。 In the EPG, information related to a category is described as information related to a program, and the described information is often rough information such as “sports”. For example, if chapter information is detected based on such information, it may not be possible to detect appropriate chapter information. In other words, when detecting chapter information for a “Sports Broadcast” program and a “Sports News” program, the chapter information that is more suitable for each program is detected if it is detected by a different algorithm rather than the same algorithm. it can.

このようなことを考慮したチャプタ情報が検出できるように、学習器１００は学習を行う。よって、入力制御部１１１に入力されるビデオデータに関するカテゴリも、詳細に分類された結果のカテゴリ（その情報）であることが好ましく、以下の説明においては、そのような情報が入力されるとして説明を続ける。 The learning device 100 performs learning so that chapter information in consideration of such a situation can be detected. Therefore, it is preferable that the category related to the video data input to the input control unit 111 is also a category (information thereof) as a result of detailed classification. In the following description, it is assumed that such information is input. Continue.

ステップＳ１０１において、入力制御部１１１によりビデオストリームが取得されると、ステップＳ１０２において、デコード１１２は、ビデオデータをデコードし、フレームを生成する。さらに、フレーム抽出部１１３は、処理対象とするフレームを抽出する。処理対象とするフレームとは、換言すれば、リファレンスデータとして用いるリファレンスフレームである。 In step S101, when a video stream is acquired by the input control unit 111, in step S102, the decode 112 decodes the video data and generates a frame. Further, the frame extraction unit 113 extracts a frame to be processed. In other words, the frame to be processed is a reference frame used as reference data.

ビデオデータから生成される全てのフレームを、リファレンスフレームとすると、後述する特徴ベクトル、そしてその特徴ベクトルから生成される識別パラメータを、それぞれ生成する時の処理などの負担が増大してしまう。このようなことを考慮し、カテゴリに含まれる全てのフレームから、所定の規則に基づき、複数のフレームが抽出されるようにする。所定の規則とは、例えば、ランダムに抽出する、所定の間隔（所定の時間間隔、所定のフレームの枚数での間隔）で抽出する、クラスタリング手法に基づき抽出するなどの規則である。 If all the frames generated from the video data are set as reference frames, the burden of processing when generating a feature vector described later and an identification parameter generated from the feature vector will be increased. Considering this, a plurality of frames are extracted from all frames included in the category based on a predetermined rule. The predetermined rule is, for example, a rule such as random extraction, extraction at a predetermined interval (predetermined time interval, predetermined number of frames), or extraction based on a clustering method.

また、クラスタリング手法でリファレンスフレームを抽出する場合、例えば、後述するフレームの特徴ベクトルを用いてクラスタリングを行い、構成要素数の多い順に所定数のクラスタを選択した後、各クラスタの重心に近いフレームを選択するなどの手法が考えられる。また、リファレンスフレームは一度選択されたら、そのフレームが用いられ、変更されないようにすることが好ましい。 In addition, when extracting a reference frame by a clustering method, for example, clustering is performed using a feature vector of a frame, which will be described later, and after selecting a predetermined number of clusters in descending order of the number of components, a frame close to the center of gravity of each cluster A method such as selection is conceivable. Moreover, it is preferable that once a reference frame is selected, that frame is used and not changed.

ステップＳ１０２において、フレーム抽出部１１３より抽出されたフレームは、ビデオ特徴量抽出部１１４に供給される。ステップＳ１０３において、ビデオ特徴量抽出部１１４は、供給されたフレーム（画像）から特徴量を抽出する。特徴量としては、例えば、色ヒストグラム、色モーメント、差分画像、縮小画像などである。ビデオ特徴量抽出部１１４により抽出された特徴量は、リファレンスデータ記憶部１１５に供給される。 In step S102, the frame extracted by the frame extraction unit 113 is supplied to the video feature amount extraction unit 114. In step S103, the video feature amount extraction unit 114 extracts a feature amount from the supplied frame (image). Examples of the feature amount include a color histogram, a color moment, a difference image, and a reduced image. The feature amount extracted by the video feature amount extraction unit 114 is supplied to the reference data storage unit 115.

ステップＳ１０４において、リファレンスデータ記憶部１１５は、ビデオ特徴量抽出部１１４で抽出された特徴量を、リファレンスデータとして記憶する。 In step S104, the reference data storage unit 115 stores the feature amount extracted by the video feature amount extraction unit 114 as reference data.

このようにして、１つのカテゴリにつき、複数のフレームからリファレンスデータが抽出される。例えば、１つのカテゴリから、Ｎ１枚のフレームが抽出された場合、Ｎ１枚分のリファレンスデータ（特徴量）が、リファレンスデータ記憶部１１５に、そのカテゴリのリファレンスデータとして記憶される。 In this way, reference data is extracted from a plurality of frames for one category. For example, when N1 frames are extracted from one category, N1 pieces of reference data (features) are stored in the reference data storage unit 115 as reference data of the category.

複数のカテゴリから、同じように、リファレンスデータが抽出されるため、例えば、Ｍ個のカテゴリからリファレンスデータが抽出された場合、Ｍ個のカテゴリ分のリファレンスデータが、リファレンスデータ記憶部１１５に記憶される。 Since reference data is similarly extracted from a plurality of categories, for example, when reference data is extracted from M categories, reference data for M categories is stored in the reference data storage unit 115. The

なお、このリファレンスデータ記憶部１１５に記憶されるリファレンスデータでは、上記したように、他の装置で生成され、ネットワークを介して供給されたり、記録媒体に記録されて供給されたりしても良い。このような場合も、他の装置では、上記した処理と同様の処理が実行されることで、リファレンスデータが生成される。 Note that the reference data stored in the reference data storage unit 115 may be generated by another device and supplied via a network or recorded on a recording medium and supplied as described above. Also in such a case, in other apparatuses, the same processing as the processing described above is executed, so that the reference data is generated.

［識別パラメータの生成について］
このようにして、リファレンスデータ記憶部１１５にリファレンスデータが記憶されているとき、学習器１００は、学習、すなわちこの場合、識別パラメータの作成を行う。図６のフローチャートを参照し、識別パラメータの生成について説明する。 [Generation of identification parameters]
Thus, when the reference data is stored in the reference data storage unit 115, the learning device 100 performs learning, that is, in this case, creates an identification parameter. The generation of the identification parameter will be described with reference to the flowchart of FIG.

ステップＳ１５１において、ビデオストリームが取得される。このビデオストリームは、教師データとされ、所定のカテゴリに属し、そのカテゴリは、リファレンスデータの生成時と同じく、細かなカテゴリに分類されている。ビデオストリームが取得されるとき、そのビデオストリームが属するカテゴリの情報も取得される。 In step S151, a video stream is acquired. This video stream is used as teacher data and belongs to a predetermined category, and the category is classified into fine categories as in the case of generating reference data. When a video stream is acquired, information on the category to which the video stream belongs is also acquired.

次に、ステップＳ１５２において、フレームが抽出される。デコーダ１１２は、入力制御部１１１により入力が制御されたビデオストリームをデコードする。そのデコードされたフレームのうちの所定の枚数のフレームが、フレーム抽出部１１３により抽出される。 Next, in step S152, a frame is extracted. The decoder 112 decodes the video stream whose input is controlled by the input control unit 111. A predetermined number of frames among the decoded frames are extracted by the frame extraction unit 113.

所定のカテゴリに属する番組のビデオストリームの全てが処理対象とされても良い。例えば、６０分の番組であるならば、６０分ぶんのビデオストリームが処理対象とされても良い。しかしながら、このようにすると、処理対象となるフレーム数が増大し、処理負担の増大、処理時間の増大を招くことになる。 All of the video streams of programs belonging to a predetermined category may be processed. For example, if it is a 60-minute program, a 60-minute video stream may be processed. However, if this is done, the number of frames to be processed increases, leading to an increase in processing load and an increase in processing time.

そこで、所定のカテゴリに属する番組の所定の時間ぶんのビデオストリームが処理対象とされるようにする。例えば、番組の冒頭の１０分間ぶんのビデオストリームが処理対象とされる。このようにした場合、入力制御部１１１は、番組の冒頭の１０分間だけ、ビデオストリームが入力されるように制御する。 Therefore, a video stream for a predetermined time of a program belonging to a predetermined category is set as a processing target. For example, a 10-minute video stream at the beginning of a program is a processing target. In such a case, the input control unit 111 controls the video stream to be input only for the first 10 minutes of the program.

次に、その１０分間ぶんのビデオストリームに含まれる全てのフレームを処理対象としても良い。しかしながら、上記の場合と同様に、処理負担や処理時間の増大を招くことになるため、所定の枚数のフレームが処理対象とされる。所定の枚数として、Ｍ枚のフレームが処理対象とされる場合、フレーム抽出部１１３は、デコーダ１１２からのビデオストリームから、Ｍ枚のフレームを抽出し、ビデオ特徴量抽出部１１４に出力する。Ｍ枚のフレームは、例えば、所定の時間間隔で抽出、ランダムに抽出、所定のフレームの枚数毎に抽出される。 Next, all the frames included in the 10-minute video stream may be processed. However, as in the above case, the processing load and the processing time are increased, so that a predetermined number of frames are processed. When M frames are processed as the predetermined number, the frame extraction unit 113 extracts M frames from the video stream from the decoder 112 and outputs the M frames to the video feature amount extraction unit 114. For example, the M frames are extracted at predetermined time intervals, randomly extracted, and extracted for each predetermined number of frames.

フレーム抽出部１１３において抽出されたフレームは、処理対象のフレームとして、ビデオ特徴量抽出部１１４に出力される。ステップＳ１５３において、ビデオ特徴量抽出部１１４は、供給されたフレーム（画像）から、所定の特徴量を抽出する。この所定の特徴量は、色ヒストグラム、色モーメント、差分画像、縮小画像などである。また、この所定の特徴量は、リファレンスデータと同じ特徴量とされる。すなわち、例えば、リファレンスデータとしての特徴量が、色ヒストグラムでの特徴量である場合、ステップＳ１５３において、ビデオ特徴量抽出部１１４により抽出される特徴量も、色ヒストグラムでの特徴量とされる。 The frame extracted by the frame extraction unit 113 is output to the video feature amount extraction unit 114 as a processing target frame. In step S153, the video feature amount extraction unit 114 extracts a predetermined feature amount from the supplied frame (image). The predetermined feature amount is a color histogram, a color moment, a difference image, a reduced image, or the like. The predetermined feature amount is the same feature amount as the reference data. That is, for example, when the feature amount as the reference data is a feature amount in the color histogram, the feature amount extracted by the video feature amount extraction unit 114 in step S153 is also the feature amount in the color histogram.

ステップＳ１５４において、リファレンスデータＲｉが初期値である“１”に設定される。リファレンスデータは、複数のカテゴリ毎に、複数のフレームから抽出された特徴量で構成されている。Ｍ個のカテゴリ毎に、例えば、ｎ₁枚、ｎ₂枚、ｎ₃枚、・・・、ｎ_n枚のフレームからそれぞれ特徴量が抽出されている場合、（ｎ₁＋ｎ₂＋ｎ₃＋・・・ｎ_n）個の特徴量がリファレンスデータとしてリファレンスデータ記憶部１１５に記憶されている。 In step S154, the reference data Ri is set to “1” which is an initial value. The reference data is composed of feature amounts extracted from a plurality of frames for a plurality of categories. For example, when feature values are extracted from n ₁ frames, n ₂ frames, n ₃ frames,..., N _n frames for each of M categories, (n ₁ + n ₂ + n ₃ +. .. n _n ) feature quantities are stored in the reference data storage unit 115 as reference data.

この（ｎ₁＋ｎ₂＋ｎ₃＋・・・ｎ_n）個の特徴量に、順に番号を割り振るとする。すなわち、１乃至（ｎ₁＋ｎ₂＋ｎ₃＋・・・ｎ_n）番までの番号が、各特徴量に割り振られる。ステップＳ１５４においては、初期設定として、このリファレンスデータを構成する特徴量の１番目の特徴量が、処理対象のリファレンスデータとして設定される。 Assume that numbers are sequentially assigned to the (n ₁ + n ₂ + n ₃ +... N _n ) feature quantities. That is, numbers from _{1 to} (n ₁ + n ₂ + n ₃ +... N _n ) are assigned to each feature quantity. In step S154, as the initial setting, the first feature quantity of the feature quantity constituting the reference data is set as the reference data to be processed.

ステップＳ１５５において、距離算出部１１６は、ビデオ特徴量抽出部１１４から供給された特徴量と、リファレンスデータＲｉを用いて距離を算出する。すなわち、ビデオ特徴量抽出部１１４で処理対象とされたフレームと、リファレンスデータＲｉが抽出されたフレームとの類似度に関する距離が算出される。ここでは、距離が短いほど類似しているとして説明を続ける。 In step S155, the distance calculation unit 116 calculates the distance using the feature amount supplied from the video feature amount extraction unit 114 and the reference data Ri. That is, the distance related to the similarity between the frame that has been processed by the video feature quantity extraction unit 114 and the frame from which the reference data Ri has been extracted is calculated. Here, the description is continued assuming that the shorter the distance, the more similar.

ステップＳ１５５において、距離算出部１１６により算出された距離は、最小距離保持部１１７に供給される。最小距離保持部１１７は、ステップＳ１５６において、供給された距離と、保持されている距離を比較し、供給された距離の方が短いか否かを判断する。最小距離保持部１１７は、リファレンスデータＲｉ毎に、距離を保持している。保持される距離は、処理過程のなかで、一番短いとされた距離である。 In step S <b> 155, the distance calculated by the distance calculation unit 116 is supplied to the minimum distance holding unit 117. In step S156, the minimum distance holding unit 117 compares the supplied distance with the held distance, and determines whether the supplied distance is shorter. The minimum distance holding unit 117 holds a distance for each reference data Ri. The distance to be held is the shortest distance in the process.

例えば、リファレンスデータＲｉが“１”のとき（リファレンスデータＲ１が処理対象とされているとき）、そのリファレンスデータＲ１に関連付けられている距離と、供給された距離とを比較し、短い距離の方が保持される。よって、ステップＳ１５６において、保持されている距離よりも、供給された距離の方が短いか否かが判断され、短いと判断された場合、ステップＳ１５７に処理が進められる。 For example, when the reference data Ri is “1” (when the reference data R1 is a processing target), the distance associated with the reference data R1 is compared with the supplied distance, and the shorter distance is calculated. Is retained. Therefore, in step S156, it is determined whether or not the supplied distance is shorter than the held distance. If it is determined that the supplied distance is shorter, the process proceeds to step S157.

ステップＳ１５７において、その短いと判断された距離が、その時点で処理対象とされているリファレンスデータＲｉに関連付けられる。すなわちこの場合、その時点でリファレンスデータＲｉに関連付けられていた距離が、新たな距離に置き換えられる。置き換えが実行された後、処理は、ステップＳ１５８に進められる。 In step S157, the distance determined to be short is associated with the reference data Ri that is to be processed at that time. That is, in this case, the distance associated with the reference data Ri at that time is replaced with a new distance. After the replacement is executed, the process proceeds to step S158.

一方、ステップＳ１５６において、保持されている距離よりも、供給された距離の方が長いと判断された場合、ステップＳ１５７の処理はスキップされ、ステップＳ１５８に処理が進められる。すなわち、その時点で、リファレンスデータＲｉに関連付けられている距離が、そのまま関連付けられた状態が維持される。 On the other hand, if it is determined in step S156 that the supplied distance is longer than the held distance, the process in step S157 is skipped and the process proceeds to step S158. That is, at that time, the state in which the distance associated with the reference data Ri is directly associated is maintained.

ステップＳ１５８において、次のリファレンスデータＲｉがあるか否かが判断される。例えば、リファレンスデータＲ１が処理対象とされているときには、リファレンスデータＲ２があるか否かが判断される。ステップＳ１５８において、次のリファレンスデータＲｉがあると判断された場合、ステップＳ１５９に処理が進められる。 In step S158, it is determined whether there is next reference data Ri. For example, when the reference data R1 is a processing target, it is determined whether or not there is reference data R2. If it is determined in step S158 that there is the next reference data Ri, the process proceeds to step S159.

ステップＳ１５９において、次のリファレンスデータＲｉが、新たな処理対象のリファレンスデータＲｉに設定される。そして、新たに処理対象とされたリファレンスデータＲｉに対して、ステップＳ１５５以下の処理が繰り返される。 In step S159, the next reference data Ri is set as the new reference data Ri to be processed. And the process after step S155 is repeated with respect to the reference data Ri newly made into the process target.

このようにステップＳ１５５乃至Ｓ１５９の処理が繰り返されることにより、リファレンスデータ記憶部１１５に記憶されている全てのリファレンスデータＲｉと、１枚のフレームから抽出された特徴量との距離が算出される。換言すれば、リファレンスデータＲｉの基になった複数のフレームと、処理対象とされているビデオストリーム内の１つのフレームとの距離が、それぞれ算出され、最小距離のみが保持される。 As described above, by repeating the processes of steps S155 to S159, the distance between all the reference data Ri stored in the reference data storage unit 115 and the feature amount extracted from one frame is calculated. In other words, the distances between the plurality of frames that are the basis of the reference data Ri and one frame in the video stream to be processed are calculated, and only the minimum distance is held.

一方、ステップＳ１５８において、次のリファレンスデータＲｉはないと判断された場合、ステップＳ１５２に処理が戻され、次のフレームが処理対象とされる。このように、ステップＳ１５２乃至Ｓ１５９の処理が繰り返されることにより、処理対象とされているビデオストリームから抽出された所定枚数のフレームと、リファレンスデータＲｉを抽出する基となったフレームとの距離が、それぞれ算出され、最小の距離の情報だけが保持される。 On the other hand, if it is determined in step S158 that there is no next reference data Ri, the process returns to step S152, and the next frame is set as a processing target. In this way, by repeating the processing of steps S152 to S159, the distance between the predetermined number of frames extracted from the video stream to be processed and the frame from which the reference data Ri is extracted is Each is calculated and only the minimum distance information is retained.

このような最小の距離から構成される特徴量ベクトルから、識別パラメータが生成されるが、その説明は、後述する。ここまでの処理について、再度、図７を参照して説明する。 An identification parameter is generated from a feature vector composed of such minimum distances, and the description thereof will be described later. The processing so far will be described again with reference to FIG.

図７を参照するに、カテゴリ１、カテゴリ２、・・・、カテゴリＮが設定されている。これらのカテゴリは、詳細に分類されたカテゴリである。例えば、カテゴリ１は“スポーツニュース”であり、カテゴリ２は“スポーツ中継”であり、カテゴリ３は“スポーツバラエティ”といったように、“スポーツ”というジャンルであっても、さらに、どのような番組であるかを示す詳細なカテゴリとされている。 Referring to FIG. 7, category 1, category 2,..., Category N are set. These categories are categorized in detail. For example, Category 1 is “Sports News”, Category 2 is “Sports Broadcast”, Category 3 is “Sports Variety”, etc. It is a detailed category that indicates whether or not there is.

カテゴリ１には、複数枚のフレームが含まれる。その複数枚のフレームから、ｎ₁枚のフレームが抽出される。この処理は、図４のフローチャートを参照して説明したステップＳ１０１，Ｓ１０２の処理に対応する。同様に、カテゴリ２にも、複数枚のフレームが含まれ、その複数枚のフレームから、ｎ₂枚のフレームが抽出される。さらに同様に、カテゴリｎにも、複数枚のフレームが含まれ、その複数枚のフレームから、ｎ_n枚のフレームが抽出される。 Category 1 includes a plurality of frames. N ₁ frames are extracted from the plurality of frames. This processing corresponds to the processing in steps S101 and S102 described with reference to the flowchart of FIG. Similarly, category 2 includes a plurality of frames, and n ₂ frames are extracted from the plurality of frames. Similarly, the category n includes a plurality of frames, and n _n frames are extracted from the plurality of frames.

各カテゴリ１乃至ｎから抽出されたそれぞれのフレームは、リファレンスフレームとされる。このリファレンスフレームから、特徴量が抽出される。この処理は、上述したステップＳ１０３に相当し、リファレンスフレームから特徴量が抽出され、その特徴量がリファレンスデータとされる処理である。 Each frame extracted from each category 1 to n is a reference frame. A feature amount is extracted from this reference frame. This process corresponds to step S103 described above, and is a process in which a feature amount is extracted from a reference frame and the feature amount is used as reference data.

よって、カテゴリ１から、ｎ₁枚のリファレンスフレームが抽出されるので、ｎ₁個のリファレンスデータが生成される。同様に、カテゴリ２から、ｎ₂枚のリファレンスフレームが抽出されるので、ｎ₂個のリファレンスデータが生成される。同様に、カテゴリｎから、ｎ_n枚のリファレンスフレームが抽出されるので、ｎ_n個のリファレンスデータが生成される。よって、最終的には、（ｎ₁＋ｎ₂＋・・・＋ｎ_n）個のリファレンスデータが生成される。 Therefore, from the category 1, since n ₁ piece of reference frame is extracted, n ₁ pieces of reference data are generated. Similarly, from the category 2, since n ₂ sheets of reference frame is extracted, n ₂ pieces of reference data are generated. Similarly, from the category n, since n _n pieces of reference frame is extracted, n _n pieces of reference data are generated. Therefore, (n ₁ + n ₂ +... + N _n ) reference data are finally generated.

このように、リファレンスデータ（図７では、リファレンスフレームと記述し、フレームに見立てた四角形を図示してある）が生成され、リファレンスデータ記憶部１１５に記憶される。各リファレンスデータには、番号が割り振られる。ここでは、図７に示したように、カテゴリ１から抽出されたリファレンスデータを、リファレンスデータＲ１、リファレンスデータＲ２、リファレンスデータＲ３とする。他のリファレンスデータにも番号が割り振られるが、図７には図示していない。 In this way, reference data (in FIG. 7, described as a reference frame, and a quadrilateral resembling a frame is generated) is generated and stored in the reference data storage unit 115. Each reference data is assigned a number. Here, as shown in FIG. 7, the reference data extracted from category 1 is referred to as reference data R1, reference data R2, and reference data R3. Numbers are assigned to other reference data, which are not shown in FIG.

このような状態の時、ビデオストリームＶ１が取得される。このビデオストリームＶ１のうちの範囲が限定、例えば、上記したように、番組の先頭の１０分間だけ、サンプリングが行われる。その結果、Ｍ枚のフレームが抽出される。これらのＭ枚のフレームのそれぞれから特徴量が抽出される。この処理は、図６のフローチャートにおけるステップＳ１５１乃至Ｓ１５３に相当する。 In such a state, the video stream V1 is acquired. The range of the video stream V1 is limited. For example, as described above, sampling is performed only for the first 10 minutes of the program. As a result, M frames are extracted. A feature amount is extracted from each of these M frames. This process corresponds to steps S151 to S153 in the flowchart of FIG.

図７においては、Ｍ個のフレームからそれぞれ抽出された特徴量を、特徴量Ｍ１、特徴量Ｍ２、特徴量Ｍ３、・・・、特徴量Ｍｍと記述する。 In FIG. 7, feature amounts extracted from M frames are described as feature amount M1, feature amount M2, feature amount M3,..., Feature amount Mm.

まず、特徴量Ｍ１とリファレンスデータＲ１が用いられ、距離Ｄ１が算出される。同様に特徴量Ｍ１とリファレンスデータＲ２が用いられ、距離Ｄ２が算出される。さらに同様に、特徴量Ｍ１とリファレンスデータＲ３が用いられ、距離Ｄ３が算出される。このようにして、１つの特徴量Ｍ１と、全てのリファレンスデータＲｉとの距離が算出される。よって、この時点で、（ｎ₁＋ｎ₂＋・・・＋ｎ_n）次元の特徴量ベクトルが生成されることになる。この処理は、ステップＳ１５４乃至Ｓ１５９の処理に対応する。 First, the feature amount M1 and the reference data R1 are used to calculate the distance D1. Similarly, the feature amount M1 and the reference data R2 are used to calculate the distance D2. Similarly, the distance D3 is calculated using the feature amount M1 and the reference data R3. In this way, the distance between one feature amount M1 and all reference data Ri is calculated. Therefore, at this point, a (n ₁ + n ₂ +... + N _n ) -dimensional feature quantity vector is generated. This process corresponds to the processes of steps S154 to S159.

このようにして、１つの特徴量Ｍと、全てのリファレンスデータＲｉとの距離が求められると、次の特徴量Ｍと、全てのリファレンスデータＲｉとの距離が求められる。特徴量Ｍ１の後は、特徴量Ｍ２が処理対象とされ、リファレンスデータＲｉとの距離が算出される。 Thus, when the distance between one feature value M and all the reference data Ri is obtained, the distance between the next feature value M and all the reference data Ri is obtained. After the feature value M1, the feature value M2 is processed, and the distance from the reference data Ri is calculated.

ステップＳ１５６の処理として、特徴量Ｍ１とリファレンスデータＲ１との距離Ｄ１と、特徴量Ｍ２とリファレンスデータＲ１との距離Ｄ１’が比較される。その結果、距離Ｄ１’の方が、距離Ｄ１よりも短いと判断された場合、その時点で、リファレンスデータＲ１に関連付けられていた距離Ｄ１が、距離Ｄ１’に置き換えられる。また、距離Ｄ１’の方が、距離Ｄ１よりも長いと判断された場合、その時点で、リファレンスデータＲ１に関連付けられていた距離Ｄ１が、そのままリファレンスデータＲ１に関連付けられている状態が維持される。このようにして、各リファレンスデータＲｉには、特徴量Ｍ１乃至Ｍｍのうちの、一番短いとされた距離の値が関連付けられる。 In step S156, the distance D1 between the feature quantity M1 and the reference data R1 is compared with the distance D1 'between the feature quantity M2 and the reference data R1. As a result, when it is determined that the distance D1 'is shorter than the distance D1, the distance D1 associated with the reference data R1 at that time is replaced with the distance D1'. Further, when it is determined that the distance D1 ′ is longer than the distance D1, the state where the distance D1 associated with the reference data R1 at that time is directly associated with the reference data R1 is maintained. . In this way, each reference data Ri is associated with the shortest distance value among the feature amounts M1 to Mm.

よって、最終的には、１つのビデオストリームから、（ｎ₁＋ｎ₂＋・・・＋ｎ_n）次元の特徴ベクトルが１つ生成される。このような特徴ベクトルは、処理対象とされたビデオストリームＶ１の特徴を表すデータである。 Therefore, one (n ₁ + n ₂ +... + N _n ) -dimensional feature vector is finally generated from one video stream. Such a feature vector is data representing the feature of the video stream V1 to be processed.

このような特徴ベクトルが、複数のビデオストリーム毎に生成されることで、最小距離保持部１１７には、複数のビデオストリームから生成された複数の特徴ベクトルが保持される。換言すれば、複数のカテゴリ毎に、１または複数の特徴ベクトルが保持される。ここで、１または複数としたのは、１つのカテゴリで１つの特徴ベクトルが生成されるようにしても良いし、１つのカテゴリで複数の特徴ベクトルが生成されるようにしても良いからである。 By generating such feature vectors for each of a plurality of video streams, the minimum distance holding unit 117 holds a plurality of feature vectors generated from a plurality of video streams. In other words, one or a plurality of feature vectors are held for each of a plurality of categories. Here, the reason why the number is one or more is that one feature vector may be generated in one category or a plurality of feature vectors may be generated in one category. .

このようにして、生成された複数の特徴ベクトルが、最小距離保持部１１７（図４）に保持されている状態のとき、その特徴ベクトルを用いて、学習アルゴリズム処理部１１８は識別パラメータを生成する。 When the plurality of feature vectors generated in this way are held in the minimum distance holding unit 117 (FIG. 4), the learning algorithm processing unit 118 generates an identification parameter using the feature vectors. .

学習アルゴリズム処理部１１８は、所定のアルゴリズムに基づいて、また、最小距離保持部１１７に保持されている特徴ベクトルを用いて、識別パラメータを生成する。所定のアルゴリズムとしては、例えば、最急降下法、サポートベクターマシン、バックプロパゲーションといったアルゴリズムを用いることができる。これらのアルゴリズムに基づき算出された識別パラメータは、カテゴリを識別するパラメータとして識別パラメータ保持部１１９に保持される。 The learning algorithm processing unit 118 generates an identification parameter based on a predetermined algorithm and using the feature vector held in the minimum distance holding unit 117. As the predetermined algorithm, for example, an algorithm such as a steepest descent method, a support vector machine, or back propagation can be used. The identification parameter calculated based on these algorithms is held in the identification parameter holding unit 119 as a parameter for identifying the category.

学習器１００は、このようなモデルの学習処理を行い、学習の結果として識別部１７において番組の分類の識別に用いられる識別パラメータ、すなわち例えば特徴空間を分割するための直線や曲線を生成するための識別パラメータを、識別部１７に供給して設定する。その設定は、上記したように、ネットワークや記録媒体を介して行われたり、直接的に行われたりする。 The learning device 100 performs such model learning processing and generates, as a result of learning, an identification parameter used for identifying a program classification in the identification unit 17, that is, for example, a straight line or a curve for dividing a feature space. The identification parameters are supplied to the identification unit 17 and set. As described above, the setting is performed via a network or a recording medium or directly.

［分類情報の生成について］
このように、識別パラメータが生成され、記録再生装置１の識別部１７に保持されることで、記録再生装置１では、番組のカテゴリを識別することが可能となる。図８のフローチャートを参照し、記録再生装置１（図１）が番組を分類する識別処理について説明する。 [Generation of classification information]
As described above, the identification parameter is generated and held in the identification unit 17 of the recording / reproducing apparatus 1, whereby the recording / reproducing apparatus 1 can identify the category of the program. With reference to the flowchart of FIG. 8, the identification process in which the recording / reproducing apparatus 1 (FIG. 1) classifies programs will be described.

ステップＳ２０１において、ビデオストリームが取得される。このビデオストリームは、入力制御部１２により入力が制御されたビデオストリームであっても良いし、保存部２０に保存されているビデオストリームであっても良い。ステップＳ２０２において、取得されたビデオストリームから、フレームが抽出される。入力されたビデオストリームから生成される全てのフレームを処理対象とする場合、フレームを抽出するといった処理を省略することが可能である。図１に示した記録再生装置１は、フレームを抽出する部分（フレーム抽出部）は図示していない。 In step S201, a video stream is acquired. This video stream may be a video stream whose input is controlled by the input control unit 12 or a video stream stored in the storage unit 20. In step S202, a frame is extracted from the acquired video stream. When all the frames generated from the input video stream are to be processed, it is possible to omit the process of extracting the frames. The recording / reproducing apparatus 1 shown in FIG. 1 does not show a portion for extracting a frame (frame extraction unit).

しかしながら、所定のフレームが抽出され、処理対象とされる場合、例えば、デコーダ１３とビデオ特徴量抽出部１５との間に、フレーム抽出部が設けられ、そのフレーム抽出部によりフレームが抽出されるような構成とされる。図示はしていないが、ここでは、フレームが抽出されるとして説明を続ける。またここでは、ビデオ特徴量抽出部１５が、デコーダ１３から供給されるフレームを選択することで、フレームの抽出が行われるとする。 However, when a predetermined frame is extracted and processed, for example, a frame extraction unit is provided between the decoder 13 and the video feature amount extraction unit 15, and the frame extraction unit extracts the frame. It is considered as a configuration. Although not shown, the description will be continued here assuming that a frame is extracted. Further, here, it is assumed that the video feature amount extraction unit 15 selects a frame supplied from the decoder 13 to extract a frame.

また、フレームの抽出は、学習器１００のフレーム抽出部１１３（図４）で行われる処理、換言すれば、ステップＳ１０２（図５）や、ステップＳ１５２（図６）で行われる処理と同様に行われる。すなわち、番組の冒頭の１０分間分のフレームが処理対象とされたり、所定の時間間隔で抽出されたフレームが処理対象とされたりするなどである。 The frame extraction is performed in the same manner as the processing performed by the frame extraction unit 113 (FIG. 4) of the learning device 100, in other words, the processing performed in step S102 (FIG. 5) and step S152 (FIG. 6). Is called. That is, a frame for the first 10 minutes of a program is a processing target, or a frame extracted at a predetermined time interval is a processing target.

ステップＳ２０２において、フレームが抽出されると、そのフレームが処理対象とされ、ステップＳ２０３に処理が進められる。ステップＳ２０３において、ビデオ特徴量抽出部１５は、処理対象とされたフレームから特徴量を抽出する。この処理は、学習器１００のビデオ特徴量抽出部１１４と同じく行われる。すなわち、フレームから、色ヒストグラム、色モーメント、差分画像、縮小画像などのビデオ特徴量が抽出される。 When a frame is extracted in step S202, the frame is set as a processing target, and the process proceeds to step S203. In step S203, the video feature amount extraction unit 15 extracts a feature amount from the frame to be processed. This process is performed in the same manner as the video feature amount extraction unit 114 of the learning device 100. That is, video feature amounts such as a color histogram, a color moment, a difference image, and a reduced image are extracted from the frame.

ステップＳ２０４において、特徴ベクトル生成部１６により特徴ベクトルが生成される。特徴ベクトル生成部１６は、ビデオ特徴量抽出部１５から供給された特徴量の中から、識別部１７において、チャプタ情報が付される番組の分類を識別するのに用いる所定の特徴量を選択し、選択した特徴量を要素とするベクトル（特徴ベクトル）を生成する。特徴ベクトル生成部１６は、生成した特徴ベクトルを、識別部１７に供給する。 In step S204, the feature vector generation unit 16 generates a feature vector. The feature vector generation unit 16 selects, from the feature amounts supplied from the video feature amount extraction unit 15, a predetermined feature amount used for identifying the classification of the program to which the chapter information is attached in the identification unit 17. Then, a vector (feature vector) having the selected feature amount as an element is generated. The feature vector generation unit 16 supplies the generated feature vector to the identification unit 17.

ステップＳ２０５において、識別部１７は、カテゴリを識別する。識別部１７は、特徴ベクトル生成部１６から供給された特徴ベクトルと、保持されている識別パターンを用いて、入力されたビデオストリームの番組が属するカテゴリを識別する。例えば識別部１７は、線形識別器、非線形識別器、またはニューラルネットなどの識別器で構成され、特徴ベクトルを構成する各要素を、学習器１００により設定された識別パラメータに基づいて生成した直線や曲線等で分割した所定の特徴空間に配置し、配置した各要素の分布が属する特徴空間の分割領域に基づいて、番組のカテゴリを識別する。 In step S205, the identification unit 17 identifies a category. The identification unit 17 identifies the category to which the program of the input video stream belongs using the feature vector supplied from the feature vector generation unit 16 and the retained identification pattern. For example, the discriminating unit 17 includes a discriminator such as a linear discriminator, a nonlinear discriminator, or a neural network, and each element constituting the feature vector is generated based on a straight line generated based on the discrimination parameter set by the learning device 100, Arranged in a predetermined feature space divided by a curve or the like, and identifies the category of the program based on the divided area of the feature space to which the distribution of the arranged elements belongs.

この識別されるカテゴリは、学習器１００における学習で細かく分類されたカテゴリである。そして、そのように細かく分類されたカテゴリに基づいて生成された識別パラメータを用いて識別が行われるため、その識別結果も、細かいカテゴリでの結果とすることが可能となる。 This identified category is a category finely classified by learning in the learning device 100. And since it identifies using the identification parameter produced | generated based on the category classified so finely, the identification result can also be made into the result in a fine category.

ステップＳ２０６において、識別結果としてのカテゴリが、分類情報として、チャプタ情報検出部１８に対して出力される。 In step S206, the category as the identification result is output to the chapter information detection unit 18 as classification information.

このようにして、チャプタ情報検出部１８に、番組が属する細かく分類されたカテゴリに関する分類情報が供給されることで、以下に説明するような、詳細なチャプタ情報の検出が可能となる。すなわち、番組にもっとも適したチャプタ情報の検出を番組毎に行うことが可能となる。 In this way, the chapter information detection unit 18 is supplied with the classification information related to the finely classified category to which the program belongs, so that detailed chapter information as described below can be detected. That is, it is possible to detect chapter information most suitable for a program for each program.

［チャプタ情報の検出について］
図９乃至図１１を参照し、チャプタ情報の検出について説明する。図９に示すような、サッカーの試合を中継して放送する番組からチャプタ情報を検出する場合について説明する。図９の上段には、サッカーの試合を放送する番組を構成する映像が示され、図９の下段には、その映像と同時に出力される音声の音量が示されている。 [Detection of chapter information]
The detection of chapter information will be described with reference to FIGS. A case will be described in which chapter information is detected from a program broadcasted by broadcasting a soccer game as shown in FIG. The upper part of FIG. 9 shows a video that constitutes a program that broadcasts a soccer game, and the lower part of FIG. 9 shows the volume of audio that is output simultaneously with the video.

いまの場合、オーディオ特徴量抽出部１４は、デコーダ１３から供給された図９に示すように音量が変動する音声のオーディオデータから、音量、周波数スペクトラム、左右チャンネル相関値などを、オーディオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。 In this case, the audio feature amount extraction unit 14 converts the volume, frequency spectrum, left and right channel correlation values, and the like from the audio data of the sound whose volume varies as shown in FIG. And supplied to the feature vector generation unit 16 and the chapter information detection unit 18.

ビデオ特徴量抽出部１５は、デコーダ１３から供給された図９に示す映像のビデオデータから、色ヒストグラム、色モーメント、差分画像、縮小画像などを、ビデオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。 The video feature quantity extraction unit 15 extracts a color histogram, a color moment, a difference image, a reduced image, and the like from the video data of the video shown in FIG. 9 supplied from the decoder 13 as a video feature quantity, and a feature vector generation unit 16 and the chapter information detection unit 18.

特徴ベクトル生成部１６は、いまの例の場合、オーディオ特徴量抽出部１４またはビデオ特徴量抽出部１５から供給された特徴量のうち、ビデオ特徴量抽出部１５から供給された色ヒストグラムを用いて、特徴ベクトルを生成し、識別部１７に供給する。 In the present example, the feature vector generation unit 16 uses the color histogram supplied from the video feature amount extraction unit 15 among the feature amounts supplied from the audio feature amount extraction unit 14 or the video feature amount extraction unit 15. The feature vector is generated and supplied to the identification unit 17.

図９に示した映像からは、図２を参照して上述したように、例えば緑色の頻度が高い色ヒストグラムを多く含む特徴ベクトルが生成され、識別部１７に供給される。識別部１７は、緑色の頻度が高い色ヒストグラムを多く含む特徴ベクトルが特徴ベクトル生成部１６から供給されると、上述したように、そのような特徴ベクトルからは、番組の分類が「サッカー番組」であることを識別できるように学習している（識別パラメータが取得されている）ので、番組の分類が「サッカー番組」であると識別し、その識別結果として、分類が「サッカー番組」であることを示す分類情報を、チャプタ情報検出部１８に供給する。 From the video shown in FIG. 9, as described above with reference to FIG. 2, for example, a feature vector including many color histograms having a high green frequency is generated and supplied to the identification unit 17. When the feature vector including many color histograms having a high green frequency is supplied from the feature vector generation unit 16, the identification unit 17 determines that the program classification is “soccer program” from the feature vector as described above. Is identified so that it can be identified (identification parameters have been acquired), so that the classification of the program is identified as “soccer program”, and as a result of the identification, the classification is “soccer program” The classification information indicating this is supplied to the chapter information detection unit 18.

サッカーの試合を放送する番組は、キックオフのとき、ゴールチャンスがあったとき、またはゴールがあったとき等の盛り上がっているシーン（図９の例では、図中、上向きの矢印が付されている映像Ｆ４２，Ｆ５１，Ｆ５３）で音量が高くなる特徴がある。 A program that broadcasts a soccer game is a lively scene such as a kickoff, a goal chance, or a goal (in the example of FIG. 9, an upward arrow is attached in the figure) The image F42, F51, F53) has a feature that the volume is increased.

すなわちこの盛り上がっているシーンが、ユーザが興味を有する点であるので、ダイジェスト再生においてはこのシーンが再生されることが望ましい。そこで識別部１７から番組の分類が「サッカー番組」であることを示す分類情報が供給された場合、チャプタ情報検出部１８は、オーディオ特徴量抽出部１４およびビデオ特徴量抽出部１５から供給された特徴量の中から色ヒストグラム、差分画像、音量を選択し、それらを用いて、映像の連続性がない位置（例えば、フレーム）（以下、カット点と称する）を検出するとともに、カット点の検出結果と音量の変化を基にチャプタ区切り点を決定し、音量に基づく盛り上がりの度合いをチャプタのスコアとする演算を行う。チャプタ情報検出部１８は、その演算の結果検出したチャプタ情報を、保持部１９に供給して保持させる。 That is, since this exciting scene is a point that the user is interested in, it is desirable that this scene is reproduced in the digest reproduction. Therefore, when the classification information indicating that the classification of the program is “soccer program” is supplied from the identification unit 17, the chapter information detection unit 18 is supplied from the audio feature amount extraction unit 14 and the video feature amount extraction unit 15. A color histogram, a difference image, and a volume are selected from the feature amount, and a position (for example, a frame) (hereinafter referred to as a cut point) where there is no video continuity is detected using them, and a cut point is detected. A chapter breakpoint is determined based on the result and the change in volume, and a calculation is performed with the degree of excitement based on the volume as the chapter score. The chapter information detection unit 18 supplies the chapter information detected as a result of the calculation to the holding unit 19 for holding.

このように番組の分類が「サッカー番組」であると識別された場合、盛り上がりに基づいたチャプタ情報が検出されると、ダイジェスト再生において、キックオフのとき、ゴールチャンスがあったとき、またはゴールがあったとき等の盛り上がっているシーンの映像Ｆ４２，Ｆ５１，Ｆ５３等が、ダイジェスト再生される。 As described above, when the program classification is identified as “soccer program”, when chapter information based on the excitement is detected, in digest playback, at the time of kick-off, when there is a goal chance, or there is a goal. The images F42, F51, F53, etc. of the scene that is rising when the video is played are digest-reproduced.

次に、図１０に示すような、事件や出来事を報道する番組からチャプタ情報を検出する場合について説明する。オーディオ特徴量抽出部１４は、デコーダ１３から供給されたオーディオデータから、音量、周波数スペクトラム、左右チャンネル相関値などを、オーディオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。 Next, a case where chapter information is detected from a program reporting an incident or event as shown in FIG. 10 will be described. The audio feature amount extraction unit 14 extracts volume, frequency spectrum, left and right channel correlation values, and the like from the audio data supplied from the decoder 13 as audio feature amounts, and outputs them to the feature vector generation unit 16 and the chapter information detection unit 18. Supply.

ビデオ特徴量抽出部１５は、デコーダ１３から供給された図１０に示す映像のビデオデータから、色ヒストグラム、色モーメント、差分画像、縮小画像などを、ビデオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。 The video feature amount extraction unit 15 extracts a color histogram, a color moment, a difference image, a reduced image, and the like as video feature amounts from the video data of the video shown in FIG. 10 supplied from the decoder 13, and a feature vector generation unit 16 and the chapter information detection unit 18.

図１０に示した映像からは、図３を参照して上述したように、人物とスタジオの特有の色の頻度が高い色ヒストグラムを多く含む特徴ベクトルが生成されて、識別部１７に供給される。識別部１７は、人物とスタジオの特有の色の頻度が高い色ヒストグラムを多く含む特徴ベクトルが特徴ベクトル生成部１６から供給されると、上述したように、そのような特徴ベクトルからは、番組の分類が「ニュース番組」であることを識別できるように学習しているので、番組の分類が「ニュース番組」であると識別し、その識別結果として、分類が「ニュース番組」であることを示す分類情報を、チャプタ情報検出部１８に供給する。 From the video shown in FIG. 10, as described above with reference to FIG. 3, a feature vector including a large number of color histograms with high frequency of specific colors of people and studios is generated and supplied to the identification unit 17. . When the feature vector including a large number of color histograms having a high frequency of colors peculiar to people and studios is supplied from the feature vector generation unit 16, the identification unit 17 obtains the program from the feature vector as described above. Since learning is performed so that the classification is “news program”, the classification of the program is identified as “news program”, and the classification result indicates that the classification is “news program”. The classification information is supplied to the chapter information detection unit 18.

事件や出来事を報道する番組は、報道の内容を説明するアナウンサーの映像と事件等に応じた映像が順次切り替わる（図１０の例では、図中、上向きの矢印が付されている映像Ｆ６１，Ｆ６３，Ｆ７１，Ｆ７２で切り替わっている）特徴がある。 In a program reporting an incident or an event, an announcer's video explaining the content of the report and a video corresponding to the event are sequentially switched (in the example of FIG. 10, video F61, F63 with an upward arrow in the figure). , F71, F72).

視聴者にとっては報道の内容を説明するアナウンサーの映像を視聴すればニュースの概要を把握することができるので、ダイジェスト再生においてはこのシーンが再生されることが望ましい。そこで識別部１７から番組の分類が「ニュース番組」であることを示す分類情報が供給された場合、チャプタ情報検出部１８は、オーディオ特徴量抽出部１４およびビデオ特徴量抽出部１５から供給された特徴量の中から色ヒストグラムと差分画像を選択し、それらを用いて、カット点を検出するとともに、色ヒストグラムの類似性から報道の内容を説明するアナウンサーの映像と報道に応じた映像が切り替わる位置を検出してチャプタ区切り点とし、アナウンサーの映像に高いスコアを与えるような演算を実行する。チャプタ情報検出部１８は、その演算の結果検出したチャプタ情報を、保持部１９に供給して保持させる。 Since the viewer can grasp the outline of the news by viewing the video of the announcer explaining the contents of the report, it is desirable that this scene is reproduced in the digest reproduction. Therefore, when the classification information indicating that the classification of the program is “news program” is supplied from the identification unit 17, the chapter information detection unit 18 is supplied from the audio feature amount extraction unit 14 and the video feature amount extraction unit 15. A color histogram and a difference image are selected from the feature values, and the cut point is detected using them, and the position where the announcer's video explaining the content of the report from the similarity of the color histogram and the video corresponding to the report are switched. Is detected as a chapter break point, and an operation is performed to give a high score to the announcer's video. The chapter information detection unit 18 supplies the chapter information detected as a result of the calculation to the holding unit 19 for holding.

このように番組が「ニュース番組」であると識別された場合、カット点と色ヒストグラムの類似性に基づいたチャプタ情報が検出されると、ダイジェスト再生において、報道の内容を説明するアナウンサーの映像Ｆ６１、Ｆ６２、Ｆ７１が、ダイジェスト再生される。 In this way, when the program is identified as a “news program”, when chapter information based on the similarity between the cut point and the color histogram is detected, the video F61 of the announcer explaining the contents of the report in digest playback. , F62, and F71 are digest-reproduced.

次に、図１１に示すような、サッカーの試合結果（いわゆるダイジェスト）を紹介する番組からチャプタ情報を検出する場合について説明する。オーディオ特徴量抽出部１４は、デコーダ１３から供給されたオーディオデータから、音量、周波数スペクトラム、左右チャンネル相関値などを、オーディオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。 Next, a case where chapter information is detected from a program introducing a soccer game result (so-called digest) as shown in FIG. 11 will be described. The audio feature amount extraction unit 14 extracts volume, frequency spectrum, left and right channel correlation values, and the like from the audio data supplied from the decoder 13 as audio feature amounts, and outputs them to the feature vector generation unit 16 and the chapter information detection unit 18. Supply.

ビデオ特徴量抽出部１５は、デコーダ１３から供給された図１１に示す映像のビデオデータから、色ヒストグラム、色モーメント、差分画像、縮小画像などを、ビデオの特徴量として抽出し、特徴ベクトル生成部１６およびチャプタ情報検出部１８に供給する。 The video feature amount extraction unit 15 extracts a color histogram, a color moment, a difference image, a reduced image, and the like from the video data of the video shown in FIG. 11 supplied from the decoder 13 as a video feature amount, and a feature vector generation unit 16 and the chapter information detection unit 18.

図１１に示した映像からは、人物とスタジオの特有の色の頻度が高い色ヒストグラムと、緑色の頻度が高い色ヒストグラムが混在して生成されて、識別部１７に供給される。識別部１７は、人物とスタジオの特有の色の頻度が高い色ヒストグラムと緑色の頻度が高い色ヒストグラムが混在する特徴ベクトルが特徴ベクトル生成部１６から供給されると、上述したように、そのような特徴ベクトルからは、番組の分類が「サッカーダイジェスト番組」であることを識別できるように学習しているので、番組の分類が「サッカーダイジェスト番組」であると識別し、その識別結果として、分類が「サッカーダイジェスト番組」であることを示す分類情報を、チャプタ情報検出部１８に供給する。 From the video shown in FIG. 11, a color histogram with a high frequency of colors specific to the person and the studio and a color histogram with a high frequency of green are mixedly generated and supplied to the identification unit 17. As described above, when the feature vector in which the color histogram having a high frequency frequency specific to the person and the studio and the color histogram having a high green frequency are mixed is supplied from the feature vector generation unit 16, the identification unit 17 From the feature vector, learning is performed so that the classification of the program can be identified as “soccer digest program”. Therefore, the classification of the program is identified as “soccer digest program”. Supplies the chapter information detecting unit 18 with classification information indicating that “is a“ soccer digest program ”.

識別部１７から番組の分類が「サッカーダイジェスト番組」であることを示す分類情報が供給されると、チャプタ情報検出部１８は、図１０の例の場合と同様に、オーディオ特徴量抽出部１４およびビデオ特徴量抽出部１５から供給された特徴量の中から色ヒストグラムと差分画像を選択し、それらを用いて、カット点と色ヒストグラムの類似性に基づくチャプタ区切り点と、試合中の映像に高いスコアを与えるような演算を実行する。チャプタ情報検出部１８は、その演算の結果検出したチャプタ情報を、保持部１９に供給して保持させる。 When classification information indicating that the classification of the program is “soccer digest program” is supplied from the identification unit 17, the chapter information detection unit 18, as in the example of FIG. A color histogram and a difference image are selected from the feature amounts supplied from the video feature amount extraction unit 15, and using them, the chapter breakpoints based on the similarity between the cut points and the color histograms and the video during the match are high. Perform an operation that gives a score. The chapter information detection unit 18 supplies the chapter information detected as a result of the calculation to the holding unit 19 for holding.

サッカーの試合結果（いわゆるダイジェスト）を紹介する番組は、試合結果の内容を説明するアナウンサーの映像と試合の映像が順次切り替わる（図１１の例では、図中、上向きの矢印が付されている映像Ｆ８１，Ｆ８２，Ｆ９１，Ｆ９２で切り替わっている）特徴がある。 In the program introducing soccer game results (so-called digest), the video of the announcer explaining the content of the game results and the video of the game are sequentially switched (in the example of FIG. 11, the video with an upward arrow in the figure) F81, F82, F91, and F92).

サッカーの試合結果を紹介する番組においては、アナウンサーの映像よりも試合の映像の方が視聴者にとっては重要である。従って、ダイジェスト再生においては試合の映像が優先的に再生されることが望ましい。 In a program introducing soccer game results, the video of the game is more important for the viewer than the video of the announcer. Therefore, it is desirable that the video of the game is preferentially reproduced in the digest reproduction.

このように番組が「サッカーダイジェスト番組」であると識別された場合、カット点と色ヒストグラムの類似性に基づいたチャプタ情報が検出されると、サッカーの試合結果を紹介する番組に対して最適なダイジェスト再生を行うことができる。図１１の例の場合、試合シーンの映像Ｆ８３，Ｆ９２等が、ダイジェスト再生される。 In this way, when the program is identified as a “soccer digest program” and the chapter information based on the similarity between the cut points and the color histogram is detected, it is optimal for the program introducing the soccer game result. Digest playback can be performed. In the case of the example of FIG. 11, the game scene videos F83, F92, etc. are digest-reproduced.

例えば図１１に示す番組に対して、ＥＰＧでの分類に応じて、図９の例の場合のように盛り上がり点をチャプタ情報として検出すると、ダイジェスト再生において適切な映像を再生することができない。 For example, for the program shown in FIG. 11, if a climax point is detected as chapter information as in the case of the example of FIG. 9 according to the EPG classification, an appropriate video cannot be reproduced in digest reproduction.

以上のようにして、ダイジェスト再生で利用されるチャプタ情報の検出に適した番組の分類識別が行われ、その分類に基づいてチャプタ情報が検出される。 As described above, program classification suitable for detection of chapter information used in digest reproduction is performed, and chapter information is detected based on the classification.

なお以上においては、放送番組をダイジェスト再生する場合に利用されるチャプタ情報検出を例として説明したが、他のコンテンツをダイジェスト再生する場合のチャプタ情報検出についても同様に適用することができる。例えばカムコーダで撮ったパーソナルコンテンツについても適用することができる。なおパーソナルコンテンツの分類用の学習が必要となるが、図１に示した記録再生装置１に学習器１００が内蔵される構成とすることで、その学習を容易に行えるようになる。 In the above description, the chapter information detection used for digest playback of a broadcast program has been described as an example. However, the same can be applied to chapter information detection for digest playback of other content. For example, the present invention can also be applied to personal contents taken with a camcorder. Although learning for personal content classification is required, the learning device 100 is built in the recording / reproducing apparatus 1 shown in FIG. 1 so that the learning can be performed easily.

また以上においては、ダイジェスト再生で利用されるチャプタ情報の検出に適した番組の分類を識別する場合を例として説明したが、他の処理に適した分類が識別されるようにすることもできる。 In the above description, the case of identifying a program category suitable for detecting chapter information used in digest playback has been described as an example. However, a category suitable for other processing may be identified.

また以上においては、分類識別処理が実行されるタイミングについては言及しなかったが、番組を録画とすると同時に行うことができる。すなわち入力制御部１２は、ＡＶデータを保持部２０に供給して記憶させるとともに（すなわち録画させるとともに）、デコーダ１３に供給する。 In the above description, the timing at which the classification identification process is executed is not mentioned, but it can be performed simultaneously with recording a program. That is, the input control unit 12 supplies AV data to the holding unit 20 for storage (that is, recording) and supplies the AV data to the decoder 13.

デコーダ１３乃至識別部１７は、入力制御部１２から供給されたＡＶデータに基づいて、上述したように分類識別処理を実行する。なおこの際、チャプタ情報検出部１８は、動作せず、オーディオ特徴量抽出部１４およびビデオ特徴量抽出部１５から供給された特徴量は、例えばチャプタ情報検出部１８に保持されるようにすることもできるし、破棄されるようにすることもできる。 Based on the AV data supplied from the input control unit 12, the decoder 13 to the identification unit 17 execute the classification identification process as described above. At this time, the chapter information detection unit 18 does not operate, and the feature amounts supplied from the audio feature amount extraction unit 14 and the video feature amount extraction unit 15 are held in, for example, the chapter information detection unit 18. It can also be discarded.

なおチャプタ情報検出は、番組の録画が完了し、番組の分類が識別された後、保持部２０に保持されたＡＶデータがデコーダ１３によって読み出され、オーディオ特徴量抽出部１４およびビデオ特徴量抽出部１５によって各特徴量が抽出され、チャプタ情報検出部１８において、抽出された特徴量から先に識別された番組の分類に応じた特徴量が選択されて、チャプタ情報が検出される。 In the chapter information detection, after the recording of the program is completed and the classification of the program is identified, the AV data held in the holding unit 20 is read out by the decoder 13, and the audio feature quantity extraction unit 14 and the video feature quantity extraction are performed. Each feature amount is extracted by the unit 15, and the chapter information is detected by the chapter information detection unit 18 by selecting a feature amount according to the classification of the previously identified program from the extracted feature amount.

また分類識別に必要な特徴ベクトルの特徴量を、番組全体に渡って抽出することもできるし、例えば番組の先頭の所定の時間（例えば、１０分間）の部分から抽出することもできる。番組全体に渡って特徴量を抽出する場合には、上述したように番組の録画が完了した後、分類識別が行われるが、番組の一部から特徴量を抽出する場合、オーディオ特徴量抽出部１４およびビデオ特徴量抽出部１５のそれぞれと、チャプタ情報検出部１８の間にバッファを設け、特徴ベクトルが生成されて分類が識別されるまでの特徴量をバッファしておけば、分類が識別された後に、直ちにチャプタ情報検出を開始することができる。 Further, the feature amount of the feature vector necessary for classification and identification can be extracted over the entire program, or can be extracted from, for example, a predetermined time (for example, 10 minutes) portion at the beginning of the program. When extracting feature values over the entire program, classification is performed after the recording of the program is completed as described above. When extracting feature values from a part of the program, an audio feature value extracting unit 14 and the video feature quantity extraction unit 15 and the chapter information detection unit 18 are provided with a buffer, and if the feature quantity from when the feature vector is generated until the classification is identified is buffered, the classification is identified. After that, chapter information detection can be started immediately.

また以上においては、番組の分類が「ニュース番組」または「サッカー番組」である場合を例として説明したが、「音楽番組」など、他の分類に識別することができるようにすることができる。 In the above description, the case where the program category is “news program” or “soccer program” has been described as an example. However, the program category can be classified into other categories such as “music program”.

また分類は、いわゆるジャンルに相当するものに限らず、他のものであってもよい。歌や演奏を放送する番組には、以下に示すようなタイプがあるが、それらを分類として識別することができれば、ダイジェスト再生におけるチャプタ情報検出をさらに適切に行うことができる。
・司会者と出演者との会話などよりも、実際の歌や演奏の時間が長いタイプ
・司会者と出演者との会話などが長いタイプ
・ホールなどの収録であって、観客の声援や拍手が入るタイプ The classification is not limited to what corresponds to a so-called genre, and may be other types. There are the following types of programs that broadcast songs and performances, but if they can be identified as classifications, chapter information detection in digest playback can be performed more appropriately.
・ The type of actual song and performance time is longer than the conversation between the moderator and the performer ・ The type of conversation between the moderator and the performer is long ・ The recording of the hall etc., and the cheering and applause of the audience Type that contains

また以上においては、分類識別にあたり、ＥＰＧデータを利用しなかったが、保持部２２に保持されたＥＰＧ情報をさらに利用して番組の分類識別の精度を向上させることもできる。 In the above description, EPG data is not used for classification identification. However, the EPG information held in the holding unit 22 can be further used to improve the classification identification accuracy of the program.

図１２は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示すブロック図である。 FIG. 12 is a block diagram illustrating a hardware configuration example of a computer that executes the above-described series of processing by a program.

コンピュータにおいて、CPU（Central Processing Unit）２０１，ROM（Read Only Memory）２０２，RAM（Random Access Memory）２０３は、バス２０４により相互に接続されている。 In a computer, a central processing unit (CPU) 201, a read only memory (ROM) 202, and a random access memory (RAM) 203 are connected to each other by a bus 204.

バス２０４には、さらに、入出力インタフェース２０５が接続されている。入出力インタフェース２０５には、キーボード、マウス、マイクロホンなどよりなる入力部２０６、ディスプレイ、スピーカなどよりなる出力部２０７、ハードディスクや不揮発性のメモリなどよりなる記憶部２０８、ネットワークインタフェースなどよりなる通信部２０９、磁気ディスク、光ディスク、光磁気ディスク、あるいは半導体メモリなどのリムーバブルメディア２１１を駆動するドライブ２１０が接続されている。 An input / output interface 205 is further connected to the bus 204. The input / output interface 205 includes an input unit 206 composed of a keyboard, mouse, microphone, etc., an output unit 207 composed of a display, a speaker, etc., a storage unit 208 composed of a hard disk or nonvolatile memory, and a communication unit 209 composed of a network interface. A drive 210 for driving a removable medium 211 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is connected.

以上のように構成されるコンピュータでは、CPU２０１が、例えば、記憶部２０８に記憶されているプログラムを、入出力インタフェース２０５およびバス２０４を介して、RAM２０３にロードして実行することにより、上述した一連の処理が行われる。 In the computer configured as described above, the CPU 201 loads, for example, the program stored in the storage unit 208 to the RAM 203 via the input / output interface 205 and the bus 204 and executes the program, and the series described above. Is performed.

コンピュータ（CPU２０１）が実行するプログラムは、例えば、磁気ディスク（フレキシブルディスクを含む）、光ディスク（CD-ROM(Compact Disc-Read Only Memory),DVD(Digital Versatile Disc)等）、光磁気ディスク、もしくは半導体メモリなどよりなるパッケージメディアであるリムーバブルメディア２１１に記録して、あるいは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供される。 The program executed by the computer (CPU 201) is, for example, a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), etc.), a magneto-optical disk, or a semiconductor. The program is recorded on a removable medium 211 that is a package medium composed of a memory or the like, or provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

そして、プログラムは、リムーバブルメディア２１１をドライブ２１０に装着することにより、入出力インタフェース２０５を介して、記憶部２０８にインストールすることができる。また、プログラムは、有線または無線の伝送媒体を介して、通信部２０９で受信し、記憶部２０８にインストールすることができる。その他、プログラムは、ROM２０２や記憶部２０８に、予めインストールしておくことができる。 The program can be installed in the storage unit 208 via the input / output interface 205 by attaching the removable medium 211 to the drive 210. The program can be received by the communication unit 209 via a wired or wireless transmission medium and installed in the storage unit 208. In addition, the program can be installed in advance in the ROM 202 or the storage unit 208.

なお、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program that is processed in time series in the order described in this specification, or in parallel or at a necessary timing such as when a call is made. It may be a program for processing.

１記録再生装置，１１データ分離部，１２入力制御部，１３デコーダ，
１４オーディオ特徴量抽出部，１５ビデオ特徴量抽出部，１６特徴ベクトル
生成部，１７識別部，１８チャプタ情報検出部，１９保持部，２０保持部，２１再生部，２２保持部，４１制御部，１００学習器，１１１入力制御部，１１２デコーダ，１１３フレーム抽出部，１１４ビデオ特徴量抽出部，１１５リファレンスデータ記憶部，１１６距離算出部，１１７最小距離保持部，１１８学習アルゴリズム処理部，１１９識別パラメータ保持部，１２０ドライブ，１２１通信部 1 recording / reproducing apparatus, 11 data separation unit, 12 input control unit, 13 decoder,
14 audio feature quantity extraction units, 15 video feature quantity extraction units, 16 feature vector generation units, 17 identification units, 18 chapter information detection units, 19 holding units, 20 holding units, 21 playback units, 22 holding units, 41 control units, 100 learning unit, 111 input control unit, 112 decoder, 113 frame extraction unit, 114 video feature amount extraction unit, 115 reference data storage unit, 116 distance calculation unit, 117 minimum distance holding unit, 118 learning algorithm processing unit, 119 identification parameter Holding unit, 120 drive, 121 communication unit

Claims

A predetermined number of frames are extracted from the plurality of first contents, a feature quantity is extracted from each of the extracted frames, and a multi-dimensional vector composed of the extracted first feature quantities is stored. Storage means;
Extracting means for extracting a predetermined number of frames from the second content and extracting a second feature amount for each frame;
Of each of the plurality of first feature amounts constituting the multi-dimensional vector and a predetermined number of frames extracted from the second content, the second extracted from a frame to be processed. Calculating means for calculating a distance from the feature amount of
Vector generation means for holding only the minimum distance among the distances calculated for each of the second feature amounts by the calculation means, and generating a feature vector composed of the minimum distance;
An information processing apparatus comprising: parameter generation means that performs processing based on a predetermined algorithm using the feature vector generated by the generation means, and generates parameters for classifying content.

The information processing apparatus according to claim 1, wherein the extraction unit extracts the second feature amount from a predetermined part of the second content.

The information processing apparatus according to claim 1, wherein the algorithm is one of a steepest descent method, a support vector machine, and backpropagation.

A predetermined number of frames are extracted from the plurality of first contents, a feature quantity is extracted from each of the extracted frames, and a multi-dimensional vector composed of the extracted first feature quantities is stored. Remember
Extract a predetermined number of frames from the second content, extract a second feature amount for each frame,
Of each of the plurality of first feature amounts constituting the multi-dimensional vector and a predetermined number of frames extracted from the second content, the second extracted from a frame to be processed. Calculate the distance from the feature amount of
Of the distances calculated for each of the second feature amounts, only the minimum distance is retained, and a feature vector composed of the minimum distance is generated ,
An information processing method including a step of generating a parameter for classifying content by performing processing based on a predetermined algorithm using the generated feature vector .

A predetermined number of frames are extracted from the plurality of first contents, a feature quantity is extracted from each of the extracted frames, and a multi-dimensional vector composed of the extracted first feature quantities is stored. Remember
Extract a predetermined number of frames from the second content, extract a second feature amount for each frame,
Of each of the plurality of first feature amounts constituting the multi-dimensional vector and a predetermined number of frames extracted from the second content, the second extracted from a frame to be processed. Calculate the distance from the feature amount of
Of the distances calculated for each of the second feature amounts, only the minimum distance is retained, and a feature vector composed of the minimum distance is generated ,
A computer-readable program for executing a process including a step of generating a parameter for classifying content by performing a process based on a predetermined algorithm using the generated feature vector .

A recording medium on which the program according to claim 5 is recorded.