JP4066454B2

JP4066454B2 - Signal processing apparatus with authoring function and signal processing method including authoring

Info

Publication number: JP4066454B2
Application number: JP2003058996A
Authority: JP
Inventors: 哲矢鰺坂; 考司沼田
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2003-03-05
Filing date: 2003-03-05
Publication date: 2008-03-26
Anticipated expiration: 2023-03-05
Also published as: JP2004274171A

Description

【０００１】
【発明の属する技術分野】
本発明は、オーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法に関し、特にオーディオビデオデータを記憶媒体に記録する際に用いるオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法に関する。
【０００２】
【従来の技術】
デジタルビデオ（ＤＶ）テープやアナログビデオテープレコーダ（ＶＴＲ）からのオーディオビデオデータ（音声データ、画像（動画を含む）データ及び記録日時を示す日時データを含む、本明細書中で同じ）を、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）のような容量の大きい記憶媒体に記録する場合、複数のビデオ映像を含むオーディオビデオデータを一枚のＤＶＤにまとめて記録する場合がある。その場合、一枚のＤＶＤの中にどのような内容のオーディオビデオデータが記録されているかは、別途記録していないときには、内容を一通り見なければならなくなる。
【０００３】
そのような面倒を避けるための技術が知られている。例えば、以下のような技術である。
まず、例えば、特徴のあるシーンごと、又は、まとめて見たいシーンごとなどで頭出しを行えるように、一枚のＤＶＤに格納されたオーディオビデオデータを複数のチャプタに区切る。次に、各チャプタごとの先頭画面のサムネイル（静止画）を、そのチャプタの代表画像として抽出する。そして、ディスプレイの表示画面中に、抽出した全てのサムネイルを同時に表示する（又は一部のサムネイルを表示し、残りはスクロールで表示可能とする）。このようにすると、一つの表示画面で、複数のビデオ映像の代表画像一覧を見られるので、ＤＶＤ中のオーディオビデオデータの内容を短時間で把握することができる。そして、各サムネイルごとの頭出しを容易に行うことが出来る。
【０００４】
ここで、オーディオビデオデータを複数のチャプタに自動的に区切る方法としては、所定の条件を満たすオーディオビデオデータ（画像データ及び音声データ）の変化を検出し、その場所で区切る方法や、オーディオビデオデータ上に記録されたマーカを検出してその場所で区切る方法などが知られている。自動的に代表画像を抽出してサムネイル（静止画）とする方法としては、区切られたチャプタの先頭画面を代表画像とする方法などが知られている。
【０００５】
ただし、オーディオビデオデータの変化やオーディオビデオデータ上のマーカだけでは、適切な位置でオーディオビデオデータを区切れず、所望のチャプタを構成できない場合がある。また、サムネイルが静止画の場合、代表画像を適切に選択しないと、そのチャプタの内容を的確に把握することが困難となる場合がある。
複数のビデオ映像を有するオーディオビデオデータの区切りを自動的に、より適切に見出し、所望のチャプタを構成可能な技術が望まれている。的確にチャプタの内容を把握可能なサムネイルを生成することが可能な技術が望まれている。
【０００６】
関連する技術として、特開２００２−１５２６３６号公報（特許文献１）に自動チャプタ作成機能付き記録再生装置の技術が開示されている（関連：特開２００２−１５２６６５（特許文献２）、特開２００２−１５２６６６（特許文献３））。
この技術の自動チャプタ作成機能付き記録再生装置は、記録再生用媒体と、記録処理・再生処理手段と、表示信号導出手段と、システム制御手段と、一時停止手段とを有する装置である。ここで、記録再生用媒体は、プログラムを含むビデオ情報が記録されるビデオ情報記録領域と、ビデオ情報を記録・再生するための管理情報が記録されるビデオ管理情報記録領域と、プログラムの各チャプタを管理するためのチャプタ管理情報の記録領域とを少なくとも有する。記録処理・再生処理手段は、記録再生用媒体に、情報を記録・また記録情報を再生する。表示信号導出手段は、再生処理手段からの再生信号をディスプレイに供給する。システム制御手段は、記録処理・再生処理手段および表示信号導出手段を制御する。一時停止手段は、システム制御手段を介して全気記録処理手段に対して記録処理の一時停止を行わせる。そして、この装置は、一時停止手段が一時停止を実行したときと、録画が再開されたときの録画情報の切れ目をチャプタの境界として、チャプタ管理情報に登録する手段を有したことを特徴とする。
この技術の目的は、多数のプログラム（オーディオビデオデータ）が連続して記録されるような記憶媒体に対してチャプタ及びサムネイルを自動的に作成する自動チャプタ作成機能付き記録再生装置を提供することにある。
【０００７】
この技術では、オーディオビデオデータを記録中にオーディオビデオデータが一時停止した場合、それをチャプタの区切りとして複数のチャプタを決定する。そして、各チャプタの先頭画面をサムネイル（静止画）として取り出し、代表画面一覧を生成する。チャプタの編集は、手動で行うことも可能である。
【０００８】
【特許文献１】
特開２００２−１５２６３６号公報
【特許文献２】
特開２００２−１５２６６５号公報
【特許文献３】
特開２００２−１５２６６６号公報
【０００９】
【発明が解決しようとする課題】
従って、本発明の目的は、複数の映像を有するオーディオビデオデータを一つの記憶媒体に格納する場合に、格納されたオーディオビデオデータの内容を迅速且つ的確に把握できるように格納可能なオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法を提供することにある。
【００１０】
また、本発明の他の目的は、複数の映像を有するオーディオビデオデータを一つの記憶媒体に格納する場合に、オーディオビデオデータの区切りを自動的に、より適切に見出し、的確にチャプタを構成可能なオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法を提供することにある。
【００１１】
本発明の更に他の目的は、複数の映像を有するオーディオビデオデータを一つの記憶媒体に格納する場合に、区切られたチャプタごとの内容を的確に把握できるサムネイルを自動的に生成することが可能なオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法を提供することにある。
【００１２】
本発明の別の目的は、複数の映像を有するオーディオビデオデータを一つの記憶媒体に格納する場合に、その内容を容易に把握できるメニュー画面を自動的に作成可能なオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法を提供することにある。
【００１３】
【課題を解決するための手段】
以下に、［発明の実施の形態]で使用される番号・符号を用いて、課題を解決するための手段を説明する。これらの番号・符号は、［特許請求の範囲］の記載と［発明の実施の形態］との対応関係を明らかにするために括弧付きで付加されたものである。ただし、それらの番号・符号を、［特許請求の範囲］に記載されている発明の技術的範囲の解釈に用いてはならない。
【００１４】
従って、上記課題を解決するために、本発明のオーサリング機能付き信号処理装置は、データ処理部（２−１）と、書き込み制御部（１６）とを具備する。
データ処理部（２−１）は、オーディオビデオデータ（Ａ）をチャプタごとに分割して生成される複数のチャプタの各々ごとに動画サムネイル（Ｆ１、Ｊ）を作成し、前記動画サムネイル（Ｆ１、Ｊ）を含むメニュー画面データ（Ｈ１、Ｈ２）を作成する。書き込み制御部（１６）は、メニュー画面データ（Ｈ１、Ｈ２）を記憶媒体に記録する制御を行う。
ここで、オーディオビデオデータ（Ａ）は、複数の画像データと、その画像データの記録日時を示す日時データ（Ｔ０）と、オーディオビデオデータ（Ａ）におけるその画像データの位置を示す位置データ（ｔ０）とを含む。メニュー画面データ（Ｈ１、Ｈ２）は、その複数のチャプタのうちの一部又は全部の動画サムネイル（Ｆ１、Ｊ）を同時に表示するメニュー画面（５０）を示す。
本発明により、記憶媒体に格納されたオーディオビデオデータの内容は、チャプタごとの動画サムネイルが纏まって表示されるメニュー画面で把握される。すなわち、記憶媒体に格納されたオーディオビデオデータの内容を迅速且つ的確に把握できるように格納可能となる。
ここで、記録媒体としては、ＤＶＤやＲＯＭ、ＲＡＭ、ＨＤ、ＣＤ、ＦＤが例示される。データ処理部（２−１）は、メニュー画面を作成するための条件（Ｂ、Ｃ）を参照して、上記処理を行っても良い。その場合、ユーザの考えを反映できる。
【００１５】
上記のオーサリング機能付き信号処理装置において、データ処理部（２−１）は、データ作成部（２−２）と、メニュー画面作成部（１５）とを備える。
データ作成部（２−２）は、オーディオビデオデータ（Ａ）を分割してその複数のチャプタを生成し、その複数のチャプタの各々ごとに動画サムネイル（Ｆ１）を作成し、動画サムネイル（Ｆ１）に関するデータを示す動画サムネイルデータ（Ｌ）と、その複数のチャプタに関する制御情報を示す制御情報データ（Ｇ１）とを作成する。メニュー画面作成部（１５）は、動画サムネイルデータ（Ｌ）と制御情報データ（Ｇ１）とに基づいて、メニュー画面データ（Ｈ１）を作成する。
ここで、動画サムネイルデータ（Ｌ）としては、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）が例示される。また、オーディオビデオデータ（Ａ）をチャプタに分割する方法は、オーディオビデオデータ（Ａ）に含まれる日時データ（Ｔ０）や画像データ、音声データを用いて行う方法に例示される。
【００１６】
上記のオーサリング機能付き信号処理装置において、データ処理部（２−１）は、日時データ（Ｔ０）に基づいて、オーディオビデオデータ（Ａ）を分割してその複数のチャプタを生成する。
日時データ（Ｔ０：画像データの記録日時を示す）を用いてチャプタの分割を行うので、オーディオビデオデータ（Ａ）における内容の関連するシーンを集めることが出来、自動でチャプタを適切に区切ることができる。すなわち、オーディオビデオデータの区切りを自動的に、より適切に見出し、的確にチャプタを構成可能となる。
【００１７】
上記のオーサリング機能付き信号処理装置において、データ作成部（２−２）は、データ前処理部（２−３）と、動画サムネイル作成部（１３）と、制御情報データ作成部（１４）とを備える。
データ前処理部（２−３）は、オーディオビデオデータ（Ａ）を分割してその複数のチャプタを生成し、その複数のチャプタに関するデータを示すチャプタデータ（Ｅ）と、オーディオビデオデータ（Ａ）を符号化した映像符号化データ（Ｄ）とを作成する。動画サムネイル作成部（１３）は、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、動画サムネイルデータ（Ｌ）を作成する。制御情報データ作成部（１４）は、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、制御情報データ（Ｇ１）を作成する。
ここで、チャプタデータ（Ｅ）としては、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）が例示される。ただし、チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。また、チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。
【００１８】
上記のオーサリング機能付き信号処理装置において、データ前処理部（２−３）は、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）規格に基づいて、映像符号化データ（Ｄ）を作成する。
【００１９】
上記のオーサリング機能付き信号処理装置において、動画サムネイル作成部（１３）は、ハイライトシーン検出部（２６）と、作成手法選択部（２７）と、作成手法選択部（２８）と、テーブル作成部（２９）とを備える。
ハイライトシーン検出部（２６）は、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、ハイライトシーンの有無を判定する。ここで、そのハイライトシーンは、画素差分値（Δ）が基準値以上となる映像符号化データ（Ｄ）である。その基準値は、可変である。作成手法選択部（２７）は、そのハイライトシーンの有無に基づいて、そのチャプタごとに、動画サムネイル（Ｆ１）の作成方法を、予め設定された作成方法から選択する。作成手法実行部（２８）は、その選択された作成方法に基づいて、そのチャプタごとに、動画サムネイル（Ｆ１）を作成する。テーブル作成部（２９）は、作成された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を生成する。
ただし、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
ここで、その作成方法としては、そのハイライトシーンが無い場合、映像符号化データ（Ｄ）からフレームを間引いて動画サムネイル（Ｆ１）とし、そのハイライトシーンが有る場合、そのハイライトシーンを動画サムネイル（Ｆ１）とする方法が例示される。
本発明により、ハイライトシーンを用いることで、区切られたチャプタごとの内容を的確に把握できるサムネイルを自動的に作成することが可能となる。
【００２０】
上記のオーサリング機能付き信号処理装置において、動画サムネイル作成部（１３ａ）は、データ検出部（５６）と、データ解析部（５７）と、データ抽出部（５８）と、テーブル作成部（５９）とを備える。
データ検出部（５６）は、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、ＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅ）の位置を検出する。データ解析部（５７）は、検出されたそのＧＯＰに基づいて、そのチャプタごとに、そのＧＯＰ単位の符号量（Ｒ）と位置データ（ｔ０）とを関連付けた符号量テーブルを作成する。データ抽出部（５８）は、その符号量テーブルに基づいて、そのチャプタごとに、符号量（Ｒ）が最大となるそのＧＯＰを含む連続した所定の時間（２×Δｔ１）の映像符号化データ（Ｄ）を動画サムネイル（Ｆ１）として抽出する。テーブル作成部（５９）は、抽出された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を生成する。
ただし、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
本発明により、符号量（Ｒ）が最大となるそのＧＯＰを含むシーンを用いることで、区切られたチャプタごとの内容を的確に把握できるサムネイルを自動的に作成することが可能となる。
【００２１】
上記のオーサリング機能付き信号処理装置において、動画サムネイル作成部（１３ｂ）は、データ検出部（７６）と、データ解析部（７７）と、データ抽出部（７８）と、テーブル作成部（７９）とを備える。
データ検出部（７６）は、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、ＧＯＰの位置を検出し、検出されたそのＧＯＰごとに、所定の色を示す画素データに対して、所定のポイントを付加する。データ解析部（７７）は、そのポイントに基づいて、そのチャプタごとに、そのＧＯＰごとのポイントの合計（Ｓ）と位置データ（ｔ０）とを関連付けたポイントテーブルを作成する。データ抽出部（７８）は、そのポイントテーブルに基づいて、そのチャプタごとに、ポイントの合計（Ｓ）が最大となるそのＧＯＰを含む連続した所定の時間（２×Δｔ２）の映像符号化データ（Ｄ）を動画サムネイル（Ｆ１）として抽出する。テーブル作成部（７９）は、抽出された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を生成する。
ただし、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
本発明により、所定の色を示す画素データを多く含むシーンを用いるので、区切られたチャプタごとの内容を的確に把握できるサムネイルを自動的に生成することが可能となる。例えば、所定の色を人間の肌の色にすれば、人間が多く出てくる画面を取り出すことが出来る。
【００２２】
上記のオーサリング機能付き信号処理装置において、データ処理部（２ａ−１）は、データ前処理部（２ａ−２）と、メニュー画面作成部（１５ａ）とを備える。
データ前処理部（２ａ−２）は、オーディオビデオデータ（Ａ）を分割してその複数のチャプタを生成し、その複数のチャプタに関するデータを示すチャプタデータ（Ｅ）を作成し、オーディオビデオデータ（Ａ）を圧縮した動画データ（Ｊ）を作成して動画データ（Ｊ）に関するデータを示す動画サムネイルデータ（Ｋ）を作成する。メニュー画面作成部（１５ａ）は、動画サムネイルデータ（Ｋ）とチャプタデータ（Ｅ）とに基づいて、メニュー画面データ（Ｈ２）を作成する。
ここで、チャプタデータ（Ｅ）としては、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）が例示される。ただし、チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。また、チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。
また、動画サムネイルデータ（Ｋ）としては、オーディオビデオデータ（Ａ）を圧縮した動画データ（Ｊ）と映像符号化データ（Ｄ）とを関連付けた動画データテーブル（Ｋ）に例示される。動画データ（Ｊ）は、オーディオビデオデータ（Ａ）を符号化する過程で算出される符号化データに基づいて生成される。
【００２３】
上記のオーサリング機能付き信号処理装置において、データ前処理部（２ａ−２）は、オーディオビデオデータ（Ａ）を符号化した映像符号化データ（Ｄ）を更に作成し、その符号化の際にＤＣＴ（ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ）演算で算出されるＤＣ係数に基づいて、動画データ（Ｊ）を作成する。
【００２４】
上記のオーサリング機能付き信号処理装置において、データ前処理部（２ａ−２）は、エンコード部（１１、１１ａ）と、記録日時解析部（１２）とを含む。
エンコード部（１１、１１ａ）は、オーディオビデオデータ（Ａ）に基づいて、映像符号化データ（Ｄ）を作成する。記録日時解析部（１２）は、日時データ（Ｔ０）に基づいて、オーディオビデオデータ（Ａ）を複数のチャプタに分割し、チャプタデータ（Ｅ）を作成する。
【００２５】
上記のオーサリング機能付き信号処理装置において、記録日時解析部（１２）は、チャプタ分割部（２１）と、テーブル生成部（２３）とを備える。
チャプタ分割部（２１）は、日時データ（Ｔ０）の連続性が無い箇所をディオビデオデータ（Ａ）の区切りとして、その複数のチャプタに分割する。テーブル生成部（２３）は、チャプタテーブル（Ｅ）を含むチャプタデータ（Ｅ）を作成する。
ここで、チャプタテーブル（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けている。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。
【００２６】
上記のオーサリング機能付き信号処理装置において、記録日時解析部（１２）は、その複数のチャプタの数を予め設定された最大チャプタ数（Ｎ）に制限するチャプタ制限部（２２）を更に備える。
【００２７】
上記課題を解決するために、本発明のＤＶＤ装置は、オーディオビデオデータ（Ａ）の入力に基づいて、メニュー画面データ（Ｈ１、Ｈ２）を出力する上記の各項のいずれか一項に記載のオーサリング機能付き信号処理装置と、その記憶媒体にメニュー画面データ（Ｈ１、Ｈ２）を書き込む駆動部（３）とを具備する。
ここで、記録媒体としては、ＤＶＤやＲＯＭ、ＲＡＭ、ＨＤ、ＣＤ、ＦＤが例示される。
【００２８】
従って、上記課題を解決するために、本発明のオーサリングを含む信号処理方法は、（ａ）〜（ｂ）ステップを具備する。
（ａ）ステップは、オーディオビデオデータ（Ａ）をチャプタごとに分割して生成される複数のチャプタの各々ごとに動画サムネイル（Ｆ１、Ｊ）を作成し、動画サムネイル（Ｆ１、Ｊ）を含むメニュー画面データ（Ｈ１、Ｈ２）を作成する。ここで、オーディオビデオデータ（Ａ）は、複数の画像データと、その画像データの記録日時を示す日時データ（Ｔ０）と、オーディオビデオデータ（Ａ）におけるその画像データの位置を示す位置データ（ｔ０）を含む。メニュー画面データ（Ｈ１、Ｈ２）は、その複数のチャプタのうちの一部又は全部の動画サムネイル（Ｆ１、Ｊ）を同時に表示するメニュー画面（５０）を示す。（ｂ）ステップは、メニュー画面データ（Ｈ１、Ｈ２）を記憶媒体に記録する。
ここで、記録媒体としては、ＤＶＤやＲＯＭ、ＲＡＭ、ＨＤ、ＣＤ、ＦＤが例示される。
【００２９】
上記のオーサリングを含む信号処理方法において、（ａ）ステップは、（ａ１）から（ａ５）ステップを備える。
（ａ１）ステップは、オーディオビデオデータ（Ａ）に基づいて、オーディオビデオデータ（Ａ）を符号化した映像符号化データ（Ｄ）を作成する。（ａ２）ステップは、日時データ（Ｔ０）に基づいて、オーディオビデオデータ（Ａ）を分割して複数のチャプタを生成し、そのチャプタに関するデータを示すチャプタデータ（Ｅ）を作成する。（ａ３）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、その複数のチャプタの各々ごとに動画サムネイル（Ｆ１）を作成し、複数の動画サムネイル（Ｆ１）に関するデータを示す動画サムネイルデータ（Ｌ）を作成する。（ａ４）ステップは、映像符号化データ（Ｄ）とチャプタテーブル（Ｅ）とに基づいて、その複数のチャプタに関する制御情報を示す制御情報データ（Ｇ１）を作成する。（ａ５）ステップは、動画サムネイルデータ（Ｌ）と制御情報データ（Ｇ１）とに基づいて、メニュー画面データ（Ｈ１）を作成する。
【００３０】
上記のオーサリングを含む信号処理方法において、（ａ３）ステップは、（ａａ１）から（ａａ５）ステップを備える。
（ａａ１）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、映像符号化データ（Ｄ）のうちの画素差分値（Δ）が基準値以上となるハイライトシーンを検出する。ここで、チャプタデータ（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ａａ２）ステップは、そのハイライトシーンの長さが指定の再生時間と等しくなるようにその基準値を変化させ、等しくできればそのハイライトシーン有りと判定してそのハイライトシーンと位置データ（ｔ０）とを関連付けた差分値テーブルを生成する。等しくできなければそのハイライトシーン無しと判定する。（ａａ３）ステップは、そのハイライトシーンの有無、及び、そのハイライトシーンの状況に基づいて、そのチャプタごとに、動画サムネイル（Ｆ１）の作成方法を、予め設定された作成方法から選択する。（ａａ４）ステップは、その選択された作成方法に基づいて、そのチャプタごとに、動画サムネイル（Ｆ１）を作成する。（ａａ５）ステップは、作成された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を作成する。ここで、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
ただし、その作成方法は、そのハイライトシーンが無い場合、映像符号化データ（Ｄ）からフレームを間引いて動画サムネイル（Ｆ１）とする。そのハイライトシーンが複数有る場合、複数のそのハイライトシーンを連結させて動画サムネイル（Ｆ１）とする。そのハイライトシーンが一つしかない場合、そのハイライトシーンをそのまま動画サムネイル（Ｆ１）とする。
【００３１】
上記のオーサリングを含む信号処理方法において、（ａ３）ステップは、（ａａ６）から（ａａ９）ステップを備える。
（ａ３）ステップは、
（ａａ６）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、ＧＯＰの位置を検出する。ここで、チャプタデータ（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ａａ７）ステップは、検出されたそのＧＯＰに基づいて、そのチャプタごとに、そのＧＯＰ単位の符号量（Ｒ）と位置データ（ｔ０）とを関連付けた符号量テーブルを作成する。（ａａ８）ステップは、その符号量テーブルに基づいて、そのチャプタごとに、符号量（Ｒ）が最大となるそのＧＯＰを含む連続した所定の時間（２×Δｔ１）の映像符号化データ（Ｄ）を動画サムネイル（Ｆ１）として抽出する。（ａａ９）ステップは、抽出された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を作成する。ここで、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
【００３２】
上記のオーサリングを含む信号処理方法において、（ａ３）ステップは、（ａａ１０）から（ａａ１４）ステップを備える。
（ａａ１０）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、ＧＯＰの位置を検出する。ここで、チャプタデータ（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ａａ１１）ステップは、検出されたそのＧＯＰごとに、所定の色を示す画素データに対して、所定のポイントを付加する。（ａａ１２）ステップは、そのポイントに基づいて、そのチャプタごとに、そのＧＯＰごとのポイントの合計（Ｓ）と位置データ（ｔ０）とを関連付けたポイントテーブルを作成する。（ａａ１３）ステップは、そのポイントテーブルに基づいて、そのチャプタごとに、ポイントの合計（Ｓ）が最大となるそのＧＯＰを含む連続した所定の時間（２×Δｔ２）の映像符号化データ（Ｄ）を動画サムネイル（Ｆ１）として抽出する。（ａａ１４）ステップは、抽出された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を作成する。ここで、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
【００３３】
上記のオーサリングを含む信号処理方法において、（ａ）ステップは、（ａ６）〜（ａ８）ステップを具備する。
（ａ６）ステップは、オーディオビデオデータ（Ａ）に基づいて、オーディオビデオデータ（Ａ）を符号化した映像符号化データ（Ｄ）と、オーディオビデオデータ（Ａ）を圧縮した動画データ（Ｊ）を作成して動画データ（Ｊ）に関するデータを示す動画サムネイルデータ（Ｋ）とを作成する。ここで、動画サムネイルデータ（Ｋ）は、動画データ（Ｊ）と映像符号化データ（Ｄ）とを関連付けた動画データテーブル（Ｋ）を含む。動画データ（Ｊ）は、その符号化の過程で算出される符号化データに基づいて作成される。（ａ７）ステップは、日時データ（Ｔ０）に基づいて、オーディオビデオデータ（Ａ）を分割して複数のチャプタを生成し、チャプタに関するデータを示すチャプタデータ（Ｅ）を作成する。ここで、チャプタデータ（Ｅ）は、複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ａ８）ステップは、動画サムネイルデータ（Ｋ）とチャプタデータ（Ｅ）とに基づいて、メニュー画面データ（Ｈ２）を作成する。
【００３４】
上記のオーサリングを含む信号処理方法において、（ａ６）ステップは、（ａｂ１）〜（ａｂ３）ステップを具備する。
（ａｂ１）ステップは、オーディオビデオデータ（Ａ）に対してＤＣＴ演算を行う。（ａｂ２）ステップは、そのＤＣＴ演算に伴い生成するＤＣ係数に基づいて、動画データ（Ｊ）を作成する。（ａｂ３）ステップは、動画データ（Ｊ）と映像符号化データ（Ｄ）とに基づいて、動画サムネイルデータ（Ｋ）を作成する。
【００３５】
従って、上記課題を解決するために、本発明に関するコンピュータプログラムは、（ｃ）〜（ｄ）ステップを備える方法をコンピュータに実行させる。
（ｃ）ステップは、オーディオビデオデータ（Ａ）をチャプタごとに分割して生成される複数のチャプタの各々ごとに動画サムネイル（Ｆ１、Ｊ）を作成し、動画サムネイル（Ｆ１、Ｊ）を含むメニュー画面データ（Ｈ１、Ｈ２）を作成する。ここで、オーディオビデオデータ（Ａ）は、複数の画像データと、その画像データの記録日時を示す日時データ（Ｔ０）と、オーディオビデオデータ（Ａ）におけるその画像データの位置を示す位置データ（ｔ０）を含む。メニュー画面データ（Ｈ１、Ｈ２）は、その複数のチャプタのうちの一部又は全部の動画サムネイル（Ｆ１、Ｊ）を同時に表示するメニュー画面（５０）を示す。（ｄ）ステップは、メニュー画面データ（Ｈ１、Ｈ２）を記憶媒体に記録する。
ここで、記録媒体としては、ＤＶＤやＲＯＭ、ＲＡＭ、ＨＤ、ＣＤ、ＦＤが例示される。
【００３６】
また、上記のコンピュータプログラムにおいて、（ｃ）ステップは、（ｃ１）から（ｃ５）ステップを備える。
（ｃ１）ステップは、オーディオビデオデータ（Ａ）を符号化した映像符号化データ（Ｄ）を作成する。（ｃ２）ステップは、オーディオビデオデータ（Ａ）を分割して複数のチャプタを生成し、そのチャプタに関するデータを示すチャプタデータ（Ｅ）を作成する。（ｃ３）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、その複数のチャプタの各々ごとに動画サムネイル（Ｆ１）を作成し、複数の動画サムネイル（Ｆ１）に関するデータを示す動画サムネイルデータ（Ｌ）を作成する。（ｃ４）ステップは、映像符号化データ（Ｄ）とチャプタテーブル（Ｅ）とに基づいて、その複数のチャプタに関する制御情報を示す制御情報データ（Ｇ１）を作成する。（ｃ５）ステップは、動画サムネイルデータ（Ｌ）と制御情報データ（Ｇ１）とに基づいて、メニュー画面データ（Ｈ１）を作成する。
【００３７】
上記のプログラムにおいて、（ｃ２）ステップは、日時データ（Ｔ０）に基づいて、オーディオビデオデータ（Ａ）を分割して複数のチャプタを生成する。
【００３８】
また、上記のコンピュータプログラムにおいて、（ｃ３）ステップは、（ｃａ１）から（ｃａ５）ステップを備える。
（ｃａ１）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、映像符号化データ（Ｄ）のうちの画素差分値（Δ）が基準値以上となるハイライトシーンを検出する。ここで、チャプタデータ（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ｃａ２）ステップは、そのハイライトシーンの長さが指定の再生時間と等しくなるようにその基準値を変化させ、等しくできればそのハイライトシーン有りと判定してそのハイライトシーンと位置データ（ｔ０）とを関連付けた差分値テーブルを生成する。等しくできなければそのハイライトシーン無しと判定する。（ｃａ３）ステップは、そのハイライトシーンの有無、及び、そのハイライトシーンの状況に基づいて、そのチャプタごとに、動画サムネイル（Ｆ１）の作成方法を、予め設定された作成方法から選択する。（ｃａ４）ステップは、その選択された作成方法に基づいて、そのチャプタごとに、動画サムネイル（Ｆ１）を作成する。（ｃａ５）ステップは、作成された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を作成する。ここで、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
ただし、その作成方法は、そのハイライトシーンが無い場合、映像符号化データ（Ｄ）からフレームを間引いて動画サムネイル（Ｆ１）とする。そのハイライトシーンが複数有る場合、複数のそのハイライトシーンを連結させて動画サムネイル（Ｆ１）とする。そのハイライトシーンが一つしかない場合、そのハイライトシーンをそのまま動画サムネイル（Ｆ１）とする。
【００３９】
また、上記のコンピュータプログラムにおいて、（ｃ３）ステップは、（ｃａ６）から（ｃａ９）ステップを備える。
（ｃａ６）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、ＧＯＰの位置を検出する。ここで、チャプタデータ（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ｃａ７）ステップは、検出されたそのＧＯＰに基づいて、そのチャプタごとに、そのＧＯＰ単位の符号量（Ｒ）と位置データ（ｔ０）とを関連付けた符号量テーブルを作成する。（ｃａ８）ステップは、その符号量テーブルに基づいて、そのチャプタごとに、符号量（Ｒ）が最大となるそのＧＯＰを含む連続した所定の時間（２×Δｔ１）の映像符号化データ（Ｄ）を動画サムネイル（Ｆ１）として抽出する。（ｃａ９）ステップは、抽出された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を作成する。ここで、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
【００４０】
更に、上記のコンピュータプログラムにおいて、（ｃ３）ステップは、（ｃａ１０）から（ｃａ１４）ステップを備える。
（ｃａ１０）ステップは、映像符号化データ（Ｄ）とチャプタデータ（Ｅ）とに基づいて、そのチャプタごとに、ＧＯＰの位置を検出する。ここで、チャプタデータ（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ｃａ１１）ステップは、検出されたそのＧＯＰごとに、所定の色を示す画素データに対して、所定のポイントを付加する。（ｃａ１２）ステップは、そのポイントに基づいて、そのチャプタごとに、そのＧＯＰごとのそのポイントの合計（Ｓ）と位置データ（ｔ０）とを関連付けたポイントテーブルを作成する。（ｃａ１３）ステップは、そのポイントテーブルに基づいて、そのチャプタごとに、そのポイントの合計（Ｓ）が最大となるそのＧＯＰを含む連続した所定の時間（２×Δｔ２）の映像符号化データ（Ｄ）を動画サムネイル（Ｆ１）として抽出する。（ｃａ１４）ステップは、抽出された動画サムネイル（Ｆ１）に基づいて、動画サムネイルデータ（Ｌ）を作成する。ここで、動画サムネイルデータ（Ｌ）は、複数の動画サムネイル（Ｆ１）と、位置データ（ｔ０）とを関連付けた動画サムネイルテーブル（Ｌ）を含む。
【００４１】
更に、上記のコンピュータプログラムにおいて、（ｃ）ステップは、（ｃ６）〜（ｃ８）ステップを具備する。
（ｃ６）ステップは、オーディオビデオデータ（Ａ）を符号化した映像符号化データ（Ｄ）と、オーディオビデオデータ（Ａ）を圧縮した動画データ（Ｊ）を作成して動画データ（Ｊ）に関するデータを示す動画サムネイルデータ（Ｋ）とを作成する。ここで、動画サムネイルデータ（Ｋ）は、動画データ（Ｊ）と映像符号化データ（Ｄ）とを関連付けた動画データテーブル（Ｋ）を含む。動画データ（Ｊ）は、その符号化の過程で算出される符号化データに基づいて作成される。（ｃ７）ステップは、オーディオビデオデータ（Ａ）を分割して複数のチャプタを生成し、そのチャプタに関するデータを示すチャプタデータ（Ｅ）を作成する。ここで、チャプタデータ（Ｅ）は、その複数のチャプタの各々に対応したチャプタ日時データ（３３）とチャプタ位置データ（３２）とを関連付けているチャプタテーブル（Ｅ）を含む。チャプタ日時データ（３３）は、その複数のチャプタの各々における日時データ（Ｔ０）に基づくデータである。チャプタ位置データ（３２）は、その複数のチャプタの各々における位置データ（ｔ０）に基づくデータである。（ｃ８）ステップは、動画サムネイルデータ（Ｋ）とチャプタデータ（Ｅ）とに基づいて、メニュー画面データ（Ｈ２）を作成する。
【００４２】
上記のプログラムにおいて、（ｃ７）ステップは、日時データ（Ｔ０）に基づいて、オーディオビデオデータ（Ａ）を分割して複数のチャプタを生成する。
【００４３】
更に、上記のコンピュータプログラムにおいて、（ｃ６）ステップは、（ｃｂ１）〜（ｃｂ３）ステップを具備する。
（ｃｂ１）ステップは、オーディオビデオデータ（Ａ）に対してＤＣＴ演算を行う。（ｃｂ２）ステップは、そのＤＣＴ演算に伴い生成するＤＣ係数に基づいて、動画データ（Ｊ）を作成する。（ｃｂ３）ステップは、動画データ（Ｊ）と映像符号化データ（Ｄ）とに基づいて、動画サムネイルデータ（Ｋ）を作成する。
【００４４】
【発明の実施の形態】
以下、本発明であるオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法の実施の形態に関して、添付図面を参照して説明する。本実施の形態では、本発明であるオーサリング機能付き信号処理装置をＤＶＤ装置（ＭＰＥＧ装置）に適用した例について説明しているが、その他の映像記録装置に対しても適用が可能である。
【００４５】
（第１の実施の形態）
本発明であるオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法の第１の実施の形態について、添付図面を参照して説明する。
【００４６】
まず、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第１の実施の形態の構成について説明する。
図１は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第１の実施の形態の構成を示すブロック図である。ＤＶＤ装置１は、オーディオビデオデータＡ及び諸条件（Ｂ及びＣ、後述）の入力に基づいて、ＤＶＤに、オーディオビデオデータを符号化した映像符号化データＤ及びその他のデータを記録、格納する。ＤＶＤ装置１は、オーサリング機能付き信号処理装置２、ＤＶＤ駆動部３及びシステムマイコン４を具備する。ここでは、ＤＶＤのデータを読み出す構成を省略している。
【００４７】
オーサリング機能付き信号処理装置２は、システムマイコン４の制御により、オーディオビデオデータＡの入力に基づいて、映像符号化データＤ（後述）とメニュー画面データＨ１（後述）と制御情報データＧ１（後述）とを生成する。そして、それらを所定の記憶媒体に記録する制御を行う。記憶媒体は、ＤＶＤに例示される。
ＤＶＤ駆動部３は、システムマイコン４の制御により、オーサリング機能付き信号処理装置２から出力される映像符号化データＤとメニュー画面データＨ１と制御情報データＧ１とを、内部にセットされた記憶媒体に記録（格納）する。ここでは、記録媒体として、ＤＶＤを用いる。ただし、他の記録媒体（例示：ＲＯＭ、ＲＡＭ、ＣＤ、ＨＤ、ＦＤ）を用いることも可能である。
システムマイコン４は、オーサリング機能付き信号処理装置２及びＤＶＤ駆動部３を具備するＤＶＤ装置１を制御する。システムマイコン４は、ＭＰＵ（マイクロプロセッサユニット）に例示される。
【００４８】
オーサリング機能付き信号処理装置２は、エンコード部１１と、記録日時解析部１２と、動画サムネイル作成部１３と、制御情報データ作成部１４と、メニュー画面作成部１５と、書き込み制御部１６とを具備する。ここで、エンコード部１１と記録日時解析部１２とをデータ前処理部２−３ともいう。データ前処理部２−３と動画サムネイル作成部１３と制御情報データ作成部１４とをデータ作成部２−２ともいう。データ作成部２−２とメニュー画面作成部１５とをデータ処理部２−１ともいう。
【００４９】
エンコード部１１は、外部から入力されるオーディオビデオデータＡに基づいて、オーディオビデオデータを符号化した映像符号化データＤを生成する。
【００５０】
ここで、オーディオビデオデータＡは、デジタルビデオテープレコーダやアナログビデオテープレコーダのような機器から出力された複数のビデオ映像を有するデータであり、音声データと画像（動画を含む、本明細書中で同じ）データとを含む。画像データは、オーディオビデオデータＡにおけるその画像データを記録した日時（例示：西暦年：月：日：時：分：秒）としての日時データと、テープ（オーディオビデオデータＡ）の先頭からの位置（例示：時間時：分：秒）を示す位置データとを含む。オーディオビデオデータＡの符号化は、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ、本明細書中で同じ）の規格に基づいて行うエンコードを含む。映像符号化データＤは、ＭＰＥＧ２データ（ＶＯＢ（ＶｉｄｅｏＯｂｊｅｃｔ）データ）に例示される。
【００５１】
図３は、日時データ及び位置データを示す図である。オーディオビデオデータＡには、図中、一つの四角の枠で示される一つのフレームごとに、日時データとしてのオーディオビデオデータＡを記録した記録日時Ｔ０（図中、各フレームの下部に記載）と、位置データとしてのオーディオビデオデータＡを記録したテープの先頭からの時間ｔ０（図中、各フレームの上部に記載）とが共に記録されている。
【００５２】
図１を参照して、記録日時解析部１２は、外部から入力されるオーディオビデオデータＡの日時データに基づいて、オーディオビデオデータＡを複数のチャプタに分割する。すなわち、日時データが不連続な箇所を検出し、そこをチャプタの区切りとしてオーディオビデオデータＡを分割する。そして、複数のチャプタの各々に対応したチャプタ日時データとチャプタ位置データとを関連付けるチャプタテーブルＥ（後述）を生成する。ただし、オーディオビデオデータＡを分割するチャプタの最大数を示す最大チャプタ数データＢ（Ｎ）を外部から入力された場合、その最大数を越えないように、チャプタを調整する。
【００５３】
ここで、チャプタは、オーディオビデオデータＡを分割して得られるオーディオビデオデータＡの一部分のデータである。一つのチャプタは、オーディオビデオデータＡの内の連続した部分（連続したシーンを含む）でも良いし、連続していない部分を併せたもの（連続したシーンを複数含む）でも良い。
【００５４】
ここで、チャプタ日時データは、複数のチャプタの各々における日時データに基づくデータであり、チャプタの最初及び最後を示す日時データや、前のチャプタの最後を示す日時データと次のチャプタの最初を示す日時データの差に例示される。チャプタ位置データは、複数のチャプタの各々における位置データに基づくデータであり、チャプタの最初又は最後を示す位置データに例示される。
【００５５】
図２は、記録日時解析部１２の構成を示すブロック図である。記録日時解析部１２は、チャプタ分割部２１と、チャプタ制限部２２と、テーブル生成部２３とを備える。
【００５６】
チャプタ分割部２１は、日時データの連続性が無い箇所をディオビデオデータＡの区切りと判断する。そして、その区切り基づいて、オーディオビデオデータＡを複数のチャプタに分割する。
チャプタ制限部２２は、複数のチャプタの数が最大チャプタ数データＢで示される最大数Ｎを越える場合、複数のチャプタのうち、予め設定された条件を満たす隣り合う２つのチャプタを結合することにより、チャプタの数を最大数Ｎ以下に抑える。ただし、予め設定された条件は、前のチャプタの最後を示す日時データと次のチャプタの最初を示す日時データの差が最小となる２つのチャプタに例示される。
テーブル生成部２３は、複数のチャプタの各々に対応したチャプタ日時データとチャプタ位置データとの関係を示すチャプタテーブルＥを生成する。
【００５７】
図４は、チャプタテーブルＥを示す表である。チャプタテーブルＥは、複数のチャプタの各々に対応したチャプタ日時データとチャプタ位置データとを関連付けている。
ここで、チャプタ番号３１は、チャプタの通し番号である。チャプタ位置データとしてのチャプタ終了位置３２は、チャプタの最後を示す位置データである。時：分：秒で表示される。チャプタ日時データとしてのチャプタ記録日時間隔３３は、前のチャプタの最後を示す日時データと次のチャプタの最初を示す日時データの差を示す時間間隔である。日：時：分：秒で表示される。そして、チャプタは、チャプタ終了位置３２の小さい順位に並べられる。
【００５８】
図１を参照して、動画サムネイル作成部１３は、映像符号化データＤとチャプタテーブルＥとに基づいて、複数のチャプタの各々に対応する複数の動画サムネイルＦ１を作成する。このとき、動画サムネイルＦ１の作成条件を示す動画条件データＣ（外部から入力、デフォルト値を有していても良い）を参照し、その条件に適合するように各動画サムネイルＦ１を生成する。ここで、動画条件データＣは、動画サムネイルの画像サイズや動画サムネイルの再生時間に例示される。
ここで、動画サムネイルは、動画形式のサムネイル（プレビュー）である。基となるデータを圧縮（符号化などの画像処理）して生成される。
【００５９】
動画サムネイル作成部１３は、更に、動画サムネイルＦ１の各フレームと、映像符号化データＤとを関連付けた動画サムネイルテーブルＬを作成する。すなわち、動画サムネイルＦ１の各フレームを示す動画用画像データと、映像符号化データＤの位置データとを関連付けた動画サムネイルテーブルＬを作成する。
【００６０】
図５は、動画サムネイルテーブルＬを示す表である。フレームの通し番号であるフレーム番号４１、映像符号化データＤの位置データ（オーディオビデオデータの先頭からの時間）としての時刻４２及び動画用画像データとしてのフレーム画像データ４４がフレームごとに関連付けられている。動画サムネイルテーブルＬは、動画サムネイルＦ１ごとに設けても良いし、一つの動画サムネイルテーブルＬを適当に区切り、複数の動画サムネイルＦ１を含ませても良い。
【００６１】
図６は、動画サムネイル作成部１３の構成を示すブロック図である。動画サムネイル作成部１３は、ハイライトシーン検出部２６、作成手法選択部２７、作成手法実行部２８及びテーブル作成部２９を備える。
【００６２】
ハイライトシーン検出部２６は、映像符号化データＤとチャプタテーブルＥとに基づいて、各チャプタごとに、フレーム間の画素の差分としての画素差分値Δを検出する。フレーム単位の画素差分値Δは、チャプタごとに、映像符号化データＤの位置データ（オーディオビデオデータの先頭からの時間）ｔ０と関連付けられて、差分値テーブルとして記憶部（図示されず）に格納される。
【００６３】
ここで、差分値テーブルについて説明する。
図７は、差分値テーブルをグラフの形で表現した図である。縦軸は画素差分値Δ、横軸は位置データｔ０（ここでは「時間」）である。グラフ中の曲線Ｗは、画素差分値Δを示す。半直線α０及びα２は、それぞれ画素差分値Δ０及びΔ２を示す。時間ｔ１から時間ｔ２までが１チャプタとする。
ハイライトシーン検出部２６は、差分値テーブル（図７）からハイライトシーンを抽出する。ここで、ハイライトシーンは、画素差分値Δが所定の閾値よりも大きくなる状態が、予め設定された時間以上続く箇所である。図７において、閾値をΔ０とすれば、ハイライトシーンは曲線ＷのＰ１の箇所に相当する。この場合、閾値を所定の最大値Δ０から小さくしていくことで、ハイライトシーンを長くしていくことが出来る。例えば、図７において、閾値をΔ０からΔ１（半直線α１で表示）に小さくすることで、ハイライトシーンは、Ｐ１からＰ２＋Ｐ３に長くすることが出来る。この操作により、ハイライトシーンの時間（ハイライトシーンが複数ある場合には、その合計時間）を、指定された再生時間に合わせることが出来る。ここで、閾値が所定の最小値Δ２まで下げても、ハイライトシーンが指定された再生時間に達しない場合には、ハイライトシーンは無いとする。
ただし、瞬間的に画素差分値Δが、所定の閾値よりも大きくなるような箇所Ｑ（例示：カメラがパンするような場面）は、そのＰｘの時間が予め設定された時間未満なので、ハイライトと検出しない。
【００６４】
図６を参照して、作成手法選択部２７は、検出されたハイライトシーンの状況に応じて、下記の３種類の方法から動画サムネイルの作成方法を選択する。
（Ａ）ハイライトシーンが複数箇所あるチャプタは、ハイライトシーンを連結させ、動画サムネイルとする。
（Ｂ）ハイライトシーンが一箇所しかないチャプタは、その部分をそのまま動画サムネイルとする。
（Ｃ）ハイライトシーンが無いチャプタは、チャプタからフレームを間引いて圧縮し、動画サムネイルとする。例えば、１０分間のチャプタを短縮して再生時間１分間の動画サムネイルにするには、１フレーム表示−９フレームスキップ、又は、１秒表示−９秒スキップを繰り返すことで実現できる。
【００６５】
作成手法実行部２８は、作成手法選択部２７で選択された方法を用いて、動画サムネイルを作成する。
テーブル作成部２９は、作成された動画サムネイルを動画サムネイルテーブルＬに格納する。
を備える。
【００６６】
図１を参照して、制御情報データ作成部１４は、映像符号化データＤとチャプタテーブルＥとに基づいて、複数のチャプタに関するデータを含む制御情報データＧ１を作成する。すなわち、映像符号化データＤとチャプタテーブルＥとに基づいて、ＤＶＤの制御情報データＧ１（ビデオタイトルセット６７のＶＴＳＩ、後述）を作成し、その制御情報データＧ１のＰＴＴ（ＰａｒｔｏｆＴｉｔｌｅ、後述）に、各プログラム（ＰＧ、後述）がどのチャプタに含まれるかを示すチャプタデータ（例示：チャプタ番号３１）を格納する。
なお、ＤＶＤ以外の記憶媒体を用いる場合には、その記憶媒体に対応した制御情報データＧ１を作成する。
【００６７】
図８は、ＤＶＤに格納されるデータの構造を示す図である。ＤＶＤに格納されるデータ６１は、ビデオマネージャ（ＶＭＧ）６３と、ビデオタイトルセット（ＶＴＳ）６７とを備える。
ビデオマネージャ（ＶＭＧ）６３は、制御情報としてのＶＭＧＩと、メニュー画面データＨ１（後述）としてのＶＭＧＭ＿ＶＯＢＳと、ＶＭＧＩのバックアップとしてのＶＭＧＩ（ＢＵＰ）とを備える。
ビデオタイトルセット６７は、ビデオタイトルセット（ムービー（ビデオ映像）の集合）の制御情報としてのＶＴＳＩと、動画ファイルとしてのＶＴＳＭ＿ＶＯＢＳ〜ＶＴＳＴＴ＿ＶＯＢＳと、ＶＴＳＩのバックアップとしてのＶＴＳＩ（ＢＵＰ）とを備える。
ＶＴＳＩには、ビデオタイトルセットの内部構造が記述されている。ビデオタイトルセットの内部構造は、タイトル（個々のムービー）−プログラムチェーン（ＰＧＣ：プログラムの集合）−ＰＴＴ（チャプタ：ビデオストリーム内のセル境界線上に設定されるアクセスポイント）−プログラム（ＰＧ：セルの集合）−セル（ビデオオブジェクトユニットの集合）−ビデオオブジェクトユニット（ＶＯＢＵ：ＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｕｔｕｒｅｓに対応）の階層構造を有する。そして、ＶＴＳＩには、各階層がＶＴＳＭ＿ＶＯＢＳ〜ＶＴＳＴＴ＿ＶＯＢＳのどの部分に相当するかを記述している。
【００６８】
図１を参照して、メニュー画面作成部１５は、動画サムネイルテーブルＬと制御情報データＧ１とに基づいて、動画のメニュー画面を示すメニュー画面データＨ１を作成する。ただし、メニュー画面データＨ１は、制御情報データＧ１で示される各チャプタに対応させて、動画サムネイルテーブルＬから動画画像データ（フレーム画像データ４４）の該当部分を取り出して生成される。
ここで、動画のメニュー画面は、複数のチャプタに対応した複数の動画サムネイルＦ１を、一つの画面で一度に表示したものである。例えば、４つのチャプタが有る場合のメニュー画面では、一つの画面で４つの動画サムネイルＦ１を観ることが出来る。
そして、ポインティングデバイス（例示：マウス）により、画面上で動画サムネイルＦ１を選択できる。その場合、メニュー画面データＨ１において、各動画サムネイルＦ１は、制御情報データＧ１のＰＴＴと関連付けられているので、画面上で選択された動画サムネイルＦ１に対応するチャプタを再生することが出来る。
【００６９】
書き込み制御部１６は、映像符号化データＤとメニュー画面データＨ１と制御情報データＧ１とを受信し、それぞれのデータをＤＶＤの所定の領域に記録するように、ＤＶＤ駆動部３へのデータの出力の制御を行う。
このとき、メニュー画面データＨ１は、ＶＭＧ６３のＶＭＧＭ＿ＶＯＢＳに、制御情報データＧ１は、ＶＴＳ６７のＶＴＳＩに、映像符号化データＤは、ＶＴＳＭ＿ＶＯＢＳ〜ＶＴＳＴＴ＿ＶＯＢＳにそれぞれ格納される。
なお、ＤＶＤ以外の記憶媒体を用いる場合には、書き込み制御部１６は、その記憶媒体に対応したフォーマットに基づいて記録媒体に対する書き込みを制御する。
【００７０】
次に、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第１の実施の形態の動作（オーサリングを含む信号処理方法）について説明する。
図９は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第１の実施の形態の動作（オーサリングを含む信号処理方法）を示すフロー図である。
【００７１】
（１）ステップＳ０１
エンコード部１１は、外部から入力されたオーディオビデオデータＡに基づいて、オーディオビデオデータＡを符号化した映像符号化データＤを生成する。
（２）ステップＳ０２
記録日時解析部１２は、外部から入力されたオーディオビデオデータＡの日時データに基づいて、オーディオビデオデータＡを複数のチャプタに分割する。ただし、オーディオビデオデータＡを分割するチャプタの数を、外部から入力される最大チャプタ数データＢで示される最大チャプタ数Ｎを越えないようにチャプタを調整する。そして、チャプタテーブルＥを生成する。
（３）ステップＳ０３
動画サムネイル作成部１３は、映像符号化データＤとチャプタテーブルＥとに基づいて、複数のチャプタの各々に対応する複数の動画サムネイルＦ１を作成する。このとき、各動画サムネイルＦ１の画像サイズ及び再生時間を、外部から入力される動画条件データＣで示される画像サイズ及び再生時間となるように各動画サムネイルＦ１を生成する。そして、動画サムネイル作成部１３は、動画サムネイルＦ１の各フレームと、映像符号化データＤとを関連付けた動画サムネイルテーブルＬを作成する。
（４）ステップＳ０４
制御情報データ作成部１４は、映像符号化データＤとチャプタテーブルＥとに基づいて、複数のチャプタに関するデータを含む制御情報データＧ１を作成する。このとき、その制御情報データＧ１に、各プログラム３８がどのチャプタに含まれるかを示すチャプタデータが格納される。
（５）ステップＳ０５
メニュー画面作成部１５は、動画サムネイルテーブルＬと制御情報データＧ１とに基づいて、メニュー画面データＨ１を作成する。
（６）ステップＳ０６
書き込み制御部１６は、メニュー画面データＨ１と映像符号化データＤと制御情報データＧ１とをＤＶＤの所定の領域に記録するように、ＤＶＤ駆動部３へのデータの出力の制御を行う。ＤＶＤ駆動部３は、それらのデータをＤＶＤに書き込む。
【００７２】
ここで、ステップＳ０２のチャプタテーブルを作成する動作について更に説明する。
図１０は、ステップＳ０２のチャプタテーブルを作成する動作を示すフロー図である。
【００７３】
（１）ステップＳ２１
記録日時解析部１２のチャプタ分割部２１は、外部から入力されるオーディオビデオデータＡの日時データを検出している。
（２）ステップＳ２２
チャプタ分割部２１は、日時データに連続性があるか否かを判断する。連続性がある場合には、ステップＳ２１へ戻る。連続性が無い場合には、ステップＳ２３へ進む。ここで、日時データの連続性は、オーディオビデオデータＡに記録された映像に基づいて、秒単位、分単位、時間単位など予め設定しておく。ここでは、秒単位とする。
（３）ステップＳ２３
チャプタ分割部２１は、日時データの連続性無い箇所をディオビデオデータＡの区切りと判断する。そして、その区切りの位置の位置データ（先頭からの時間）を取得する。また、その区切りの位置における手前側のチャプタでの最後の日時データと、その次のチャプタでの最初の日時データとを取得する。ただし、この区切りの位置は、チャプタ候補の位置であり、確定ではない。
（４）ステップＳ２４
チャプタ制限部２２は、チャプタの数が一つ増えた場合、チャプタテーブルＥのチャプタの数（総チャプタ数）が、最大チャプタ数データＢで示される最大チャプタ数Ｎを越えるか否かを判断する。越える場合には、ステップＳ２６へ進む。越えない場合には、ステップＳ２５へ進む。
（５）ステップＳ２５
テーブル生成部２３は、チャプタテーブルＥを生成（更新）する。すなわち、チャプタ候補の位置における位置データを、チャプタテーブルＥのチャプタ終了位置３２とする。また、チャプタ候補の位置における手前側のチャプタでの最後の日時データと、その次のチャプタでの最初の日時データとの差を算出し、チャプタテーブルＥのチャプタ記録日時間隔３３とする。
（６）ステップＳ２６
チャプタ制限部２２は、チャプタテーブルＥの総チャプタ数が最大チャプタ数Ｎを越えるので、チャプタテーブルＥのチャプタを一つ削減する。削減する方法としては、チャプタテーブルＥ内の複数のチャプタのうち、チャプタ記録日時間隔３３（前のチャプタの最後を示す日時データと次のチャプタの最初を示す日時データとの差）が最小となる隣り合う２つのチャプタを選択し、結合することにより行う。
それと共に、チャプタテーブルＥを生成（更新）する。すなわち、チャプタ候補の位置における位置データをチャプタ終了位置３２とし、チャプタ候補の位置における手前側のチャプタでの最後の日時データと、その次のチャプタでの最初の日時データとの差をチャプタ記録日時間隔３３とする。
（７）ステップＳ２７
オーディオビデオデータＡが継続している場合には、ステップＳ２１に戻り、ステップＳ２１〜ステップＳ２６を繰り返す。
【００７４】
図１１は、ステップＳ０２における図４のチャプタテーブルＥを生成する過程を示す図である。ここでは、最大チャプタ数データＢの値が“５”（最大チャプタ数Ｎ＝５）の場合を例として説明する。
【００７５】
図１１（ａ）は、ステップＳ０２において生成されつつある図４のチャプタテーブルＥを示している。ここでは、５つのチャプタが見出された状態を示している。
この状態において、ステップＳ２３で、図１１（ｂ）に示すように、新たなチャプタが検出された場合を考える。この場合、ステップＳ２４で、最大チャプタ数Ｎ（＝５）＜総チャプタ数（＝６）と判断される。ここで、図１１（ａ）のチャプタ番号３１＝３のチャプタ（「チャプタ３」とする、他のチャプタも同様）が、チャプタテーブルＥの中でチャプタ記録日時間隔３３が最小（５分）となっている。従って、ステップＳ２６で、チャプタ３をチャプタ２へ結合することにより、チャプタ３を削除する。それと共に、チャプタ４及びチャプタ５は、繰り上がってチャプタ３及びチャプタ４となる。しかる後、新たに見出されたチャプタをチャプタ６として、チャプタテーブルＥを更新する。図１１（ｃ）が更新されたチャプタテーブルＥである。
【００７６】
このようにすることで、複数のビデオ映像を有するオーディオビデオデータを、自動的に複数のチャプタに区切り、且つ、チャプタの数を最大チャプタ数Ｎ以下に抑えることが出来る。
【００７７】
ここで、ステップＳ０３の動画サムネイルを作成する動作について更に説明する。
図１２は、ステップＳ０３の動画サムネイルを作成する動作を示すフロー図である。
【００７８】
（１）ステップＳ３１
動画サムネイル作成部１３のハイライトシーン検出部２６は、映像符号化データＤとチャプタテーブルＥとに基づいて、一つのチャプタを選択し、そのチャプタの映像符号化データＤを取得する。
（２）ステップＳ３２
ハイライトシーン検出部２６は、チャプタ全体の映像符号化データＤから、フレーム間の画素差分値Δを検出する。そして、画素差分値Δと位置データｔ０とを関連付けた差分値テーブル（図７）に格納する。
（３）ステップＳ３３
ハイライトシーン検出部２６は、差分値テーブル（図７）と動画条件データＣの再生時間とに基づいて、ハイライトシーンの時間が指定された再生時間と等しくなるように、閾値をΔ０から変化させる。そして、閾値がΔ２に達するまでに、ハイライトシーンの時間が指定された再生時間と等しくなる場合、ハイライトシーンがあると判定する（Ｙｅｓ）。閾値がΔ２に達しても、ハイライトシーンの時間が指定された再生時間未満である場合、ハイライトシーンが無いと判定する（Ｎｏ）。無い場合、ステップＳ３７へ進む。
（４）ステップＳ３４
作成手法選択部２７は、ハイライトシーンのある箇所を数える。そして、それが複数箇所ある場合（Ｎｏ）、ステップ３５へ進み、それが一箇所のみの場合（Ｙｅｓ）、ステップＳ３６へ進む。
（５）ステップＳ３５
作成方法実行部３８は、既述の（Ａ）の方法で、動画サムネイルを作成する。すなわち、ハイライトシーンが複数箇所あるので、ハイライトシーンを連結させ、動画サムネイルＦ１とする。ステップＳ３８へ進む。
（６）ステップＳ３６
作成方法実行部３８は、既述の（Ｂ）の方法で、動画サムネイルを作成する。すなわち、ハイライトシーンが一箇所しかないので、その部分をそのまま動画サムネイルＦ１とする。ステップＳ３８へ進む。
（７）ステップＳ３７
作成方法実行部３８は、既述の（Ｃ）の方法で、動画サムネイルを作成する。すなわち、ハイライトシーンが無いので、チャプタからフレームを間引いて圧縮し、動画サムネイルＦ１とする。ステップＳ３８へ進む。
（８）ステップＳ３８
テーブル作成部２９は、動画サムネイルＦ１の各フレームと、映像符号化データＤとを関連付けた動画サムネイルテーブルＬを作成する。
（１１）ステップＳ３９
テーブル作成部２９は、全てのチャプタについて動画サムネイルを作成している場合（Ｙｅｓ）、プロセスを終了する。動画サムネイルを作成していないチャプタが有る場合（Ｎｏ）、ステップＳ３１へもどる。
【００７９】
このようなプロセスにより、全てのチャプタについて、再生時間の揃った動画サムネイルを自動的に最適な方法で作成することが可能となる。
【００８０】
ここで、ステップＳ０５のメニュー画面データを作成する動作について更に説明する。
図１３は、ステップＳ０５のメニュー画面データを作成する動作を示すフロー図である。ここでは、チャプタの数が４個の場合について説明する。
【００８１】
（１）ステップＳ４１
メニュー画面作成部１５は、動画のメニュー画面（メニュー画面データＨ１）の第１フレームを作成するために、フレーム番号ｍ＝１を設定する。
（２）ステップＳ４２
メニュー画面作成部１５は、第１フレームのチャプタ１について処理を行うために、チャプタ番号ｋ＝１を設定する。
（３）ステップＳ４３
メニュー画面作成部１５は、動画サムネイルテーブルＬと制御情報データＧ１とに基づいて、チャプタ番号ｋ＝１のチャプタにおけるｍ＝１番目のフレームに相当するフレーム画像データ４４を動画サムネイルテーブルＬから取得する。そして、メニュー画面のｍ＝１番目のフレームの右上に貼り付ける。
（４）ステップＳ４４、ステップＳ４５
メニュー画面作成部１５は、チャプタ番号ｋが、最大チャプタ数Ｎ（ここでは、Ｎ＝４）以上となるまで、ステップＳ４３〜ステップＳ４５を繰り返す。
これにより、メニュー画面データＨ１のｍ＝１番目の１フレーム分が完成する。ただし、ｋ＝２の場合、フレームの左上、ｋ＝３の場合、フレームの右下、ｋ＝４の場合、フレームの左下に、フレーム画像データ４４をそれぞれ貼り付ける。
（５）ステップＳ４６、ステップＳ４７
メニュー画面作成部１５は、フレーム番号ｍが、指定フレーム数Ｍ（指定された再生時間に相当）以上となるまで、ステップＳ４２〜ステップＳ４７を繰り返す。
これにより、指定フレーム数Ｍ（指定された再生時間分）のメニュー画面（静止画）が生成される。
（６）ステップＳ４８
メニュー画面作成部１５は、得られた複数のメニュー画面（静止画）をＭＰＥＧ２規格により圧縮して、ＶＯＢファイルとし、メニュー画面データＨ１を完成させる。それには、上記プロセスにより４つのチャプタの動画サムネイルが含まれている。
【００８２】
図１４は、上記（１）ステップＳ４１〜（６）ステップＳ４８で作成されたメニュー画面データＨ１を用いた動画のメニュー画面を示す図である。メニュー画面５０は、チャプタ１の動画サムネイル５１−１、チャプタ２の動画サムネイル５１−２、チャプタ３の動画サムネイル５１−３、チャプタ４の動画サムネイル５１−４、メニューボタン５２を備える。
動画のメニュー画面５０の再生時、このメニュー画面全体が一つの動画として表示される。そして、ユーザーがチャプタ１〜チャプタ４の部分を選択すると、それぞれのチャプタのところへジャンプし、通常の映像が再生される。チャプタ数が多く、チャプタ５〜が存在する場合、メニューボタン５２を選択すると、チャプタ５〜チャプタ８のメニュー画面に切り替わる。動画メニューの選択や画面切り替えについては、従来知られた方法を使用することが出来る。
【００８３】
このようにして、各チャプタの動画サムネイルを含むメニュー画面データを、自動的に作成することが出来る。
【００８４】
本発明は、複数のビデオ映像を有するオーディオビデオデータをＤＶＤのような一つの記憶媒体に格納する場合に、日時データに基づいて、オーディオビデオデータの区切りを自動的に、より適切に見出し、所望のチャプタを構成することが出来る。
【００８５】
また、本発明は、区切られたチャプタごとに動画サムネイルを生成することができるので、各チャプタの内容を的確に把握することが可能となる。そして、それらの動画サムネイルを全て含んだメニュー画面を自動的に作成できるので、ＤＶＤに含まれる全てのオーディオビデオデータの内容を容易に把握することができる。
【００８６】
（第２の実施の形態）
本発明であるオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法の第２の実施の形態について、添付図面を参照して説明する。
【００８７】
まず、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第２の実施の形態の構成について説明する。
図１は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第２の実施の形態の構成を示すブロック図である。ＤＶＤ装置１は、オーディオビデオデータＡ及び諸条件（Ｂ及びＣ）の入力に基づいて、ＤＶＤに、オーディオビデオデータを符号化した映像符号化データＤ及びその他のデータを記録、格納する。ＤＶＤ装置１は、オーサリング機能付き信号処理装置２、ＤＶＤ駆動部３及びシステムマイコン４を具備する。ここでは、ＤＶＤのデータを読み出す構成を省略している。
【００８８】
第２の実施の形態では、動画サムネイル作成部１３ａによる動画サムネイルの作成方法が異なる。
通常、ＭＰＥＧで圧縮されたビデオデータ（ここでは、映像符号化データＤに対応）において、色の変化の激しい部分や動きの速い部分のような複雑な映像では、多くの符号が発生する。そのため、符号量が多くなる。逆に、映像の変化の少ない平坦な部分や動きの遅い部分では、符号があまり発生しない。そのため、符号量が少なくなる。第２の実施の形態では、この符号量に基づいて、ハイライトシーンを検出する。
【００８９】
図１を参照して、オーサリング機能付き信号処理装置２は、システムマイコン４の制御により、オーディオビデオデータＡの入力に基づいて、映像符号化データＤとメニュー画面データＨ１と制御情報データＧ１とを生成する。そして、それらを所定の記憶媒体に記録する制御を行う。記憶媒体は、ＤＶＤに例示される。
ＤＶＤ駆動部３及びシステムマイコン４は、第１の実施の形態と同様であるのでその説明を省略する。
【００９０】
オーサリング機能付き信号処理装置２は、エンコード部１１と、記録日時解析部１２と、動画サムネイル作成部１３ａと、制御情報データ作成部１４と、メニュー画面作成部１５と、書き込み制御部１６とを具備する。
【００９１】
動画サムネイル作成部１３ａは、映像符号化データＤとチャプタテーブルＥとに基づいて、複数のチャプタの各々に対応する複数の動画サムネイルＦ１を作成する。このとき、動画サムネイルＦ１の作成条件を示す動画条件データＣを参照し、その条件に適合するように各動画サムネイルＦ１を生成する。動画条件データＣは、動画サムネイルの画像サイズや動画サムネイルの再生時間に例示される。
【００９２】
動画サムネイル作成部１３ａは、更に、動画サムネイルＦ１の各フレームと、映像符号化データＤとを関連付けた動画サムネイルテーブルＬを作成する。すなわち、動画サムネイルＦ１の各フレームを示す動画用画像データと、映像符号化データＤの位置データとを関連付けた動画サムネイルテーブルＬを作成する。図５に示す動画サムネイルテーブルＬについては、第１の実施の形態での説明の通りである。
【００９３】
図１５は、動画サムネイル作成部１３ａの構成を示すブロック図である。動画サムネイル作成部１３ａは、データ検出部５６、データ解析部５７、データ抽出部５８及びテーブル作成部５９を備える。
【００９４】
データ検出部５６は、映像符号化データＤとチャプタテーブルＥとに基づいて、各チャプタごとに、映像符号化データＤの符号を解析し、ＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅ）の位置を検出する。
【００９５】
データ解析部５７は、検出されたＧＯＰごとに、その符号量（Ｂｙｔｅ数）を検出する。そして、ＧＯＰ単位の符号量は、チャプタごとに、映像符号化データＤの位置データ（オーディオビデオデータの先頭からの時間）ｔ０と関連付けられて、符号量テーブルとして記憶部（図示されず）に格納される。
【００９６】
ここで、符号量テーブルについて説明する。
図１６は、符号量テーブルをグラフの形で表現した図である。縦軸はＧＯＰごとの符号量Ｒ、横軸は位置データｔ０（ここでは「時間」）である。グラフ中の曲線Ｖは、符号量を示す。時間ｔ１から時間ｔ２までが１チャプタとする。点Ａ１は、チャプタにおける符号量が最大の点である。そのときの時間はｔＡ１である。
【００９７】
データ抽出部５８は、符号量テーブル（図１６）を解析して、符号量Ｒ最大の点Ａ１を検出する。そして、時間ｔＡ１を中心にして、前後の映像符号化データＤを指定された再生時間になるように、ハイライトシーンとして抽出する。すなわち、図１６において、ハイライトシーンは、時間ｔＣ１から時間ｔＢ１までの映像符号化データＤとなる。このとき、時間ｔＣ１−ｔＢ１＝再生時間、ただし、ｔＣ１−ｔＡ１＝Δｔ１＝ｔＡ１−ｔＢ１である。このハイライトシーンを動画サムネイルとする。
【００９８】
テーブル作成部５９は、作成された動画サムネイルを動画サムネイルテーブルＬに格納する。
【００９９】
エンコード部１１、記録日時解析部１２、制御情報データ作成部１４、メニュー画面作成部１５書き込み制御部１６（本実施の形態に関わる図２〜図４、図８の説明を含む）は、第１の実施の形態と同様であるので、その説明を省略する。
【０１００】
次に、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第２の実施の形態の動作（オーサリングを含む信号処理方法）について説明する。
本実施の形態に関わる図９〜図１１、図１３、図１４については、第１の実施の形態と同様であるのでその説明を省略する。
【０１０１】
次に、ステップＳ０３の動画サムネイルを作成する動作について更に説明する。
図１７は、ステップＳ０３の動画サムネイルを作成する動作を示すフロー図である。
【０１０２】
（１）ステップＳ５１
動画サムネイル作成部１３ａのデータ検出部５６は、映像符号化データＤとチャプタテーブルＥとに基づいて、一つのチャプタを選択し、そのチャプタの映像符号化データＤを取得する。
（２）ステップＳ５２
データ検出部５６は、チャプタ全体の映像符号化データＤの符号を解析し、ＧＯＰの位置を検出する。
（３）ステップＳ５３
データ解析部５７は、検出されたＧＯＰごとに、その符号量を検出する。そして、データ解析部５７は、ＧＯＰ単位の符号量と、映像符号化データＤの位置データｔ０とを関連付けて、符号量テーブル（図１６）として記憶部（図示されず）に格納される。
（４）ステップＳ５４
データ抽出部５８は、符号量テーブル（図１６）を解析して、符号量最大の点Ａ１を検出する。そして、時間ｔＡ１を中心にして、前後の映像符号化データＤを指定された再生時間になるように、ハイライトシーンとして抽出する。そして、それを動画サムネイルＦ１とする。
（５）ステップＳ５５
テーブル作成部５９は、動画サムネイルＦ１の各フレームと、映像符号化データＤとを関連付けた動画サムネイルテーブルＬを作成する。
（６）ステップＳ５６
テーブル作成部５９は、全てのチャプタについて動画サムネイルを作成している場合（Ｙｅｓ）、プロセスを終了する。動画サムネイルを作成していないチャプタが有る場合（Ｎｏ）、ステップＳ５１へもどる。
【０１０３】
このようなプロセスにより、全てのチャプタについて、再生時間の揃った動画サムネイルを自動的に最適な方法で作成することが可能となる。
この方法の場合、圧縮された映像符号化データＤをそのまま利用し、画像解析をせずにハイライトシーンを検出することが出来る。したがって、動画サムネイルを作成する際の処理にかかる資源を削減できるので、コストを軽減することが出来る。そして、短時間で実行することが出来る。
【０１０４】
本実施の形態においても、第１の実施の形態と同様の効果を得ることが出来る。
【０１０５】
（第３の実施の形態）
本発明であるオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法の第３の実施の形態について、添付図面を参照して説明する。
【０１０６】
まず、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第３の実施の形態の構成について説明する。
図１は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第３の実施の形態の構成を示すブロック図である。ＤＶＤ装置１は、オーディオビデオデータＡ及び諸条件（Ｂ及びＣ）の入力に基づいて、ＤＶＤに、オーディオビデオデータを符号化した映像符号化データＤ及びその他のデータを記録、格納する。ＤＶＤ装置１は、オーサリング機能付き信号処理装置２、ＤＶＤ駆動部３及びシステムマイコン４を具備する。ここでは、ＤＶＤのデータを読み出す構成を省略している。
【０１０７】
第３の実施の形態では、動画サムネイル作成部１３ｂによる動画サムネイルの作成方法が異なる。
第３の実施の形態では、映像内のデータの内、特に人の顔を検出し、ハイライトシーンとして抽出する。
【０１０８】
図１を参照して、オーサリング機能付き信号処理装置２は、システムマイコン４の制御により、オーディオビデオデータＡの入力に基づいて、映像符号化データＤとメニュー画面データＨ１と制御情報データＧ１とを生成する。そして、それらを所定の記憶媒体に記録する制御を行う。記憶媒体は、ＤＶＤに例示される。
ＤＶＤ駆動部３及びシステムマイコン４は、第１の実施の形態と同様であるのでその説明を省略する。
【０１０９】
オーサリング機能付き信号処理装置２は、エンコード部１１と、記録日時解析部１２と、動画サムネイル作成部１３ｂと、制御情報データ作成部１４と、メニュー画面作成部１５と、書き込み制御部１６とを具備する。
【０１１０】
動画サムネイル作成部１３ｂは、映像符号化データＤとチャプタテーブルＥとに基づいて、複数のチャプタの各々に対応する複数の動画サムネイルＦ１を作成する。このとき、動画サムネイルＦ１の作成条件を示す動画条件データＣを参照し、その条件に適合するように各動画サムネイルＦ１を生成する。動画条件データＣは、動画サムネイルの画像サイズや動画サムネイルの再生時間に例示される。
【０１１１】
動画サムネイル作成部１３ａは、更に、動画サムネイルＦ１の各フレームと、映像符号化データＤとを関連付けた動画サムネイルテーブルＬを作成する。すなわち、動画サムネイルＦ１の各フレームを示す動画用画像データと、映像符号化データＤの位置データとを関連付けた動画サムネイルテーブルＬを作成する。図５に示す動画サムネイルテーブルＬについては、第１の実施の形態での説明の通りである。
【０１１２】
図１８は、動画サムネイル作成部１３ｂの構成を示すブロック図である。動画サムネイル作成部１３ｂは、データ検出部７６、データ解析部７７、データ抽出部７８及びテーブル作成部７９を備える。
【０１１３】
データ検出部７６は、映像符号化データＤとチャプタテーブルＥとに基づいて、各チャプタごとに、映像符号化データＤの符号を解析し、ＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅ）の位置を検出する。次に、各ＧＯＰごとに、映像符号化データＤの符号を１フレームづつ解析する。そして、人の顔を検出し、ポイント化する。
【０１１４】
ポイント化により人の顔を検出するには、以下のようにして行う。
図１９は、人の顔を検出する方法を説明する図である。図１９（ａ）は、解析する画像（１フレーム）を示す。図１９（ｂ）は、マスク画像を示す。人の顔を検出するには、まず、解析する画像（ａ）で、肌色（所定の色の範囲）を示す画素を検出する。次に、マスク画像（ｂ）と解析する画像（ａ）とを比較して、マスク画像（ｂ）の白部分に肌色画素が検出されたときは、その画素をポイント＋１とし、黒部分に肌色画素が検出されたときは、その画素をポイント−１とする。
【０１１５】
データ解析部７７は、画像解析で得られたポイント数をＧＯＰごとに加える。そして、ポイントのＧＯＰ単位（ごと）の合計は、チャプタごとに、映像符号化データＤの位置データ（オーディオビデオデータの先頭からの時間）ｔ０と関連付けられて、ポイントテーブルとして記憶部（図示されず）に格納される。
【０１１６】
ここで、ポイントテーブルについて説明する。
図２０は、ポイントテーブルをグラフの形で表現した図である。縦軸はＧＯＰごとのポイントの合計Ｓ、横軸は位置データｔ０（ここでは「時間」）である。グラフ中の曲線Ｕは、ＧＯＰごとのポイントの合計を示す。時間ｔ１から時間ｔ２までが１チャプタとする。点Ａ２は、チャプタにおけるポイントの合計が最大の点である。そのときの時間はｔＡ２である。
【０１１７】
データ抽出部７８は、ポイントテーブル（図２０）を解析して、ＧＯＰごとのポイントの合計Ｓの最大の点Ａ２を検出する。そして、時間ｔＡ２を中心にして、前後の映像符号化データＤを指定された再生時間になるように、ハイライトシーンとして抽出する。すなわち、図２０において、ハイライトシーンは、時間ｔＣ２から時間ｔＢ２までの映像符号化データＤとなる。このとき、時間ｔＣ２−ｔＢ２＝再生時間、ただし、ｔＣ２−ｔＡ２＝Δｔ２＝ｔＡ２−ｔＢ２である。このハイライトシーンを動画サムネイルとする。
【０１１８】
テーブル作成部７９は、作成された動画サムネイルを動画サムネイルテーブルＬに格納する。
【０１１９】
エンコード部１１、記録日時解析部１２、制御情報データ作成部１４、メニュー画面作成部１５書き込み制御部１６（本実施の形態に関わる図２〜図４、図８の説明を含む）は、第１の実施の形態と同様であるので、その説明を省略する。
【０１２０】
次に、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第３の実施の形態の動作（オーサリングを含む信号処理方法）について説明する。
本実施の形態に関わる図９〜図１１、図１３、図１４については、第１の実施の形態と同様であるのでその説明を省略する。
【０１２１】
次に、ステップＳ０３の動画サムネイルを作成する動作について更に説明する。
図２１は、ステップＳ０３の動画サムネイルを作成する動作を示すフロー図である。
【０１２２】
（１）ステップＳ６１
動画サムネイル作成部１３ｂのデータ検出部７６は、映像符号化データＤとチャプタテーブルＥとに基づいて、一つのチャプタを選択し、そのチャプタの映像符号化データＤを取得する。
（２）ステップＳ６２
データ検出部７６は、チャプタ全体の映像符号化データＤの符号を１フレームづつ解析する。そして、所定の領域（マスク画像で指定）の肌色（所定の色の範囲）を示す画素を顔として検出し、ポイント化する。
（３）ステップＳ６３
データ解析部７７は、画像解析で得られたポイント数をＧＯＰごとに加える。そして、ＧＯＰ単位のポイントと、映像符号化データＤの位置データｔ０とを関連付けて、ポイントテーブル（図２０）として記憶部（図示されず）に格納される。
（４）ステップＳ６４
データ抽出部７８は、ポイントテーブル（図２０）を解析して、ポイント最大の点Ａ２を検出する。そして、時間ｔＡ２を中心にして、前後の映像符号化データＤを指定された再生時間になるように、ハイライトシーンとして抽出する。そして、それを動画サムネイルＦ１とする。
（５）ステップＳ６５
テーブル作成部７９は、動画サムネイルＦ１の各フレームと、映像符号化データＤとを関連付けた動画サムネイルテーブルＬを作成する。
（６）ステップＳ６６
テーブル作成部７９は、全てのチャプタについて動画サムネイルを作成している場合（Ｙｅｓ）、プロセスを終了する。動画サムネイルを作成していないチャプタが有る場合（Ｎｏ）、ステップＳ６１へもどる。
【０１２３】
このようなプロセスにより、全てのチャプタについて、再生時間の揃った動画サムネイルを自動的に最適な方法で作成することが可能となる。
この方法の場合、一般的な顔検出に比較して、色の比較を行うだけなので、処理を高速で行うことが出来る。
【０１２４】
本実施の形態では、人の顔を検出しているが、例えば、特定の動物の色を設定すれば、その特定の動物を検出して動画サムネイルを生成することが出来る。同様に、例えば、植物の緑、空のスカイブルーのような色を設定すれば、自然の風景を検出して動画サムネイルを作成することが出来る。色の設定は、例えば、動画条件データＣから入力するようにする。
【０１２５】
本実施の形態においても、第１の実施の形態と同様の効果を得ることが出来る。
【０１２６】
（第４の実施の形態）
本発明であるオーサリング機能付き信号処理装置及びオーサリングを含む信号処理方法の第４の実施の形態について、添付図面を参照して説明する。
【０１２７】
まず、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第４の実施の形態の構成について説明する。
図２２は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第４の実施の形態の構成を示すブロック図である。ＤＶＤ装置１ａは、オーディオビデオデータＡ、及び諸条件（Ｂ及びＣ、後述）の入力に基づいて、ＤＶＤに、オーディオビデオデータを符号化した映像符号化データ及びその他のデータを記録、格納する。ＤＶＤ装置１ａは、オーサリング機能付き信号処理装置２ａ、ＤＶＤ駆動部３及びシステムマイコン４を具備する。ここでは、ＤＶＤのデータを読み出す構成を省略している。
【０１２８】
オーサリング機能付き信号処理装置２ａは、システムマイコン４の制御により、オーディオビデオデータＡの入力に基づいて、映像符号化データＤ（後述）とメニュー画面データＨ２（後述）と制御情報データＧ２（後述）とを生成する。そして、それらを所定の記憶媒体に記録する制御を行う。記憶媒体は、ＤＶＤに例示される。
ＤＶＤ駆動部３は、システムマイコン４の制御により、オーサリング機能付き信号処理装置２ａから出力される映像符号化データＤとメニュー画面データＨ２と制御情報データＧ２とを、内部にセットされた記憶媒体に記録（格納）する。
システムマイコン４は、オーサリング機能付き信号処理装置２ａ及びＤＶＤ駆動部３を具備するＤＶＤ装置１を制御する。システムマイコン４は、ＭＰＵ（マイクロプロセッサユニット）に例示される。
【０１２９】
オーサリング機能付き信号処理装置２ａは、エンコード部１１ａと、記録日時解析部１２と、制御情報データ作成部１４ａと、メニュー画面作成部１５ａと、書き込み制御部１６ａとを具備する。ここで、エンコード部１１ａと記録日時解析部１２とをデータ前処理部２ａ−２ともいう。データ前処理部２ａ−２と制御情報データ作成部１４ａとメニュー画面作成部１５ａとをデータ処理部２ａ−１ともいう。
【０１３０】
エンコード部１１ａは、外部から入力されるオーディオビデオデータＡに基づいて、オーディオビデオデータを符号化した映像符号化データＤを生成する。それと共に、オーディオビデオデータＡを符号化する過程で算出される符号化データに基づいて、動画データＪを生成する。
【０１３１】
ここで、符号化データは、オーディオビデオデータの符号化の際に用いられる離散コサイン変換（ＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍ、本明細書中「ＤＣＴ」とも記す）の演算の結果として得られるＤＣ係数（直流成分）のみを抽出したデータである。そして、動画データＪは、オーディオビデオデータＡのＤＣ係数で構成される画像（サイズ＝元のフレームの縦１／８×横１／８）を連続的に並べて生成される。オーディオビデオデータＡ、画像データ、オーディオビデオデータＡの符号化及び映像符号化データＤは、第１の実施の形態と同様である。
【０１３２】
動画データＪは、オーディオビデオデータＡを圧縮したものである。これを、各チャプタごとに分割することで、チャプタごとの動画サムネイルＦ２となる。すなわち、動画データＪは、動画サムネイルＦ２の集合である。
【０１３３】
エンコード部１１ａは、更に、動画データＪの各フレームと、映像符号化データＤとを関連付けた動画データテーブルＫを作成する。すなわち、動画データＪの各フレームを示す動画用画像データと、映像符号化データＤの位置データ及びＶＯＢファイルのアドレスとを関連付けた動画データテーブルＫを作成する。ＶＯＢアドレスは、各フレームに対する圧縮（ＭＰＥＧ）後のデータ位置（先頭からのアドレス）である。
【０１３４】
映像符号化データＤを生成する際に、同時に僅かな処理の追加だけで、動画サムネイルＦ２（後述）用の動画データＪを得ることが出来る。この動画データＪは、ＤＣＴ演算を行う１ブロック（８画素×８画素）のデータを一つのＤＣ係数で代表させるので、縦１／８×横１／８＝１／６４にデータを圧縮することが出来る。この場合、データが大胆に省略されているが、メニュー画面の動画サムネイルの用途としては、充分な解像度を得られる。
【０１３５】
図２３は、動画データテーブルＫを示す表である。フレームの通し番号であるフレーム番号４１、映像符号化データＤの位置データ（オーディオビデオデータの先頭からの時間）としての時刻４２、及び映像符号化データＤ内のフレームに対するアドレスとしてのＶＯＢファイルのアドレス４３、動画用画像データとしてのフレーム画像データ４４がフレームごとに関連付けられている。
【０１３６】
記録日時解析部１２及びチャプタテーブルＥは、第１の実施の形態と同様であるので、その説明を省略する。
【０１３７】
図２２を参照して、メニュー画面作成部１５ａは、動画データテーブルＫとチャプタテーブルＥとに基づいて、動画のメニュー画面を示すメニュー画面データＨ２を作成する。ただし、動画サムネイルＦ２は、チャプタテーブルＥで定義されるチャプタに対応させて、動画データテーブルＫから動画データＪの該当部分を取り出して生成する。
ここで、動画のメニュー画面は、複数のチャプタに対応した複数の動画サムネイルＦ２（図示されず）を、一つの画面で一度に表示したものである。例えば、４つのチャプタが有る場合のメニュー画面では、一つの画面で４つの動画サムネイルＦ２を観ることが出来る。このとき、動画サムネイルＦ２の条件を示す動画条件データＣ（例示：画面サイズ、再生時間）を参照し、条件に適合するように各動画サムネイルＦ２を生成する。
【０１３８】
そして、ポインティングデバイス（例示：マウス）により、画面上で動画サムネイルＦ２を選択できる。その場合、メニュー画面データＨ２において、各動画サムネイルＦ２は、動画データテーブルＫのＶＯＢファイルのアドレス４３と関連付けられているので、画面上で選択された動画サムネイルＦ２に対応するチャプタを再生することが出来る。
【０１３９】
制御情報データ作成部１４ａは、動画データテーブルＫとチャプタテーブルＥとに基づいて、複数のチャプタに関するデータを含む制御情報データＧ２（後述）を作成する。すなわち、動画データテーブルＫ（の動画データＪ）に基づいてＤＶＤの制御情報データＧ２を作成し、その制御情報データＧ２のＰＴＴ３７に、各プログラムがどのチャプタに含まれるかを示すチャプタデータ（例示：チャプタ番号３１）を格納する。ただし、制御情報データＧ２は、第１の実施の形態と同様である。
【０１４０】
書き込み制御部１６ａは、映像符号化データＤとメニュー画面データＨ２と制御情報データＧ２とを受信し、それぞれのデータをＤＶＤの所定の領域に記録するように、ＤＶＤ駆動部３へのデータの出力の制御を行う。
このとき、メニュー画面データＨ２は、ＶＭＧ６３のＶＭＧＭ＿ＶＯＢＳに、制御情報データＧ２は、ＶＴＳ６７のＶＴＳＩに、映像符号化データＤは、ＶＴＳＭ＿ＶＯＢＳ〜ＶＴＳＴＴ＿ＶＯＢＳにそれぞれ格納される。
【０１４１】
次に、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第４の実施の形態の動作（オーサリングを含む信号処理方法）について説明する。
図２４は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置（ＭＰＥＧ装置）の第４の実施の形態の動作（オーサリングを含む信号処理方法）を示すフロー図である。
【０１４２】
（１）ステップＳ１１
エンコード部１１ａは、外部から入力されたオーディオビデオデータＡに基づいて、オーディオビデオデータＡを符号化した映像符号化データＤを生成する。それと共に、オーディオビデオデータＡを符号化する過程で算出される符号化データに基づいて、動画データＪを生成し、動画データＪの各フレームと、映像符号化データＤとを関連付けた動画データテーブルＫを作成する。
【０１４３】
（２）ステップＳ１２
記録日時解析部１２は、外部から入力されたオーディオビデオデータＡの日時データに基づいて、オーディオビデオデータＡを複数のチャプタに分割する。ただし、オーディオビデオデータＡを分割するチャプタの数を、外部から入力される最大チャプタ数データＢで示される最大チャプタ数Ｎを越えないようにチャプタを調整する。そして、チャプタテーブルＥを生成する。
【０１４４】
（３）ステップＳ１３
メニュー画面作成部１５ａは、動画データテーブルＫとチャプタテーブルＥとに基づいて、動画のメニュー画面を示すメニュー画面データＨ２を作成する。
【０１４５】
（４）ステップＳ１４
制御情報データ作成部１４ａは、動画データテーブルＫとチャプタテーブルＥとに基づいて、複数のチャプタに関するデータを含む制御情報データＧ２を作成する。このとき、その制御情報データＧ２のＰＴＴに、各プログラムがどのチャプタに含まれるかを示すチャプタデータ（例示：チャプタ番号３１）が格納される。
【０１４６】
（５）ステップＳ１５
書き込み制御部１６は、メニュー画面データＨ２と映像符号化データＤと制御情報データＧ２とをＤＶＤの所定の領域に記録するように、ＤＶＤ駆動部３へのデータの出力の制御を行う。ＤＶＤ駆動部３は、それらのデータをＤＶＤに書き込む。
【０１４７】
ここで、ステップＳ１１の動作の内、における動画データテーブルＫを作成する動作について更に説明する。
図２５は、ステップＳ１１における動画データテーブルＫを作成する動作を示すフロー図である。これらのステップは、Ｉピクチャのみに対して行う。
【０１４８】
（１）ステップＳ７１
ＭＰＥＧの符号化の際に行うＤＣＴ演算で得られるＤＣ係数を抽出する。
（２）ステップＳ７２
オーディオビデオデータＡの１フレーム分のＤＣ係数を再配置して、１フレーム分の動画用の画像（サイズ＝元のフレームの縦１／８×横１／８）を生成する。
（３）ステップＳ７３
生成された動画用の画像を連続的に並べることにより動画データＪを生成する。
（４）ステップＳ７４
動画用の画像の各フレームごとに、動画データＪと、映像符号化データＤとを関連付けた動画データテーブルＫを作成する。すなわち、動画用の画像の各フレームを示す動画用画像データと、映像符号化データＤの位置データ及びＶＯＢファイルのアドレスとを関連付けた動画データテーブルＫを作成する。
【０１４９】
動画データＪは、オーディオビデオデータＡの１ＧＯＰ（１ＶＯＢＵ＝通常０．５秒）分を、１枚の画像で表すことになる。すなわち、大幅に短縮された動画データを得ることが出来る。
【０１５０】
ただし、ステップＳ１２のチャプタテーブルを作成する動作については、図８に示すステップＳ０２と同様であるのでその説明を省略する。
【０１５１】
ここで、ステップＳ１３のメニュー画面データを作成する動作について更に説明する。
図２６は、ステップＳ１３のメニュー画面データを作成する動作を示すフロー図である。ここでは、チャプタの数が４個の場合について説明する。
【０１５２】
（１）ステップＳ８１
メニュー画面作成部１５ａは、各チャプタの開始時刻（位置データ：オーディオビデオデータでの時刻）を特定するために、チャプタ番号ｋ＝１を設定する。
（２）ステップＳ８２
メニュー画面作成部１５ａは、チャプタ番号ｋの開始時刻について、チャプタテーブルＥのチャプタ番号４１＝ｋの欄の１つ前の欄の時刻４２（前のチャプタの終了時刻）から、チャプタ番号ｋの開始時刻を求める。（開始時刻）＝（１つ前の欄の時刻４２）＋（１フレーム分の時間）、で求める。
例えば、チャプタ番号ｋ＝２の場合、チャプタ番号４１＝ｋ＝２の欄の１つ前の欄は、チャプタ番号４１＝１の欄である。したがって、（チャプタ番号ｋ＝２の開始時刻）＝（チャプタ番号ｋ＝１の欄の時刻４２（チャプタ１の終了時刻））＋（１フレーム分の時間）、で求まる。ただし、チャプタ番号ｋ＝１は、最初のチャプタであり、１つ前の欄は存在しないので、その場合、開始時刻＝０とする。
（３）ステップＳ８３、ステップＳ８４
メニュー画面作成部１５ａは、チャプタ番号ｋが、最大チャプタ数Ｎ（ここでは、Ｎ＝４）以上となるまで、ステップＳ８２〜ステップＳ８４を繰り返す。
これにより、各チャプタの開始時刻を特定することが出来る。
（４）ステップＳ８５
メニュー画面作成部１５ａは、動画のメニュー画面（メニュー画面データＨ２）の第１フレームを作成するために、フレーム番号ｍ＝１を設定する。
（５）ステップＳ８６
メニュー画面作成部１５ａは、第１フレームのチャプタ１について処理を行うために、チャプタ番号ｋ＝１を設定する。
（６）ステップＳ８７
メニュー画面作成部１５ａは、動画データテーブルＫに基づいて、チャプタ番号ｋ＝１のチャプタにおけるｍ＝１番目のフレームに相当するフレーム画像データ４４を動画データテーブルＫから取得する。そして、メニュー画面のｍ＝１番目のフレームの右上に貼り付ける。
（７）ステップＳ８８、ステップＳ８９
メニュー画面作成部１５ａは、チャプタ番号ｋが、最大チャプタ数Ｎ（ここでは、Ｎ＝４）以上となるまで、ステップＳ８７〜ステップＳ８９を繰り返す。
これにより、メニュー画面データＨ２のｍ＝１番目の１フレーム分が完成する。ただし、ｋ＝２の場合、フレームの左上、ｋ＝３の場合、フレームの右下、ｋ＝４の場合、フレームの左下に、フレーム画像データ４４をそれぞれ貼り付ける。
（８）ステップＳ９０、ステップＳ９１
メニュー画面作成部１５ａは、フレーム番号ｍが、指定フレーム数Ｍ（指定された再生時間に相当）以上となるまで、ステップＳ８６〜ステップＳ９１を繰り返す。
これにより、指定フレーム数Ｍ（指定された再生時間分）のメニュー画面（静止画）が生成される。
（９）ステップＳ９２
メニュー画面作成部１５ａは、得られた複数のメニュー画面（静止画）をＭＰＥＧ２規格により圧縮して、ＶＯＢファイルとし、メニュー画面データＨ２を完成させる。それには、上記プロセスにより４つのチャプタの動画サムネイルが含まれている。
【０１５３】
上記（１）ステップＳ８１〜（９）ステップＳ９２で作成されたメニュー画面データＨ２は、図１４と同様であるので、その説明を省略する。
【０１５４】
このようにして、各チャプタの動画サムネイルを含むメニュー画面データを、自動的に作成することが出来る。
【０１５５】
以上のように本発明は、第１の実施の形態と同様の効果を得ることが出来る。
【０１５６】
【発明の効果】
本発明により、複数のオーディオビデオデータを一つの記憶媒体に格納する場合に、オーディオビデオデータを自動的に区切り、チャプタを自動生成し、動画サムネイルを用いたメニュー画面を自動的に生成することが可能になる。そして、記憶媒体内の内容を迅速且つ的確に把握することが可能になる。
【図面の簡単な説明】
【図１】図１は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置の第１〜３の実施の形態の構成を示すブロック図である。
【図２】図２は、記録日時解析部の構成を示す図である。
【図３】図３は、日時データ及び位置データを示す図である。
【図４】図４は、チャプタテーブルを示す表である。
【図５】図５は、動画サムネイルテーブルを示す表である。
【図６】図６は、動画サムネイル作成部１３の構成を示すブロック図である。
【図７】図７は、差分値テーブルをグラフの形で表現した図である。
【図８】図８は、ＤＶＤに格納されるデータの構造を示す図である。
【図９】図９は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置の第１〜３の実施の形態の動作を示すフロー図である。
【図１０】図１０は、ステップＳ０２のチャプタテーブルを作成する動作を示すフロー図である。
【図１１】図１１（ａ）〜（ｃ）は、図４のチャプタテーブルを生成する過程を示す図である。
【図１２】図１２は、ステップＳ０３の動画サムネイルを作成する動作を示すフロー図である。
【図１３】図１３は、ステップＳ０５のメニュー画面データを作成する動作を示すフロー図である。
【図１４】図１４は、動画のメニュー画面を示す図である。
【図１５】図１５は、動画サムネイル作成部１３ａの構成を示すブロック図である。
【図１６】図１６は、符号量テーブルをグラフの形で表現した図である。
【図１７】図１７は、ステップＳ０３の動画サムネイルを作成する動作を示すフロー図である。
【図１８】図１８は、動画サムネイル作成部１３ｂの構成を示すブロック図である。
【図１９】図１９（ａ）（ｂ）は、人の顔を検出する方法を説明する図である。
【図２０】図２０は、ポイントテーブルをグラフの形で表現した図である。
【図２１】図２１は、ステップＳ０３の動画サムネイルを作成する動作を示すフロー図である。
【図２２】図２２は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置の第４の実施の形態の構成を示すブロック図である。
【図２３】図２３は、動画データテーブルを示す表である。
【図２４】図２４は、本発明であるオーサリング機能付き信号処理装置を適用したＤＶＤ装置の第４の実施の形態の動作を示すフロー図である。
【図２５】図２５は、ステップＳ１１における動画データテーブルを作成する動作を示すフロー図である。
【図２６】図２６は、ステップＳ１３のメニュー画面データを作成する動作を示すフロー図である。
【符号の説明】
１（ａ）ＤＶＤ装置
２（ａ）オーサリング機能付き信号処理装置
２−１、２ａ−１データ処理部
２−２データ作成部
２ａ−２データ前処理部
２−３データ前処理部
３ＤＶＤ駆動部
４システムマイコン
１１（ａ）エンコード部
１２記録日時解析部
１３（ａ、ｂ）動画サムネイル作成部
１４（ａ）制御情報データ作成部
１５（ａ）メニュー画面作成部
１６（ａ）書き込み制御部
２１チャプタ分割部
２２チャプタ制限部
２３テーブル生成部
２５、５５、７５動画サムネイル作成実行部
２６ハイライトシーン検出部
２７作成手法選択部
２８作成手法実行部
２９テーブル作成部
３１チャプタ番号
３２チャプタ終了位置
３３チャプタ記録日時間隔
３４ビデオタイトルセット
３５タイトル
３６プログラムチェーン（ＰＧＣ）
３７ＰＴＴ（チャプタ）
３８プログラム（ＰＧ）
３８−１セル
３８−２ビデオオブジェクトユニット（ＶＯＢＵ）
３８−３パック
５０メニュー画面
５１−１チャプタ１の動画サムネイル
５１−２チャプタ２の動画サムネイル
５１−３チャプタ３の動画サムネイル
５１−４チャプタ４の動画サムネイル
５２メニューボタン
５６、７６データ検出部
５７、７７データ解析部
５８、７８データ抽出部
５９、７９テーブル作成部
６１ＤＶＤに格納されるデータ
６３ビデオマネージャ（ＶＭＧ）
６７ビデオタイトルセット（ＶＴＳ）
Ａオーディオビデオデータ
Ｂ最大チャプタ数データ
Ｃ動画条件データ
Ｄ映像符号化データ
Ｅチャプタテーブル
Ｆ（１、２）動画サムネイル
Ｇ（１、２）制御情報データ
Ｈ（１、２）メニュー画面データ
Ｉ（１、２）動画サムネイル＋制御情報データ＋メニュー画面データ
Ｊ動画データ
Ｋ動画データテーブル
Ｌ動画サムネイルテーブル[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a signal processing device with an authoring function and a signal processing method including authoring, and more particularly, to a signal processing device with an authoring function and a signal processing method including authoring that are used when audio-video data is recorded on a storage medium.
[0002]
[Prior art]
Audio video data from a digital video (DV) tape or an analog video tape recorder (VTR) (sound data, image (including moving image) data, and date / time data indicating recording date / time, the same in this specification), DVD When recording on a large-capacity storage medium such as (Digital Versatile Disc), audio video data including a plurality of video images may be collectively recorded on one DVD. In that case, what kind of contents of the audio video data are recorded in one DVD must be viewed through when the contents are not recorded separately.
[0003]
Techniques for avoiding such trouble are known. For example, there are the following techniques.
First, for example, audio video data stored in one DVD is divided into a plurality of chapters so that the cueing can be performed for each characteristic scene or each scene to be viewed together. Next, a thumbnail (still image) of the top screen for each chapter is extracted as a representative image of the chapter. Then, all the extracted thumbnails are simultaneously displayed on the display screen of the display (or a part of the thumbnails is displayed, and the rest can be displayed by scrolling). In this way, since a list of representative images of a plurality of video images can be viewed on one display screen, the contents of the audio video data in the DVD can be grasped in a short time. Then, it is possible to easily cue each thumbnail.
[0004]
Here, as a method of automatically dividing the audio video data into a plurality of chapters, a method of detecting a change in audio video data (image data and audio data) satisfying a predetermined condition and dividing it at the place, or audio video data A method is known in which a marker recorded above is detected and separated at that location. As a method of automatically extracting a representative image and making it a thumbnail (still image), a method of using a head image of a divided chapter as a representative image is known.
[0005]
However, there are cases where a desired chapter cannot be constructed because the audio video data is not separated at an appropriate position only by the change of the audio video data or the marker on the audio video data. If the thumbnail is a still image, it may be difficult to accurately grasp the contents of the chapter unless the representative image is appropriately selected.
There is a demand for a technique that can automatically find a break of audio video data having a plurality of video images more appropriately and configure a desired chapter. There is a demand for a technique that can generate thumbnails that can accurately grasp the contents of chapters.
[0006]
As a related technique, Japanese Patent Application Laid-Open No. 2002-152636 (Patent Document 1) discloses a technique of a recording / reproducing apparatus with an automatic chapter creation function (Related: Japanese Patent Application Laid-Open No. 2002-152665 (Patent Document 2), Japanese Patent Application Laid-Open No. 2002-260688) -152666 (Patent Document 3)).
The recording / reproducing apparatus with an automatic chapter creation function of this technique is an apparatus having a recording / reproducing medium, recording / reproducing processing means, display signal deriving means, system control means, and pause means. Here, the recording / reproducing medium includes a video information recording area in which video information including a program is recorded, a video management information recording area in which management information for recording / reproducing the video information is recorded, and each chapter of the program. At least a chapter management information recording area. The recording processing / reproducing processing means records information on the recording / reproducing medium and reproduces the recorded information. The display signal deriving unit supplies the reproduction signal from the reproduction processing unit to the display. The system control unit controls the recording processing / reproducing processing unit and the display signal deriving unit. The temporary stop means causes the whole-air recording processing means to temporarily stop the recording process via the system control means. And this apparatus has means for registering in the chapter management information as a chapter boundary between the recording information when the pause means executes the pause and when the recording is resumed. .
An object of this technique is to provide a recording / reproducing apparatus with an automatic chapter creation function that automatically creates chapters and thumbnails on a storage medium on which a large number of programs (audio video data) are continuously recorded. is there.
[0007]
In this technique, when audio / video data is paused during recording of audio / video data, a plurality of chapters are determined by using it as a chapter break. Then, the top screen of each chapter is extracted as a thumbnail (still image), and a representative screen list is generated. Chapter editing can also be performed manually.
[0008]
[Patent Document 1]
JP 2002-152636 A
[Patent Document 2]
JP 2002-152665 A
[Patent Document 3]
JP 2002-152666 A
[0009]
[Problems to be solved by the invention]
Accordingly, an object of the present invention is to provide an authoring function capable of storing so that the contents of the stored audio video data can be quickly and accurately grasped when audio video data having a plurality of images is stored in one storage medium. A signal processing apparatus and a signal processing method including authoring are provided.
[0010]
In addition, another object of the present invention is to automatically and more appropriately find audio video data delimiters when audio video data having a plurality of images is stored in a single storage medium, and to accurately configure chapters. An object of the present invention is to provide a signal processing apparatus with an authoring function and a signal processing method including authoring.
[0011]
Still another object of the present invention is to automatically generate a thumbnail that can accurately grasp the contents of each divided chapter when audio / video data having a plurality of videos is stored in one storage medium. An object of the present invention is to provide a signal processing apparatus with an authoring function and a signal processing method including authoring.
[0012]
Another object of the present invention is to provide a signal processing apparatus with an authoring function capable of automatically creating a menu screen that can easily grasp the contents when audio video data having a plurality of videos is stored in one storage medium, and An object of the present invention is to provide a signal processing method including authoring.
[0013]
[Means for Solving the Problems]
Hereinafter, means for solving the problem will be described using the numbers and symbols used in the embodiments of the present invention. These numbers and symbols are added in parentheses in order to clarify the correspondence between the description of [Claims] and [Embodiments of the Invention]. However, these numbers and symbols should not be used for the interpretation of the technical scope of the invention described in [Claims].
[0014]
Therefore, in order to solve the above problems, the signal processing apparatus with an authoring function of the present invention includes a data processing unit (2-1) and a write control unit (16).
The data processing unit (2-1) creates a moving image thumbnail (F1, J) for each of a plurality of chapters generated by dividing the audio video data (A) for each chapter, and the moving image thumbnail (F1, Menu screen data (H1, H2) including J) is created. The write control unit (16) performs control to record the menu screen data (H1, H2) on the storage medium.
Here, the audio video data (A) includes a plurality of image data, date / time data (T0) indicating the recording date and time of the image data, and position data (t0) indicating the position of the image data in the audio video data (A). ). The menu screen data (H1, H2) indicates a menu screen (50) for simultaneously displaying a part or all of the moving image thumbnails (F1, J) among the plurality of chapters.
According to the present invention, the contents of the audio video data stored in the storage medium are grasped on the menu screen on which the moving image thumbnails for each chapter are displayed together. In other words, the contents of the audio video data stored in the storage medium can be stored so as to be quickly and accurately grasped.
Here, examples of the recording medium include DVD, ROM, RAM, HD, CD, and FD. The data processing unit (2-1) may perform the above process with reference to the conditions (B, C) for creating the menu screen. In that case, the user's idea can be reflected.
[0015]
In the signal processing apparatus with an authoring function, the data processing unit (2-1) includes a data creation unit (2-2) and a menu screen creation unit (15).
The data creation unit (2-2) divides the audio-video data (A) to generate a plurality of chapters, creates a moving image thumbnail (F1) for each of the plurality of chapters, and a moving image thumbnail (F1). Moving image thumbnail data (L) indicating data related to the control data and control information data (G1) indicating control information related to the plurality of chapters. The menu screen creation unit (15) creates menu screen data (H1) based on the moving image thumbnail data (L) and the control information data (G1).
Here, as the moving image thumbnail data (L), a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated is exemplified. A method of dividing the audio video data (A) into chapters is exemplified by a method of using date / time data (T0), image data, and audio data included in the audio video data (A).
[0016]
In the above-described signal processing device with an authoring function, the data processing unit (2-1) divides the audio video data (A) based on the date / time data (T0) to generate a plurality of chapters.
Since the chapters are divided using the date / time data (T0: indicates the recording date / time of the image data), the scenes related to the contents in the audio video data (A) can be collected, and the chapters can be automatically divided appropriately. it can. In other words, it is possible to automatically find a break of the audio video data more appropriately and configure a chapter accurately.
[0017]
In the signal processing device with an authoring function, the data creation unit (2-2) includes a data preprocessing unit (2-3), a moving image thumbnail creation unit (13), and a control information data creation unit (14). Prepare.
The data preprocessing unit (2-3) divides the audio video data (A) to generate a plurality of chapters, chapter data (E) indicating data related to the plurality of chapters, and audio video data (A). The video encoded data (D) obtained by encoding is generated. The moving image thumbnail creating unit (13) creates moving image thumbnail data (L) based on the encoded video data (D) and the chapter data (E). The control information data creation unit (14) creates control information data (G1) based on the encoded video data (D) and the chapter data (E).
Here, as the chapter data (E), a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated is exemplified. However, the chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters.
[0018]
In the above-described signal processing device with an authoring function, the data preprocessing unit (2-3) creates video encoded data (D) based on the Moving Picture Experts Group (MPEG) standard.
[0019]
In the above-described signal processing device with an authoring function, the moving image thumbnail creation unit (13) includes a highlight scene detection unit (26), a creation method selection unit (27), a creation method selection unit (28), and a table creation unit. (29).
The highlight scene detection unit (26) determines the presence or absence of a highlight scene for each chapter based on the encoded video data (D) and the chapter data (E). Here, the highlight scene is video encoded data (D) in which the pixel difference value (Δ) is equal to or greater than the reference value. The reference value is variable. The creation method selection unit (27) selects a creation method of the moving image thumbnail (F1) from preset creation methods for each chapter based on the presence or absence of the highlight scene. The creation method execution unit (28) creates a moving image thumbnail (F1) for each chapter based on the selected creation method. The table creation unit (29) generates moving image thumbnail data (L) based on the created moving image thumbnail (F1).
However, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
Here, as a creation method thereof, when there is no highlight scene, frames are thinned out from the video encoded data (D) to form a moving image thumbnail (F1). When there is the highlighted scene, the highlighted scene is converted into a moving image. A method for setting the thumbnail (F1) is exemplified.
According to the present invention, by using a highlight scene, it is possible to automatically create a thumbnail that can accurately grasp the contents of each divided chapter.
[0020]
In the above-described signal processing device with an authoring function, the moving image thumbnail creation unit (13a) includes a data detection unit (56), a data analysis unit (57), a data extraction unit (58), and a table creation unit (59). Is provided.
The data detection unit (56) detects the position of a GOP (Group Of Picture) for each chapter based on the encoded video data (D) and the chapter data (E). Based on the detected GOP, the data analysis unit (57) performs, for each chapter, the GOP unit. Sign Associate quantity (R) with position data (t0) Sign Create a quantity table. The data extraction unit (58) Sign Based on the quantity table, for each chapter, Sign Video encoded data (D) for a continuous predetermined time (2 × Δt1) including the GOP with the maximum amount (R) is extracted as a moving image thumbnail (F1). The table creation unit (59) generates moving image thumbnail data (L) based on the extracted moving image thumbnail (F1).
However, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
According to the present invention, Sign By using a scene including the GOP with the maximum amount (R), it is possible to automatically create a thumbnail that can accurately grasp the contents of each divided chapter.
[0021]
In the above-described signal processing device with an authoring function, the moving image thumbnail creation unit (13b) includes a data detection unit (76), a data analysis unit (77), a data extraction unit (78), and a table creation unit (79). Is provided.
The data detection unit (76) detects the position of the GOP for each chapter based on the video encoded data (D) and the chapter data (E), and outputs a predetermined color for each detected GOP. A predetermined point is added to the pixel data shown. Based on the points, the data analysis unit (77) creates a point table in which the sum (S) of points for each GOP and the position data (t0) are associated with each chapter. On the basis of the point table, the data extraction unit (78), for each chapter, encodes video encoded data (2 × Δt2) of continuous predetermined time (2 × Δt2) including the GOP with the maximum point total (S). D) is extracted as a moving image thumbnail (F1). The table creation unit (79) generates moving image thumbnail data (L) based on the extracted moving image thumbnail (F1).
However, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
According to the present invention, since a scene including a large amount of pixel data indicating a predetermined color is used, it is possible to automatically generate a thumbnail that can accurately grasp the contents of each divided chapter. For example, if the predetermined color is human skin color, a screen on which many humans appear can be taken out.
[0022]
In the signal processing apparatus with an authoring function, the data processing unit (2a-1) includes a data preprocessing unit (2a-2) and a menu screen creation unit (15a).
The data preprocessing unit (2a-2) divides the audio video data (A) to generate a plurality of chapters, creates chapter data (E) indicating data related to the plurality of chapters, and generates audio video data ( Movie data (J) in which A) is compressed is created to create movie thumbnail data (K) indicating data related to movie data (J). The menu screen creation unit (15a) creates menu screen data (H2) based on the moving image thumbnail data (K) and the chapter data (E).
Here, as the chapter data (E), a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated is exemplified. However, the chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters.
The moving image thumbnail data (K) is exemplified by a moving image data table (K) in which moving image data (J) obtained by compressing the audio video data (A) and video encoded data (D) are associated with each other. The moving image data (J) is generated based on the encoded data calculated in the process of encoding the audio video data (A).
[0023]
In the above-described signal processing device with an authoring function, the data preprocessing unit (2a-2) further creates video encoded data (D) obtained by encoding the audio video data (A), and performs DCT during the encoding. Movie data (J) is created based on the DC coefficient calculated by (Discrete Course Transform) calculation.
[0024]
In the signal processing apparatus with an authoring function, the data preprocessing unit (2a-2) includes an encoding unit (11, 11a) and a recording date and time analysis unit (12).
The encoding unit (11, 11a) creates video encoded data (D) based on the audio video data (A). The recording date and time analysis unit (12) divides the audio video data (A) into a plurality of chapters based on the date and time data (T0) and creates chapter data (E).
[0025]
In the signal processing apparatus with an authoring function, the recording date and time analysis unit (12) includes a chapter division unit (21) and a table generation unit (23).
The chapter division unit (21) divides a part of the date / time data (T0) having no continuity into a plurality of chapters as a delimiter of the video data (A). The table generation unit (23) creates chapter data (E) including the chapter table (E).
Here, the chapter table (E) associates the chapter date / time data (33) and the chapter position data (32) corresponding to each of the plurality of chapters. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters.
[0026]
In the signal processing apparatus with an authoring function, the recording date and time analysis unit (12) further includes a chapter limiter (22) that limits the number of chapters to a preset maximum number of chapters (N).
[0027]
In order to solve the above-described problem, the DVD device according to the present invention outputs the menu screen data (H1, H2) based on the input of the audio video data (A). A signal processing device with an authoring function and a drive unit (3) for writing menu screen data (H1, H2) to the storage medium.
Here, examples of the recording medium include DVD, ROM, RAM, HD, CD, and FD.
[0028]
Therefore, in order to solve the above problem, the signal processing method including authoring according to the present invention includes the steps (a) to (b).
The step (a) creates a moving image thumbnail (F1, J) for each of a plurality of chapters generated by dividing the audio video data (A) for each chapter, and a menu including the moving image thumbnail (F1, J). Screen data (H1, H2) is created. Here, the audio video data (A) includes a plurality of image data, date / time data (T0) indicating the recording date and time of the image data, and position data (t0) indicating the position of the image data in the audio video data (A). )including. The menu screen data (H1, H2) indicates a menu screen (50) for simultaneously displaying a part or all of the moving image thumbnails (F1, J) among the plurality of chapters. (B) The step records the menu screen data (H1, H2) on the storage medium.
Here, examples of the recording medium include DVD, ROM, RAM, HD, CD, and FD.
[0029]
In the signal processing method including authoring described above, step (a) includes steps (a1) to (a5).
In the step (a1), video encoded data (D) obtained by encoding the audio video data (A) is created based on the audio video data (A). In the step (a2), the audio video data (A) is divided based on the date / time data (T0) to generate a plurality of chapters, and chapter data (E) indicating data related to the chapters is generated. In step (a3), a moving image thumbnail (F1) is created for each of the plurality of chapters based on the encoded video data (D) and the chapter data (E), and data relating to the plurality of moving image thumbnails (F1). Video thumbnail data (L) is generated. In the step (a4), control information data (G1) indicating control information related to the plurality of chapters is created based on the encoded video data (D) and the chapter table (E). In the step (a5), menu screen data (H1) is created based on the moving image thumbnail data (L) and the control information data (G1).
[0030]
In the signal processing method including authoring described above, the step (a3) includes steps (aa1) to (aa5).
In the (aa1) step, based on the video encoded data (D) and the chapter data (E), for each chapter, the pixel difference value (Δ) in the video encoded data (D) is greater than or equal to the reference value. A highlight scene is detected. Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. In the (aa2) step, the reference value is changed so that the length of the highlight scene becomes equal to the designated reproduction time, and if it can be equal, it is determined that there is the highlight scene, and the highlight scene and position data (t0). ) Is generated. If not equal, it is determined that there is no highlight scene. In the (aa3) step, a moving image thumbnail (F1) creation method is selected from preset creation methods for each chapter based on the presence / absence of the highlight scene and the situation of the highlight scene. In the (aa4) step, a moving image thumbnail (F1) is created for each chapter based on the selected creation method. In the (aa5) step, the moving image thumbnail data (L) is generated based on the generated moving image thumbnail (F1). Here, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
However, when the highlight scene does not exist, the creation method is such that a frame is thinned out from the video encoded data (D) to obtain a moving image thumbnail (F1). When there are a plurality of the highlight scenes, the plurality of the highlight scenes are connected to form a moving image thumbnail (F1). If there is only one highlight scene, the highlight scene is directly used as a moving image thumbnail (F1).
[0031]
In the signal processing method including authoring described above, the step (a3) includes steps (aa6) to (aa9).
(A3) Step is
In the (aa6) step, based on the video encoded data (D) and the chapter data (E), the position of the GOP is detected for each chapter. Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. (Aa7) The step is performed for each chapter based on the detected GOP. Sign Associate quantity (R) with position data (t0) Sign Create a quantity table. (Aa8) The step is Sign Based on the quantity table, for each chapter, Sign Video encoded data (D) for a continuous predetermined time (2 × Δt1) including the GOP with the maximum amount (R) is extracted as a moving image thumbnail (F1). In the (aa9) step, moving image thumbnail data (L) is created based on the extracted moving image thumbnail (F1). Here, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
[0032]
In the signal processing method including authoring described above, the step (a3) includes steps (aa10) to (aa14).
The (aa10) step detects the position of the GOP for each chapter based on the encoded video data (D) and the chapter data (E). Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. In the (aa11) step, a predetermined point is added to pixel data indicating a predetermined color for each detected GOP. The (aa12) step creates a point table in which the total (S) of points for each GOP and the position data (t0) are associated with each chapter based on the points. In step (aa13), on the basis of the point table, for each chapter, video encoded data (D) of continuous predetermined time (2 × Δt2) including the GOP having the maximum point total (S). Are extracted as a moving image thumbnail (F1). In the (aa14) step, moving image thumbnail data (L) is created based on the extracted moving image thumbnail (F1). Here, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
[0033]
In the signal processing method including authoring described above, step (a) includes steps (a6) to (a8).
In the step (a6), based on the audio video data (A), encoded video data (D) obtained by encoding the audio video data (A) and moving image data (J) obtained by compressing the audio video data (A) are obtained. The moving image thumbnail data (K) indicating the data related to the moving image data (J) is generated. Here, the moving image thumbnail data (K) includes a moving image data table (K) in which the moving image data (J) and the video encoded data (D) are associated with each other. The moving image data (J) is created based on the encoded data calculated in the encoding process. In the step (a7), the audio video data (A) is divided based on the date / time data (T0) to generate a plurality of chapters, and chapter data (E) indicating data related to the chapters is generated. Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. In the step (a8), menu screen data (H2) is created based on the moving image thumbnail data (K) and the chapter data (E).
[0034]
In the signal processing method including authoring described above, the step (a6) includes steps (ab1) to (ab3).
In the (ab1) step, a DCT operation is performed on the audio video data (A). In the (ab2) step, moving image data (J) is created based on the DC coefficient generated with the DCT calculation. The (ab3) step creates moving image thumbnail data (K) based on moving image data (J) and encoded video data (D).
[0035]
Therefore, in order to solve the above problems, the computer program according to the present invention causes a computer to execute a method including steps (c) to (d).
(C) The step creates a moving image thumbnail (F1, J) for each of a plurality of chapters generated by dividing the audio video data (A) for each chapter, and a menu including the moving image thumbnail (F1, J). Screen data (H1, H2) is created. Here, the audio video data (A) includes a plurality of image data, date / time data (T0) indicating the recording date and time of the image data, and position data (t0) indicating the position of the image data in the audio video data (A). )including. The menu screen data (H1, H2) indicates a menu screen (50) for simultaneously displaying a part or all of the moving image thumbnails (F1, J) among the plurality of chapters. In step (d), the menu screen data (H1, H2) is recorded on the storage medium.
Here, examples of the recording medium include DVD, ROM, RAM, HD, CD, and FD.
[0036]
In the above computer program, step (c) includes steps (c1) to (c5).
In the (c1) step, video encoded data (D) obtained by encoding audio video data (A) is created. In the step (c2), the audio video data (A) is divided to generate a plurality of chapters, and chapter data (E) indicating data related to the chapters is created. (C3) The step creates a moving image thumbnail (F1) for each of the plurality of chapters based on the encoded video data (D) and the chapter data (E), and data on the plurality of moving image thumbnails (F1). Video thumbnail data (L) is generated. In the (c4) step, control information data (G1) indicating control information related to the plurality of chapters is created based on the encoded video data (D) and the chapter table (E). In step (c5), menu screen data (H1) is created based on the moving image thumbnail data (L) and the control information data (G1).
[0037]
In the above program, the step (c2) divides the audio video data (A) based on the date / time data (T0) to generate a plurality of chapters.
[0038]
In the above computer program, step (c3) includes steps (ca1) to (ca5).
In the (ca1) step, based on the video encoded data (D) and the chapter data (E), for each chapter, the pixel difference value (Δ) in the video encoded data (D) is greater than or equal to the reference value. A highlight scene is detected. Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. In the (ca2) step, the reference value is changed so that the length of the highlight scene becomes equal to the designated reproduction time, and if it can be made equal, it is determined that the highlight scene is present, and the highlight scene and position data (t0). ) Is generated. If not equal, it is determined that there is no highlight scene. In the (ca3) step, based on the presence / absence of the highlight scene and the situation of the highlight scene, a creation method of the moving image thumbnail (F1) is selected from preset creation methods for each chapter. In the (ca4) step, a moving image thumbnail (F1) is created for each chapter based on the selected creation method. In the (ca5) step, movie thumbnail data (L) is created based on the created movie thumbnail (F1). Here, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
However, when the highlight scene does not exist, the creation method is such that a frame is thinned out from the video encoded data (D) to obtain a moving image thumbnail (F1). When there are a plurality of the highlight scenes, the plurality of the highlight scenes are connected to form a moving image thumbnail (F1). If there is only one highlight scene, the highlight scene is directly used as a moving image thumbnail (F1).
[0039]
In the above computer program, step (c3) includes steps (ca6) to (ca9).
The (ca6) step detects the position of the GOP for each chapter based on the encoded video data (D) and the chapter data (E). Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. (Ca7) The step is performed for each chapter based on the detected GOP. Sign Associate quantity (R) with position data (t0) Sign Create a quantity table. (Ca8) The step Sign Based on the quantity table, for each chapter, Sign Video encoded data (D) for a continuous predetermined time (2 × Δt1) including the GOP with the maximum amount (R) is extracted as a moving image thumbnail (F1). In the (ca9) step, movie thumbnail data (L) is created based on the extracted movie thumbnail (F1). Here, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
[0040]
Further, in the above computer program, step (c3) includes steps (ca10) to (ca14).
The (ca10) step detects the position of the GOP for each chapter based on the video encoded data (D) and the chapter data (E). Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. In the (ca11) step, a predetermined point is added to pixel data indicating a predetermined color for each detected GOP. The (ca12) step creates a point table in which the total (S) of the points for each GOP and the position data (t0) are associated with each chapter based on the points. (Ca13) The step is based on the point table, and for each chapter, video encoded data (D × D2) for a predetermined continuous time (2 × Δt2) including the GOP for which the sum (S) of the points is the maximum. ) Is extracted as a moving image thumbnail (F1). In the (ca14) step, moving image thumbnail data (L) is created based on the extracted moving image thumbnail (F1). Here, the moving image thumbnail data (L) includes a moving image thumbnail table (L) in which a plurality of moving image thumbnails (F1) and position data (t0) are associated with each other.
[0041]
Furthermore, in the above computer program, step (c) includes steps (c6) to (c8).
(C6) The step creates video encoded data (D) obtained by encoding audio video data (A) and moving picture data (J) obtained by compressing audio video data (A), and data relating to moving picture data (J). The moving image thumbnail data (K) indicating is generated. Here, the moving image thumbnail data (K) includes a moving image data table (K) in which the moving image data (J) and the video encoded data (D) are associated with each other. The moving image data (J) is created based on the encoded data calculated in the encoding process. The step (c7) divides the audio video data (A) to generate a plurality of chapters, and creates chapter data (E) indicating data related to the chapters. Here, the chapter data (E) includes a chapter table (E) in which the chapter date / time data (33) corresponding to each of the plurality of chapters and the chapter position data (32) are associated with each other. The chapter date / time data (33) is data based on the date / time data (T0) in each of the plurality of chapters. The chapter position data (32) is data based on the position data (t0) in each of the plurality of chapters. In step (c8), menu screen data (H2) is created based on the moving image thumbnail data (K) and the chapter data (E).
[0042]
In the above program, the step (c7) generates a plurality of chapters by dividing the audio video data (A) based on the date / time data (T0).
[0043]
Further, in the above computer program, step (c6) includes steps (cb1) to (cb3).
In the (cb1) step, a DCT operation is performed on the audio video data (A). In the (cb2) step, the moving image data (J) is created based on the DC coefficient generated with the DCT calculation. The (cb3) step creates moving image thumbnail data (K) based on moving image data (J) and encoded video data (D).
[0044]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of a signal processing apparatus with an authoring function and a signal processing method including authoring according to the present invention will be described below with reference to the accompanying drawings. In this embodiment, an example in which the signal processing apparatus with an authoring function according to the present invention is applied to a DVD apparatus (MPEG apparatus) is described, but the present invention can also be applied to other video recording apparatuses.
[0045]
(First embodiment)
A signal processing device with an authoring function and a signal processing method including authoring according to a first embodiment of the present invention will be described with reference to the accompanying drawings.
[0046]
First, the configuration of a first embodiment of a DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
FIG. 1 is a block diagram showing a configuration of a first embodiment of a DVD apparatus (MPEG apparatus) to which a signal processing apparatus with an authoring function according to the present invention is applied. The DVD device 1 records and stores video encoded data D obtained by encoding audio video data and other data on a DVD based on the input of audio video data A and various conditions (B and C, which will be described later). The DVD device 1 includes a signal processing device 2 with an authoring function, a DVD drive unit 3 and a system microcomputer 4. Here, a configuration for reading DVD data is omitted.
[0047]
The signal processing device 2 with the authoring function is controlled by the system microcomputer 4 and based on the input of the audio video data A, video encoded data D (described later), menu screen data H1 (described later), and control information data G1 (described later). And generate And control which records them on a predetermined storage medium is performed. The storage medium is exemplified by a DVD.
Under the control of the system microcomputer 4, the DVD drive unit 3 stores video encoded data D, menu screen data H 1, and control information data G 1 output from the signal processing device 2 with an authoring function in a storage medium set therein. Record (store). Here, a DVD is used as the recording medium. However, other recording media (example: ROM, RAM, CD, HD, FD) can also be used.
The system microcomputer 4 controls the DVD device 1 including the signal processing device 2 with the authoring function and the DVD drive unit 3. The system microcomputer 4 is exemplified by an MPU (microprocessor unit).
[0048]
The signal processing apparatus 2 with an authoring function includes an encoding unit 11, a recording date and time analysis unit 12, a moving image thumbnail creation unit 13, a control information data creation unit 14, a menu screen creation unit 15, and a write control unit 16. To do. Here, the encoding unit 11 and the recording date and time analysis unit 12 are also referred to as a data preprocessing unit 2-3. The data preprocessing unit 2-3, the moving image thumbnail creation unit 13, and the control information data creation unit 14 are also referred to as a data creation unit 2-2. The data creation unit 2-2 and the menu screen creation unit 15 are also referred to as a data processing unit 2-1.
[0049]
The encoding unit 11 generates video encoded data D obtained by encoding audio video data based on audio video data A input from the outside.
[0050]
Here, the audio video data A is data having a plurality of video images output from a device such as a digital video tape recorder or an analog video tape recorder, and includes audio data and images (including moving images, in this specification). The same) data. The image data includes date and time data as the date and time (example: year: month: day: hour: minute: second) of the audio video data A and the position from the beginning of the tape (audio video data A). (Example: time hour: minute: second). The encoding of the audio video data A includes encoding performed based on the MPEG (Moving Picture Experts Group, the same in this specification) standard. The encoded video data D is exemplified by MPEG2 data (VOB (Video Object) data).
[0051]
FIG. 3 is a diagram showing date and time data and position data. The audio video data A includes a recording date and time T0 (recorded at the bottom of each frame in the drawing) in which the audio video data A is recorded as date and time data for each frame indicated by one square frame in the drawing. The time t0 (described in the upper part of each frame in the figure) from the beginning of the tape on which the audio video data A as the position data is recorded is recorded together.
[0052]
With reference to FIG. 1, the recording date and time analysis unit 12 divides the audio video data A into a plurality of chapters based on the date and time data of the audio video data A input from the outside. That is, a portion where the date / time data is discontinuous is detected, and the audio video data A is divided by using that as a chapter delimiter. Then, a chapter table E (described later) that associates chapter date / time data and chapter position data corresponding to each of the plurality of chapters is generated. However, when the maximum chapter number data B (N) indicating the maximum number of chapters into which the audio video data A is divided is input from the outside, the chapters are adjusted so as not to exceed the maximum number.
[0053]
Here, the chapter is a part of the audio video data A obtained by dividing the audio video data A. One chapter may be a continuous part (including a continuous scene) in the audio video data A, or may be a combination of non-continuous parts (including a plurality of continuous scenes).
[0054]
Here, the chapter date / time data is data based on date / time data in each of a plurality of chapters, and indicates date / time data indicating the beginning and end of the chapter, date / time data indicating the end of the previous chapter, and the beginning of the next chapter. This is illustrated by the difference in date and time data. The chapter position data is data based on position data in each of a plurality of chapters, and is exemplified by position data indicating the beginning or end of the chapter.
[0055]
FIG. 2 is a block diagram illustrating a configuration of the recording date and time analysis unit 12. The recording date and time analysis unit 12 includes a chapter division unit 21, a chapter restriction unit 22, and a table generation unit 23.
[0056]
The chapter division unit 21 determines that a portion where the date / time data is not continuous is a segment of the video data A. Then, based on the division, the audio video data A is divided into a plurality of chapters.
When the number of chapters exceeds the maximum number N indicated by the maximum chapter number data B, the chapter restriction unit 22 combines two adjacent chapters that satisfy a preset condition among the plurality of chapters. , Keep the number of chapters below the maximum number N. However, the preset conditions are exemplified by two chapters in which the difference between the date / time data indicating the end of the previous chapter and the date / time data indicating the beginning of the next chapter is minimized.
The table generation unit 23 generates a chapter table E indicating the relationship between chapter date data and chapter position data corresponding to each of a plurality of chapters.
[0057]
FIG. 4 is a table showing the chapter table E. The chapter table E associates chapter date data and chapter position data corresponding to each of a plurality of chapters.
Here, the chapter number 31 is a chapter serial number. The chapter end position 32 as chapter position data is position data indicating the end of the chapter. Displayed in hours: minutes: seconds. The chapter recording date / time interval 33 as chapter date / time data is a time interval indicating a difference between date / time data indicating the end of the previous chapter and date / time data indicating the beginning of the next chapter. Displayed in day: hour: minute: second. The chapters are arranged in the order of smaller chapter end position 32.
[0058]
Referring to FIG. 1, moving image thumbnail creation unit 13 creates a plurality of moving image thumbnails F1 corresponding to each of a plurality of chapters based on encoded video data D and chapter table E. At this time, reference is made to moving image condition data C (input from the outside, which may have a default value) indicating the creation condition of the moving image thumbnail F1, and each moving image thumbnail F1 is generated so as to meet the condition. Here, the moving image condition data C is exemplified by the image size of the moving image thumbnail and the reproduction time of the moving image thumbnail.
Here, the moving image thumbnail is a moving image format thumbnail (preview). It is generated by compressing the base data (image processing such as encoding).
[0059]
The moving image thumbnail creating unit 13 further creates a moving image thumbnail table L in which each frame of the moving image thumbnail F1 is associated with the encoded video data D. That is, the moving image thumbnail table L is created in which moving image data indicating each frame of the moving image thumbnail F1 is associated with the position data of the encoded video data D.
[0060]
FIG. 5 is a table showing the moving image thumbnail table L. A frame number 41 which is a serial number of the frame, a time 42 as position data (time from the beginning of the audio video data) of the video encoded data D, and frame image data 44 as moving image data are associated for each frame. . The moving image thumbnail table L may be provided for each moving image thumbnail F1, or one moving image thumbnail table L may be appropriately divided to include a plurality of moving image thumbnails F1.
[0061]
FIG. 6 is a block diagram illustrating a configuration of the moving image thumbnail creation unit 13. The moving image thumbnail creation unit 13 includes a highlight scene detection unit 26, a creation method selection unit 27, a creation method execution unit 28, and a table creation unit 29.
[0062]
The highlight scene detection unit 26 detects a pixel difference value Δ as a pixel difference between frames for each chapter based on the encoded video data D and the chapter table E. The pixel difference value Δ for each frame is associated with the position data (time from the beginning of the audio video data) t0 of the encoded video data D for each chapter and stored in a storage unit (not shown) as a difference value table. Is done.
[0063]
Here, the difference value table will be described.
FIG. 7 is a diagram representing the difference value table in the form of a graph. The vertical axis represents the pixel difference value Δ, and the horizontal axis represents the position data t0 (here “time”). A curve W in the graph indicates a pixel difference value Δ. Half lines α0 and α2 indicate pixel difference values Δ0 and Δ2, respectively. One chapter is from time t1 to time t2.
The highlight scene detection unit 26 extracts a highlight scene from the difference value table (FIG. 7). Here, the highlight scene is a place where the state in which the pixel difference value Δ is larger than a predetermined threshold continues for a preset time. In FIG. 7, if the threshold value is Δ0, the highlight scene corresponds to the portion of the curve W at P1. In this case, the highlight scene can be lengthened by decreasing the threshold value from the predetermined maximum value Δ0. For example, in FIG. 7, the highlight scene can be lengthened from P1 to P2 + P3 by reducing the threshold from Δ0 to Δ1 (indicated by a half line α1). By this operation, the time of the highlight scene (the total time when there are a plurality of highlight scenes) can be adjusted to the designated reproduction time. Here, it is assumed that there is no highlight scene when the highlight scene does not reach the designated reproduction time even if the threshold value is lowered to the predetermined minimum value Δ2.
However, a portion Q (for example, a scene where the camera pans) where the pixel difference value Δ instantaneously becomes larger than a predetermined threshold is highlighted because its Px time is less than a preset time. Not detected.
[0064]
Referring to FIG. 6, the creation method selection unit 27 selects a method for creating a moving image thumbnail from the following three methods according to the detected situation of the highlight scene.
(A) A chapter having a plurality of highlight scenes is linked to the highlight scene to form a moving image thumbnail.
(B) A chapter having only one highlight scene is used as a moving image thumbnail as it is.
(C) A chapter without a highlight scene is compressed by thinning out a frame from the chapter to form a moving image thumbnail. For example, shortening a chapter of 10 minutes to a moving image thumbnail having a playback time of 1 minute can be realized by repeating 1 frame display-9 frame skip or 1 second display-9 skip.
[0065]
The creation method execution unit 28 creates a moving image thumbnail using the method selected by the creation method selection unit 27.
The table creation unit 29 stores the created moving image thumbnail in the moving image thumbnail table L.
Is provided.
[0066]
Referring to FIG. 1, control information data creation unit 14 creates control information data G <b> 1 including data related to a plurality of chapters based on encoded video data D and chapter table E. That is, based on the encoded video data D and the chapter table E, DVD control information data G1 (VTSI of the video title set 67, which will be described later) is created, and PTT (Part of Title, which will be described later) of the control information data G1. The chapter data (example: chapter number 31) indicating which chapter each program (PG, which will be described later) is included in is stored.
When a storage medium other than DVD is used, control information data G1 corresponding to the storage medium is created.
[0067]
FIG. 8 shows the structure of data stored on a DVD. The data 61 stored on the DVD includes a video manager (VMG) 63 and a video title set (VTS) 67.
The video manager (VMG) 63 includes VMGI as control information, VMGM_VOBS as menu screen data H1 (described later), and VMGI (BUP) as a backup of VMGI.
The video title set 67 includes VTSI as control information of a video title set (a set of movies (video images)), VTSM_VOBS to VTSTT_VOBS as moving image files, and VTSI (BUP) as a backup of VTSI.
The VTSI describes the internal structure of the video title set. The internal structure of the video title set is: title (individual movie)-program chain (PGC: set of programs)-PTT (chapter: access point set on a cell boundary in the video stream)-program (PG: cell Set) -cell (set of video object units) -video object unit (VOBU: GOP (corresponding to Group Of Pictures)), and in VTSI, each layer corresponds to any part of VTSM_VOBS to VTSTT_VOBS Is described.
[0068]
Referring to FIG. 1, menu screen creation unit 15 creates menu screen data H1 indicating a menu screen for a moving image, based on moving image thumbnail table L and control information data G1. However, the menu screen data H1 is generated by extracting a corresponding portion of the moving image data (frame image data 44) from the moving image thumbnail table L in correspondence with each chapter indicated by the control information data G1.
Here, the moving image menu screen displays a plurality of moving image thumbnails F1 corresponding to a plurality of chapters at a time on one screen. For example, on the menu screen when there are four chapters, four moving image thumbnails F1 can be viewed on one screen.
Then, the moving image thumbnail F1 can be selected on the screen by a pointing device (example: mouse). In that case, in the menu screen data H1, each moving image thumbnail F1 is associated with the PTT of the control information data G1, so that the chapter corresponding to the moving image thumbnail F1 selected on the screen can be reproduced.
[0069]
The write control unit 16 receives the encoded video data D, the menu screen data H1, and the control information data G1, and outputs the data to the DVD drive unit 3 so as to record each data in a predetermined area of the DVD. Control.
At this time, menu screen data H1 is stored in VMGM_VOBS of VMG63, control information data G1 is stored in VTSI of VTS67, and video encoded data D is stored in VTSM_VOBS to VTSTT_VOBS.
Note that when a storage medium other than a DVD is used, the write control unit 16 controls writing to the recording medium based on a format corresponding to the storage medium.
[0070]
Next, the operation (signal processing method including authoring) of the first embodiment of the DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
FIG. 9 is a flowchart showing the operation (signal processing method including authoring) of the first embodiment of the DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied.
[0071]
(1) Step S01
The encoding unit 11 generates video encoded data D obtained by encoding the audio video data A based on the audio video data A input from the outside.
(2) Step S02
The recording date and time analysis unit 12 divides the audio video data A into a plurality of chapters based on the date and time data of the audio video data A input from the outside. However, the chapters are adjusted so that the number of chapters into which the audio video data A is divided does not exceed the maximum number of chapters N indicated by the maximum number of chapters data B input from the outside. Then, a chapter table E is generated.
(3) Step S03
The moving image thumbnail creating unit 13 creates a plurality of moving image thumbnails F1 corresponding to each of the plurality of chapters based on the encoded video data D and the chapter table E. At this time, each moving image thumbnail F1 is generated so that the image size and the reproduction time of each moving image thumbnail F1 become the image size and the reproducing time indicated by the moving image condition data C input from the outside. Then, the moving image thumbnail creating unit 13 creates a moving image thumbnail table L in which each frame of the moving image thumbnail F1 is associated with the encoded video data D.
(4) Step S04
The control information data creation unit 14 creates control information data G1 including data related to a plurality of chapters based on the encoded video data D and the chapter table E. At this time, chapter data indicating which chapter each program 38 is included in is stored in the control information data G1.
(5) Step S05
The menu screen creation unit 15 creates menu screen data H1 based on the moving image thumbnail table L and the control information data G1.
(6) Step S06
The write control unit 16 controls the output of data to the DVD drive unit 3 so as to record the menu screen data H1, the video encoded data D, and the control information data G1 in a predetermined area of the DVD. The DVD drive unit 3 writes those data on the DVD.
[0072]
Here, the operation of creating the chapter table in step S02 will be further described.
FIG. 10 is a flowchart showing the operation of creating the chapter table in step S02.
[0073]
(1) Step S21
The chapter division unit 21 of the recording date analysis unit 12 detects date / time data of the audio video data A input from the outside.
(2) Step S22
The chapter division unit 21 determines whether the date / time data has continuity. If there is continuity, the process returns to step S21. If there is no continuity, the process proceeds to step S23. Here, the continuity of the date / time data is preset based on the video recorded in the audio video data A, such as in seconds, minutes, or hours. Here, the unit is seconds.
(3) Step S23
The chapter division unit 21 determines that a portion where the date / time data is not continuous is a segment of the video data A. And the position data (time from the head) of the position of the division is acquired. Also, the last date and time data in the front chapter at the delimiter position and the first date and time data in the next chapter are acquired. However, this delimiter position is the position of a chapter candidate and is not fixed.
(4) Step S24
When the number of chapters increases by one, the chapter restriction unit 22 determines whether or not the number of chapters (total number of chapters) in the chapter table E exceeds the maximum number of chapters N indicated by the maximum number of chapters data B. . When it exceeds, it progresses to step S26. If not, the process proceeds to step S25.
(5) Step S25
The table generation unit 23 generates (updates) the chapter table E. That is, the position data at the position of the chapter candidate is set as the chapter end position 32 of the chapter table E. In addition, the difference between the last date / time data of the previous chapter at the position of the chapter candidate and the first date / time data of the next chapter is calculated and set as the chapter recording date / time interval 33 of the chapter table E.
(6) Step S26
Since the total number of chapters in the chapter table E exceeds the maximum number of chapters N, the chapter restriction unit 22 reduces one chapter in the chapter table E. As a reduction method, among the chapters in the chapter table E, the chapter recording date and time interval 33 (the difference between the date and time data indicating the end of the previous chapter and the date and time data indicating the start of the next chapter) is minimized. This is done by selecting and joining two adjacent chapters.
At the same time, a chapter table E is generated (updated). That is, the position data at the position of the chapter candidate is defined as the chapter end position 32, and the difference between the last date / time data at the front chapter at the position of the chapter candidate and the first date / time data at the next chapter is defined as the chapter recording date / time. The interval is 33.
(7) Step S27
If the audio video data A continues, the process returns to step S21, and steps S21 to S26 are repeated.
[0074]
FIG. 11 is a diagram showing a process of generating the chapter table E of FIG. 4 in step S02. Here, a case where the value of the maximum chapter number data B is “5” (maximum chapter number N = 5) will be described as an example.
[0075]
FIG. 11A shows the chapter table E of FIG. 4 being generated in step S02. Here, a state where five chapters are found is shown.
In this state, consider a case where a new chapter is detected in step S23 as shown in FIG. In this case, it is determined in step S24 that the maximum number of chapters N (= 5) <the total number of chapters (= 6). Here, the chapter of chapter number 31 = 3 in FIG. 11A (referred to as “chapter 3”, which is the same for other chapters) has a minimum chapter recording date and time interval 33 in the chapter table E (5 minutes). It has become. Therefore, in step S26, chapter 3 is deleted by combining chapter 3 with chapter 2. At the same time, chapter 4 and chapter 5 are moved up to become chapter 3 and chapter 4. Thereafter, the chapter table E is updated with the newly found chapter as the chapter 6. FIG. 11C shows the updated chapter table E.
[0076]
In this way, audio video data having a plurality of video images can be automatically divided into a plurality of chapters, and the number of chapters can be suppressed to a maximum number of chapters N or less.
[0077]
Here, the operation of creating a moving image thumbnail in step S03 will be further described.
FIG. 12 is a flowchart showing the operation of creating a moving image thumbnail in step S03.
[0078]
(1) Step S31
The highlight scene detection unit 26 of the moving image thumbnail creation unit 13 selects one chapter based on the encoded video data D and the chapter table E, and acquires the encoded video data D of the chapter.
(2) Step S32
The highlight scene detection unit 26 detects a pixel difference value Δ between frames from the encoded video data D of the entire chapter. And it stores in the difference value table (FIG. 7) which linked | related pixel difference value (DELTA) and position data t0.
(3) Step S33
Based on the difference value table (FIG. 7) and the playback time of the moving picture condition data C, the highlight scene detection unit 26 changes the threshold from Δ0 so that the time of the highlight scene becomes equal to the specified playback time. Let If the highlight scene time becomes equal to the designated reproduction time before the threshold value reaches Δ2, it is determined that there is a highlight scene (Yes). Even if the threshold value reaches Δ2, if the highlight scene time is less than the designated playback time, it is determined that there is no highlight scene (No). If not, the process proceeds to step S37.
(4) Step S34
The creation method selection unit 27 counts a certain portion of the highlight scene. If there are a plurality of locations (No), the process proceeds to Step 35, and if it is only one location (Yes), the process proceeds to Step S36.
(5) Step S35
The creation method execution unit 38 creates a moving image thumbnail by the method (A) described above. That is, since there are a plurality of highlight scenes, the highlight scenes are connected to form a moving image thumbnail F1. Proceed to step S38.
(6) Step S36
The creation method execution unit 38 creates a moving image thumbnail by the method (B) described above. That is, since there is only one highlight scene, that portion is directly used as the moving image thumbnail F1. Proceed to step S38.
(7) Step S37
The creation method execution unit 38 creates a moving image thumbnail by the method (C) described above. That is, since there is no highlight scene, the frame is thinned out from the chapter and compressed to obtain a moving image thumbnail F1. Proceed to step S38.
(8) Step S38
The table creation unit 29 creates a moving image thumbnail table L in which each frame of the moving image thumbnail F1 is associated with the encoded video data D.
(11) Step S39
If the table creation unit 29 has created moving image thumbnails for all chapters (Yes), the table creation unit 29 ends the process. If there is a chapter for which a moving image thumbnail has not been created (No), the process returns to step S31.
[0079]
Through such a process, it becomes possible to automatically create moving image thumbnails having the same playback time for all chapters by an optimum method.
[0080]
Here, the operation of creating the menu screen data in step S05 will be further described.
FIG. 13 is a flowchart showing the operation of creating the menu screen data in step S05. Here, a case where the number of chapters is four will be described.
[0081]
(1) Step S41
The menu screen creation unit 15 sets the frame number m = 1 in order to create the first frame of the moving picture menu screen (menu screen data H1).
(2) Step S42
The menu screen creation unit 15 sets chapter number k = 1 in order to perform processing on chapter 1 of the first frame.
(3) Step S43
The menu screen creation unit 15 acquires the frame image data 44 corresponding to the m = 1st frame in the chapter with the chapter number k = 1 from the moving image thumbnail table L based on the moving image thumbnail table L and the control information data G1. . Then, m = 1 is pasted on the upper right of the first frame on the menu screen.
(4) Step S44, Step S45
The menu screen creation unit 15 repeats Steps S43 to S45 until the chapter number k is equal to or greater than the maximum number of chapters N (N = 4 in this case).
Thus, m = 1 first frame of the menu screen data H1 is completed. However, the frame image data 44 is pasted at the upper left of the frame when k = 2, the lower right of the frame when k = 3, and the lower left of the frame when k = 4, respectively.
(5) Step S46, Step S47
The menu screen creation unit 15 repeats Steps S42 to S47 until the frame number m becomes equal to or greater than the designated frame number M (corresponding to the designated reproduction time).
As a result, a menu screen (still image) of the specified number of frames M (specified playback time) is generated.
(6) Step S48
The menu screen creation unit 15 compresses the obtained plurality of menu screens (still images) according to the MPEG2 standard to form a VOB file, and completes the menu screen data H1. It includes the moving image thumbnails of four chapters by the above process.
[0082]
FIG. 14 is a diagram showing a moving image menu screen using the menu screen data H1 created in (1) step S41 to (6) step S48. The menu screen 50 includes a moving image thumbnail 51-1 of chapter 1, a moving image thumbnail 51-2 of chapter 2, a moving image thumbnail 51-3 of chapter 3, a moving image thumbnail 51-4 of chapter 4, and a menu button 52.
When the moving image menu screen 50 is reproduced, the entire menu screen is displayed as one moving image. When the user selects the chapters 1 to 4, the user jumps to each chapter and a normal video is reproduced. When the number of chapters is large and chapters 5 to 5 are present, when the menu button 52 is selected, the menu screen is switched to the chapters 5 to 8 menu screen. Conventionally known methods can be used for selecting a movie menu and switching screens.
[0083]
In this way, menu screen data including a moving image thumbnail of each chapter can be automatically created.
[0084]
According to the present invention, when audio / video data having a plurality of video images is stored in one storage medium such as a DVD, the audio / video data delimiter is automatically and more appropriately found based on the date / time data. Can be configured.
[0085]
Further, according to the present invention, a moving image thumbnail can be generated for each divided chapter, so that the contents of each chapter can be accurately grasped. Since the menu screen including all the moving image thumbnails can be automatically created, the contents of all the audio video data included in the DVD can be easily grasped.
[0086]
(Second Embodiment)
A signal processing apparatus with an authoring function and a signal processing method including authoring according to a second embodiment of the present invention will be described with reference to the accompanying drawings.
[0087]
First, the configuration of a second embodiment of a DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
FIG. 1 is a block diagram showing a configuration of a second embodiment of a DVD apparatus (MPEG apparatus) to which a signal processing apparatus with an authoring function according to the present invention is applied. The DVD device 1 records and stores video encoded data D obtained by encoding audio video data and other data on the DVD based on the input of the audio video data A and various conditions (B and C). The DVD device 1 includes a signal processing device 2 with an authoring function, a DVD drive unit 3 and a system microcomputer 4. Here, a configuration for reading DVD data is omitted.
[0088]
In the second embodiment, the moving image thumbnail creating method by the moving image thumbnail creating unit 13a is different.
Usually, in the video data compressed by MPEG (corresponding to the video encoded data D here), in a complicated video such as a part where the color changes rapidly or a part where the movement is fast, Sign Occurs. for that reason, Sign The amount increases. On the other hand, in the flat part where the image changes little or the part where the movement is slow, Sign Does not occur much. for that reason, Sign The amount is reduced. In the second embodiment, this Sign A highlight scene is detected based on the amount.
[0089]
Referring to FIG. 1, signal processing apparatus 2 with an authoring function receives encoded video data D, menu screen data H1, and control information data G1 based on the input of audio video data A under the control of system microcomputer 4. Generate. And control which records them on a predetermined storage medium is performed. The storage medium is exemplified by a DVD.
Since the DVD drive unit 3 and the system microcomputer 4 are the same as those in the first embodiment, description thereof is omitted.
[0090]
The signal processing apparatus 2 with an authoring function includes an encoding unit 11, a recording date and time analysis unit 12, a moving image thumbnail creation unit 13a, a control information data creation unit 14, a menu screen creation unit 15, and a write control unit 16. To do.
[0091]
The moving image thumbnail creation unit 13a creates a plurality of moving image thumbnails F1 corresponding to each of the plurality of chapters based on the encoded video data D and the chapter table E. At this time, the moving image condition data C indicating the generation condition of the moving image thumbnail F1 is referred to, and each moving image thumbnail F1 is generated so as to meet the condition. The moving image condition data C is exemplified by the image size of the moving image thumbnail and the reproduction time of the moving image thumbnail.
[0092]
The moving image thumbnail creating unit 13a further creates a moving image thumbnail table L in which each frame of the moving image thumbnail F1 is associated with the encoded video data D. That is, the moving image thumbnail table L is created in which moving image data indicating each frame of the moving image thumbnail F1 is associated with the position data of the encoded video data D. The moving image thumbnail table L shown in FIG. 5 is as described in the first embodiment.
[0093]
FIG. 15 is a block diagram showing the configuration of the moving image thumbnail creation unit 13a. The moving image thumbnail creation unit 13a includes a data detection unit 56, a data analysis unit 57, a data extraction unit 58, and a table creation unit 59.
[0094]
The data detection unit 56 analyzes the code of the video encoded data D for each chapter based on the video encoded data D and the chapter table E, and detects the position of the GOP (Group Of Picture).
[0095]
The data analysis unit 57 detects the code amount (number of bytes) for each detected GOP. And in GOP units Sign The amount is associated with the position data (time from the beginning of the audio video data) t0 of the video encoded data D for each chapter, Sign It is stored in a storage unit (not shown) as a quantity table.
[0096]
here, Sign The amount table will be described.
FIG. Sign It is the figure which expressed the quantity table in the form of a graph. The vertical axis is for each GOP. Sign The amount R and the horizontal axis are position data t0 (here, “time”). The curve V in the graph is Sign Indicates the amount. One chapter is from time t1 to time t2. Point A1 is in the chapter Sign The amount is the largest point. The time at that time is tA1.
[0097]
The data extraction unit 58 Sign Analyzing the quantity table (Figure 16) Sign The point A1 with the maximum amount R is detected. Then, the encoded video data D before and after the time tA1 is extracted as a highlight scene so as to have a designated reproduction time. That is, in FIG. 16, the highlight scene is video encoded data D from time tC1 to time tB1. At this time, time tC1-tB1 = reproduction time, where tC1-tA1 = Δt1 = tA1-tB1. Let this highlight scene be a moving image thumbnail.
[0098]
The table creation unit 59 stores the created moving image thumbnail in the moving image thumbnail table L.
[0099]
The encoding unit 11, the recording date and time analysis unit 12, the control information data creation unit 14, the menu screen creation unit 15 and the write control unit 16 (including the explanations of FIGS. 2 to 4 and 8 relating to the present embodiment) Since this is the same as the embodiment, the description thereof is omitted.
[0100]
Next, the operation (signal processing method including authoring) of the second embodiment of the DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
9 to 11, 13, and 14 relating to the present embodiment are the same as those of the first embodiment, and thus the description thereof is omitted.
[0101]
Next, the operation of creating a moving image thumbnail in step S03 will be further described.
FIG. 17 is a flowchart showing the operation of creating a moving image thumbnail in step S03.
[0102]
(1) Step S51
The data detection unit 56 of the moving image thumbnail creation unit 13a selects one chapter based on the encoded video data D and the chapter table E, and acquires the encoded video data D of the chapter.
(2) Step S52
The data detection unit 56 analyzes the code of the video encoded data D of the entire chapter and detects the position of the GOP.
(3) Step S53
The data analysis unit 57 detects the code amount for each detected GOP. Then, the data analysis unit 57 uses the GOP unit. Sign By associating the amount with the position data t0 of the encoded video data D, Sign It is stored in a storage unit (not shown) as a quantity table (FIG. 16).
(4) Step S54
The data extraction unit 58 Sign Analyzing the quantity table (Figure 16) Sign The point A1 with the maximum amount is detected. Then, the encoded video data D before and after the time tA1 is extracted as a highlight scene so as to have a designated reproduction time. And let it be a moving image thumbnail F1.
(5) Step S55
The table creation unit 59 creates a moving image thumbnail table L in which each frame of the moving image thumbnail F1 is associated with the encoded video data D.
(6) Step S56
If the table creation unit 59 has created moving image thumbnails for all chapters (Yes), the process ends. If there is a chapter for which a moving image thumbnail has not been created (No), the process returns to step S51.
[0103]
Through such a process, it becomes possible to automatically create moving image thumbnails having the same playback time for all chapters by an optimum method.
In the case of this method, the compressed video encoded data D is used as it is, and a highlight scene can be detected without image analysis. Therefore, resources required for processing when creating a moving image thumbnail can be reduced, and thus costs can be reduced. And it can be executed in a short time.
[0104]
Also in this embodiment, the same effect as that of the first embodiment can be obtained.
[0105]
(Third embodiment)
A third embodiment of a signal processing apparatus with an authoring function and a signal processing method including authoring according to the present invention will be described with reference to the accompanying drawings.
[0106]
First, the configuration of a third embodiment of a DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
FIG. 1 is a block diagram showing the configuration of a third embodiment of a DVD apparatus (MPEG apparatus) to which a signal processing apparatus with an authoring function according to the present invention is applied. The DVD device 1 records and stores video encoded data D obtained by encoding audio video data and other data on the DVD based on the input of the audio video data A and various conditions (B and C). The DVD device 1 includes a signal processing device 2 with an authoring function, a DVD drive unit 3 and a system microcomputer 4. Here, a configuration for reading DVD data is omitted.
[0107]
In the third embodiment, the moving image thumbnail creating method by the moving image thumbnail creating unit 13b is different.
In the third embodiment, a human face is detected from among data in a video, and is extracted as a highlight scene.
[0108]
Referring to FIG. 1, signal processing apparatus 2 with an authoring function receives encoded video data D, menu screen data H1, and control information data G1 based on the input of audio video data A under the control of system microcomputer 4. Generate. And control which records them on a predetermined storage medium is performed. The storage medium is exemplified by a DVD.
Since the DVD drive unit 3 and the system microcomputer 4 are the same as those in the first embodiment, description thereof is omitted.
[0109]
The signal processing apparatus 2 with an authoring function includes an encoding unit 11, a recording date and time analysis unit 12, a moving image thumbnail creation unit 13b, a control information data creation unit 14, a menu screen creation unit 15, and a write control unit 16. To do.
[0110]
The moving image thumbnail creating unit 13b creates a plurality of moving image thumbnails F1 corresponding to each of the plurality of chapters based on the encoded video data D and the chapter table E. At this time, the moving image condition data C indicating the generation condition of the moving image thumbnail F1 is referred to, and each moving image thumbnail F1 is generated so as to meet the condition. The moving image condition data C is exemplified by the image size of the moving image thumbnail and the reproduction time of the moving image thumbnail.
[0111]
The moving image thumbnail creating unit 13a further creates a moving image thumbnail table L in which each frame of the moving image thumbnail F1 is associated with the encoded video data D. That is, the moving image thumbnail table L is created in which moving image data indicating each frame of the moving image thumbnail F1 is associated with the position data of the encoded video data D. The moving image thumbnail table L shown in FIG. 5 is as described in the first embodiment.
[0112]
FIG. 18 is a block diagram illustrating a configuration of the moving image thumbnail creation unit 13b. The moving image thumbnail creation unit 13b includes a data detection unit 76, a data analysis unit 77, a data extraction unit 78, and a table creation unit 79.
[0113]
The data detection unit 76 analyzes the code of the video encoded data D for each chapter based on the video encoded data D and the chapter table E, and detects the position of the GOP (Group Of Picture). Next, the code of the video encoded data D is analyzed frame by frame for each GOP. Then, a human face is detected and converted into points.
[0114]
In order to detect a human face by making points, it is performed as follows.
FIG. 19 is a diagram illustrating a method for detecting a human face. FIG. 19A shows an image (one frame) to be analyzed. FIG. 19B shows a mask image. In order to detect a human face, first, a pixel indicating a skin color (a predetermined color range) is detected in the image (a) to be analyzed. Next, the mask image (b) is compared with the image (a) to be analyzed. When a skin color pixel is detected in the white portion of the mask image (b), the pixel is set to the point +1 and the skin color is displayed in the black portion. When a pixel is detected, the pixel is set to point-1.
[0115]
The data analysis unit 77 adds the number of points obtained by image analysis for each GOP. The sum of the GOP units (for each point) is associated with the position data (time from the beginning of the audio video data) t0 of the encoded video data D for each chapter, and is stored as a point table (not shown). ).
[0116]
Here, the point table will be described.
FIG. 20 is a diagram representing the point table in the form of a graph. The vertical axis represents the total point S for each GOP, and the horizontal axis represents position data t0 (here, “time”). A curve U in the graph indicates the sum of points for each GOP. One chapter is from time t1 to time t2. Point A2 is the point with the largest total of points in the chapter. The time at that time is tA2.
[0117]
The data extraction unit 78 analyzes the point table (FIG. 20) and detects the maximum point A2 of the total S of points for each GOP. Then, the encoded video data D before and after the time tA2 is extracted as a highlight scene so as to have a designated reproduction time. That is, in FIG. 20, the highlight scene is video encoded data D from time tC2 to time tB2. At this time, time tC2-tB2 = reproduction time, where tC2-tA2 = Δt2 = tA2-tB2. Let this highlight scene be a moving image thumbnail.
[0118]
The table creation unit 79 stores the created moving image thumbnail in the moving image thumbnail table L.
[0119]
The encoding unit 11, the recording date and time analysis unit 12, the control information data creation unit 14, the menu screen creation unit 15 and the write control unit 16 (including the explanations of FIGS. 2 to 4 and 8 relating to the present embodiment) Since this is the same as the embodiment, the description thereof is omitted.
[0120]
Next, the operation (signal processing method including authoring) of the third embodiment of the DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
9 to 11, 13, and 14 relating to the present embodiment are the same as those of the first embodiment, and thus the description thereof is omitted.
[0121]
Next, the operation of creating a moving image thumbnail in step S03 will be further described.
FIG. 21 is a flowchart showing the operation of creating a moving image thumbnail in step S03.
[0122]
(1) Step S61
The data detection unit 76 of the moving image thumbnail creation unit 13b selects one chapter based on the encoded video data D and the chapter table E, and acquires the encoded video data D of the chapter.
(2) Step S62
The data detection unit 76 analyzes the code of the video encoded data D of the entire chapter frame by frame. Then, a pixel indicating a skin color (predetermined color range) in a predetermined region (specified by a mask image) is detected as a face and pointed.
(3) Step S63
The data analysis unit 77 adds the number of points obtained by image analysis for each GOP. Then, the GOP unit point and the position data t0 of the video encoded data D are associated with each other and stored in a storage unit (not shown) as a point table (FIG. 20).
(4) Step S64
The data extraction unit 78 analyzes the point table (FIG. 20) and detects the point A2 having the maximum point. Then, the encoded video data D before and after the time tA2 is extracted as a highlight scene so as to have a designated reproduction time. And let it be a moving image thumbnail F1.
(5) Step S65
The table creation unit 79 creates a moving image thumbnail table L in which each frame of the moving image thumbnail F1 is associated with the encoded video data D.
(6) Step S66
If the table creation unit 79 has created moving image thumbnails for all chapters (Yes), the table creation unit 79 ends the process. If there is a chapter for which a moving image thumbnail has not been created (No), the process returns to step S61.
[0123]
Through such a process, it becomes possible to automatically create moving image thumbnails having the same playback time for all chapters by an optimum method.
In the case of this method, since only color comparison is performed compared to general face detection, processing can be performed at high speed.
[0124]
In the present embodiment, a human face is detected. For example, if a color of a specific animal is set, the specific animal can be detected and a moving image thumbnail can be generated. Similarly, by setting colors such as green of plants and sky blue of the sky, it is possible to detect a natural landscape and create a moving image thumbnail. The color setting is input from the moving image condition data C, for example.
[0125]
Also in this embodiment, the same effect as that of the first embodiment can be obtained.
[0126]
(Fourth embodiment)
A signal processing apparatus with an authoring function and a signal processing method including authoring according to a fourth embodiment of the present invention will be described with reference to the accompanying drawings.
[0127]
First, the configuration of a fourth embodiment of a DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
FIG. 22 is a block diagram showing a configuration of a fourth embodiment of a DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied. The DVD device 1a records and stores video encoded data obtained by encoding the audio video data and other data on the DVD based on the input of the audio video data A and various conditions (B and C, which will be described later). The DVD device 1a includes a signal processing device 2a with an authoring function, a DVD drive unit 3, and a system microcomputer 4. Here, a configuration for reading DVD data is omitted.
[0128]
The signal processing device 2a with an authoring function is controlled by the system microcomputer 4 and based on the input of the audio video data A, video encoded data D (described later), menu screen data H2 (described later), and control information data G2 (described later). And generate And control which records them on a predetermined storage medium is performed. The storage medium is exemplified by a DVD.
Under the control of the system microcomputer 4, the DVD drive unit 3 stores the encoded video data D, menu screen data H2, and control information data G2 output from the signal processing device with authoring function 2a in a storage medium set therein. Record (store).
The system microcomputer 4 controls the DVD device 1 including the signal processing device 2 a with an authoring function and the DVD drive unit 3. The system microcomputer 4 is exemplified by an MPU (microprocessor unit).
[0129]
The signal processing device with authoring function 2a includes an encoding unit 11a, a recording date and time analysis unit 12, a control information data creation unit 14a, a menu screen creation unit 15a, and a write control unit 16a. Here, the encoding unit 11a and the recording date and time analysis unit 12 are also referred to as a data preprocessing unit 2a-2. The data preprocessing unit 2a-2, the control information data creation unit 14a, and the menu screen creation unit 15a are also referred to as a data processing unit 2a-1.
[0130]
The encoding unit 11a generates encoded video data D obtained by encoding audio video data based on audio video data A input from the outside. At the same time, video data J is generated based on the encoded data calculated in the process of encoding the audio video data A.
[0131]
Here, the encoded data is a DC coefficient (DC component) obtained as a result of an operation of a discrete cosine transform (Discrete Cosine Transform, also referred to as “DCT” in the present specification) used in encoding audio video data. It is the data which extracted only. The moving image data J is generated by continuously arranging images (size = vertical 1/8 × horizontal 1/8 of the original frame) composed of the DC coefficients of the audio video data A. Audio video data A, image data, audio video data A encoding and video encoded data D are the same as those in the first embodiment.
[0132]
The moving image data J is obtained by compressing the audio video data A. By dividing this for each chapter, a video thumbnail F2 for each chapter is obtained. That is, the moving image data J is a set of moving image thumbnails F2.
[0133]
The encoding unit 11a further creates a moving image data table K in which each frame of the moving image data J is associated with the encoded video data D. That is, a moving image data table K is created in which moving image data indicating each frame of the moving image data J is associated with the position data of the encoded video data D and the address of the VOB file. The VOB address is a data position (address from the head) after compression (MPEG) for each frame.
[0134]
When the encoded video data D is generated, the moving image data J for the moving image thumbnail F2 (described later) can be obtained by adding a few processes at the same time. In this moving image data J, one block (8 pixels × 8 pixels) data to be subjected to DCT calculation is represented by one DC coefficient, so that the data is compressed to 1/8 × 1/8 = 1/64. I can do it. In this case, data is boldly omitted, but a sufficient resolution can be obtained for the use of the moving image thumbnail on the menu screen.
[0135]
FIG. 23 is a table showing the moving image data table K. A frame number 41 which is a serial number of the frame, a time 42 as position data (time from the beginning of the audio video data) of the video encoded data D, and an address 43 of the VOB file as an address for the frame in the video encoded data D Frame image data 44 as moving image data is associated with each frame.
[0136]
Since the recording date and time analysis unit 12 and the chapter table E are the same as those in the first embodiment, description thereof is omitted.
[0137]
Referring to FIG. 22, menu screen creation unit 15 a creates menu screen data H <b> 2 indicating a menu screen for moving images based on moving image data table K and chapter table E. However, the moving image thumbnail F2 is generated by extracting the corresponding portion of the moving image data J from the moving image data table K in correspondence with the chapter defined in the chapter table E.
Here, the moving image menu screen displays a plurality of moving image thumbnails F2 (not shown) corresponding to a plurality of chapters at a time on one screen. For example, on the menu screen when there are four chapters, four moving image thumbnails F2 can be viewed on one screen. At this time, the moving image condition data C indicating the condition of the moving image thumbnail F2 (example: screen size, reproduction time) is referred to, and each moving image thumbnail F2 is generated so as to meet the conditions.
[0138]
Then, the moving image thumbnail F2 can be selected on the screen by a pointing device (example: mouse). In this case, in the menu screen data H2, each moving image thumbnail F2 is associated with the address 43 of the VOB file in the moving image data table K, so that the chapter corresponding to the moving image thumbnail F2 selected on the screen can be reproduced. I can do it.
[0139]
The control information data creation unit 14a creates control information data G2 (described later) including data related to a plurality of chapters based on the moving image data table K and the chapter table E. That is, the control information data G2 of the DVD is created based on the moving image data table K (the moving image data J thereof), and chapter data (for example, indicating which chapter each program is included in the PTT 37 of the control information data G2). Stores chapter number 31). However, the control information data G2 is the same as that in the first embodiment.
[0140]
The write control unit 16a receives the encoded video data D, the menu screen data H2, and the control information data G2, and outputs the data to the DVD drive unit 3 so as to record each data in a predetermined area of the DVD. Control.
At this time, the menu screen data H2 is stored in VMGM_VOBS of the VMG 63, the control information data G2 is stored in VTSI of the VTS 67, and the video encoded data D is stored in VTSM_VOBS to VTSTT_VOBS.
[0141]
Next, the operation (signal processing method including authoring) of the fourth embodiment of the DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied will be described.
FIG. 24 is a flowchart showing the operation (signal processing method including authoring) of the fourth embodiment of the DVD apparatus (MPEG apparatus) to which the signal processing apparatus with an authoring function according to the present invention is applied.
[0142]
(1) Step S11
The encoding unit 11a generates encoded video data D obtained by encoding the audio video data A based on the audio video data A input from the outside. At the same time, moving image data J is generated based on the encoded data calculated in the process of encoding the audio video data A, and a moving image data table in which each frame of the moving image data J is associated with the encoded video data D. Create K.
[0143]
(2) Step S12
The recording date and time analysis unit 12 divides the audio video data A into a plurality of chapters based on the date and time data of the audio video data A input from the outside. However, the chapters are adjusted so that the number of chapters into which the audio video data A is divided does not exceed the maximum number of chapters N indicated by the maximum number of chapters data B input from the outside. Then, a chapter table E is generated.
[0144]
(3) Step S13
Based on the moving image data table K and the chapter table E, the menu screen creating unit 15a creates menu screen data H2 indicating a moving image menu screen.
[0145]
(4) Step S14
The control information data creation unit 14a creates control information data G2 including data on a plurality of chapters based on the moving image data table K and the chapter table E. At this time, chapter data (example: chapter number 31) indicating which chapter each program is included in is stored in the PTT of the control information data G2.
[0146]
(5) Step S15
The write control unit 16 controls the output of data to the DVD drive unit 3 so as to record the menu screen data H2, the encoded video data D, and the control information data G2 in a predetermined area of the DVD. The DVD drive unit 3 writes those data on the DVD.
[0147]
Here, the operation of creating the moving image data table K in the operation of step S11 will be further described.
FIG. 25 is a flowchart showing the operation of creating the moving image data table K in step S11. These steps are performed only for I pictures.
[0148]
(1) Step S71
A DC coefficient obtained by DCT calculation performed at the time of MPEG encoding is extracted.
(2) Step S72
The DC coefficients for one frame of the audio video data A are rearranged to generate an image for moving picture for one frame (size = vertical 1/8 × horizontal 1/8 of the original frame).
(3) Step S73
The moving image data J is generated by continuously arranging the generated moving image images.
(4) Step S74
For each frame of the moving image, a moving image data table K in which moving image data J and encoded video data D are associated is created. In other words, a moving image data table K is created in which moving image data indicating each frame of a moving image is associated with the position data of the encoded video data D and the address of the VOB file.
[0149]
The moving image data J represents one GOP (1 VOBU = normally 0.5 seconds) of the audio video data A as one image. That is, it is possible to obtain moving image data that is significantly shortened.
[0150]
However, the operation for creating the chapter table in step S12 is the same as that in step S02 shown in FIG.
[0151]
Here, the operation of creating the menu screen data in step S13 will be further described.
FIG. 26 is a flowchart showing the operation of creating the menu screen data in step S13. Here, a case where the number of chapters is four will be described.
[0152]
(1) Step S81
The menu screen creation unit 15a sets chapter number k = 1 in order to specify the start time of each chapter (position data: time in audio video data).
(2) Step S82
The menu screen creation unit 15a starts the chapter number k from the time 42 (end time of the previous chapter) in the previous column of the chapter number 41 = k column of the chapter table E with respect to the start time of the chapter number k. Find the time. (Start time) = (Time 42 in the previous column) + (Time for one frame)
For example, in the case of chapter number k = 2, the column immediately before the column of chapter number 41 = k = 2 is the column of chapter number 41 = 1. Accordingly, (start time of chapter number k = 2) = (time 42 (end time of chapter 1) in the column of chapter number k = 1) + (time for one frame). However, chapter number k = 1 is the first chapter, and there is no previous column. In this case, start time = 0.
(3) Step S83, Step S84
The menu screen creation unit 15a repeats Steps S82 to S84 until the chapter number k is equal to or greater than the maximum number of chapters N (N = 4 in this case).
Thereby, the start time of each chapter can be specified.
(4) Step S85
The menu screen creation unit 15a sets frame number m = 1 in order to create the first frame of the moving picture menu screen (menu screen data H2).
(5) Step S86
The menu screen creation unit 15a sets chapter number k = 1 in order to perform processing on chapter 1 of the first frame.
(6) Step S87
Based on the moving image data table K, the menu screen creation unit 15a acquires frame image data 44 corresponding to the m = 1st frame in the chapter with the chapter number k = 1 from the moving image data table K. Then, m = 1 is pasted on the upper right of the first frame on the menu screen.
(7) Step S88, Step S89
The menu screen creation unit 15a repeats Steps S87 to S89 until the chapter number k is equal to or greater than the maximum number of chapters N (N = 4 in this case).
Thus, m = 1 first frame of the menu screen data H2 is completed. However, the frame image data 44 is pasted at the upper left of the frame when k = 2, the lower right of the frame when k = 3, and the lower left of the frame when k = 4, respectively.
(8) Step S90, Step S91
The menu screen creation unit 15a repeats Steps S86 to S91 until the frame number m is equal to or greater than the designated frame number M (corresponding to the designated playback time).
As a result, a menu screen (still image) of the specified number of frames M (specified playback time) is generated.
(9) Step S92
The menu screen creation unit 15a compresses the obtained plurality of menu screens (still images) according to the MPEG2 standard to form a VOB file, and completes the menu screen data H2. It includes the moving image thumbnails of four chapters by the above process.
[0153]
The menu screen data H2 created in (1) Steps S81 to (9) Step S92 is the same as that shown in FIG.
[0154]
In this way, menu screen data including a moving image thumbnail of each chapter can be automatically created.
[0155]
As described above, the present invention can obtain the same effects as those of the first embodiment.
[0156]
【The invention's effect】
According to the present invention, when a plurality of audio video data is stored in one storage medium, the audio video data is automatically divided, chapters are automatically generated, and a menu screen using moving image thumbnails is automatically generated. It becomes possible. Then, it becomes possible to quickly and accurately grasp the contents in the storage medium.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of first to third embodiments of a DVD apparatus to which a signal processing apparatus with an authoring function according to the present invention is applied.
FIG. 2 is a diagram illustrating a configuration of a recording date analysis unit.
FIG. 3 is a diagram illustrating date and time data and position data.
FIG. 4 is a table showing a chapter table.
FIG. 5 is a table showing a moving image thumbnail table;
FIG. 6 is a block diagram illustrating a configuration of a moving image thumbnail creation unit 13;
FIG. 7 is a diagram representing a difference value table in the form of a graph.
FIG. 8 is a diagram illustrating a structure of data stored in a DVD.
FIG. 9 is a flowchart showing the operation of the first to third embodiments of a DVD apparatus to which the signal processing apparatus with an authoring function according to the present invention is applied.
FIG. 10 is a flowchart showing an operation of creating a chapter table in step S02.
11A to 11C are diagrams showing a process of generating the chapter table of FIG.
FIG. 12 is a flowchart showing an operation of creating a moving image thumbnail in step S03.
FIG. 13 is a flowchart showing an operation of creating menu screen data in step S05.
FIG. 14 is a diagram showing a moving image menu screen;
FIG. 15 is a block diagram illustrating a configuration of a moving image thumbnail creating unit 13a.
FIG. 16 shows Sign It is the figure which expressed the quantity table in the form of a graph.
FIG. 17 is a flowchart showing an operation of creating a moving image thumbnail in step S03.
FIG. 18 is a block diagram illustrating a configuration of a moving image thumbnail creation unit 13b.
FIGS. 19 (a) and 19 (b) are diagrams illustrating a method for detecting a human face.
FIG. 20 is a diagram representing a point table in the form of a graph.
FIG. 21 is a flowchart showing an operation of creating a moving image thumbnail in step S03.
FIG. 22 is a block diagram showing a configuration of a fourth embodiment of a DVD apparatus to which the signal processing apparatus with an authoring function according to the present invention is applied.
FIG. 23 is a table showing a moving image data table.
FIG. 24 is a flowchart showing the operation of the fourth embodiment of the DVD apparatus to which the signal processing apparatus with an authoring function according to the present invention is applied.
FIG. 25 is a flowchart showing an operation of creating a moving image data table in step S11.
FIG. 26 is a flowchart showing the operation of creating the menu screen data in step S13.
[Explanation of symbols]
1 (a) DVD device
2 (a) Signal processing device with authoring function
2-1, 2a-1 Data processing section
2-2 Data creation unit
2a-2 Data pre-processing unit
2-3 Data pre-processing unit
3 DVD drive
4 System microcomputer
11 (a) Encoding part
12 Recording date analysis part
13 (a, b) Movie thumbnail creation section
14 (a) Control information data creation unit
15 (a) Menu screen creation section
16 (a) Write controller
21 Chapter division
22 Chapter Restrictions
23 Table generator
25, 55, 75 Movie thumbnail creation execution unit
26 Highlight scene detector
27 Creation method selector
28 Creation method execution part
29 Table creation section
31 Chapter number
32 Chapter end position
33 Chapter recording date and time interval
34 Video title set
35 titles
36 Program Chain (PGC)
37 PTT (chapter)
38 Program (PG)
38-1 cells
38-2 Video Object Unit (VOBU)
38-3 pack
50 Menu screen
51-1 Movie Thumbnail of Chapter 1
51-2 Video thumbnail of chapter 2
51-3 Movie thumbnail of chapter 3
51-4 Thumbnail video thumbnail
52 Menu button
56, 76 Data detector
57, 77 Data analysis section
58, 78 Data extraction unit
59, 79 Table creation section
61 Data stored on DVD
63 Video Manager (VMG)
67 Video title set (VTS)
A Audio video data
B Maximum number of chapters data
C Movie condition data
D Video encoded data
E Chapter table
F (1,2) Movie thumbnail
G (1,2) Control information data
H (1,2) Menu screen data
I (1, 2) Movie thumbnail + control information data + menu screen data
J Movie data
K video data table
L Movie thumbnail table

Claims

A data processing unit for creating a video thumbnail for each of a plurality of chapters generated by dividing audio video data for each chapter, and creating menu screen data including the video thumbnail;
A write control unit for controlling the recording of the menu screen data in a storage medium;
Comprising
The audio video data includes a plurality of image data, date / time data indicating a recording date / time of the image data, and position data indicating a position of the image data in the audio / video data,
The menu screen data indicates a menu screen that simultaneously displays a part or all of the video thumbnails of the plurality of chapters,
The data processing unit
Dividing the audio-video data to generate the plurality of chapters, creating the moving image thumbnail for each of the plurality of chapters, moving image thumbnail data indicating data regarding the moving image thumbnail, and control information regarding the plurality of chapters A data creation unit for creating control information data indicating
A menu screen creation unit for creating menu screen data based on the video thumbnail data and the control information data;
The data creation unit
A data pre-processing unit that generates the plurality of chapters by dividing the audio-video data, creates chapter data indicating data related to the plurality of chapters, and video encoded data obtained by encoding the audio-video data;
A video thumbnail creation unit that creates the video thumbnail data based on the video encoded data and the chapter data;
A control information data creating unit that creates the control information data based on the video encoded data and the chapter data;
The video thumbnail creation unit
A data detection unit for detecting a position of a GOP (Group Of Picture) for each chapter based on the video encoded data and the chapter data;
Based on the detected GOP, for each of the chapters, and data analysis unit for creating a code amount table which associates with the position data and the code amount of each GOP in the GOP,
Based on the code amount table, for each chapter, a data extraction unit that extracts the video encoded data for a predetermined time including the GOP with the maximum code amount as the moving image thumbnail;
A table creation unit that generates the video thumbnail data based on the extracted video thumbnail,
The signal processing apparatus with an authoring function, wherein the moving image thumbnail data includes a moving image thumbnail table in which the moving image thumbnail for each chapter is associated with the position data.

A data detection unit for detecting the position of the GOP for each chapter from the audio video data divided into chapters;
Based on the GOP detected by the data detection unit, for each chapter, a data analysis unit that creates a code amount table in which a GOP unit code amount in the GOP is associated with position data of the GOP;
Based on the code amount table created by the data analysis unit, data for extracting, as the moving image thumbnail, the audio video data for a predetermined period including the GOP having the maximum code amount for each chapter. A signal processing device comprising: an extraction unit.

(A) creating a video thumbnail for each of a plurality of chapters generated by dividing audio video data into chapters, and creating menu screen data including the video thumbnails;
here,
The audio video data includes a plurality of image data, date / time data indicating a recording date / time of the image data, and position data indicating a position of the image data in the audio / video data,
The menu screen data indicates a menu screen that simultaneously displays a part or all of the video thumbnails of the plurality of chapters,
(B) recording the menu screen data in a storage medium;
Comprising
The step (a) includes:
(A1) creating video encoded data obtained by encoding the audio video data based on the audio video data;
(A2) dividing the audio video data based on the date and time data to generate the plurality of chapters, and creating chapter data indicating data related to the chapters;
(A3) creating a moving image thumbnail for each of the plurality of chapters based on the encoded video data and the chapter data, and generating moving image thumbnail data indicating data relating to the moving image thumbnail;
(A4) creating control information data indicating control information related to the plurality of chapters based on the video encoded data and the chapter data;
(A5) creating menu screen data based on the moving image thumbnail data and the control information data,
The step (a3)
(Aa6) detecting the position of the GOP for each chapter based on the video encoded data and the chapter data;
here,
The chapter data includes a chapter table associating chapter date data and chapter position data corresponding to each of the plurality of chapters,
The chapter date / time data is data based on the date / time data in each of the plurality of chapters,
The chapter position data is data based on the position data in each of the plurality of chapters,
(Aa7) based on the detected GOP, for each of the chapters, and creating a code amount table which associates with the position data and the code amount of each GOP in the GOP,
(Aa8) Based on the code amount table, for each chapter, extracting the video encoded data at a continuous predetermined time including the GOP having the maximum code amount as the moving image thumbnail;
(Aa9) creating video thumbnail data based on the extracted video thumbnail; and
The video thumbnail data includes a video thumbnail table in which the video thumbnail for each chapter and the position data are associated with each other. The signal processing method including authoring.

A first step of detecting a position of a GOP for each chapter from the audio-video data divided into chapters;
Based on the GOP that is detected by the first step, for each of the chapters, and a second step of creating a code amount table that associates the position data of the GOP and the code amount of each GOP in the GOP,
Based on the code amount table created in the second step, for each chapter, the audio video data of a continuous predetermined time including the GOP with the maximum code amount is extracted as the moving image thumbnail. A video thumbnail creation execution method comprising: a third step.

(C) creating a video thumbnail for each of a plurality of chapters generated by dividing audio video data into chapters, and creating menu screen data including the video thumbnails;
here,
The audio video data includes a plurality of image data, date / time data indicating a recording date / time of the image data, and position data indicating a position of the image data in the audio / video data,
The menu screen data indicates a menu screen that simultaneously displays a part or all of the video thumbnails of the plurality of chapters,
(D) recording the menu screen data in a storage medium;
Comprising
The step (c) includes:
(C1) creating video encoded data obtained by encoding the audio video data;
(C2) dividing the audio-video data to generate the plurality of chapters, and creating chapter data indicating data related to the chapters;
(C3) creating a video thumbnail for each of the plurality of chapters based on the video encoded data and the chapter data, and creating video thumbnail data indicating data related to the plurality of video thumbnails;
(C4) creating control information data indicating control information on the plurality of chapters based on the video encoded data and the chapter data;
(C5) creating menu screen data based on the moving image thumbnail data and the control information data,
The step (c3) includes:
(Ca6) detecting a position of a GOP for each chapter based on the video encoded data and the chapter data;
here,
The chapter data includes a chapter table associating chapter date data and chapter position data corresponding to each of the plurality of chapters,
The chapter date / time data is data based on the date / time data in each of the plurality of chapters,
The chapter position data is data based on the position data in each of the plurality of chapters,
(Ca7) Based on the detected GOP, for each chapter, creating a code amount table in which a GOP unit code amount in the GOP is associated with the position data;
(Ca8) Based on the code amount table, for each chapter, extracting the video encoded data at a continuous predetermined time including the GOP having the maximum code amount as the moving image thumbnail;
(Ca9) creating a video thumbnail data based on the extracted video thumbnail; and
The moving image thumbnail data includes a moving image thumbnail table in which the moving image thumbnail for each chapter and the position data are associated with each other.

A first step of detecting a position of a GOP for each chapter from the audio-video data divided into chapters;
Based on the GOP that is detected by the first step, for each of the chapters, and a second step of creating a code amount table that associates the position data of the GOP and the code amount of each GOP in the GOP,
Based on the code amount table created in the second step, for each chapter, the audio video data of a continuous predetermined time including the GOP with the maximum code amount is extracted as the moving image thumbnail. A program for causing a computer to execute a method comprising the third step.