JP2012142645A

JP2012142645A - Audio/video reproducing apparatus, audio/video recording and reproducing apparatus, audio/video editing apparatus, audio/video reproducing method, audio/video recording and reproducing method, and audio/video editing apparatus

Info

Publication number: JP2012142645A
Application number: JP2009109309A
Authority: JP
Inventors: Yoshiaki Kusunoki; 恵明楠; Masaaki Shimada; 昌明島田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2009-04-28
Filing date: 2009-04-28
Publication date: 2012-07-26
Also published as: WO2010125757A1

Abstract

PROBLEM TO BE SOLVED: To provide an audio/video recording and reproducing apparatus capable of selecting a proper image as a thumb nail.SOLUTION: An audio/video reproducing method includes extracting a scene boundary from image information, generating entry information for accessing the extracted scene boundary, selecting a thumb nail image representing scenes divided by the extracted scene boundary, displaying the selected thumb nail image, allowing an arbitrary thumb nail to be selected from the displayed thumb nails, and reproducing a scene corresponding to the selected thumb nail, and selects a picture a prescribed period after the scene boundary as the thumb nail image.

Description

本発明は、記録する映像をシーン単位に分割するとともに、シーンに対応するサムネイル画像を生成することによって、再生時に、視聴したいシーンを、サムネイル選択による視覚的な選択によって可能とする映像音声再生装置に関するものである。 The present invention relates to a video / audio reproduction device that divides a video to be recorded into scene units and generates a thumbnail image corresponding to the scene so that a scene desired to be viewed can be visually selected by thumbnail selection at the time of reproduction. It is about.

従来の映像記録再生装置においては、映像の再生開始位置から再生を行う場合、再生開始時点をタイトル名もしくはチャプター番号によって選択していたため、ユーザーは、再生を行う前の段階で、どのような内容の映像が再生されるかを知ることができなかった。 In the conventional video recording / playback device, when playing from the playback start position of the video, the playback start time is selected by the title name or chapter number, so the user can select what content before playback. Couldn't know if the video will be played.

このような問題を解消するため、録画時における映像符号化処理前の画像データを用いて、サムネイルデータを生成し、再生タイトルを選択するグラフィックス画面上に、生成したサムネイルデータを表示することによって、再生内容をユーザーに知らせる方法がある（特許文献１）。さらに、タイトル、もしくはタイトル内の特定区間（チャプターやシーン）の内容を示すサムネイルデータの画像をより適切な画像内容とするために、サムネイルデータの元となる映像の位置を決定する方法も考案されている（特許文献２及び３）。一方、特定のシーンの内容を、連続する静止画として表現するために、シーン内のシーンチェンジの発生間隔を利用し、サムネイル画像を決定する方法もある（特許文献４）。 In order to solve such a problem, by generating thumbnail data using image data before video encoding processing at the time of recording and displaying the generated thumbnail data on a graphics screen for selecting a playback title There is a method of informing the user of the playback content (Patent Document 1). Furthermore, in order to make the image of thumbnail data indicating the content of a title or a specific section (chapter or scene) in the title more appropriate image content, a method of determining the position of the video that is the source of the thumbnail data has been devised. (Patent Documents 2 and 3). On the other hand, there is also a method for determining a thumbnail image by using a scene change occurrence interval in a scene in order to express the content of a specific scene as a continuous still image (Patent Document 4).

特開２００６−１４８７３１号公報（第１４頁、図９）JP 2006-148731 A (page 14, FIG. 9) 特開２００６−２２９８２１号公報（第４頁、図１）JP 2006-229821 A (page 4, FIG. 1) 特開２００３−２７４３６１号公報（第４頁、図３）JP 2003-274361 A (page 4, FIG. 3) 特開２００１−２９８７１１号公報（第３頁、図２５）Japanese Patent Laid-Open No. 2001-298711 (page 3, FIG. 25)

しかし、従来の映像記録再生装置においては、シーンの開始および終了位置をチャプターもしくはエントリーで区切り、再生開始から固定時間、もしくはエントリー位置からの固定時間経過後の画像をサムネイルとして利用していた。そのため、選択したサムネイル画像の内容と、再生される映像内容に乖離があり、ユーザーが違和感を覚えることがあった。また、動きの変化が激しい場面のピクチャがサムネイルとして選択された場合、サムネイル画像が何を表示しているのか分からないという不具合もあった。 However, in the conventional video recording / playback apparatus, the start and end positions of a scene are separated by chapters or entries, and images after a fixed time from the start of playback or after a fixed time from the entry position are used as thumbnails. Therefore, there is a difference between the content of the selected thumbnail image and the content of the video to be played back, and the user sometimes feels uncomfortable. In addition, when a picture of a scene with a large change in motion is selected as a thumbnail, there is a problem that it is not possible to know what the thumbnail image is displayed.

本発明は、上記の課題を解決するためになされたものであり、記録する映像音声コンテンツをシーンに区切るとともに、各シーンについて、適切な画像をサムネイルとして選択することが可能な映像音声記録再生装置を提供することを目的とする。 The present invention has been made to solve the above-described problem, and is a video / audio recording / reproducing apparatus capable of dividing video / audio content to be recorded into scenes and selecting an appropriate image as a thumbnail for each scene. The purpose is to provide.

本発明に係る映像再生装置は、映像情報からシーンの境界を抽出するシーン境界抽出手段と、
シーン境界抽出手段によって抽出されたシーン境界にアクセスするためのエントリー情報を生成するエントリー生成手段と、
上記シーン境界抽出手段によって分割されたシーンを代表するサムネイル画像を選択するサムネイル生成手段と、
前記サムネイル生成手段によって選択されたサムネイル画像を表示するサムネイル表示手段と、
表示されたサムネイルから任意のサムネイルを選択するためのサムネイル選択手段と、
選択されたサムネイルに対応するシーンを再生する手段と、を備え、
前記サムネイル生成手段は、シーン境界から所定期間後のピクチャをサムネイル画像として選択するものである。 A video reproduction apparatus according to the present invention includes a scene boundary extraction unit that extracts a scene boundary from video information;
Entry generating means for generating entry information for accessing the scene boundary extracted by the scene boundary extracting means;
Thumbnail generation means for selecting a thumbnail image representative of the scene divided by the scene boundary extraction means;
Thumbnail display means for displaying a thumbnail image selected by the thumbnail generation means;
A thumbnail selection means for selecting an arbitrary thumbnail from the displayed thumbnails;
Means for playing a scene corresponding to the selected thumbnail,
The thumbnail generation means selects a picture after a predetermined period from the scene boundary as a thumbnail image.

本発明の映像音声再生装置によれば、シーン境界から所定期間後のピクチャをサムネイル画像として選択するので、映像音声コンテンツをシーンに区切るとともに、各シーンについて、適切な画像をサムネイルとして選択することができる。 According to the video / audio reproduction device of the present invention, since a picture after a predetermined period from a scene boundary is selected as a thumbnail image, it is possible to divide the video / audio content into scenes and to select an appropriate image as a thumbnail for each scene. it can.

実施の形態１に係る映像音声記録再生装置の構成を示すブロック図である。1 is a block diagram showing a configuration of a video / audio recording / reproducing apparatus according to Embodiment 1. FIG. シーン選択操作における表示画面の一例である。It is an example of the display screen in scene selection operation. シーンとサムネイル画像との関係を示す図である。It is a figure which shows the relationship between a scene and a thumbnail image. シーンチェンジとサムネイル画像との関係を示す図である。It is a figure which shows the relationship between a scene change and a thumbnail image. シーンチェンジとサムネイル画像との関係を示す図である。It is a figure which shows the relationship between a scene change and a thumbnail image. シーンチェンジとサムネイル画像との関係を示す図である。It is a figure which shows the relationship between a scene change and a thumbnail image. エントリー情報生成動作を示すフローチャートである。It is a flowchart which shows entry information production | generation operation | movement. サムネイル生成動作を示すフローチャートである。It is a flowchart which shows a thumbnail production | generation operation | movement. サムネイル生成動作を示すフローチャートである。It is a flowchart which shows a thumbnail production | generation operation | movement. サムネイル表示動作を示すフローチャートである。It is a flowchart which shows a thumbnail display operation | movement. 実施の形態２に係る映像音声記録再生装置の構成を示すブロック図である。6 is a block diagram showing a configuration of a video / audio recording / reproducing apparatus according to Embodiment 2. FIG. シーン境界検出方法の一例を示す図である。It is a figure which shows an example of the scene boundary detection method. 実施の形態３に係る映像音声記録再生装置の構成を示すブロック図である。6 is a block diagram showing a configuration of a video / audio recording / reproducing apparatus according to Embodiment 3. FIG.

実施の形態１．
図１は、本発明の実施の形態１に関る映像音声記録再生装置１の構成を示すブロック図である。映像音声記録再生装置１は、外部機器とのインターフェースとして、デジタルチューナー３、アナログチューナー４、外部入力端子５、モニター出力端子３１、ネットワーク端子３０を有する。デジタルチューナー３およびアナログチューナー４は、アンテナ９１に接続され、デジタルおよびアナログ放送を受信することができる。外部入力端子５は、主に、アナログ入力と呼ばれるコンポジット入力、Ｓビデオ入力、ＬＲ音声入力等に対応する。モニター出力端子３１にはモニター９３が接続され、映像音声記録再生装置１が生成する映像および音声信号を表示出力することが可能である。また、映像音声記録再生装置１は、装置を遠隔操作するためのリモコン受信６を備え、赤外線等で受信されたリモコンコードはシステム制御部２に送信される。 Embodiment 1 FIG.
FIG. 1 is a block diagram showing a configuration of a video / audio recording / reproducing apparatus 1 according to Embodiment 1 of the present invention. The video / audio recording / reproducing apparatus 1 includes a digital tuner 3, an analog tuner 4, an external input terminal 5, a monitor output terminal 31, and a network terminal 30 as interfaces with external devices. The digital tuner 3 and the analog tuner 4 are connected to the antenna 91 and can receive digital and analog broadcasts. The external input terminal 5 mainly corresponds to composite input called analog input, S video input, LR audio input, and the like. A monitor 93 is connected to the monitor output terminal 31 so that video and audio signals generated by the video / audio recording / reproducing apparatus 1 can be displayed and output. The video / audio recording / reproducing apparatus 1 includes a remote control receiver 6 for remotely operating the apparatus, and a remote control code received by infrared rays or the like is transmitted to the system control unit 2.

システム制御部２は、ユーザーの要求に基づくリモコンの制御信号に応じて映像音声記録再生装置１を制御する。システム制御部２は、ユーザーから録画の要求がなされると、録画アプリ９を起動し、録画実行を指示する。録画アプリ９は、録画制御１０を制御し、録画の開始、終了および録画中の記録動作を管理する。また、ユーザーから再生の要求がなされると、システム制御部２は、再生アプリ１１を起動し、再生実行を指示する。再生アプリ１１、は再生制御１４を制御し、指定されたコンテンツの再生を開始する。 The system control unit 2 controls the video / audio recording / reproducing apparatus 1 according to a control signal of a remote controller based on a user's request. When a recording request is made by the user, the system control unit 2 activates the recording application 9 and instructs execution of recording. The recording application 9 controls the recording control 10 and manages the start and end of recording and the recording operation during recording. When a playback request is made by the user, the system control unit 2 activates the playback application 11 and instructs playback execution. The playback application 11 controls the playback control 14 to start playback of the specified content.

図１を参照し、映像音声記録再生装置１における、デジタル放送の録画機能について説明する。デジタルチューナー３で受信されたデジタル放送は、フルＴＳ（ＴｒａｎｓｐｏｒｔＳｔｒｅａｍ）として入力ストリーム制御８に送られる。入力ストリーム制御８は、ＡＶストリーム記録部１７に映像音声データであるＡＶストリームを安定的に記録するために、データを一時的に蓄えて、ＡＶストリーム記録部１７に定常的に書き込む処理を行う。また入力ストリーム制御８は、フルＴＳから記録する必要のないパケット等を省いたパーシャルＴＳにも対応する。また、ＴＳのＰＩＤやセクション情報の検出処理を行うほか、記録されるビデオストリーム中のＧＯＰ（ＧｒｏｕｐＯｆＰｉｃｔｕｒｅ）の開始位置の検出等も行う。ＡＶストリーム記録部１７は、ＨＤＤ（ＨａｒｄＤｉｓｃＤｒｉｖｅ）等により構成される。 With reference to FIG. 1, the recording function of the digital broadcast in the audio / video recording / reproducing apparatus 1 is demonstrated. The digital broadcast received by the digital tuner 3 is sent to the input stream control 8 as a full TS (Transport Stream). The input stream control 8 performs a process of temporarily storing the data and writing the AV stream recording unit 17 in a steady manner in order to stably record the AV stream that is the video / audio data in the AV stream recording unit 17. The input stream control 8 also supports a partial TS in which packets that do not need to be recorded from the full TS are omitted. In addition to the TS PID and section information detection processing, the GOP (Group Of Picture) start position in the recorded video stream is also detected. The AV stream recording unit 17 is configured by an HDD (Hard Disc Drive) or the like.

ＨＤＤにＡＶストリームを記録する場合、再生時におけるシーン検索性の向上、特殊再生時における任意時間へのランダムシークを実現するため、ＨＤＤに記録されるＡＶストリームのアドレス情報と時間情報とを対応付けたアドレスマップが必要となる。録画制御１０は、録画時に、入力ストリーム制御８により検出されるＧＯＰ開始位置情報に基づいてアドレスマップを生成し、ＡＶ管理情報記録部１８に記録する。ＡＶ管理情報記録部１８は、前述のアドレスマップを記録するほか、記録された番組のタイトル、記録開始時間、記録終了時間、放送サービス名、チャンネル番号、映像コーデック情報、音声コーデック情報、番組詳細情報等が含まれる再生制御情報を記録し、必要に応じて再生制御１４に読み出す。 When recording an AV stream on the HDD, the address information and the time information of the AV stream recorded on the HDD are associated with each other in order to improve scene searchability during playback and to achieve random seek at an arbitrary time during special playback. Address map is required. The recording control 10 generates an address map based on the GOP start position information detected by the input stream control 8 during recording, and records it in the AV management information recording unit 18. The AV management information recording unit 18 records the above-described address map, as well as the recorded program title, recording start time, recording end time, broadcast service name, channel number, video codec information, audio codec information, and program details information. And the like are recorded, and read out to the reproduction control 14 as necessary.

次に、アナログ放送、および外部から入力された映像信号の録画機能について説明する。アンテナ９１で受信されたアナログ放送は、アナログチューナー４によってユーザーが指定されたチャンネルの信号のみが取り出される。アナログチューナー４により取り出された信号は、映像信号化され、図示しないＡＤ／ＤＡコンバーターによってデジタル信号化された後、ＡＶエンコーダ７に入力される。ＡＶエンコーダ７は、映像信号をＭＰＥＧ−２ビデオに符号化するとともに、音声信号をＡＡＣにより符号化し、符号化した映像および音声信号をＭＰＥＧ−２ＴｒａｎｓｐｏｒｔＳｔｒｅａｍに多重し、ＡＶストリームを生成する。ＡＶストリームは入力ストリーム制御８に送られ、ＡＶストリーム記録部１７に記録される。外部入力５から入力された映像音声信号についても同様に、ＡＤ／ＤＡ変換された後に、ＡＶエンコーダ７によってＡＶストリームに変換され、ＡＶストリーム記録部１７に記録される。 Next, analog broadcasting and a video signal recording function input from outside will be described. From the analog broadcast received by the antenna 91, only the signal of the channel designated by the user by the analog tuner 4 is extracted. The signal extracted by the analog tuner 4 is converted into a video signal, converted into a digital signal by an AD / DA converter (not shown), and then input to the AV encoder 7. The AV encoder 7 encodes the video signal into MPEG-2 video, encodes the audio signal by AAC, multiplexes the encoded video and audio signal with the MPEG-2 Transport Stream, and generates an AV stream. The AV stream is sent to the input stream control 8 and recorded in the AV stream recording unit 17. Similarly, the audio / video signal input from the external input 5 is also AD / DA converted, converted to an AV stream by the AV encoder 7, and recorded in the AV stream recording unit 17.

図１を参照し、映像音声記録再生装置１における再生機能について説明する。ユーザーが、図示しないリモコンに設けられた「録画タイトル一覧」ボタンを押下すると、録画された番組タイトルの全て、あるいは一部が画面上に表示される。ユーザーはリモコンに設けられた上下左右のカーソルキーによって希望するタイトルにカーソルを移動させ、「決定」ボタンを押下することで再生するタイトルを選択する。選択されたタイトルは、リモコンから赤外線信号としてリモコン受信６に送信され、ソフトウェアコードに変換された後、システム制御部２に送られ、グラフィックス画面が更新される。また、再生待機状態もしくは録画状態でない場合、これらのコードは再生アプリ１１に送られ、対応する再生動作機能が呼び出され、再生状態が変更される。以下、具体的な再生動作について説明する。 With reference to FIG. 1, the reproduction function in the video / audio recording / reproducing apparatus 1 will be described. When the user presses a “recorded title list” button provided on a remote controller (not shown), all or part of the recorded program titles are displayed on the screen. The user moves the cursor to a desired title by using the up / down / left / right cursor keys provided on the remote controller, and selects a title to be reproduced by pressing the “OK” button. The selected title is transmitted from the remote controller to the remote controller receiver 6 as an infrared signal, converted into a software code, and then transmitted to the system control unit 2 to update the graphics screen. If the playback standby state or the recording state is not set, these codes are sent to the playback application 11, the corresponding playback operation function is called, and the playback state is changed. Hereinafter, a specific reproduction operation will be described.

ユーザーが再生タイトルを決定すると、再生アプリ１１は指定タイトルを再生するように再生制御１４に指示を行う。再生制御１４は、そのタイトルに関する再生制御情報をＡＶ管理情報記録部１８から読み出し、図示しないＲＡＭ上に入力する。再生制御１４は、再生制御情報のうちアドレスマップを用いて、次に再生すべきストリームの再生開始時点に対応するアドレス情報を読み出し、そのアドレスのＡＶストリームをＡＶストリーム記録部１７から読み出し、出力ストリーム制御１６に読み出す。ＡＶデコーダ１５は、出力ストリーム制御１６から送り込まれたＡＶストリームを転送された順にデコードし、モニター出力端子３１からモニター９３に出力する。出力ストリーム制御１６は、ＡＶデコーダ１５がアンダーフローおよびオーバーフローを生じて映像及び音声が途切れないよう、映像音声データのデコードを行うよう、ＡＶストリーム記録部１７に記録されたＡＶストリームを読み出す。さらに、出力ストリーム制御１６は、ＡＶデコーダ１５のバッファの状態に応じて、一時的に保存されたＡＶストリームを、Ｈ／ＷのアシストであるＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）転送機能を利用してＡＶデコーダに転送を行う。 When the user determines the reproduction title, the reproduction application 11 instructs the reproduction control 14 to reproduce the designated title. The reproduction control 14 reads out reproduction control information related to the title from the AV management information recording unit 18 and inputs it to a RAM (not shown). The playback control 14 uses the address map of the playback control information to read address information corresponding to the playback start time of the stream to be played next, reads the AV stream at that address from the AV stream recording unit 17, and outputs the output stream. Read to control 16. The AV decoder 15 decodes the AV stream sent from the output stream control 16 in the order of transfer, and outputs the decoded AV stream from the monitor output terminal 31 to the monitor 93. The output stream control 16 reads the AV stream recorded in the AV stream recording unit 17 so as to decode the video / audio data so that the video / audio is not interrupted by causing the AV decoder 15 to underflow and overflow. Furthermore, the output stream control 16 uses a DMA (Direct Memory Access) transfer function, which is an H / W assist, for the AV stream temporarily stored in accordance with the buffer state of the AV decoder 15. Transfer to.

次に、映像音声記録再生装置１におけるシーン検出部５０の動作について説明する。シーン検出部５０は、録画動作時に入力ストリーム制御８に一時的に記録されるＡＶストリームの映像および音声の解析してシーン検出を行う。録画時に入力ストリーム制御８においてＧＯＰの先頭が検出されると、解析用デコーダ２１は、ＧＯＰの先頭からキーフレーム（Ｉピクチャ）を抽出し、Ｉピクチャのデコードを行う。デコードされた輝度信号（Ｙ）、および色差信号（Ｕ，Ｖ）のデータは、フレームバッファ２２に入力される。フレームバッファ２２は、デコードされたＹＵＶデータの少なくとも２つの画像を保持できるサイズを有する。シーン境界抽出２３は、フレームバッファ２２に入力された２つの画像の差を求め、画像の差が予め設定した閾値以上であればシーンチェンジが発生したものと判断し、シーン境界として判定する。 Next, the operation of the scene detection unit 50 in the video / audio recording / reproducing apparatus 1 will be described. The scene detection unit 50 performs scene detection by analyzing video and audio of an AV stream temporarily recorded in the input stream control 8 during a recording operation. When the head of the GOP is detected in the input stream control 8 during recording, the analysis decoder 21 extracts a key frame (I picture) from the head of the GOP, and decodes the I picture. The decoded luminance signal (Y) and color difference signal (U, V) data are input to the frame buffer 22. The frame buffer 22 has a size capable of holding at least two images of decoded YUV data. The scene boundary extraction 23 obtains the difference between the two images input to the frame buffer 22, and determines that a scene change has occurred if the difference between the images is equal to or greater than a preset threshold, and determines the scene boundary.

２つの画像の差を求める手法としては、比較する各画像のヒストグラムを求め、ヒストグラムの各要素の差分絶対値を累積することによって求める手段が考えられる。また、ＡＶストリーム中の符号化パラメータである動きベクトル等を検出して行ってもよい。また、顔検出および顔認識等によって検出された顔データを用いて、登場人物の構成の変化を捉えることによって実現してもよい。 As a technique for obtaining a difference between two images, a means for obtaining a histogram of each image to be compared and accumulating an absolute difference value of each element of the histogram can be considered. Alternatively, a motion vector or the like that is an encoding parameter in the AV stream may be detected. Further, it may be realized by capturing changes in the composition of characters using face data detected by face detection, face recognition, or the like.

シーン境界抽出２３によってシーン境界として判定された位置は、シーン再生開始位置とされる。エントリー生成２４は、シーン再生開始位置のアドレス情報および時間情報からなる再生エントリー情報を作成する。こうしたシーンは、一般的にはチャプターと呼ばれる。また、エントリー位置もしくはその情報は、チャプターマーク、あるいは単にチャプターと呼ばれることもある。このように生成されたエントリー情報は、メタデータ記録部１９に記録される。 The position determined as the scene boundary by the scene boundary extraction 23 is the scene reproduction start position. The entry generation 24 creates reproduction entry information including address information and time information of the scene reproduction start position. Such a scene is generally called a chapter. Further, the entry position or the information thereof may be called a chapter mark or simply a chapter. The entry information generated in this way is recorded in the metadata recording unit 19.

シーン境界抽出２３によりシーン境界が検出されると、サムネイル生成２５は、対応する画像情報をフレームバッファ２２から抜き出し、ビットマップのフォーマットに変換し、サムネイル画像を生成する。生成されたサムネイル画像は、サムネイル情報としてメタデータ記録部１９に記録する（サムネイルの選択方法の詳細については後述する）。なお、サムネイルのフォーマットはビットマップ以外に、ＴＩＦＦ、ＪＰＥＧ等のフォーマットであってもよい。また、サムネイルの画像データを記録する方法以外に、サムネイル画像が存在するＡＶストリームの時間情報、アドレス情報をサムネイル情報としてメタデータ記録部１９に記録してもよい。この場合、サムネイルのための実データの生成時間を必要とせずシステム負荷を低減できるほか、メタデータ記録部１９の領域を小さくするとともに、データ転送にかかる時間およびシステム負荷を低減できるというメリットがある。 When the scene boundary is detected by the scene boundary extraction 23, the thumbnail generation 25 extracts the corresponding image information from the frame buffer 22, converts it into a bitmap format, and generates a thumbnail image. The generated thumbnail image is recorded in the metadata recording unit 19 as thumbnail information (details of the thumbnail selection method will be described later). The thumbnail format may be a format such as TIFF or JPEG in addition to the bitmap. In addition to the method of recording thumbnail image data, time information and address information of an AV stream in which a thumbnail image exists may be recorded in the metadata recording unit 19 as thumbnail information. In this case, it is possible to reduce the system load without requiring generation time of actual data for thumbnails, and to reduce the area of the metadata recording unit 19 and reduce the time required for data transfer and the system load. .

次に、映像音声記録再生装置１において、エントリー情報とサムネイル情報を用いて任意のシーンを再生する方法について説明する。図２は、モニター９３にＡＶストリームの再生映像とグラフィックス画像が重畳された状態を示している。サムネイル画像は、メタデータ記録部１９に保存されているサムネイル情報を、メタデータ制御部１２により読み出し、サムネイル表示１３において画像化された後、グラフィックス重畳２０で再生映像に重畳され、モニター９３に表示される。 Next, a method for playing back an arbitrary scene using the entry information and thumbnail information in the video / audio recording / playback apparatus 1 will be described. FIG. 2 shows a state in which the playback video of the AV stream and the graphics image are superimposed on the monitor 93. For the thumbnail image, the thumbnail information stored in the metadata recording unit 19 is read out by the metadata control unit 12, converted into an image in the thumbnail display 13, and then superimposed on the reproduced video by the graphics superimposition 20. Is displayed.

ユーザーが所望のタイトルを選択すると、再生アプリ１１は、再生制御１４に選択されたタイトルの再生を指示する。再生制御１４は、ＡＶストリーム記録部１７からＡＶストリームを出力ストリーム制御１６に読み出す。出力ストリーム制御１６に読み出されたＡＶストリームは、ＡＶデコーダ１５によってデコードされ、モニター９３に表示される。 When the user selects a desired title, the playback application 11 instructs the playback control 14 to play the selected title. The reproduction control 14 reads the AV stream from the AV stream recording unit 17 to the output stream control 16. The AV stream read by the output stream control 16 is decoded by the AV decoder 15 and displayed on the monitor 93.

ユーザーが、シーンセレクトのために設けられたリモコンの専用ボタンを押下すると、再生アプリ１１がメタデータ制御部１２に対して、再生すべきタイトルと、再生映像の経過時間情報を伝える。メタデータ制御部は、経過時間の前後５枚のサムネイルをメタデータ制御部１２から読み出す。サムネイル選択枠２８は、サムネイルの選択枠を生成しサムネイル表示１３に出力する。サムネイル表示１３は、サムネイルのメタデータを画像として展開し、選択枠とともにグラフィックス重畳に出力する。 When the user presses a dedicated button on the remote control provided for scene selection, the playback application 11 informs the metadata control unit 12 of the title to be played back and elapsed time information of the playback video. The metadata control unit reads five thumbnails before and after the elapsed time from the metadata control unit 12. The thumbnail selection frame 28 generates a thumbnail selection frame and outputs it to the thumbnail display 13. The thumbnail display 13 expands the metadata of the thumbnail as an image and outputs it to the graphics overlay together with the selection frame.

上記処理により、図２に示すように、現在のシーンに対応するサムネイル（Ｓｃｅｎｅ１７）の前後を含む合計５枚のサムネイルが表示される。サムネイルが表示された初期状態においては、リモコン選択枠は、現在再生中のシーンに合わせて表示される。ここでは、シーン１７を再生しているためＳｃｅｎｅ１７のサムネイルに対してリモコン選択枠が表示される。ユーザーはリモコンの左右キーを使ってリモコン選択枠を左右に移動させ、所望のシーンを選択することができる。サムネイル選択２８は、このリモコン動作に連動して、左右に移動する選択枠の生成を行う。さらに、リモコン選択枠が画面の右もしくは左端に到達した状態でさらに左右のカーソルキーを押下すると、表示されているサムネイル全体が、押下されたカーソルキーと反対の方向に移動することによって、現在表示されていないシーンのサムネイルが表示される。ユーザーが、所望のサムネイルにカーソルを合わせ、「決定」キーを押下すると、再生アプリ１１は再生制御１４に対して再生しているコンテンツの再生状態を一時停止させる。 As a result of the above processing, as shown in FIG. 2, a total of five thumbnails including before and after the thumbnail (Scene 17) corresponding to the current scene are displayed. In the initial state in which thumbnails are displayed, the remote control selection frame is displayed in accordance with the currently reproduced scene. Here, since the scene 17 is being reproduced, a remote control selection frame is displayed for the thumbnail of Scene17. The user can select a desired scene by moving the remote control selection frame left and right using the left and right keys of the remote control. The thumbnail selection 28 generates a selection frame that moves to the left and right in conjunction with this remote control operation. Furthermore, if the left and right cursor keys are pressed while the remote control selection frame has reached the right or left edge of the screen, the entire displayed thumbnail moves in the opposite direction to the pressed cursor key, so that the current display is displayed. A thumbnail of the scene that has not been displayed is displayed. When the user moves the cursor to a desired thumbnail and presses the “OK” key, the playback application 11 causes the playback control 14 to pause the playback state of the content being played back.

さらに、再生アプリ１１は、再生制御１４に対して、ユーザーが選択したシーンの開始位置を示すエントリー情報に基づいて、シーンのエントリー情報によって指定される時間から再生を再開するよう指示を行う。再生制御１４は、ＡＶ管理情報記録部１８に記録されているアドレスマップ情報を用いて、エントリーの時間情報をＡＶストリームのアドレス情報に変換し、出力ストリーム制御１６に送る。出力ストリーム制御１６は、再生制御１４からのアドレス情報に基づいて、選択されたシーンのＡＶストリームのデータをＡＶストリーム記録部１７から読出し、ＡＶでコーダ１５に出力する。ＡＶデコーダ１５は、出力ストリーム制御１６からのＡＶストリームのデータをデコードする。これにより、ユーザーが選択したサムネイルに対応するシーンの再生が可能となる。 Furthermore, the playback application 11 instructs the playback control 14 to restart playback from the time specified by the entry information of the scene, based on the entry information indicating the start position of the scene selected by the user. The playback control 14 uses the address map information recorded in the AV management information recording unit 18 to convert the entry time information into AV stream address information and sends it to the output stream control 16. Based on the address information from the playback control 14, the output stream control 16 reads the AV stream data of the selected scene from the AV stream recording unit 17 and outputs it to the coder 15 in AV. The AV decoder 15 decodes the AV stream data from the output stream control 16. As a result, the scene corresponding to the thumbnail selected by the user can be reproduced.

図３は、サムネイルとシーンの開始位置であるエントリーとの関係を示す説明図である。ここでは、映像の変化点であるシーンチェンジをシーンの境界としている。なお、シーンの境界は、シーンチェンジに限らず、例えば音声が発生しない区間、コーデックやフォーマットの変化点、周波数、チャンネル数、量子化数の変化位置に基づいて検出してもよい。また、５分、１０分、１５分等の所定時間間隔、あるいはユーザー指定による時間間隔で区切ってシーンを構成してもよい。また、外部から与えられた特定のイベントに対してシーンを構成してもよい。このように分割された各シーンにおいて、シーンの内容を表示するサムネイル画像が生成される。 FIG. 3 is an explanatory diagram showing the relationship between the thumbnail and the entry that is the start position of the scene. Here, a scene change, which is a video change point, is used as a scene boundary. Note that the scene boundary is not limited to a scene change, and may be detected based on, for example, a section where no sound is generated, a codec or format change point, a frequency, the number of channels, and a quantization number change position. Alternatively, the scene may be divided at predetermined time intervals such as 5 minutes, 10 minutes, 15 minutes, or time intervals specified by the user. A scene may be configured for a specific event given from the outside. In each scene divided in this way, a thumbnail image that displays the contents of the scene is generated.

本実施の形態は、図３に示すように、シーン先頭のピクチャではなく、シーンの先頭以降のピクチャをサムネイルとすることを特徴とする。映像が継続して変化しているような画像は、その位置をシーンの境界であるエントリーとして採用することは適切であるが、その画像をシーンの代表画であるサムネイル画像として採用することは適切でない。一般的に、継続的に変化する映像から一つの画像を抜き出した場合、その画像だけでは何が表示されているのか分からないことが多い。このため、ユーザーが再生するシーンを選択するために使用されるサムネイル画像として、このような動いている映像の一部を用いることは不適切で、静止画もしくは静止画に近い画像を採用することが適切と考えられる。シーンの開始位置ではなく、所定時間経過後のピクチャをサムネイル画像とすることにより、こうした不具合を解消することができる。 As shown in FIG. 3, the present embodiment is characterized in that thumbnails are the pictures after the head of the scene, not the pictures at the head of the scene. For images where the video is changing continuously, it is appropriate to adopt the position as an entry that is the boundary of the scene, but it is appropriate to adopt that image as a thumbnail image that is a representative image of the scene. Not. In general, when one image is extracted from a continuously changing video, it is often difficult to know what is displayed only by the image. For this reason, it is inappropriate to use a part of the moving video as a thumbnail image used to select a scene to be played back by the user, and a still image or an image close to a still image should be adopted. Is considered appropriate. Such a problem can be solved by using a picture after a predetermined time instead of a scene start position as a thumbnail image.

シーン境界から時間的に遅れたピクチャをサムネイルとする方法としては、シーンチェンジを検出した後、一定時間、例えば３秒後の位置をサムネイル映像とする方法がある（図４）。また、シーンチェンジ検出後、シーンの変化量が閾値以下になり、かつ閾値以下が一定時間経過（例えば３秒）したところを選択してもよい（図５）。また、映像データが動きベクトル情報を有するデータである場合、動きベクトルの量がある閾値以下になるところを選択してもよい。このように、シーン境界から所定時間経過後のピクチャをサムネイルとして使用することにより、各シーンの内容に整合した画像をシーン選択のための画像としてユーザーに提供することが可能となる。 As a method of using a picture that is delayed in time from a scene boundary as a thumbnail, there is a method of using a thumbnail image at a position after a certain time, for example, 3 seconds after detecting a scene change (FIG. 4). In addition, after the scene change is detected, the scene change amount may be selected to be equal to or less than a threshold value, and a certain time elapses (for example, 3 seconds) when the threshold value is less than the threshold value may be selected (FIG. 5). In addition, when the video data is data having motion vector information, a place where the amount of motion vector falls below a certain threshold value may be selected. In this way, by using a picture after a predetermined time from the scene boundary as a thumbnail, it is possible to provide the user with an image that matches the contents of each scene as an image for scene selection.

図４は、サムネイル画像の選択方法の一例を示す説明図である。図４において、横軸はピクチャ番号、縦軸は隣接する２つのピクチャの画像差分値である。ここでは、説明を単純化するため、１秒間に１枚のピクチャを解析対象とする。画像差分値は、例えば、２つのピクチャの輝度信号（Ｙ）、色差信号（Ｕ，Ｖ）のそれぞれのヒストグラムを生成し、各要素の差分絶対値の累積和を全要素数で除算した値としてもよい。この場合、画像差分値は、最小０から１の間の値をとることになる。そしてこの画像差分値が、予め設定しているシーン変化判定閾値を越えた場合、ピクチャ間にシーンチェンジが発生したと判定する。 FIG. 4 is an explanatory diagram illustrating an example of a thumbnail image selection method. In FIG. 4, the horizontal axis represents the picture number, and the vertical axis represents the image difference value between two adjacent pictures. Here, in order to simplify the description, one picture per second is set as an analysis target. As the image difference value, for example, a histogram of the luminance signal (Y) and the color difference signal (U, V) of two pictures is generated, and the cumulative sum of the difference absolute values of each element is divided by the total number of elements. Also good. In this case, the image difference value takes a value between 0 and 1 at the minimum. When the image difference value exceeds a preset scene change determination threshold, it is determined that a scene change has occurred between pictures.

図４では、ピクチャ番号Ｐ５２において、シーン変化判定閾値を越えているため、ピクチャＰ５１とピクチャＰ５２の間でシーンチェンジが発生していると判定し、ピクチャＰ５２をシーン境界とする。よって、ピクチャＰ５２を、シーン開始位置、つまりエントリーとしてメタデータ記録部１９にエントリー情報を記録する。先述の通り、ピクチャＰ５２の画像は、当該シーンに対応するサムネイルとしては採用せず、ピクチャ５２から、さらに所定時間経過後（ここでは３秒経過後）のピクチャＰ５５をサムネイル画像として採用する。ピクチャＰ５５は、入力ストリーム制御８により検出され、解析用デコーダ２１でデコードされた後、フレームバッファ２２に入力される。サムネイル生成２５は、フレームバッファ２２に入力されたピクチャＰ５５のデータに基づいてサムネイルを生成し、メタデータ記録部１９に記録する。 In FIG. 4, since the scene change determination threshold is exceeded at picture number P52, it is determined that a scene change has occurred between pictures P51 and P52, and picture P52 is set as a scene boundary. Therefore, the entry information is recorded in the metadata recording unit 19 with the picture P52 as a scene start position, that is, an entry. As described above, the image of the picture P52 is not adopted as a thumbnail corresponding to the scene, and the picture P55 after a predetermined time has elapsed from the picture 52 (here, after 3 seconds have elapsed) is adopted as a thumbnail image. The picture P55 is detected by the input stream control 8, decoded by the analysis decoder 21, and then input to the frame buffer 22. The thumbnail generation 25 generates a thumbnail based on the data of the picture P55 input to the frame buffer 22 and records it in the metadata recording unit 19.

このようにシーンチェンジ、すなわちエントリー位置から所定時間経過した画像をサムネイルとすることによって、例えば、カメラのパーン映像、画像エフェクトによる画像変化中の画像、一瞬割り込まれた画像等、サムネイル画像として不適切な画像がサムネイルとして選択されることを防ぐことができる。 In this way, a scene change, that is, an image that has passed a predetermined time from the entry position is used as a thumbnail. It is possible to prevent an image from being selected as a thumbnail.

図５は、サムネイル画像の選択方法の他の例を示す説明図である。図５に示す方法では、シーン変化判定閾値以外に、サムネイル用判定閾値を用いてサムネイル画像を選択する。サムネイル用判定閾値は、サムネイル画像を選択するための閾値であり、シーンチェンジ発生後、画像差分値が当該閾値以下となった場合にサムネイル画像の選択を行う。 FIG. 5 is an explanatory diagram illustrating another example of a thumbnail image selection method. In the method shown in FIG. 5, a thumbnail image is selected using a thumbnail determination threshold in addition to the scene change determination threshold. The determination threshold for thumbnail is a threshold for selecting a thumbnail image, and a thumbnail image is selected when the image difference value becomes equal to or less than the threshold after a scene change occurs.

図６は、サムネイル画像の選択方法の他の例を示す説明図である。図６に示す方法では、
画像差分値がサムネイル用判定閾値を下回ってから所定時間経過後（ここでは３秒経過後）のピクチャＰ５８をサムネイル画像として選択する。これにより、同じ内容の画像が継続して表示される期間（この例ではピクチャＰ５５からピクチャＰ５８）における画像をサムネイル画像として選択できるようになる。そのため、内容を認識しやすい画像をサムネイルとして表示することができるとともに、シーンの映像とサムネイルの画像をより関連付けしやすくなる。 FIG. 6 is an explanatory diagram illustrating another example of a thumbnail image selection method. In the method shown in FIG.
A picture P58 is selected as a thumbnail image after a predetermined time has elapsed (here, 3 seconds have elapsed) after the image difference value falls below the thumbnail determination threshold. As a result, it is possible to select an image during a period in which images having the same contents are continuously displayed (in this example, from picture P55 to picture P58) as a thumbnail image. Therefore, an image whose contents can be easily recognized can be displayed as a thumbnail, and the scene video and the thumbnail image can be more easily associated with each other.

図６に示す例では、映像のシーン変化値がサムネイル用判定閾値を下回ってから所定時間経過後のピクチャをサムネイル画像として採用しているが、所定経過時間後に、それ以前のピクチャ（ピクチャＰ５６やピクチャＰ５７）を採用してもよい。 In the example shown in FIG. 6, a picture after a predetermined time has elapsed since the scene change value of the video falls below the thumbnail determination threshold is used as a thumbnail image. Picture P57) may be employed.

以下、フローチャートを用いて本実施の形態に関る映像音声記録再生装置１００の動作説明を行う。
図７は、エントリー情報生成のフローを示す図である。録画を開始すると、入力ストリーム制御８にＡＶストリームが順次入力される（Ｓ１０１）。入力ストリーム制御８は、ＡＶストリームからＧＯＰヘッダの検出を行い、キーフレームと呼ばれるＩピクチャのみを解析し、抽出する（Ｓ１０２）。キーフレームが抽出されると、当該ピクチャのデコードが実行され（Ｓ１０３）、画像差分値の算出が行われる（Ｓ１０４）。シーンチェンジが発生しないと判断された場合（Ｓ１０５においてＮｏ）、次のピクチャについてステップＳ１０２〜Ｓ１０４の処理が行われる。 Hereinafter, the operation of the video / audio recording / reproducing apparatus 100 according to the present embodiment will be described using a flowchart.
FIG. 7 is a diagram showing a flow of entry information generation. When recording is started, AV streams are sequentially input to the input stream control 8 (S101). The input stream control 8 detects a GOP header from the AV stream, analyzes and extracts only an I picture called a key frame (S102). When the key frame is extracted, the picture is decoded (S103), and the image difference value is calculated (S104). If it is determined that no scene change will occur (No in S105), the processing of steps S102 to S104 is performed for the next picture.

一方、シーンチェンジが検出されると（Ｓ１０５においてＹｅｓ）、シーンチェンジが発生した箇所をシーン境界と判定し、その位置のエントリー情報を生成する（Ｓ１０６）。生成されたエントリー情報は、メタデータ記録部１９に記録される（Ｓ１０７）。上記動作は録画が完了し、入力ストリーム制御８がストリーム入力の終了を検知するまで行われる。入力ストリーム制御８において、ＡＶストリーム中からストリームの終端を示すフラグが検出された場合（Ｓ１０８においてＹｅｓ）、録画処理が終了する（Ｓ１０９）。 On the other hand, when a scene change is detected (Yes in S105), the location where the scene change has occurred is determined to be a scene boundary, and entry information for that position is generated (S106). The generated entry information is recorded in the metadata recording unit 19 (S107). The above operation is performed until the recording is completed and the input stream control 8 detects the end of the stream input. When the input stream control 8 detects a flag indicating the end of the stream from the AV stream (Yes in S108), the recording process ends (S109).

図８は、サムネイル画像生成までの動作の一例を示すフローチャートである。図８のフローチャートは、図４に示す、シーンチェンジ発生から所定時間経過後（３秒経過後）のピクチャをサムネイル画像にする場合の動作を示している。
先に、図７を参照して説明した通り、入力ストリーム制御８は、ＡＶストリームからキーフレームであるＩピクチャを抽出し（Ｓ１０２）、デコード処理を行う（Ｓ１０３）。デコード処理の後、サムネイル生成タイマーがセットされているか否かが判別される（Ｓ１０４）。このサムネイル生成タイマーは、シーンチェンジ検出後、すなわちシーン境界検出後、後段のステップＳ１１４においてセットされ、以降、キーフレームがデコードされるごとに減算される（Ｓ１０５）。このサムネイル生成タイマーは、シーンチェンジ検出後、サムネイル画像を選択するまでの所定時間（３秒間）をカウントするためのものであり、所定時間が経過してサムネイル生成タイマーがタイムアウトすると（Ｓ１０６）、サムネイル生成２５においてサムネイル画像が生成される（Ｓ１０７）。サムネイル画像が生成されると、サムネイル情報がメタデータ記録部１９に記録される（Ｓ１０８）。サムネイル画像を生成するまでの所定時間が経過していない場合（Ｓ１０６においてＮｏ）、あるいはサムネイル画像の生成が完了した場合（Ｓ１０９）、ステップＳ１０４においてデコードされたキーフレームの画像差分値の算出が行われ（Ｓ１１０）、シーンチェンジ判定処理が行われる（Ｓ１１１）。シーンチェンジが検出された場合、シーン境界を示すエントリー情報が生成され（Ｓ１１２）、メタデータ記録部１９に記録される（Ｓ１１３）。エントリー情報が記録されると、サムネイル生成タイマーがセットされ、続く録画ストリームについてキーフレーム抽出処理（Ｓ１０２）が再開される。
以上の処理が、録画ストリームの入力が終了するまで行われる（Ｓ１１５）。 FIG. 8 is a flowchart illustrating an example of operations up to thumbnail image generation. The flowchart of FIG. 8 shows an operation in the case where the picture shown in FIG. 4 is a thumbnail image after a predetermined time has elapsed (after 3 seconds) from the occurrence of the scene change.
As described above with reference to FIG. 7, the input stream control 8 extracts an I picture that is a key frame from the AV stream (S102), and performs a decoding process (S103). After the decoding process, it is determined whether a thumbnail generation timer is set (S104). This thumbnail generation timer is set in the subsequent step S114 after the scene change is detected, that is, after the scene boundary is detected, and thereafter is subtracted every time the key frame is decoded (S105). The thumbnail generation timer is for counting a predetermined time (3 seconds) until a thumbnail image is selected after a scene change is detected. When the predetermined time elapses and the thumbnail generation timer times out (S106), the thumbnail generation timer is used. In the generation 25, a thumbnail image is generated (S107). When the thumbnail image is generated, the thumbnail information is recorded in the metadata recording unit 19 (S108). When the predetermined time has not elapsed until the thumbnail image is generated (No in S106), or when the generation of the thumbnail image is completed (S109), the image difference value of the key frame decoded in step S104 is calculated. (S110), a scene change determination process is performed (S111). When a scene change is detected, entry information indicating a scene boundary is generated (S112) and recorded in the metadata recording unit 19 (S113). When the entry information is recorded, a thumbnail generation timer is set, and the key frame extraction process (S102) is resumed for the subsequent recording stream.
The above processing is performed until the input of the recording stream is completed (S115).

ここではピクチャの処理を行うごとにタイマーを減算しているが、装置内に存在するタイマー機能を使ってタイムアウトをカウントしてもよい。また、記録されるＡＶストリームに重畳されているシステム時間を利用してもよい。 Here, the timer is subtracted every time a picture is processed, but a timeout may be counted using a timer function existing in the apparatus. Further, the system time superimposed on the AV stream to be recorded may be used.

図９は、サムネイル画像生成までの動作の一例を示すフローチャートである。図９のフローチャートは、図６に示す、シーンチェンジ発生後、画像差分値がサムネイル用判定閾値を下回ってから所定時間経過後（３秒経過後）のピクチャをサムネイル画像にする場合の動作を示している。
図９において、ステップＳ１０２〜Ｓ１０４に示す処理工程は、図７において説明したものと同様である。ステップＳ１０４における画像差分値算出後、シーン開始位置であるエントリーが決定しているか否かが判断される（Ｓ１０５）。エントリーが決定しているか否かの判断は、後段のステップＳ１１６においてシーン開始状態設定がなされているかにより行われる。すなわち、ステップＳ１１３においてシーンチェンジが検出され、エントリー情報生成（Ｓ１１４）、エントリー情報記録（Ｓ１１５）が行われた後、シーン開始状態設定がなされる（Ｓ１１６）。エントリーが決定している場合、画像差分値がサムネイル用判定閾値以下であるかが判別される（Ｓ１０６）。サムネイル用判定閾値以下と判別された場合、サムネイル生成カウントが減算される（Ｓ１０７）。画像差分値がサムネイル用判定閾値を下回ってから所定期間経過後、サムネイル生成カウントがタイムアウトすると（Ｓ１０８）、サムネイル情報が生成され（Ｓ１０９）、メタデータ記録部１９に記録される（Ｓ１１０）。サムネイル画像生成処理が終了すると、シーン開始状態が解除される（Ｓ１１１）。これにより、次のシーン境界の検出処理がなされる。一方、画像差分値がサムネイル用判定閾値以下と判別された場合（Ｓ１０６においてＮｏ）、サムネイル生成カウンタのカウント値がリセットされ（Ｓ１１２）、画像差分値が再びサムネイル用判定閾値以下となるまでサムネイル生成カウントの減算は行われない。
以上の処理が、録画ストリームの入力が終了するまで行われる（Ｓ１１７）。 FIG. 9 is a flowchart illustrating an example of operations up to thumbnail image generation. The flowchart of FIG. 9 shows the operation in the case where the picture shown in FIG. 6 is a thumbnail image after a predetermined time has elapsed (after 3 seconds have elapsed) after the image change value has fallen below the thumbnail determination threshold after the occurrence of the scene change. ing.
In FIG. 9, the processing steps shown in steps S102 to S104 are the same as those described in FIG. After calculating the image difference value in step S104, it is determined whether or not an entry that is a scene start position has been determined (S105). Whether or not the entry has been determined is determined based on whether or not the scene start state is set in the subsequent step S116. That is, a scene change is detected in step S113, entry information generation (S114) and entry information recording (S115) are performed, and then a scene start state is set (S116). If the entry has been determined, it is determined whether the image difference value is equal to or smaller than the thumbnail determination threshold value (S106). If it is determined that it is less than or equal to the determination threshold for thumbnail, the thumbnail generation count is subtracted (S107). When the thumbnail generation count times out after a lapse of a predetermined period after the image difference value falls below the thumbnail determination threshold (S108), thumbnail information is generated (S109) and recorded in the metadata recording unit 19 (S110). When the thumbnail image generation process ends, the scene start state is canceled (S111). Thus, the next scene boundary detection process is performed. On the other hand, if it is determined that the image difference value is equal to or smaller than the thumbnail determination threshold (No in S106), the count value of the thumbnail generation counter is reset (S112), and thumbnail generation is performed until the image difference value is again equal to or smaller than the thumbnail determination threshold. No count is subtracted.
The above processing is performed until the input of the recording stream is completed (S117).

図１０は、サムネイル選択による再生動作を示すフローチャートである。
ユーザーがリモコン等により所望の処理を映像記録再生装置１００に対して入力すると、リモコン受信６でコード化されたコマンドがシステム制御部２に入力される（Ｓ１０２）。システム制御部２はコマンドに応じて処理を分岐する。再生開始指示が入力された場合（Ｓ１０３）、再生アプリ１１に対して再生開始指示が行われる（Ｓ１０４）。これにより、ＡＶストリーム記録部１７から出力ストリーム制御１６へのＡＶストリームの読み出しが開始され、ＡＶデコーダ１５においてデコードおよび出画が開始される。
次に、ユーザーが、リモコンに設けられた「シーンセレクト」ボタンを押下した場合、サムネイル表示コマンドが送られ、サムネイル表示指示が行われる（Ｓ１０５）。これにより、メタデータ記録部１９からサムネイル画像が読み出され（Ｓ１０６）、読み出されたサムネイル画像が表示される（Ｓ１０７）。
次に、ユーザーがリモコンの左右キーを押下すると、選択枠移動指示が行われ（Ｓ１０８）、サムネイル選択２８によって表示された選択枠を移動させて描画する処理が行われる（ステップＳ１０９）。ユーザーが希望するシーンを見つけ、「決定」キーを押下すると、システム制御部２に対し決定指示がなされる（Ｓ１１０）。次に、再生アプリ１１に対してエントリー情報を抽出するように指示がなされ、再生制御１４はＡＶ管理情報記録部１８から再生制御情報を読み出し、指定されたシーンのアドレス位置にシーク動作を行う(Ｓ１１１)。再生終了指示がなされた場合は処理を終了し（Ｓ１１３）、それ以外の場合は、新たなコマンドの入力に従う（Ｓ１０２）。
以上の動作により、ユーザーがサムネイルを画面上で指定することによって、対応するシーンのエントリー位置からの再生が可能になる。 FIG. 10 is a flowchart showing a reproduction operation by thumbnail selection.
When the user inputs a desired process to the video recording / reproducing apparatus 100 using a remote controller or the like, a command encoded by the remote control reception 6 is input to the system control unit 2 (S102). The system control unit 2 branches the process according to the command. When a playback start instruction is input (S103), a playback start instruction is issued to the playback application 11 (S104). As a result, reading of the AV stream from the AV stream recording unit 17 to the output stream control 16 is started, and decoding and image output are started in the AV decoder 15.
Next, when the user presses the “scene select” button provided on the remote controller, a thumbnail display command is sent and a thumbnail display instruction is issued (S105). Thereby, a thumbnail image is read from the metadata recording unit 19 (S106), and the read thumbnail image is displayed (S107).
Next, when the user presses the left and right keys on the remote controller, a selection frame movement instruction is issued (S108), and a process of drawing by moving the selection frame displayed by the thumbnail selection 28 is performed (step S109). When the user finds the desired scene and presses the “decision” key, the system controller 2 is instructed to decide (S110). Next, the playback application 11 is instructed to extract entry information, and the playback control 14 reads the playback control information from the AV management information recording unit 18 and performs a seek operation at the address position of the designated scene ( S111). If a reproduction end instruction is given, the process ends (S113). Otherwise, a new command is input (S102).
With the above operation, the user can play back from the entry position of the corresponding scene by designating the thumbnail on the screen.

以上において説明した本実施の形態に関る映像音声記録再生装置１００によれば、シーンチェンジに対応するエントリー位置から、所定時間経過後の画像をサムネイル画像として採用することができる。これにより、シーンチェンジが継続している可能性が高いエントリー直後の画像をサムネイル画像として採用した場合に生じる不具合を解消し、サムネイルとしてより適切な画像を選択することができる。具体的には、動きが激しい映像や、瞬間的に表示されるような映像を避け、動きが少ない静止画に近い画像をサムネイルとすることが可能である。また、サムネイル画像は静止画等の相関性の高い画像が時間的に継続する期間の画像を選択して生成されるので、シーンの内容と整合したサムネイル画像を提供することができるとともに、ユーザーが選択したサムネイル画像をシーンの映像内において確認することができ、ユーザーは安心してサムネイル画像選択によるシーン再生をすることができる。 According to the video / audio recording / reproducing apparatus 100 according to the present embodiment described above, an image after a predetermined time can be employed as a thumbnail image from an entry position corresponding to a scene change. As a result, it is possible to eliminate a problem that occurs when an image immediately after entry with a high possibility of continuing the scene change is adopted as a thumbnail image, and to select a more appropriate image as the thumbnail. Specifically, it is possible to avoid a video with intense movement or a video that is displayed instantaneously, and an image close to a still image with little movement can be used as a thumbnail. In addition, since the thumbnail image is generated by selecting an image in a period in which a highly correlated image such as a still image continues in time, it is possible to provide a thumbnail image that matches the contents of the scene and the user can The selected thumbnail image can be confirmed in the video of the scene, and the user can play the scene by selecting the thumbnail image with peace of mind.

また、サムネイルを選択する条件は、時間経過のみのため、サムネイル生成判定条件を計算するシステムの負荷が少なくて済む。本実施の形態においては、再生するＡＶストリームを直接解析して、エントリー情報の抽出およびサムネイル画像の選択を行うので、エントリー情報およびサムネイル画像が有する時間情報はＡＶストリームと関連付けられている。このため、ＡＶストリームの再生経過時間とエントリー情報およびサムネイル画像が示す時間とを対応付けて再生を行うことが可能である。 Moreover, since the condition for selecting the thumbnail is only the passage of time, the load on the system for calculating the thumbnail generation determination condition can be reduced. In this embodiment, since the AV stream to be reproduced is directly analyzed to extract entry information and select a thumbnail image, the entry information and time information of the thumbnail image are associated with the AV stream. For this reason, it is possible to perform reproduction by associating the elapsed reproduction time of the AV stream with the time indicated by the entry information and the thumbnail image.

本実施の形態では、シーン境界であるエントリーを生成する方法、およびサムネイルを生成する方法として、ＨＤＤなどのＡＶストリーム記録部に記録されるＡＶストリームを対象としているが、符号化ストリームに限定されるものではなく、例えば、符号化を行う前の画像データを利用してもよい。また、Ｈ／Ｗなどで実装されたデコーダを解析用デコーダ２１として用いてもよい。さらにサムネイル生成を行う処理に専用のＨ／Ｗエンコーダを用いても良い。また、記録に関係するチューナー３および４、ＡＶエンコーダ７、入力ストリーム制御８、ＡＶストリーム記録部１７を改造することなく装置を構成することができるため、従来の映像音声記録再生装置に容易に搭載することができる。 In this embodiment, as a method for generating an entry that is a scene boundary and a method for generating a thumbnail, an AV stream recorded in an AV stream recording unit such as an HDD is targeted. However, the method is limited to an encoded stream. For example, image data before encoding may be used. Also, a decoder implemented by H / W or the like may be used as the analysis decoder 21. Further, a dedicated H / W encoder may be used for the process of generating thumbnails. Further, since the apparatus can be configured without modifying the tuners 3 and 4, the AV encoder 7, the input stream control 8, and the AV stream recording unit 17 related to recording, it can be easily installed in a conventional video / audio recording / reproducing apparatus. can do.

また、サムネイル生成２５において、生成しようとするサムネイルの画像が、サムネイル画像としてふさわしくない場合に、サムネイル画像として採用せず、新たに別のサムネイル画像を生成するようにしてもよい。サムネイル画像としてふさわしくない画像の例としては、黒色や白色などの単色の場合や、画像として何も存在しない場合や、カメラのフラッシュ等で画面が白とびした画像や、動きが早すぎて何が写っているのか変わらない画像や、コントラストが低い画像等が考えられる。これにより、適切な画像をサムネイルとして選択するのみでなく、不適切なデータに基づいてサムネイル生成処理を行うことにより、システムがハングするといった不具合を防ぐことができる。 In addition, in the thumbnail generation 25, if the thumbnail image to be generated is not suitable as a thumbnail image, another thumbnail image may be generated instead of being used as a thumbnail image. Examples of images that are not suitable as thumbnail images include monochrome images such as black and white, images that do not exist as images, images that are overexposed by the camera flash, etc. An image that does not change whether it is reflected or an image with a low contrast can be considered. Thus, not only selecting an appropriate image as a thumbnail but also performing a thumbnail generation process based on inappropriate data can prevent a problem that the system hangs.

また、本実施の形態では、シーン境界を抽出した後にサムネイル画像の選択と生成を行う構成としているが、例えば、サムネイルとして適切な画像を検出し、その後に、時間的に前にあるシーン境界の抽出を行ってもよい。この場合、シーン境界検出（例えば、シーンチェンジ検出）を常に行わなくてもよく、処理負荷の軽減が期待できる。さらに、サムネイル画像として適正な画像が検出されたときのみ、シーンを区切るので、結果的にサムネイル画像として不適切な画像がサムネイルとして選択されることがなくなる。 In the present embodiment, the thumbnail image is selected and generated after the scene boundary is extracted.For example, an appropriate image is detected as a thumbnail, and then the scene boundary that is temporally ahead is detected. Extraction may be performed. In this case, scene boundary detection (for example, scene change detection) need not always be performed, and a reduction in processing load can be expected. Furthermore, since a scene is divided only when an appropriate image is detected as a thumbnail image, an image inappropriate as a thumbnail image is not selected as a thumbnail.

シーン境界の検出を固定値で行った場合、つまり５分間隔等でシーンを区切った場合、シーン境界検出のための負荷を軽減できる。さらには、録画動作と並行してサムネイル生成を行う場合に、シーンの長さが予めわかっているので、シーンの中で最良のサムネイル画像を選択する処理が簡単に実現でき、結果としてより適切なサムネイル画像を選択することが可能になる。 When scene boundary detection is performed with a fixed value, that is, when a scene is divided at intervals of 5 minutes, the load for scene boundary detection can be reduced. Furthermore, when generating thumbnails in parallel with the recording operation, the length of the scene is known in advance, so that the process of selecting the best thumbnail image in the scene can be easily realized, resulting in a more appropriate result. It becomes possible to select a thumbnail image.

なお、本実施の形態にかかわる映像音声記録再生装置１００は、再生機能を有しているが、記録機能のみを有する記録装置であってもよい。このような記録装置は、記録したＡＶストリーム、再生制御情報、メタデータを外部に出力するインターフェース機能を設けることにより実現することができる。また、こうしたインターフェースから出力されたデータ、もしくはデータが記録された媒体を用いて、本実施の形態に関る再生機能を有する再生装置を構成してもよい。 The video / audio recording / reproducing apparatus 100 according to the present embodiment has a reproducing function, but may be a recording apparatus having only a recording function. Such a recording apparatus can be realized by providing an interface function for outputting the recorded AV stream, reproduction control information, and metadata to the outside. Further, a playback apparatus having a playback function according to the present embodiment may be configured using data output from such an interface or a medium on which data is recorded.

実施の形態２．
図１１は、本発明の実施の形態２に関る映像音声記録再生装置１０１の構成を示すブロック図である。本実施の形態に関る映像音声記録再生装置１０１は、図１に示す映像音声記録再生装置１００におけるシーン境界検出２３の代わりに音声境界抽出２６を備えたことを特徴とする。 Embodiment 2. FIG.
FIG. 11 is a block diagram showing a configuration of a video / audio recording / reproducing apparatus 101 according to Embodiment 2 of the present invention. The video / audio recording / reproducing apparatus 101 according to this embodiment is characterized in that an audio boundary extraction 26 is provided instead of the scene boundary detection 23 in the video / audio recording / reproducing apparatus 100 shown in FIG.

本実施の形態において、解析デコーダ２１は、ＡＶストリーム中の符号化されている音声データに対しても復号を行う。音声境界抽出２６は、フレームバッファ２２を解して入力される音声データについて、例えば、無音が発生している区間をシーン境界である判定する。エントリー２５は、シーン境界の判定結果に基づき、実施の形態１と同様にエントリー情報を作成し、メタデータ記録部１９に記録する。 In the present embodiment, the analysis decoder 21 also performs decoding on encoded audio data in the AV stream. The audio boundary extraction 26 determines, for example, a section in which silence is occurring in the audio data input through the frame buffer 22 as a scene boundary. The entry 25 creates entry information based on the determination result of the scene boundary and records it in the metadata recording unit 19 as in the first embodiment.

音声の解析方法としては、無音検出に限らず、音声の周波数解析を行い、周波数分布より音声情報の特性分類を行ってもよい。音声境界抽出２６において、一定区間の音声信号に対し、オーディオフレーム単位で周波数解析を実施し、各周波数の分布を求める。次に、区間ごとの周波数分布の形状が、予め「会話」、「音楽」、「スポーツ」、[動物の声]、「騒音」等に分類された周波数分布パターンのどれに一致するか求める。さらに、複数のオーディオフレームからなる区間窓を設け、最も頻度の多い周波数分布パターンを対応する区間窓のパターンであると判定する。そして、パターンが変化した区間窓の先頭をシーン境界とする。 The speech analysis method is not limited to silence detection, and speech frequency analysis may be performed, and speech information characteristic classification may be performed based on the frequency distribution. In the voice boundary extraction 26, frequency analysis is performed in units of audio frames on a voice signal in a certain section, and a distribution of each frequency is obtained. Next, it is determined which of the frequency distribution patterns classified in advance into “conversation”, “music”, “sports”, “animal voice”, “noise”, etc. Furthermore, a section window composed of a plurality of audio frames is provided, and the frequency distribution pattern with the highest frequency is determined as the corresponding section window pattern. Then, the head of the section window in which the pattern has changed is set as the scene boundary.

図１２は、音声の周波数解析によりシーン境界を検出する方法を示す説明図である。図１２（ａ）に示すように、音声信号はオーディオフレームに分割される。分割されたオーディオフレーム単位でオーディオデータの周波数解析が行われ、各周波数分布に最も近い周波数分布パターンを求める。図１２（ａ）に示す例では、３つのオーディオフレームからなる区間窓を設け、図１２の（ｂ）に示すように、区間窓ごとに最も頻度の高い周波数分布パターンを求める。この区間窓に対応付けられた周波数分布パターンが変化したところをシーンの変わり目であるシーン境界と判定し、図１２（ｃ）に示すようにシーンを構成する。 FIG. 12 is an explanatory diagram showing a method for detecting a scene boundary by audio frequency analysis. As shown in FIG. 12A, the audio signal is divided into audio frames. Frequency analysis of audio data is performed in units of divided audio frames, and a frequency distribution pattern closest to each frequency distribution is obtained. In the example shown in FIG. 12A, a section window composed of three audio frames is provided, and the most frequent frequency distribution pattern is obtained for each section window, as shown in FIG. A place where the frequency distribution pattern associated with the section window changes is determined to be a scene boundary which is a transition of the scene, and the scene is configured as shown in FIG.

次に、オーディオ情報に基づいてサムネイル画像の選択する方法について説明する。映像情報と音声情報の区切りは、必ずしも一致することはないが、音声の区切りで映像の変化が発生することが多いので、図１２（ｃ）に示す、シーン境界（周波数分布の形状パターンが変化する区間窓の境界）の画像をサムネイル画像とすべきではない。よって、区間窓において、同じ周波数分布パターンが続くところをサムネイルとする。例えば、「スポーツ」が続く区間窓Ｎ＋１からＮ＋３において、中心の区間窓Ｎ＋２における中心のオーディオフレームであるＡＦｍ＋７をサムネイル画像として選択する。
以降のサムネイル生成の処理は実施の形態１と同じである。 Next, a method for selecting thumbnail images based on audio information will be described. The video information and audio information delimiters do not necessarily match, but since video changes often occur at the audio delimiters, the scene boundary (the shape pattern of the frequency distribution changes as shown in FIG. 12C). The image of the section window boundary) should not be a thumbnail image. Therefore, a portion where the same frequency distribution pattern continues in the section window is defined as a thumbnail. For example, in the section windows N + 1 to N + 3 where “Sports” continues, AFm + 7, which is the central audio frame in the central section window N + 2, is selected as a thumbnail image.
Subsequent thumbnail generation processing is the same as in the first embodiment.

本実施の形態によれば、音声情報に基づいてシーン境界の検出やサムネイル画像の選択を行うことにより、実施の形態１のように映像情報に基づいてこれらの処理を行う場合に比して解析処理を軽量化し、処理速度を上げることができる。特に、音声信号から無音区間を検出する場合は、解析処理が非常に軽量になり、より早く処理を完了させることができる。また、音声情報を利用するために、音声に特徴のある番組、例えば音楽番組や、スポーツ番組に対して、シーンを分割することにおいてより精度が向上する。 According to the present embodiment, scene boundary detection and thumbnail image selection are performed based on audio information, and analysis is performed as compared with the case where these processes are performed based on video information as in the first embodiment. The processing can be reduced in weight and the processing speed can be increased. In particular, when a silent section is detected from an audio signal, the analysis process becomes very light and the process can be completed more quickly. In addition, since audio information is used, accuracy is further improved by dividing a scene into a program characterized by audio, such as a music program or a sports program.

さらに、映像と音声とを組み合わせてシーン境界の検出やサムネイル画像の選択を行うことにより、より適切な処理が可能となる。また、映像情報や音声情報に限定されることなく、字幕情報や、その他放送波に重畳されるＥＰＧデータや、著作権管理情報等、さまざまなデータを利用してもよい。これらのデータを利用することによって、正確なシーン分割が可能になる。 Further, by combining video and audio to detect a scene boundary and select a thumbnail image, more appropriate processing can be performed. Further, the present invention is not limited to video information and audio information, and various data such as caption information, EPG data superimposed on broadcast waves, copyright management information, and the like may be used. By using these data, accurate scene division becomes possible.

実施の形態３．
図１３は本発明の実施の形態３に関る映像音声記録再生装置１０２の構成を示すブロック図である。本実施の形態に関る映像音声記録再生装置１０２は、ＡＶストリーム記録部１７に記録されたＡＶストリームを読み出して、シーン境界の検出およびサムネイル画像の選択を行うことを特徴とする。図１３に示す映像音声記録装置１０２において、ダビング用ストリーム制御２７以外の構成は、図１に示す映像音声記録再生装置１００と同様である。 Embodiment 3 FIG.
FIG. 13 is a block diagram showing a configuration of the video / audio recording / reproducing apparatus 102 according to the third embodiment of the present invention. The video / audio recording / reproducing apparatus 102 according to the present embodiment reads the AV stream recorded in the AV stream recording unit 17 to detect a scene boundary and select a thumbnail image. In the video / audio recording apparatus 102 shown in FIG. 13, the configuration other than the dubbing stream control 27 is the same as that of the video / audio recording / reproducing apparatus 100 shown in FIG.

ダビング用ストリーム制御２７は、ＡＶストリーム記録部１７からＡＶストリームを読み出し、解析対象となるＡＶストリームのデータを解析用デコーダ２１に入力する。ダビング用ストリーム制御２７は、通常、ダビング用のバッファとして用いられるが、本実施の形態においては、このダビング用ストリーム制御２７の構成を用いて、ＡＶストリーム記録部１７から読み出されたＡＶストリームを解析用デコーダに入力する。 The dubbing stream control 27 reads the AV stream from the AV stream recording unit 17 and inputs the data of the AV stream to be analyzed to the analysis decoder 21. The dubbing stream control 27 is normally used as a dubbing buffer. In this embodiment, the dubbing stream control 27 is used to convert the AV stream read from the AV stream recording unit 17. Input to the decoder for analysis.

ダビング用ストリーム制御２７は、録画が終了するとＡＶストリーム記録部１７に記録されたＡＶストリームを順次読出し解析用でコーダ２１に入力する。解析用デコーダ２１は、ＧＯＰの開始コードが検出されると、先頭のＩピクチャが含まれるパケットを解析用デコーダ２１に転送する。解析用デコーダ２１は、当該Ｉピクチャのデコードを行う。以降の動作は実施の形態１と同様である。 When the recording is finished, the dubbing stream control 27 sequentially reads the AV stream recorded in the AV stream recording unit 17 and inputs it to the coder 21 for analysis. When the GOP start code is detected, the analysis decoder 21 transfers a packet including the first I picture to the analysis decoder 21. The analysis decoder 21 decodes the I picture. Subsequent operations are the same as those in the first embodiment.

このように、録画終了後にシーン境界の検出およびサムネイル画像の選択を行う構成とすることにより、実施の形態１のように、録画時の処理負荷の高い状態ではなく、処理の負荷が低い状態でのＡＶストリームの解析を行うため、安定した処理が可能となる。また、録画時にＡＶストリームの解析を行う場合、録画動作の速度と同じか、それ以上の速度でシーン境界の検出およびサムネイル画像の生成を行う必要があったが、録画終了後においては、さほど高速な解析処理を必要としない。また、録画後に処理を行うため、特定の期間のみ（例えば、ＣＭ区間を除いた基幹）の解析を行うことができ、トータルの解析負荷を低減し、消費電力の削減に貢献できる。また、一度記録されたタイトルに対して処理を行うことができるため、編集操作等でタイトルの内容が変わった場合に利用することができる。 In this way, by adopting a configuration in which the detection of the scene boundary and the selection of the thumbnail image are performed after the recording is completed, the processing load is not low but the processing load is low as in the first embodiment. Therefore, stable processing can be performed. In addition, when analyzing an AV stream during recording, it is necessary to detect a scene boundary and generate a thumbnail image at a speed that is the same as or higher than the speed of the recording operation. It does not require a special analysis process. In addition, since the processing is performed after recording, it is possible to analyze only for a specific period (for example, the backbone excluding the CM section), thereby reducing the total analysis load and contributing to the reduction of power consumption. Further, since the process can be performed on the title once recorded, it can be used when the content of the title is changed by an editing operation or the like.

１００，１０１，１０２録画音声記録再生装置、２システム制御部、３デジタルチューナー、４アナログチューナー、５外部入力端子、６リモコン受信部、７ＡＶエンコーダ、８入力ストリーム制御、９録画アプリ、１０録画制御、１１再生アプリ、１２メタデータ制御部、１３サムネイル表示、１４再生制御、１５ＡＶデコーダ、１６出力ストリーム制御、１７ＡＶストリーム記録部、１８ＡＶ管理情報記録部、１９メタデータ記録部、２０グラフィック重畳、２１解析用デコーダ、２２フレームバッファ、２３シーン境界抽出、２４エントリー生成、２５サムネイル生成、２６音声境界抽出、２７ダビング用ストリーム制御、２８サムネイル選択、３０ネットワーク端子、３１モニター出力端子、９１アンテナ、９２ネットワーク、９３モニター、９４リモコン、９５画面 100, 101, 102 Recorded audio recording / playback device, 2 system control unit, 3 digital tuner, 4 analog tuner, 5 external input terminal, 6 remote control receiving unit, 7 AV encoder, 8 input stream control, 9 recording application, 10 recording control 11 playback application, 12 metadata control unit, 13 thumbnail display, 14 playback control, 15 AV decoder, 16 output stream control, 17 AV stream recording unit, 18 AV management information recording unit, 19 metadata recording unit, 20 graphic superposition , 21 Analysis decoder, 22 frame buffer, 23 scene boundary extraction, 24 entry generation, 25 thumbnail generation, 26 audio boundary extraction, 27 dubbing stream control, 28 thumbnail selection, 30 network terminal, 31 monitor output terminal , 91 antenna, 92 network, 93 monitors, 94 remote control, 95 screen

Claims

Scene boundary extraction means for extracting a scene boundary from video information;
Entry generating means for generating entry information for accessing the scene boundary extracted by the scene boundary extracting means;
Thumbnail generation means for selecting a picture representing the scene divided by the scene boundary extraction means to generate a thumbnail image;
Thumbnail display means for displaying thumbnail images generated by the thumbnail generation means;
A thumbnail selection means for selecting an arbitrary thumbnail from the displayed thumbnails;
Means for playing a scene corresponding to the selected thumbnail,
The video / audio reproduction device, wherein the thumbnail generation means selects a picture after a predetermined period from a scene boundary as a thumbnail image.

2. The video / audio reproduction apparatus according to claim 1, wherein the thumbnail generation means analyzes the video information and selects a picture with little video change as a thumbnail image.

2. The video / audio reproduction device according to claim 1, wherein the thumbnail generation means detects a difference value between pictures constituting a scene and selects a picture having the difference value equal to or less than a predetermined threshold value as a thumbnail image.

2. The video / audio reproduction device according to claim 1, wherein the thumbnail generation means includes metadata storage means for generating address information of a picture corresponding to the generated thumbnail image as thumbnail information and storing the thumbnail information. .

2. The video / audio reproduction device according to claim 1, wherein the scene boundary extraction means detects the scene boundary based on the detection result of the scene change.

2. The video / audio reproduction device according to claim 1, wherein the scene boundary extraction unit detects a scene boundary based on a change in frequency distribution of the audio data.

2. The video / audio reproduction device according to claim 1, wherein the scene boundary extraction means detects a scene boundary based on data superimposed on the video information.

2. The video / audio reproduction apparatus according to claim 1, wherein the scene boundary extraction unit detects a scene boundary based on address information of a picture corresponding to the generated thumbnail image.

Scene boundary extraction means for extracting a scene boundary from video information;
Entry generating means for generating entry information for accessing the scene boundary extracted by the scene boundary extracting means;
Thumbnail generation means for selecting a picture representing the scene divided by the scene boundary extraction means to generate a thumbnail image;
Metadata recording means for recording the entry information and thumbnail information that is address information of a picture corresponding to the generated thumbnail image;
The video / audio editing apparatus, wherein the thumbnail generation unit selects a picture after a predetermined period from a scene boundary as a thumbnail image.

10. The video / audio editing apparatus according to claim 9, wherein the thumbnail generation means detects a difference value between pictures constituting a scene, and selects a picture having the difference value equal to or less than a predetermined threshold value as a thumbnail image.

Extract scene boundaries from video information,
Generate entry information to access the extracted scene boundary,
Select a picture that represents the scene divided by the extracted scene boundary to generate a thumbnail image,
Display the generated thumbnail image,
A process of playing a scene corresponding to a thumbnail arbitrarily selected from the displayed thumbnail images;
A video / audio reproduction method, wherein a picture after a predetermined period from a scene boundary is selected as a thumbnail image.

12. The video / audio reproduction method according to claim 11, wherein the video information is analyzed, and a picture with little video change is selected as a thumbnail image.

12. The video / audio reproduction method according to claim 11, wherein a difference value between pictures constituting a scene is detected, and a picture having the difference value equal to or less than a predetermined threshold is selected as a thumbnail image.

12. The video / audio reproduction method according to claim 11, wherein address information of a picture corresponding to the generated thumbnail image is stored as thumbnail information.

12. The video / audio reproduction method according to claim 11, wherein a scene boundary is detected based on a detection result of a scene change.

12. The video / audio reproduction method according to claim 11, wherein a scene boundary is detected based on a change in frequency distribution of the audio data.

12. The video / audio reproduction method according to claim 11, wherein a scene boundary is detected based on data superimposed on the video information.

Extract scene boundaries from video information,
Generate entry information to access the extracted scene boundary,
Select a picture that represents the scene divided by the extracted scene boundary to generate a thumbnail image,
A process of recording the entry information and thumbnail information that is address information of a picture corresponding to the generated thumbnail image;
A video / audio editing method, wherein a picture after a predetermined period from a scene boundary is selected as a thumbnail image.

19. The video / audio editing method according to claim 18, wherein a difference value between pictures constituting a scene is detected, and a picture having the difference value equal to or less than a predetermined threshold is selected as a thumbnail image.

Recording means for recording video information;
Scene boundary extracting means for extracting a scene boundary from the video information;
Entry generating means for generating entry information for accessing the scene boundary extracted by the scene boundary extracting means;
Thumbnail generation means for selecting a picture representing the scene divided by the scene boundary extraction means to generate a thumbnail image;
Thumbnail display means for displaying thumbnail images generated by the thumbnail generation means;
A thumbnail selection means for selecting an arbitrary thumbnail from the displayed thumbnails;
Means for playing a scene corresponding to the selected thumbnail,
The video / audio recording / reproducing apparatus, wherein the thumbnail generation means selects a picture after a predetermined period from a scene boundary as a thumbnail image.

21. The video / audio recording / reproducing apparatus according to claim 20, wherein the thumbnail generation means detects a difference value between pictures constituting a scene and selects a picture having the difference value equal to or smaller than a predetermined threshold as a thumbnail image. .

21. The video / audio recording / reproducing according to claim 20, wherein the thumbnail generation means includes metadata storage means for generating address information of a picture corresponding to the generated thumbnail image as thumbnail information and storing the thumbnail information. apparatus.

21. The video / audio recording / reproducing apparatus according to claim 20, wherein the scene boundary extracting means detects a scene boundary based on a scene change detection result.

21. The video / audio recording / reproducing apparatus according to claim 20, wherein the scene boundary extraction unit detects a scene boundary based on a change in frequency distribution of the audio data.

21. The video / audio recording / reproducing apparatus according to claim 20, wherein the scene boundary extraction unit detects a scene boundary based on data superimposed on the video information.

21. The video / audio recording / reproducing apparatus according to claim 20, wherein when an inappropriate thumbnail image is generated, the thumbnail generation unit generates a thumbnail image different from the thumbnail image.

Record video information,
Extracting scene boundaries from the video information;
Generate entry information to access the extracted scene boundary,
Select a picture that represents the scene divided by the extracted scene boundary to generate a thumbnail image,
Display the generated thumbnail image,
A process of playing a scene corresponding to a thumbnail arbitrarily selected from the displayed thumbnail images;
A video / audio recording / reproducing method, wherein a picture after a predetermined period from a scene boundary is selected as a thumbnail image.

28. The video / audio recording / reproducing method according to claim 27, wherein a difference value between pictures constituting a scene is detected, and a picture having the difference value equal to or less than a predetermined threshold is selected as a thumbnail image.

28. The video / audio recording / reproducing method according to claim 27, wherein address information of a picture corresponding to the generated thumbnail image is stored as thumbnail information.

28. The video / audio recording / reproducing method according to claim 27, wherein a scene boundary is detected based on a detection result of the scene change.

28. The video / audio recording / reproducing method according to claim 27, wherein a scene boundary is detected based on a change in frequency distribution of the audio data.

28. The video / audio recording / reproducing method according to claim 27, wherein a scene boundary is detected based on data superimposed on the video information.

28. The video / audio recording / reproducing method according to claim 27, wherein when an inappropriate thumbnail image is generated, a thumbnail image different from the thumbnail image is generated.